VeritasGraph

The All-in-One GraphRAG Framework for enterprise-grade AI with document-centric ingestion, dual Tree+Graph retrieval, and verifiable attribution.

VeritasGraph is an enterprise-grade GraphRAG framework built on the principle "Don't Chunk. Graph." — ingesting whole pages or sections as graph nodes instead of traditional 500-token chunks, preserving document structural integrity. The framework employs a dual Tree + Graph retrieval architecture: PageIndex-style hierarchical TOC navigation runs in parallel with knowledge graph semantic reasoning, supporting cross-section linking and multi-hop reasoning for complex cross-document questions. Every generated claim provides 100% verifiable attribution traceable to exact source document locations, making it suitable for high-compliance domains such as legal, medical, and financial sectors.

Retrieval & Reasoning#

Tree-based Navigation: PageIndex-style hierarchical TOC navigation with cross-section linking
Graph-based Semantic Search: Knowledge graph-connected semantic retrieval, not mere vector similarity matching
Multi-hop Reasoning: Complex reasoning across documents and sections
Document-Centric Ingestion: Whole pages/sections as nodes, avoiding context loss from chunking

Ingestion Sources#

PDF: Via pipeline.ingest_pdf() or CLI veritasgraph ingest
YouTube: Automatic subtitle extraction from URL
Web Articles: Direct URL ingestion via CLI
Plain Text: Standard text ingestion
Charts/Tables: Vision RAG mode converts to knowledge graph nodes

Verifiability & Visualization#

Verifiable Attribution: Every claim includes a precise attribution path traceable to exact source locations
Interactive Graph Visualization: PyVis-powered 2D graph browser showing entities, relations, and reasoning paths in real time

Deployment Modes#

Mode	Description	Dependencies
`lite`	Cloud API, zero config	OpenAI-compatible API Key
`local`	Fully offline, Ollama local inference	Ollama (8GB RAM required)
`full`	Production-grade, one-click Docker	Docker + Neo4j + Ollama

LLM/Embedding Compatibility#

Unified through an OpenAI-compatible API abstraction, supporting mixed configurations (e.g., Groq for LLM + Ollama for Embeddings): OpenAI, Azure OpenAI, Groq, Together AI, OpenRouter, LM Studio, vLLM, Ollama.

Architecture#

Graph Engine Layer: Based on Microsoft GraphRAG for indexing and querying, Neo4j as persistent graph database
Retrieval Layer: Tree-based navigation and Graph-based semantic search running in parallel
Document Processing Layer: Document-centric ingestion with whole pages/sections as single retrievable nodes
LLM Abstraction Layer: OpenAI-compatible API interface unifying multiple local/cloud LLM providers
Visualization Layer: PyVis interactive 2D graph browser, Gradio Web UI

Installation & Quick Start#

pip install veritasgraph
veritasgraph demo --mode=lite

Optional dependencies: veritasgraph[web] (Gradio UI + visualization), veritasgraph[graphrag] (Microsoft GraphRAG integration), veritasgraph[ingest] (YouTube & web ingestion), veritasgraph[all] (all features).

Docker one-click deployment (full mode):

cd docker/five-minute-magic-onboarding
docker compose up --build
# Ports: Gradio UI :7860, Neo4j Browser :7474, Ollama API :11434

Python API Example#

from veritasgraph import VisionRAGPipeline, VisionRAGConfig

pipeline = VisionRAGPipeline()
doc = pipeline.ingest_pdf("document.pdf")
result = pipeline.query("What are the key findings?")
print(result.answer)

config = VisionRAGConfig(ingest_mode="document-centric")
pipeline = VisionRAGPipeline(config)
doc = pipeline.ingest_pdf("annual_report.pdf")
print(pipeline.get_document_tree())
section = pipeline.navigate_to_section("Methodology")

Key Environment Variables#

Variable	Purpose
`GRAPHRAG_API_KEY`	LLM API key
`GRAPHRAG_LLM_MODEL`	LLM model name
`GRAPHRAG_LLM_API_BASE`	LLM API base URL
`GRAPHRAG_EMBEDDING_API_KEY`	Embedding API key
`GRAPHRAG_EMBEDDING_MODEL`	Embedding model name
`GRAPHRAG_EMBEDDING_API_BASE`	Embedding API base URL

Unconfirmed Information#

Exact PyPI version and release date (JS rendering limitation on PyPI page)
Independent website/docs URL (README mentions "Live documentation" but no URL provided)
Formal publication of the accompanying paper (PDF in repo, no arXiv or journal link found)
Deployed HuggingFace Space address
Whether GPU is mandatory for local mode
Performance benchmarks for large-scale document sets

Retrieval & Reasoning#

Ingestion Sources#

Verifiability & Visualization#

Deployment Modes#

LLM/Embedding Compatibility#

Architecture#

Installation & Quick Start#

Python API Example#

Key Environment Variables#

Unconfirmed Information#

Related Projects

Agentara

llama.cpp

agentchattr

STAY UPDATED