An opinionated, local-first Agentic RAG framework powered by LanceDB and Pydantic AI. Features hybrid search, multi-agent collaborative research, sandboxed code execution, and parsing of 40+ document formats, with MCP support.
Overview#
haiku.rag is a robust Retrieval-Augmented Generation (RAG) solution designed to handle private data in a local-first manner. It combines the efficient vector retrieval of LanceDB with the agent orchestration capabilities of Pydantic AI. Maintained by ggozad, current version 0.32.0, released under MIT License.
Core Features#
Multi-Modal Agent Support#
- QA Agent: Precise Q&A with page number and section citations
- Research Agent: Graph-based multi-step workflow (Plan → Search → Evaluate → Synthesize) using pydantic-graph
- RLM Agent: Sandboxed Python code execution for cross-document computation and aggregation
- Conversational RAG: Multi-turn dialogue with conversation memory
Deep Document Understanding#
- Powered by Docling engine, supports 40+ formats including PDF, DOCX, PPTX, images
- Preserves document logical structure (headings, paragraphs, page numbers) with context expansion
- Visual Grounding: Highlight retrieved chunks on original page images
Hybrid Search Technology#
- Vector search + Full-text search (BM25) + Reciprocal Rank Fusion
- Reranking support: MxBAI, Cohere, Zero Entropy, vLLM
- Time Travel: Query database state at specific historical timestamps
Deployment & Integration#
Storage Architecture#
- Embedded LanceDB, no additional server required
- Cloud storage support: S3, GCS, Azure, LanceDB Cloud
- File system monitoring with automatic indexing
Interface Options#
- Complete CLI and Python API
- MCP Server: Integrates with Claude Desktop and other AI assistants
- Inspector TUI: Terminal interface for browsing documents, chunks, and search results
Requirements#
- Python 3.12+
- Ollama (default Embedding and LLM backend)
Installation#
# Full installation
pip install haiku.rag
# Slim installation
pip install haiku.rag-slim
Quick Start#
# Index documents
haiku-rag add-src paper.pdf
# Hybrid search
haiku-rag search "attention mechanism"
# Q&A with citations
haiku-rag ask "What datasets were used?" --cite
# Research mode
haiku-rag research "What are the limitations?"
# MCP server
haiku-rag serve --mcp --stdio
Python API Example#
from haiku.rag.client import HaikuRAG
async with HaikuRAG("research.lancedb", create=True) as rag:
await rag.create_document_from_source("paper.pdf")
results = await rag.search("self-attention")
answer, citations = await rag.ask("What is the complexity?")
Supported Providers#
- Embeddings: Ollama (default), OpenAI, VoyageAI, LM Studio, vLLM
- QA/Research: All Pydantic AI supported models
Use Cases#
- Enterprise internal knowledge base Q&A
- Academic literature research and analysis
- Complex data analysis tasks requiring multi-document aggregation
- Local memory/knowledge retrieval backend for AI assistants (e.g., Claude Desktop)