haiku.rag

An opinionated, local-first Agentic RAG framework powered by LanceDB and Pydantic AI. Features hybrid search, multi-agent collaborative research, sandboxed code execution, and parsing of 40+ document formats, with MCP support.

Overview#

haiku.rag is a robust Retrieval-Augmented Generation (RAG) solution designed to handle private data in a local-first manner. It combines the efficient vector retrieval of LanceDB with the agent orchestration capabilities of Pydantic AI. Maintained by ggozad, current version 0.32.0, released under MIT License.

Core Features#

QA Agent: Precise Q&A with page number and section citations
Research Agent: Graph-based multi-step workflow (Plan → Search → Evaluate → Synthesize) using pydantic-graph
RLM Agent: Sandboxed Python code execution for cross-document computation and aggregation
Conversational RAG: Multi-turn dialogue with conversation memory

Deep Document Understanding#

Powered by Docling engine, supports 40+ formats including PDF, DOCX, PPTX, images
Preserves document logical structure (headings, paragraphs, page numbers) with context expansion
Visual Grounding: Highlight retrieved chunks on original page images

Hybrid Search Technology#

Vector search + Full-text search (BM25) + Reciprocal Rank Fusion
Reranking support: MxBAI, Cohere, Zero Entropy, vLLM
Time Travel: Query database state at specific historical timestamps

Deployment & Integration#

Storage Architecture#

Embedded LanceDB, no additional server required
Cloud storage support: S3, GCS, Azure, LanceDB Cloud
File system monitoring with automatic indexing

Interface Options#

Complete CLI and Python API
MCP Server: Integrates with Claude Desktop and other AI assistants
Inspector TUI: Terminal interface for browsing documents, chunks, and search results

Requirements#

Python 3.12+
Ollama (default Embedding and LLM backend)

Installation#

# Full installation
pip install haiku.rag

# Slim installation
pip install haiku.rag-slim

Quick Start#

# Index documents
haiku-rag add-src paper.pdf

# Hybrid search
haiku-rag search "attention mechanism"

# Q&A with citations
haiku-rag ask "What datasets were used?" --cite

# Research mode
haiku-rag research "What are the limitations?"

# MCP server
haiku-rag serve --mcp --stdio

Python API Example#

from haiku.rag.client import HaikuRAG

async with HaikuRAG("research.lancedb", create=True) as rag:
    await rag.create_document_from_source("paper.pdf")
    results = await rag.search("self-attention")
    answer, citations = await rag.ask("What is the complexity?")

Supported Providers#

Embeddings: Ollama (default), OpenAI, VoyageAI, LM Studio, vLLM
QA/Research: All Pydantic AI supported models

Use Cases#

Enterprise internal knowledge base Q&A
Academic literature research and analysis
Complex data analysis tasks requiring multi-document aggregation
Local memory/knowledge retrieval backend for AI assistants (e.g., Claude Desktop)