OpenRag

A lightweight, modular, sovereign-by-design RAG framework supporting multimodal document ingestion, hybrid retrieval with reranking, and an OpenAI API-compatible interface.

OpenRag is a full-stack RAG framework developed by French open-source company Linagora, built with a "sovereign-by-design" philosophy free from vendor lock-in. It covers the complete pipeline from document ingestion to QA generation: supporting multimodal file parsing (txt/md/pdf/docx/pptx/audio/images) with unified Markdown conversion; format-aware chunking with contextual summary enrichment; vectorization via jina-embeddings-v3 into Milvus; hybrid semantic+BM25 retrieval with Query Reformulation, HyDE, and GTE/Jina v2 multilingual reranking. The LLM layer uses an OpenAI API-compatible design, connecting to Mistral/GPT-4/Claude or local vLLM models, with independent VLM configuration for image understanding. It exposes a FastAPI service layer with an OpenAI-compatible Chat API for seamless integration with OpenWebUI, LangChain, and N8N. Built-in Indexer document management UI and Chainlit chat UI both support i18n. Features include multi-tenant partition isolation, Token/OIDC authentication, Ray distributed parallelism, Kubernetes Helm Chart deployment, and an automatic evaluation pipeline using UMAP+HDBScan. Licensed under AGPL-3.0, primarily Python (94.2%), current version v1.1.9.

Core Capabilities#

Multimodal Document Ingestion: Text (txt/md), documents (pdf/docx/doc/pptx with MarkerLoader default supporting OCR and complex layouts, optional Docling), audio (wav/mp3/mp4 etc. with auto-transcription), images (png/jpeg/jpg/svg with VLM-generated description text replacing originals), all formats unified to Markdown.
Hybrid Retrieval & Reranking: Semantic vector search + BM25 keyword search, Query Reformulation and HyDE query augmentation, optional multilingual reranking via Infinity Inference Server using GTE or Jina v2.
LLM-Agnostic Design: Supports any OpenAI API-compatible LLM and local vLLM-deployed models, independent VLM configuration for image understanding.
OpenAI API-Compatible Interface: Provides OpenAI-format compatible Chat API for integration with OpenWebUI, LangChain, N8N and other frontend/workflow tools.
Multi-Tenant Partitions: Documents organized by Partition, supporting isolation of different user/team document collections.
Automatic Evaluation Pipeline: Built-in UMAP + HDBScan clustering to generate synthetic QA datasets from indexed documents, local LLM scoring of query-chunk pairs to quantify retrieval quality.
Web UI: Native Indexer document management interface and Chainlit chat interface, both with i18n support.
Authentication: Token mode (default Bearer Token) and OIDC mode (supporting Keycloak, LemonLDAP::NG and other IdPs).
Distributed Scaling: Ray-based parallelization of chunking, embedding, and ingestion tasks across multi-node multi-GPU setups; Kubernetes Helm Chart (charts/openrag-stack) and Ansible playbooks for production deployment.

Deployment#

Quick Start: Docker Compose with GPU (NVIDIA Container Toolkit) and CPU profiles.
Production: Kubernetes Helm Chart or Ansible automation.
Prerequisites: Python 3.12+, Docker and Docker Compose.

Key Configuration#

Setting	Description	Default
`BASE_URL` / `API_KEY` / `MODEL`	LLM endpoint, key, model name	Manual
`VLM_BASE_URL` / `VLM_MODEL`	Vision language model	Same as LLM
`EMBEDDER_MODEL_NAME`	Embedding model	`jinaai/jina-embeddings-v3`
`RETRIEVER_TOP_K`	Retrieval top-K documents	20
`RERANKER_ENABLED`	Enable reranking	`true`
`RERANKER_MODEL`	Reranking model	`Alibaba-NLP/gte-multilingual-reranker-base`
`AUTH_MODE`	Authentication mode	`token`
`WEBSEARCH_API_TOKEN`	Web search API (optional)	Silent disable if unset

Core Capabilities#

Deployment#

Key Configuration#

Related Projects

Kiln

OpenClaw Multi-Agent Team Framework

OmniRoute

STAY UPDATED