A lightweight, modular, sovereign-by-design RAG framework supporting multimodal document ingestion, hybrid retrieval with reranking, and an OpenAI API-compatible interface.
OpenRag is a full-stack RAG framework developed by French open-source company Linagora, built with a "sovereign-by-design" philosophy free from vendor lock-in. It covers the complete pipeline from document ingestion to QA generation: supporting multimodal file parsing (txt/md/pdf/docx/pptx/audio/images) with unified Markdown conversion; format-aware chunking with contextual summary enrichment; vectorization via jina-embeddings-v3 into Milvus; hybrid semantic+BM25 retrieval with Query Reformulation, HyDE, and GTE/Jina v2 multilingual reranking. The LLM layer uses an OpenAI API-compatible design, connecting to Mistral/GPT-4/Claude or local vLLM models, with independent VLM configuration for image understanding. It exposes a FastAPI service layer with an OpenAI-compatible Chat API for seamless integration with OpenWebUI, LangChain, and N8N. Built-in Indexer document management UI and Chainlit chat UI both support i18n. Features include multi-tenant partition isolation, Token/OIDC authentication, Ray distributed parallelism, Kubernetes Helm Chart deployment, and an automatic evaluation pipeline using UMAP+HDBScan. Licensed under AGPL-3.0, primarily Python (94.2%), current version v1.1.9.
Core Capabilities#
- Multimodal Document Ingestion: Text (txt/md), documents (pdf/docx/doc/pptx with MarkerLoader default supporting OCR and complex layouts, optional Docling), audio (wav/mp3/mp4 etc. with auto-transcription), images (png/jpeg/jpg/svg with VLM-generated description text replacing originals), all formats unified to Markdown.
- Hybrid Retrieval & Reranking: Semantic vector search + BM25 keyword search, Query Reformulation and HyDE query augmentation, optional multilingual reranking via Infinity Inference Server using GTE or Jina v2.
- LLM-Agnostic Design: Supports any OpenAI API-compatible LLM and local vLLM-deployed models, independent VLM configuration for image understanding.
- OpenAI API-Compatible Interface: Provides OpenAI-format compatible Chat API for integration with OpenWebUI, LangChain, N8N and other frontend/workflow tools.
- Multi-Tenant Partitions: Documents organized by Partition, supporting isolation of different user/team document collections.
- Automatic Evaluation Pipeline: Built-in UMAP + HDBScan clustering to generate synthetic QA datasets from indexed documents, local LLM scoring of query-chunk pairs to quantify retrieval quality.
- Web UI: Native Indexer document management interface and Chainlit chat interface, both with i18n support.
- Authentication: Token mode (default Bearer Token) and OIDC mode (supporting Keycloak, LemonLDAP::NG and other IdPs).
- Distributed Scaling: Ray-based parallelization of chunking, embedding, and ingestion tasks across multi-node multi-GPU setups; Kubernetes Helm Chart (
charts/openrag-stack) and Ansible playbooks for production deployment.
Deployment#
- Quick Start: Docker Compose with GPU (NVIDIA Container Toolkit) and CPU profiles.
- Production: Kubernetes Helm Chart or Ansible automation.
- Prerequisites: Python 3.12+, Docker and Docker Compose.
Key Configuration#
| Setting | Description | Default |
|---|---|---|
BASE_URL / API_KEY / MODEL | LLM endpoint, key, model name | Manual |
VLM_BASE_URL / VLM_MODEL | Vision language model | Same as LLM |
EMBEDDER_MODEL_NAME | Embedding model | jinaai/jina-embeddings-v3 |
RETRIEVER_TOP_K | Retrieval top-K documents | 20 |
RERANKER_ENABLED | Enable reranking | true |
RERANKER_MODEL | Reranking model | Alibaba-NLP/gte-multilingual-reranker-base |
AUTH_MODE | Authentication mode | token |
WEBSEARCH_API_TOKEN | Web search API (optional) | Silent disable if unset |