A modular Python toolkit developed by the University of Innsbruck that integrates information retrieval, re-ranking, and RAG generation, featuring 40+ pre-processed datasets and single-line pipeline construction.
Overview#
Rankify is an open-source Python toolkit developed by the Data Science team at the University of Innsbruck (DataScienceUIBK), designed to address the fragmentation of IR and RAG toolchains. The project unifies document retrieval, re-ranking, and Retrieval-Augmented Generation (RAG) in a single framework. Current version: v0.1.4, released under Apache-2.0 license.
Core Capabilities#
Retrieval#
- Sparse Retrieval: BM25
- Dense Retrieval: DPR, ANCE, ColBERT, BGE, Contriever, BPR, HYDE
- SOTA Retrievers: SFR, E5, GritLM, M2, Nomic, Instructor, RaDeR, ReasonIR, BGE-Reasoner, ReasonEmbed, DiverRetriever
- Pre-built Indices: Wikipedia and MS MARCO corpora
Re-ranking#
Integrates 24+ state-of-the-art re-ranking models:
- Cross-Encoders
- RankGPT / RankGPT-API
- MonoT5, MonoBert, RankT5
- LiT5Score, LiT5Distill
- Vicuna Reranker, Zephyr Reranker
- FlashRank, InRanker
- Transformer Reranker (bge-reranker, mxbai-rerank, gte-multilingual, etc.)
- API Services (Voyage, Jina, Mixedbread.ai)
RAG Generation#
- Generation Strategies: Zero-shot, Basic-RAG, Chain-of-Thought-RAG, FiD (Fusion-in-Decoder), In-Context Learning RALM
- LLM Backends: Hugging Face, vLLM, LiteLLM, OpenAI
Datasets & Evaluation#
- 40+ Pre-retrieved Benchmark Datasets: NQ, TriviaQA, HotpotQA, FEVER, ELI5, PopQA, Musique, StrategyQA, BoolQ, WebQ, etc.
- Each dataset contains 1,000 pre-retrieved documents
- Evaluation Metrics: Recall@k, Precision@k, MRR, nDCG, MAP
- RAG Evaluation: Integrated RAGAS framework
Architecture#
Modular Design#
rankify.retrievers: Multiple retriever implementationsrankify.models.reranking: Unified re-ranking interfacerankify.generator: RAG generatorrankify.dataset: Dataset management and loadingrankify.metrics: Evaluation metricsrankify.agent: AI-assisted model selection (RankifyAgent)rankify.server: REST API serverrankify.integrations: Framework integrations
Pipeline API#
Single-line pipeline creation:
from rankify import pipeline
# Complete RAG pipeline
rag = pipeline("rag", retriever="bge", reranker="flashrank", generator="basic-rag")
answers = rag("What is machine learning?", documents)
# Other pipeline types
pipeline("search") # Document retrieval only
pipeline("rerank") # Retrieval + re-ranking
Deployment#
REST API Server#
rankify serve --port 8000 --retriever bge --reranker flashrank
Python API Deployment#
from rankify.server import RankifyServer
server = RankifyServer(retriever="bge", reranker="flashrank")
server.start(port=8000)
Custom Index Building#
rankify-index index data/wikipedia_10k.jsonl --retriever bm25 --output ./indices
Framework Integration#
- LangChain
- LlamaIndex
- Gradio Interactive Interface (Web Playground)
Installation#
# Environment setup
conda create -n rankify python=3.10
conda activate rankify
# PyTorch installation
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
# Full installation
pip install "rankify[all]"
# Optional modules
pip install "rankify[retriever]" # Retrieval functionality
pip install "rankify[reranking]" # Re-ranking functionality
pip install "rankify[rag]" # RAG endpoints
Use Cases#
- Academic Research: Comparative studies of information retrieval and re-ranking methods
- RAG System Benchmarking
- QA System Prototyping
- Enterprise Document Retrieval and Ranking
- Knowledge Base QA System Construction
- Multi-model Performance Comparison
Important Notes#
- Full dataset is approximately 1.48 TB, requiring significant storage and bandwidth
- Some retrievers (e.g., ColBERT) require specific compilation environment dependencies
- Recommended: PyTorch 2.5.1 and Python 3.10+
Authors#
Abdelrahman Abdallah, Bhawna Piryani, Jamshid Mozafari, Mohammed Ali, Adam Jatowt (University of Innsbruck)