Rankify

A modular Python toolkit developed by the University of Innsbruck that integrates information retrieval, re-ranking, and RAG generation, featuring 40+ pre-processed datasets and single-line pipeline construction.

Overview#

Rankify is an open-source Python toolkit developed by the Data Science team at the University of Innsbruck (DataScienceUIBK), designed to address the fragmentation of IR and RAG toolchains. The project unifies document retrieval, re-ranking, and Retrieval-Augmented Generation (RAG) in a single framework. Current version: v0.1.4, released under Apache-2.0 license.

Core Capabilities#

Retrieval#

Sparse Retrieval: BM25
Dense Retrieval: DPR, ANCE, ColBERT, BGE, Contriever, BPR, HYDE
SOTA Retrievers: SFR, E5, GritLM, M2, Nomic, Instructor, RaDeR, ReasonIR, BGE-Reasoner, ReasonEmbed, DiverRetriever
Pre-built Indices: Wikipedia and MS MARCO corpora

Re-ranking#

Integrates 24+ state-of-the-art re-ranking models:

Cross-Encoders
RankGPT / RankGPT-API
MonoT5, MonoBert, RankT5
LiT5Score, LiT5Distill
Vicuna Reranker, Zephyr Reranker
FlashRank, InRanker
Transformer Reranker (bge-reranker, mxbai-rerank, gte-multilingual, etc.)
API Services (Voyage, Jina, Mixedbread.ai)

RAG Generation#

Generation Strategies: Zero-shot, Basic-RAG, Chain-of-Thought-RAG, FiD (Fusion-in-Decoder), In-Context Learning RALM
LLM Backends: Hugging Face, vLLM, LiteLLM, OpenAI

Datasets & Evaluation#

40+ Pre-retrieved Benchmark Datasets: NQ, TriviaQA, HotpotQA, FEVER, ELI5, PopQA, Musique, StrategyQA, BoolQ, WebQ, etc.
Each dataset contains 1,000 pre-retrieved documents
Evaluation Metrics: Recall@k, Precision@k, MRR, nDCG, MAP
RAG Evaluation: Integrated RAGAS framework

Architecture#

Modular Design#

rankify.retrievers: Multiple retriever implementations
rankify.models.reranking: Unified re-ranking interface
rankify.generator: RAG generator
rankify.dataset: Dataset management and loading
rankify.metrics: Evaluation metrics
rankify.agent: AI-assisted model selection (RankifyAgent)
rankify.server: REST API server
rankify.integrations: Framework integrations

Pipeline API#

Single-line pipeline creation:

from rankify import pipeline

# Complete RAG pipeline
rag = pipeline("rag", retriever="bge", reranker="flashrank", generator="basic-rag")
answers = rag("What is machine learning?", documents)

# Other pipeline types
pipeline("search")  # Document retrieval only
pipeline("rerank")  # Retrieval + re-ranking

Deployment#

REST API Server#

rankify serve --port 8000 --retriever bge --reranker flashrank

Python API Deployment#

from rankify.server import RankifyServer
server = RankifyServer(retriever="bge", reranker="flashrank")
server.start(port=8000)

Custom Index Building#

rankify-index index data/wikipedia_10k.jsonl --retriever bm25 --output ./indices

Framework Integration#

LangChain
LlamaIndex
Gradio Interactive Interface (Web Playground)

Installation#

# Environment setup
conda create -n rankify python=3.10
conda activate rankify

# PyTorch installation
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124

# Full installation
pip install "rankify[all]"

# Optional modules
pip install "rankify[retriever]"  # Retrieval functionality
pip install "rankify[reranking]"   # Re-ranking functionality
pip install "rankify[rag]"         # RAG endpoints

Use Cases#

Academic Research: Comparative studies of information retrieval and re-ranking methods
RAG System Benchmarking
QA System Prototyping
Enterprise Document Retrieval and Ranking
Knowledge Base QA System Construction
Multi-model Performance Comparison

Important Notes#

Full dataset is approximately 1.48 TB, requiring significant storage and bandwidth
Some retrievers (e.g., ColBERT) require specific compilation environment dependencies
Recommended: PyTorch 2.5.1 and Python 3.10+

Authors#

Abdelrahman Abdallah, Bhawna Piryani, Jamshid Mozafari, Mohammed Ali, Adam Jatowt (University of Innsbruck)