The Context Optimization Layer for LLM Applications, delivering 40-90% token reduction through deterministic compression and intelligent caching with multi-modal support and reversible CCR mechanism
Overview#
Headroom is a context optimization middleware designed for LLM applications, using deterministic compression algorithms (non-LLM compression) combined with CCR (Compress-Cache-Retrieve) mechanism to optimize context before sending requests to LLMs.
Core Values:
- Zero code changes (proxy mode)
- Provider-agnostic (supports OpenAI/Anthropic/AWS/Google etc.)
- Reversible compression (retrieve original data on demand)
- Deterministic compression (no LLM calls, predictable results)
Core Compression Capabilities#
| Feature | Description | Compression Rate |
|---|---|---|
| SmartCrusher | Universal JSON compression, statistical array pattern analysis, preserves errors/anomalies/boundary values | 70-95% |
| CodeCompressor | AST-aware code compression, supports Python/JS/Go/Rust/Java/C++ | - |
| LLMLingua-2 | ML-based text compression (optional) | ~20x |
| Image Compression | Image token optimization with trained ML Router for automatic strategy selection | 40-90% |
| CCR Mechanism | Reversible compression with original data caching, LLM can retrieve on demand via tool calls | - |
Context Management#
- Intelligent Context Manager: Multi-factor importance scoring for message management (TOIN learning mode, semantic similarity, error metrics)
- CacheAligner: Extracts dynamic content (dates, UUIDs) to stabilize prefixes, optimizing KV Cache hit rates
- Rolling Window: Token limit-based rolling window management
- Compression Summaries: Generates summaries of omitted content (e.g., "87 passed, 2 failed, 1 error")
Installation & Quick Start#
pip install headroom-ai # Core library
pip install "headroom-ai[all]" # Full installation (recommended)
pip install "headroom-ai[proxy]" # Proxy server
pip install "headroom-ai[mcp]" # Claude Code MCP integration
pip install "headroom-ai[langchain]" # LangChain integration
Proxy Mode (Zero Code Changes)#
headroom proxy --port 8787
ANTHROPIC_BASE_URL=http://localhost:8787 claude
OPENAI_BASE_URL=http://localhost:8787/v1 cursor
Python SDK#
from headroom import compress
result = compress(messages, model="claude-sonnet-4-5-20250929")
response = client.messages.create(
model="claude-sonnet-4-5-20250929",
messages=result.messages
)
print(f"Saved {result.tokens_saved} tokens ({result.compression_ratio:.0%})")
Typical Use Cases & Results#
| Scenario | Before | After | Savings |
|---|---|---|---|
| Code search (100 results) | 17,765 tokens | 1,408 tokens | 92% |
| SRE incident debugging | 65,694 tokens | 5,118 tokens | 92% |
| Codebase exploration | 78,502 tokens | 41,254 tokens | 47% |
| GitHub issue classification | 54,174 tokens | 14,761 tokens | 73% |
Applicable Scenarios: AI agent tool output compression (logs, search results, database queries), long conversation context management, multi-modal application image token optimization, RAG system context budget control.
Operations Features#
- Prometheus metrics endpoint
- Request logging and cost tracking
- Budget limits and rate control
Cloud Provider Support#
headroom proxy --backend bedrock --region us-east-1 # AWS Bedrock
headroom proxy --backend vertex_ai --region us-central1 # Google Vertex
headroom proxy --backend azure # Azure OpenAI
headroom proxy --backend openrouter # OpenRouter