Headroom

The Context Optimization Layer for LLM Applications, delivering 40-90% token reduction through deterministic compression and intelligent caching with multi-modal support and reversible CCR mechanism

Overview#

Headroom is a context optimization middleware designed for LLM applications, using deterministic compression algorithms (non-LLM compression) combined with CCR (Compress-Cache-Retrieve) mechanism to optimize context before sending requests to LLMs.

Core Values:

Zero code changes (proxy mode)
Provider-agnostic (supports OpenAI/Anthropic/AWS/Google etc.)
Reversible compression (retrieve original data on demand)
Deterministic compression (no LLM calls, predictable results)

Core Compression Capabilities#

Feature	Description	Compression Rate
SmartCrusher	Universal JSON compression, statistical array pattern analysis, preserves errors/anomalies/boundary values	70-95%
CodeCompressor	AST-aware code compression, supports Python/JS/Go/Rust/Java/C++	-
LLMLingua-2	ML-based text compression (optional)	~20x
Image Compression	Image token optimization with trained ML Router for automatic strategy selection	40-90%
CCR Mechanism	Reversible compression with original data caching, LLM can retrieve on demand via tool calls	-

Context Management#

Intelligent Context Manager: Multi-factor importance scoring for message management (TOIN learning mode, semantic similarity, error metrics)
CacheAligner: Extracts dynamic content (dates, UUIDs) to stabilize prefixes, optimizing KV Cache hit rates
Rolling Window: Token limit-based rolling window management
Compression Summaries: Generates summaries of omitted content (e.g., "87 passed, 2 failed, 1 error")

Installation & Quick Start#

pip install headroom-ai                 # Core library
pip install "headroom-ai[all]"          # Full installation (recommended)
pip install "headroom-ai[proxy]"        # Proxy server
pip install "headroom-ai[mcp]"          # Claude Code MCP integration
pip install "headroom-ai[langchain]"    # LangChain integration

Proxy Mode (Zero Code Changes)#

headroom proxy --port 8787
ANTHROPIC_BASE_URL=http://localhost:8787 claude
OPENAI_BASE_URL=http://localhost:8787/v1 cursor

Python SDK#

from headroom import compress

result = compress(messages, model="claude-sonnet-4-5-20250929")
response = client.messages.create(
    model="claude-sonnet-4-5-20250929", 
    messages=result.messages
)
print(f"Saved {result.tokens_saved} tokens ({result.compression_ratio:.0%})")

Typical Use Cases & Results#

Scenario	Before	After	Savings
Code search (100 results)	17,765 tokens	1,408 tokens	92%
SRE incident debugging	65,694 tokens	5,118 tokens	92%
Codebase exploration	78,502 tokens	41,254 tokens	47%
GitHub issue classification	54,174 tokens	14,761 tokens	73%

Applicable Scenarios: AI agent tool output compression (logs, search results, database queries), long conversation context management, multi-modal application image token optimization, RAG system context budget control.

Operations Features#

Prometheus metrics endpoint
Request logging and cost tracking
Budget limits and rate control

Cloud Provider Support#

headroom proxy --backend bedrock --region us-east-1     # AWS Bedrock
headroom proxy --backend vertex_ai --region us-central1 # Google Vertex
headroom proxy --backend azure                          # Azure OpenAI
headroom proxy --backend openrouter                     # OpenRouter

Overview#

Core Compression Capabilities#

Context Management#

Installation & Quick Start#

Proxy Mode (Zero Code Changes)#

Python SDK#

Typical Use Cases & Results#

Operations Features#

Cloud Provider Support#

Related Projects

oh-my-codex

Ironcurtain

vibe-remote

STAY UPDATED