DISCOVER THE FUTURE OF AI AGENTSarrow_forward

Headroom

calendar_todayAdded Feb 24, 2026
categoryAgent & Tooling
codeOpen Source
PythonWorkflow Automation大语言模型MultimodalSDKAgent & ToolingModel & Inference FrameworkDeveloper Tools & CodingProtocol, API & Integration

The Context Optimization Layer for LLM Applications, delivering 40-90% token reduction through deterministic compression and intelligent caching with multi-modal support and reversible CCR mechanism

Overview#

Headroom is a context optimization middleware designed for LLM applications, using deterministic compression algorithms (non-LLM compression) combined with CCR (Compress-Cache-Retrieve) mechanism to optimize context before sending requests to LLMs.

Core Values:

  • Zero code changes (proxy mode)
  • Provider-agnostic (supports OpenAI/Anthropic/AWS/Google etc.)
  • Reversible compression (retrieve original data on demand)
  • Deterministic compression (no LLM calls, predictable results)

Core Compression Capabilities#

FeatureDescriptionCompression Rate
SmartCrusherUniversal JSON compression, statistical array pattern analysis, preserves errors/anomalies/boundary values70-95%
CodeCompressorAST-aware code compression, supports Python/JS/Go/Rust/Java/C++-
LLMLingua-2ML-based text compression (optional)~20x
Image CompressionImage token optimization with trained ML Router for automatic strategy selection40-90%
CCR MechanismReversible compression with original data caching, LLM can retrieve on demand via tool calls-

Context Management#

  • Intelligent Context Manager: Multi-factor importance scoring for message management (TOIN learning mode, semantic similarity, error metrics)
  • CacheAligner: Extracts dynamic content (dates, UUIDs) to stabilize prefixes, optimizing KV Cache hit rates
  • Rolling Window: Token limit-based rolling window management
  • Compression Summaries: Generates summaries of omitted content (e.g., "87 passed, 2 failed, 1 error")

Installation & Quick Start#

pip install headroom-ai                 # Core library
pip install "headroom-ai[all]"          # Full installation (recommended)
pip install "headroom-ai[proxy]"        # Proxy server
pip install "headroom-ai[mcp]"          # Claude Code MCP integration
pip install "headroom-ai[langchain]"    # LangChain integration

Proxy Mode (Zero Code Changes)#

headroom proxy --port 8787
ANTHROPIC_BASE_URL=http://localhost:8787 claude
OPENAI_BASE_URL=http://localhost:8787/v1 cursor

Python SDK#

from headroom import compress

result = compress(messages, model="claude-sonnet-4-5-20250929")
response = client.messages.create(
    model="claude-sonnet-4-5-20250929", 
    messages=result.messages
)
print(f"Saved {result.tokens_saved} tokens ({result.compression_ratio:.0%})")

Typical Use Cases & Results#

ScenarioBeforeAfterSavings
Code search (100 results)17,765 tokens1,408 tokens92%
SRE incident debugging65,694 tokens5,118 tokens92%
Codebase exploration78,502 tokens41,254 tokens47%
GitHub issue classification54,174 tokens14,761 tokens73%

Applicable Scenarios: AI agent tool output compression (logs, search results, database queries), long conversation context management, multi-modal application image token optimization, RAG system context budget control.

Operations Features#

  • Prometheus metrics endpoint
  • Request logging and cost tracking
  • Budget limits and rate control

Cloud Provider Support#

headroom proxy --backend bedrock --region us-east-1     # AWS Bedrock
headroom proxy --backend vertex_ai --region us-central1 # Google Vertex
headroom proxy --backend azure                          # Azure OpenAI
headroom proxy --backend openrouter                     # OpenRouter

Related Projects

View All arrow_forward

STAY UPDATED

Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.

rocket_launch