Secure, cloud-sandboxed Recursive Language Models (RLM) framework with DSPy and Modal for long-context code and document processing, offering Web UI, CLI, HTTP API, WebSocket, and MCP Server interfaces.
Overview#
Fleet-RLM is a Recursive Language Models (RLM) implementation built on DSPy declarative programming framework and Modal cloud sandbox. Originating from MIT/Stanford RLM research (arXiv:2512.24601), maintained by Qredence organization, currently at version v0.4.8 with 422+ commits in active development.
Core Features#
Web UI Interaction (fleet web)
- Browser-based conversation with RLM-powered agents
- Runtime configuration (LM / Modal) directly in UI
- Access at: http://localhost:8000
Secure Sandbox Execution
- Modal-based cloud sandbox for secure code execution
- Recursive long-context task processing,突破ing model context window limits
- Isolated runtime environment for untrusted code
Document Analysis
- PDF ingestion (MarkItDown + pypdf fallback)
- Large document intelligent processing
Execution Observability
- Streaming output of execution events and traces
- WebSocket execution stream (
/api/v1/ws/execution) - Real-time monitoring of agent execution
Multiple Interface Support
- CLI:
fleet/fleet-rlm/rlm-modalcommands - HTTP API: FastAPI REST service
- MCP Server:
fleet-rlm serve-mcp --transport stdio(Claude Desktop integration) - WebSocket: Real-time chat and execution streams
Multi-tenant Persistence
- Neon database + RLS (Row-Level Security) isolation
- Tenant-aware run/step/artifact/memory persistence
Architecture Design#
Four-layer architecture:
| Layer | Core Modules | Responsibility |
|---|---|---|
| Orchestration | react/agent.py, streaming.py, tools*.py | ReAct loop orchestration, tool dispatch, trace and stream event generation |
| Execution | core/interpreter.py, driver.py, driver_factories.py | Modal remote code execution, execution config, sandbox protocol handling |
| Service | server/routers/*, deps.py, config.py | HTTP/WebSocket transport, auth, runtime settings, session lifecycle |
| Persistence | db/*, migrations/ | Tenant-aware persistence, RLS isolation |
Installation & Quick Start#
Recommended (CLI Tool):
uv tool install fleet-rlm
fleet web
Standard Installation:
uv pip install fleet-rlm
fleet web
From Source:
uv sync --extra dev --extra server
uv run fleet web
Additional Commands:
fleet-rlm run-basic --question "What are the first 12 Fibonacci numbers?"
fleet-rlm serve-api --port 8000
fleet-rlm serve-mcp --transport stdio
fleet-rlm chat
Typical Use Cases#
- Long-context Code Tasks: Code analysis/generation beyond model context window
- Document Intelligence: Large PDF document ingestion and analysis
- AI Agent Development: Building agents with recursive delegation capabilities
- Secure Code Execution: Running untrusted code in isolated sandbox
- MCP Integration: MCP server integration with Claude Desktop and similar clients
Key Dependencies#
- DSPy 3.1.3 (Declarative LM programming framework)
- Modal 1.3.2+ (Cloud sandbox)
- FastAPI + Uvicorn (HTTP service)
- WebSockets (Real-time communication)
- SQLModel + SQLAlchemy + asyncpg (Database)
- LiteLLM (Multi-model support)
- pgvector (Vector storage)
Open Source License#
MIT License
Runtime Configuration#
- LM (Language Model) settings
- Modal secrets configuration
- Authentication mode (Dev vs Entra)
- Neon DB connection