An AI Agent-driven automated experiment framework that points at any code repo, autonomously analyzes, designs, runs experiments, and keeps improvements that work
PaperFarm abstracts the full research experiment lifecycle into a four-stage automated pipeline: Scout analyzes the codebase and designs evaluation metrics, Prepare auto-resolves the environment and dependencies, Review presents the plan in an interactive TUI for user confirmation, and Experiment runs a Manager → Critic → Experiment loop that only retains metric-improving changes.
The framework supports six Agent backends—Claude Code, Codex CLI, Aider, OpenCode, Kimi CLI, and Gemini CLI—with auto-detection when unspecified. Safety mechanisms include isolated Git commits per experiment, automatic rollback on failure, timeout watchdogs, crash counters (default pause after 3 consecutive crashes), Git worktree-isolated parallel Workers for multi-GPU scenarios, and a persistent failure ledger ranked by recovery success rate.
The interaction layer provides a Textual-based three-tab TUI command center (Execution/Metrics/Logs, with braille trend charts and color-coded event streams) alongside a Headless mode outputting JSON Lines for CI/CD integration. A built-in demo (paperfarm demo) allows experiencing the full TUI without any Agent or API Key.
The architecture uses entry-points-based plugin design covering nine extension points: storage, graph, scheduler, agents, orchestrator, execution, bootstrap, cli, and tui. All runtime state is stored in the .research/ directory, supporting checkpoint recovery.
Use cases span ML hyperparameter search and architecture optimization, research reproduction (baseline→experiment loop), non-ML performance optimization (e.g., Python JSON parser throughput), and GPU cluster parallel experiment scheduling. Official examples cover nanoGPT, Liger-Kernel, HF GLUE, CIFAR-10 Speedrun, YOLO Tiny, Whisper fine-tuning, CartPole RL, and Code Perf.
Requires Python 3.10–3.13, built with Hatchling, primary language composition Python 96.8% / Jinja 2.3%. Currently in Alpha stage (v0.2.0b1, released 2026-03-09).
Unconfirmed: pyproject.toml Homepage/Repository points to https://github.com/open-researcher/open-researcher, inconsistent with the actual repo https://github.com/shatianming5/PaperFarm; an open_researcher_v2 sub-package with a separate CLI entry exists but its relationship to v1 is undocumented; PyPI page accessibility not directly verified.