PaperFarm

An AI Agent-driven automated experiment framework that points at any code repo, autonomously analyzes, designs, runs experiments, and keeps improvements that work

PaperFarm abstracts the full research experiment lifecycle into a four-stage automated pipeline: Scout analyzes the codebase and designs evaluation metrics, Prepare auto-resolves the environment and dependencies, Review presents the plan in an interactive TUI for user confirmation, and Experiment runs a Manager → Critic → Experiment loop that only retains metric-improving changes.

The framework supports six Agent backends—Claude Code, Codex CLI, Aider, OpenCode, Kimi CLI, and Gemini CLI—with auto-detection when unspecified. Safety mechanisms include isolated Git commits per experiment, automatic rollback on failure, timeout watchdogs, crash counters (default pause after 3 consecutive crashes), Git worktree-isolated parallel Workers for multi-GPU scenarios, and a persistent failure ledger ranked by recovery success rate.

The interaction layer provides a Textual-based three-tab TUI command center (Execution/Metrics/Logs, with braille trend charts and color-coded event streams) alongside a Headless mode outputting JSON Lines for CI/CD integration. A built-in demo (paperfarm demo) allows experiencing the full TUI without any Agent or API Key.

The architecture uses entry-points-based plugin design covering nine extension points: storage, graph, scheduler, agents, orchestrator, execution, bootstrap, cli, and tui. All runtime state is stored in the .research/ directory, supporting checkpoint recovery.

Use cases span ML hyperparameter search and architecture optimization, research reproduction (baseline→experiment loop), non-ML performance optimization (e.g., Python JSON parser throughput), and GPU cluster parallel experiment scheduling. Official examples cover nanoGPT, Liger-Kernel, HF GLUE, CIFAR-10 Speedrun, YOLO Tiny, Whisper fine-tuning, CartPole RL, and Code Perf.

Requires Python 3.10–3.13, built with Hatchling, primary language composition Python 96.8% / Jinja 2.3%. Currently in Alpha stage (v0.2.0b1, released 2026-03-09).

Unconfirmed: pyproject.toml Homepage/Repository points to https://github.com/open-researcher/open-researcher, inconsistent with the actual repo https://github.com/shatianming5/PaperFarm; an open_researcher_v2 sub-package with a separate CLI entry exists but its relationship to v1 is undocumented; PyPI page accessibility not directly verified.

Related Projects

Zylos Core

verl

Kalshi AI Trading Bot

STAY UPDATED