An autonomous iterative improvement engine for Claude Code / OpenCode / OpenAI Codex that drives continuous codebase optimization through atomic Modify → Verify → Keep/Discard loops.
Claude Autoresearch is a Markdown-based Skill/Plugin definition (~5,000+ lines) inspired by Andrej Karpathy's autoresearch methodology. It abstracts "codebase improvement" into a rigorous scientific experiment workflow: after defining quantified goals and verification commands, the Agent executes one atomic change per iteration, validates with mechanical metrics, auto-reverts on failure, and keeps + continues on success.
Core Autonomous Loop#
- Modify → Verify → Keep/Discard atomic loop: Single change per iteration, Git-tracked experiments, auto-revert on failure, runs indefinitely or N iterations with summary
- Noise handling: Median across multiple runs, minimum delta threshold, confirmation runs
- Crash recovery: Auto-fix up to 3 attempts, skip and continue on failure
- Stuck detection & escalation: After 5 consecutive discards — re-read all files, combine prior successful changes, attempt reverse/aggressive strategies
10 Sub-Commands#
| Command | Function | Key Method |
|---|---|---|
/autoresearch | Core autonomous improvement loop | Goal→Modify→Verify→Keep/Discard |
/autoresearch:plan | Interactive config wizard | Natural language → Scope/Metric/Verify config |
/autoresearch:debug | Autonomous bug hunting | Scientific method + 7 investigation techniques |
/autoresearch:fix | Autonomous error fixing | Fix one by one until zero errors |
/autoresearch:security | Autonomous security audit | STRIDE + OWASP + red teaming |
/autoresearch:ship | Universal shipping workflow | 9 delivery types supported |
/autoresearch:scenario | Scenario-driven use case generator | 12 dimensions, 5 domains |
/autoresearch:predict | Multi-persona prediction | 5 experts independently analyze then debate to converge |
/autoresearch:learn | Autonomous documentation engine | Scan codebase, generate/update/validate docs |
/autoresearch:reason | Adversarial refinement | Blind review panel via multi-agent debate for subjective content |
Safety & Orchestration#
- Guard command: Optional safety net ensuring changes don't break existing tests
- Command chaining: Sub-commands can be chained (e.g.,
predict → debug → fix → ship) - MCP Server integration: Any configured MCP Server callable during loops (databases, analytics, external APIs, etc.)
Core Configuration#
| Parameter | Description |
|---|---|
Goal | Natural language improvement target |
Scope | Glob pattern for modifiable files |
Metric | Mechanical quantified metric (must output number) |
Verify | Verification shell command (extractable numeric output) |
Direction | Optimization direction (higher/lower is better) |
Guard | Optional safety command, must always pass |
Iterations: N | Bounded mode run count limit |
Installation#
Claude Code (Plugin):
/plugin marketplace add uditgoenka/autoresearch
/plugin install autoresearch@autoresearch
OpenCode / OpenAI Codex: Via ./scripts/install.sh --opencode|--codex --global.
Architecture#
Not executable code, but a Markdown protocol/Skill definition driven by host Agent's native toolchain (Read, Edit, Write, Bash, Git). Core loop protocol: 8 stages — Plan → Loop → Debug → Fix → Secure → Ship. Follows 7 principles: constraints as empowerment, strategy ≠ tactics, mechanical metrics, fast verification, iteration cost drives behavior, Git as memory, honest limitations. Results recorded as Git commits (experiment: prefix) and TSV logs.
Unconfirmed Information#
- Minimum Claude Code version for Plugin Marketplace not specified in README
- No independent paper published (Karpathy's original has arXiv:2603.07300; this is a derivative work)
- Full sub-command compatibility on OpenCode/Codex not verified