Claude Autoresearch

An autonomous iterative improvement engine for Claude Code / OpenCode / OpenAI Codex that drives continuous codebase optimization through atomic Modify → Verify → Keep/Discard loops.

Claude Autoresearch is a Markdown-based Skill/Plugin definition (~5,000+ lines) inspired by Andrej Karpathy's autoresearch methodology. It abstracts "codebase improvement" into a rigorous scientific experiment workflow: after defining quantified goals and verification commands, the Agent executes one atomic change per iteration, validates with mechanical metrics, auto-reverts on failure, and keeps + continues on success.

Core Autonomous Loop#

Modify → Verify → Keep/Discard atomic loop: Single change per iteration, Git-tracked experiments, auto-revert on failure, runs indefinitely or N iterations with summary
Noise handling: Median across multiple runs, minimum delta threshold, confirmation runs
Crash recovery: Auto-fix up to 3 attempts, skip and continue on failure
Stuck detection & escalation: After 5 consecutive discards — re-read all files, combine prior successful changes, attempt reverse/aggressive strategies

10 Sub-Commands#

Command	Function	Key Method
`/autoresearch`	Core autonomous improvement loop	Goal→Modify→Verify→Keep/Discard
`/autoresearch:plan`	Interactive config wizard	Natural language → Scope/Metric/Verify config
`/autoresearch:debug`	Autonomous bug hunting	Scientific method + 7 investigation techniques
`/autoresearch:fix`	Autonomous error fixing	Fix one by one until zero errors
`/autoresearch:security`	Autonomous security audit	STRIDE + OWASP + red teaming
`/autoresearch:ship`	Universal shipping workflow	9 delivery types supported
`/autoresearch:scenario`	Scenario-driven use case generator	12 dimensions, 5 domains
`/autoresearch:predict`	Multi-persona prediction	5 experts independently analyze then debate to converge
`/autoresearch:learn`	Autonomous documentation engine	Scan codebase, generate/update/validate docs
`/autoresearch:reason`	Adversarial refinement	Blind review panel via multi-agent debate for subjective content

Safety & Orchestration#

Guard command: Optional safety net ensuring changes don't break existing tests
Command chaining: Sub-commands can be chained (e.g., predict → debug → fix → ship)
MCP Server integration: Any configured MCP Server callable during loops (databases, analytics, external APIs, etc.)

Core Configuration#

Parameter	Description
`Goal`	Natural language improvement target
`Scope`	Glob pattern for modifiable files
`Metric`	Mechanical quantified metric (must output number)
`Verify`	Verification shell command (extractable numeric output)
`Direction`	Optimization direction (higher/lower is better)
`Guard`	Optional safety command, must always pass
`Iterations: N`	Bounded mode run count limit

Installation#

Claude Code (Plugin):

/plugin marketplace add uditgoenka/autoresearch
/plugin install autoresearch@autoresearch

OpenCode / OpenAI Codex: Via ./scripts/install.sh --opencode|--codex --global.

Architecture#

Not executable code, but a Markdown protocol/Skill definition driven by host Agent's native toolchain (Read, Edit, Write, Bash, Git). Core loop protocol: 8 stages — Plan → Loop → Debug → Fix → Secure → Ship. Follows 7 principles: constraints as empowerment, strategy ≠ tactics, mechanical metrics, fast verification, iteration cost drives behavior, Git as memory, honest limitations. Results recorded as Git commits (experiment: prefix) and TSV logs.

Unconfirmed Information#

Minimum Claude Code version for Plugin Marketplace not specified in README
No independent paper published (Karpathy's original has arXiv:2603.07300; this is a derivative work)
Full sub-command compatibility on OpenCode/Codex not verified

Core Autonomous Loop#

10 Sub-Commands#

Safety & Orchestration#

Core Configuration#

Installation#

Architecture#

Unconfirmed Information#

Related Projects

Zylos Core

verl

Kalshi AI Trading Bot

STAY UPDATED