DISCOVER THE FUTURE OF AI AGENTS

Claude Autoresearch

Added Apr 25, 2026
Agent & Tooling
Open Source
Workflow AutomationModel Context ProtocolAI AgentsAgent FrameworkCLIAgent & ToolingDeveloper Tools & CodingAutomation, Workflow & RPA

An autonomous iterative improvement engine for Claude Code / OpenCode / OpenAI Codex that drives continuous codebase optimization through atomic Modify → Verify → Keep/Discard loops.

Claude Autoresearch is a Markdown-based Skill/Plugin definition (~5,000+ lines) inspired by Andrej Karpathy's autoresearch methodology. It abstracts "codebase improvement" into a rigorous scientific experiment workflow: after defining quantified goals and verification commands, the Agent executes one atomic change per iteration, validates with mechanical metrics, auto-reverts on failure, and keeps + continues on success.

Core Autonomous Loop#

  • Modify → Verify → Keep/Discard atomic loop: Single change per iteration, Git-tracked experiments, auto-revert on failure, runs indefinitely or N iterations with summary
  • Noise handling: Median across multiple runs, minimum delta threshold, confirmation runs
  • Crash recovery: Auto-fix up to 3 attempts, skip and continue on failure
  • Stuck detection & escalation: After 5 consecutive discards — re-read all files, combine prior successful changes, attempt reverse/aggressive strategies

10 Sub-Commands#

CommandFunctionKey Method
/autoresearchCore autonomous improvement loopGoal→Modify→Verify→Keep/Discard
/autoresearch:planInteractive config wizardNatural language → Scope/Metric/Verify config
/autoresearch:debugAutonomous bug huntingScientific method + 7 investigation techniques
/autoresearch:fixAutonomous error fixingFix one by one until zero errors
/autoresearch:securityAutonomous security auditSTRIDE + OWASP + red teaming
/autoresearch:shipUniversal shipping workflow9 delivery types supported
/autoresearch:scenarioScenario-driven use case generator12 dimensions, 5 domains
/autoresearch:predictMulti-persona prediction5 experts independently analyze then debate to converge
/autoresearch:learnAutonomous documentation engineScan codebase, generate/update/validate docs
/autoresearch:reasonAdversarial refinementBlind review panel via multi-agent debate for subjective content

Safety & Orchestration#

  • Guard command: Optional safety net ensuring changes don't break existing tests
  • Command chaining: Sub-commands can be chained (e.g., predict → debug → fix → ship)
  • MCP Server integration: Any configured MCP Server callable during loops (databases, analytics, external APIs, etc.)

Core Configuration#

ParameterDescription
GoalNatural language improvement target
ScopeGlob pattern for modifiable files
MetricMechanical quantified metric (must output number)
VerifyVerification shell command (extractable numeric output)
DirectionOptimization direction (higher/lower is better)
GuardOptional safety command, must always pass
Iterations: NBounded mode run count limit

Installation#

Claude Code (Plugin):

/plugin marketplace add uditgoenka/autoresearch
/plugin install autoresearch@autoresearch

OpenCode / OpenAI Codex: Via ./scripts/install.sh --opencode|--codex --global.

Architecture#

Not executable code, but a Markdown protocol/Skill definition driven by host Agent's native toolchain (Read, Edit, Write, Bash, Git). Core loop protocol: 8 stages — Plan → Loop → Debug → Fix → Secure → Ship. Follows 7 principles: constraints as empowerment, strategy ≠ tactics, mechanical metrics, fast verification, iteration cost drives behavior, Git as memory, honest limitations. Results recorded as Git commits (experiment: prefix) and TSV logs.

Unconfirmed Information#

  • Minimum Claude Code version for Plugin Marketplace not specified in README
  • No independent paper published (Karpathy's original has arXiv:2603.07300; this is a derivative work)
  • Full sub-command compatibility on OpenCode/Codex not verified

Related Projects

View All

STAY UPDATED

Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.