A lightweight evaluation harness for coding agents featuring Docker isolation and weighted scoring, supporting 6 programming languages and 19 major coding agents.
SanityHarness is a CLI tool written in Go designed to standardize the evaluation of LLM coding agents.
Core Capabilities
- Docker container isolation for secure task execution
- 26 coding tasks across 6 languages: Go, Rust, TypeScript, Kotlin, Dart, Zig
- Built-in integration for 19 coding agents (Claude Code, Gemini, Codex CLI, Cline, Copilot, Kimi, Qwen, Goose, Junie, Kilocode, Amp, Crush, Pi, etc.)
- Difficulty-based weighted scoring system for fair comparison
- BLAKE3 hash integrity verification to prevent result tampering
- Bubblewrap sandbox isolation to limit agent system access
- Parallel evaluation (--parallel), Watch mode, resumable runs
Use Cases
- Regression testing and capability assessment for coding agent development teams
- Comparing different LLMs on code generation tasks for researchers
- Benchmarking before selecting coding assistance tools for enterprises
Requirements
- Go 1.25+
- Docker (running daemon)
- bubblewrap (optional, for agent sandbox isolation)
Quick Start
git clone https://github.com/lemon07r/sanityharness.git
cd sanityharness
make tools && make build
./sanity list
./sanity eval --agent gemini --tier all --parallel 4
Core Commands
./sanity list [--language <lang>] [--tier <tier>]- List tasks./sanity run <task> [--watch]- Run single task./sanity eval --agent <name> [--model <model>] [--parallel N]- Evaluate agent./sanity show <session-path>- View results./sanity verify <path>- Verify submission integrity
Architecture
- CLI Layer: Built on Cobra
- Task System: Task files embedded at compile time for zero-dependency distribution
- Runtime: Containers stay running, reused via
docker execto reduce overhead - Config: Supports
./sanity.toml,~/.sanity.toml,~/.config/sanity/config.toml
Output Structure
summary.json- Complete results with weighted scoresattestation.json- BLAKE3 hash verificationreport.md- Human-readable reportsubmission.json- Leaderboard format submission file
Project Info
- Current Version: v1.8.2
- License: MIT License
- Primary Language: Go (73.1%)
- Official Leaderboard: https://sanityboard.lr7.dev/