SanityHarness

A lightweight evaluation harness for coding agents featuring Docker isolation and weighted scoring, supporting 6 programming languages and 19 major coding agents.

SanityHarness is a CLI tool written in Go designed to standardize the evaluation of LLM coding agents.

Core Capabilities

Docker container isolation for secure task execution
26 coding tasks across 6 languages: Go, Rust, TypeScript, Kotlin, Dart, Zig
Built-in integration for 19 coding agents (Claude Code, Gemini, Codex CLI, Cline, Copilot, Kimi, Qwen, Goose, Junie, Kilocode, Amp, Crush, Pi, etc.)
Difficulty-based weighted scoring system for fair comparison
BLAKE3 hash integrity verification to prevent result tampering
Bubblewrap sandbox isolation to limit agent system access
Parallel evaluation (--parallel), Watch mode, resumable runs

Use Cases

Regression testing and capability assessment for coding agent development teams
Comparing different LLMs on code generation tasks for researchers
Benchmarking before selecting coding assistance tools for enterprises

Requirements

Go 1.25+
Docker (running daemon)
bubblewrap (optional, for agent sandbox isolation)

Quick Start

git clone https://github.com/lemon07r/sanityharness.git
cd sanityharness
make tools && make build
./sanity list
./sanity eval --agent gemini --tier all --parallel 4

Core Commands

./sanity list [--language <lang>] [--tier <tier>] - List tasks
./sanity run <task> [--watch] - Run single task
./sanity eval --agent <name> [--model <model>] [--parallel N] - Evaluate agent
./sanity show <session-path> - View results
./sanity verify <path> - Verify submission integrity

Architecture

CLI Layer: Built on Cobra
Task System: Task files embedded at compile time for zero-dependency distribution
Runtime: Containers stay running, reused via docker exec to reduce overhead
Config: Supports ./sanity.toml, ~/.sanity.toml, ~/.config/sanity/config.toml

Output Structure

summary.json - Complete results with weighted scores
attestation.json - BLAKE3 hash verification
report.md - Human-readable report
submission.json - Leaderboard format submission file

Project Info

Current Version: v1.8.2
License: MIT License
Primary Language: Go (73.1%)
Official Leaderboard: https://sanityboard.lr7.dev/

Related Projects

oh-my-codex

Ironcurtain

vibe-remote

STAY UPDATED