Inspect

An open-source framework for large language model evaluations from the UK AI Safety Institute, featuring a modular Datasets/Solvers/Scorers architecture, multi-model/tool support, sandboxed execution, and 100+ pre-built benchmarks.

Overview#

Inspect is an open-source framework for large language model evaluations developed by the UK AI Safety Institute (AISI). It aims to provide unified, extensible evaluation standards and tooling. The project uses MIT License and is hosted on the UK Government's official GitHub organization.

Core Architecture#

Modular design centered on Task: Dataset (Input) -> Solver (Processing/Reasoning) -> Scorer (Evaluation) -> Result

Three Core Components:

Datasets: Labeled samples with prompts as input and literal values or scoring guides as targets
Solvers: Chainable execution units (generate(), chain_of_thought(), self_critique())
Scorers: Support for Exact Match, Model Graded, and custom scoring

Key Features#

Agent & Tool Support#

Tool Calling: Custom Tools, MCP Tools, Bash, Python, Web Search/Browsing, Computer Tools
Agent Evaluations: Built-in ReAct Agent, Multi-Agent, external agents (Claude Code, Codex CLI, Gemini CLI)
Sandboxed Execution: Docker, Kubernetes, Modal, Proxmox
Tool Approval: Fine-grained tool call approval policies

Model Provider Support#

Type	Providers
Commercial APIs	OpenAI, Anthropic, Google, Grok, Mistral, AWS Bedrock, Azure AI, TogetherAI, Groq
Local/Open Source	vLLM, Ollama, llama-cpp-python, HuggingFace

Pre-built Evaluation Library (100+)#

Safeguards: AgentHarm, StrongREJECT, WMDP
Coding: HumanEval, SWE-bench, BigCodeBench
Knowledge: MMLU, GPQA, TruthfulQA
Mathematics: AIME, GSM8K, MATH
Reasoning: ARC, BBH, DROP
Assistants: GAIA, OSWorld, Mind2Web

Developer Tools#

CLI: inspect eval, inspect view
Inspect View: Web-based evaluation monitoring and visualization
VS Code Extension: Evaluation authoring, debugging, and visualization

Installation & Usage#

pip install inspect-ai
export OPENAI_API_KEY=your-key
inspect eval examples/task.py --model openai/gpt-4o

Technical Specifications#

Attribute	Value
Developer	UK AI Security Institute
License	MIT License
Primary Languages	Python (81%), TypeScript (17.3%)
Python Version	>= 3.10
Initial Release	2024-05

Extension Mechanism#

Extensible via Python packages: Elicitation/Scoring techniques, Model APIs, Tool Execution Environments, Storage Platforms

Overview#

Core Architecture#

Key Features#

Agent & Tool Support#

Model Provider Support#

Pre-built Evaluation Library (100+)#

Developer Tools#

Installation & Usage#

Technical Specifications#

Extension Mechanism#

Related Projects

oh-my-codex

Ironcurtain

vibe-remote

STAY UPDATED