DISCOVER THE FUTURE OF AI AGENTSarrow_forward

Inspect

calendar_todayAdded Feb 25, 2026
categoryAgent & Tooling
codeOpen Source
Python大语言模型AI AgentsSDKCLIAgent & ToolingModel & Inference FrameworkModel Training & InferenceSecurity & Privacy

An open-source framework for large language model evaluations from the UK AI Safety Institute, featuring a modular Datasets/Solvers/Scorers architecture, multi-model/tool support, sandboxed execution, and 100+ pre-built benchmarks.

Overview#

Inspect is an open-source framework for large language model evaluations developed by the UK AI Safety Institute (AISI). It aims to provide unified, extensible evaluation standards and tooling. The project uses MIT License and is hosted on the UK Government's official GitHub organization.

Core Architecture#

Modular design centered on Task: Dataset (Input) -> Solver (Processing/Reasoning) -> Scorer (Evaluation) -> Result

Three Core Components:

  • Datasets: Labeled samples with prompts as input and literal values or scoring guides as targets
  • Solvers: Chainable execution units (generate(), chain_of_thought(), self_critique())
  • Scorers: Support for Exact Match, Model Graded, and custom scoring

Key Features#

Agent & Tool Support#

  • Tool Calling: Custom Tools, MCP Tools, Bash, Python, Web Search/Browsing, Computer Tools
  • Agent Evaluations: Built-in ReAct Agent, Multi-Agent, external agents (Claude Code, Codex CLI, Gemini CLI)
  • Sandboxed Execution: Docker, Kubernetes, Modal, Proxmox
  • Tool Approval: Fine-grained tool call approval policies

Model Provider Support#

TypeProviders
Commercial APIsOpenAI, Anthropic, Google, Grok, Mistral, AWS Bedrock, Azure AI, TogetherAI, Groq
Local/Open SourcevLLM, Ollama, llama-cpp-python, HuggingFace

Pre-built Evaluation Library (100+)#

  • Safeguards: AgentHarm, StrongREJECT, WMDP
  • Coding: HumanEval, SWE-bench, BigCodeBench
  • Knowledge: MMLU, GPQA, TruthfulQA
  • Mathematics: AIME, GSM8K, MATH
  • Reasoning: ARC, BBH, DROP
  • Assistants: GAIA, OSWorld, Mind2Web

Developer Tools#

  • CLI: inspect eval, inspect view
  • Inspect View: Web-based evaluation monitoring and visualization
  • VS Code Extension: Evaluation authoring, debugging, and visualization

Installation & Usage#

pip install inspect-ai
export OPENAI_API_KEY=your-key
inspect eval examples/task.py --model openai/gpt-4o

Technical Specifications#

AttributeValue
DeveloperUK AI Security Institute
LicenseMIT License
Primary LanguagesPython (81%), TypeScript (17.3%)
Python Version>= 3.10
Initial Release2024-05

Extension Mechanism#

Extensible via Python packages: Elicitation/Scoring techniques, Model APIs, Tool Execution Environments, Storage Platforms

Related Projects

View All arrow_forward

STAY UPDATED

Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.

rocket_launch