Arthur Engine

An open-source AI monitoring and governance engine providing LLM hallucination detection, PII identification, prompt injection defense, and traditional ML model evaluation, featuring real-time guardrails and OpenInference support.

Overview#

Developed by Arthur AI, Arthur Engine is a comprehensive service framework for monitoring and governing AI/ML workloads. It supports evaluation, benchmarking, and real-time guardrails for both traditional machine learning and Generative AI applications.

Core Capabilities#

GenAI Evaluation#

Hallucination Detection: Claim-based LLM Judge technology
Response Quality: Measures relevance, token count, latency

Security & Compliance#

Prompt Injection Detection: Deberta-v3-base-prompt-injection-v2
Toxicity Detection: RoBERTa toxicity classifier
Sensitive Data Identification: Few-shot optimized LLM Judge
PII Identification: Presidio-based named entity recognition

ML Evaluation#

Model Monitoring: Drift, accuracy, precision, recall, F1, AUC
Analysis Tools: Model comparison, feature importance analysis, optimization area identification

Real-time Guardrails#

Configurable real-time detection metrics
Plugin-based extension support for custom models (including HuggingFace)

Deployment#

Docker Compose Quick Start#

git clone https://github.com/arthur-ai/arthur-engine.git
cd arthur-engine/deployment/docker-compose/genai-engine
cp .env.template .env
docker compose up
# Access at http://localhost:3030/docs

Development Setup#

pip install poetry
cd genai-engine
poetry shell && poetry env use 3.12
poetry install
docker compose up  # Start Postgres
poetry run serve

Prerequisites: Docker Desktop, OpenAI-compatible GPT model access

API Usage#

POST /api/v2/task: Create new LLM application tasks
POST /api/v2/tasks/{task_id}/rules: Configure evaluation rules
Task Based Validation endpoints: Submit LLM prompts and responses for evaluation

Authentication: Use GENAI_ENGINE_ADMIN_KEY via the Authorize button on /docs page

Ecosystem Integration#

Full OpenInference specification support, connecting to: LangChain, LangGraph, LlamaIndex, Vercel AI SDK, FastAPI/Flask applications, OpenAI, Anthropic, Google and other model providers.

Project Architecture#

arthur-engine/
├── genai-engine/      # GenAI engine core
├── ml-engine/         # ML engine
├── deployment/        # Docker Compose / CloudFormation / Helm
└── docs/              # Documentation

Tech Stack: Python (55.4%) / TypeScript (43.6%), FastAPI/Uvicorn, PostgreSQL + Alembic, Poetry, Docker

Example Notebooks: https://github.com/arthur-ai/example-shield-notebooks
Agent Demo: https://github.com/arthur-ai/shield-autogen-agent-demo
Shield User Guide: https://shield.docs.arthur.ai/