An open-source AI monitoring and governance engine providing LLM hallucination detection, PII identification, prompt injection defense, and traditional ML model evaluation, featuring real-time guardrails and OpenInference support.
Overview#
Developed by Arthur AI, Arthur Engine is a comprehensive service framework for monitoring and governing AI/ML workloads. It supports evaluation, benchmarking, and real-time guardrails for both traditional machine learning and Generative AI applications.
Core Capabilities#
GenAI Evaluation#
- Hallucination Detection: Claim-based LLM Judge technology
- Response Quality: Measures relevance, token count, latency
Security & Compliance#
- Prompt Injection Detection: Deberta-v3-base-prompt-injection-v2
- Toxicity Detection: RoBERTa toxicity classifier
- Sensitive Data Identification: Few-shot optimized LLM Judge
- PII Identification: Presidio-based named entity recognition
ML Evaluation#
- Model Monitoring: Drift, accuracy, precision, recall, F1, AUC
- Analysis Tools: Model comparison, feature importance analysis, optimization area identification
Real-time Guardrails#
- Configurable real-time detection metrics
- Plugin-based extension support for custom models (including HuggingFace)
Deployment#
Docker Compose Quick Start#
git clone https://github.com/arthur-ai/arthur-engine.git
cd arthur-engine/deployment/docker-compose/genai-engine
cp .env.template .env
docker compose up
# Access at http://localhost:3030/docs
Development Setup#
pip install poetry
cd genai-engine
poetry shell && poetry env use 3.12
poetry install
docker compose up # Start Postgres
poetry run serve
Prerequisites: Docker Desktop, OpenAI-compatible GPT model access
API Usage#
POST /api/v2/task: Create new LLM application tasksPOST /api/v2/tasks/{task_id}/rules: Configure evaluation rules- Task Based Validation endpoints: Submit LLM prompts and responses for evaluation
Authentication: Use GENAI_ENGINE_ADMIN_KEY via the Authorize button on /docs page
Ecosystem Integration#
Full OpenInference specification support, connecting to: LangChain, LangGraph, LlamaIndex, Vercel AI SDK, FastAPI/Flask applications, OpenAI, Anthropic, Google and other model providers.
Project Architecture#
arthur-engine/
├── genai-engine/ # GenAI engine core
├── ml-engine/ # ML engine
├── deployment/ # Docker Compose / CloudFormation / Helm
└── docs/ # Documentation
Tech Stack: Python (55.4%) / TypeScript (43.6%), FastAPI/Uvicorn, PostgreSQL + Alembic, Poetry, Docker
Related Resources#
- Example Notebooks: https://github.com/arthur-ai/example-shield-notebooks
- Agent Demo: https://github.com/arthur-ai/shield-autogen-agent-demo
- Shield User Guide: https://shield.docs.arthur.ai/