OpenGuardrails

A full-stack security guardrails platform for enterprise AI applications, unifying LLM content safety and autonomous agent execution safety.

OpenGuardrails is a full-stack security guardrails platform for enterprise AI applications, covering both LLM content safety and autonomous AI agent execution safety. Built on a single 14B LLM (quantized to 3.3B via GPTQ), it provides unified detection across three risk dimensions: content safety (harmful/explicit text detection), manipulation defense (prompt injection, jailbreak attacks, code interpreter abuse), and data leakage (sensitive/private information exfiltration).

Key technical innovations include: Configurable Policy Adaptation (per-request dynamic adjustment of unsafe categories and sensitivity thresholds via probabilistic logit-space control for precision-recall tradeoff), Unified LLM-based Guard Architecture (single model handling both content safety and manipulation detection, outperforming separate model deployments), and Quantized Scalable Model Design (14B quantized to 3.3B retaining 98%+ accuracy).

Production capabilities span: 119-language support with cross-lingual benchmark SOTA, P95 latency of 274.6ms for high-concurrency deployment, text and image multimodal detection, context-aware risk judgment based on full conversation history, user-level ban policies, and OpenAI-compatible API for one-line integration.

For agent safety, the project provides TrustedExecBench, an evaluation framework covering email/calendar/file management, financial transactions, home security, and local system automation scenarios to assess whether autonomous agents exhibit unauthorized behavior. The ecosystem includes multi-language SDKs (Python/Go/Java/Node.js), workflow integrations (Dify plugin, n8n nodes, LiteLLM proxy), and an EDR telemetry-based AI agent asset discovery tool.

Key benchmark results: English Prompt F1 87.1% (+2.8% vs runner-up), English Response F1 88.5% (+8.0% vs runner-up), Multilingual Prompt F1 97.3% (+12.3% vs runner-up), Multilingual Response F1 97.2% (+19.1% vs runner-up).

Related Projects

vLLM-Omni

Gram — The MCP Cloud Platform

SciLink

STAY UPDATED