An open-source AI observability and evaluation platform providing OpenTelemetry-based distributed tracing, LLM-as-a-Judge evaluation, dataset versioning, and prompt optimization, designed for debugging and monitoring RAG and Agent applications.
Overview#
Phoenix is an open-source AI observability and evaluation platform developed by Arize AI, built on OpenTelemetry and following OpenInference specifications. Current version: v13.3.0, licensed under Elastic License 2.0 (ELv2).
Core Features#
| Feature | Description |
|---|---|
| Tracing | OTLP-based distributed tracing capturing model calls, retrieval, tool usage |
| Evaluation | LLM-as-a-Judge automated evaluation with RAG relevance, answer relevance metrics |
| Datasets | Versioned example datasets for experiments, evaluation, and fine-tuning |
| Experiments | Track changes to prompts, LLMs, and retrieval components |
| Playground | Interactive environment for prompt optimization, model comparison |
| Prompt Management | Version control, tagging, and experiment integration |
Use Cases#
- LLM Debugging: View complete execution flow, identify root causes
- RAG Optimization: Evaluate retrieval relevance and answer quality
- Agent Analysis: Trace tool calls and multi-step reasoning
- Production Monitoring: Track AI application performance, detect regressions
Installation#
pip install arize-phoenix
# or
conda install -c conda-forge arize-phoenix
Lightweight clients: arize-phoenix-otel, arize-phoenix-client, arize-phoenix-evals
Deployment Options#
- Local / Jupyter Notebook
- Docker / Docker Compose
- Kubernetes (Helm Chart / Kustomize)
- Phoenix Cloud managed service
Architecture#
- Tracing: OpenTelemetry + OpenInference semantic conventions
- Storage: PostgreSQL (v16 recommended)
- Backend: Python with OpenAPI REST interface
- Frontend: TypeScript Web UI
- Security: RBAC, API Keys, data retention policies
Framework Integrations#
Native instrumentation support for OpenAI, LangChain, LlamaIndex, DSPy, CrewAI, Anthropic, AWS Bedrock, MistralAI, Google GenAI.