AlphaAvatar

A learnable, configurable, and pluggable Omni-Avatar Assistant framework built on LiveKit, featuring real-time interaction, multimodal memory, user persona, and external tool integration.

Project Overview#

AlphaAvatar is an Apache-2.0 open-source project aiming to build a universal virtual assistant. It is a Python-based Agent framework that solves the challenge of integrating real-time audio/video interaction (LiveKit), LLM inference, long-term memory, user persona, and virtual characters, providing a digital assistant solution with "self-evolution" capabilities.

Core Value: Lowering the barrier to building real-time voice/video AI Agents with long-term memory and personalized interaction capabilities.

Use Cases:

Real-time voice/video virtual companionship and assistance
Intelligent customer service or educational tutoring Agents with long-term memory
Multimodal interaction research (speech recognition, speaker diarization, Live2D character driving)
Intelligent Q&A systems with external knowledge base (RAG) and deep web search

Core Capabilities & Plugins#

The project features a plugin-based design with two main categories: AlphaAvatar core plugins and tool plugins.

AlphaAvatar Core Plugins#

Plugin	Status	Description
🧠 Memory	Implemented	Self-improving memory module supporting Assistant–User/Assistant–Tools/Assistant's self-memory capture and retrieval
🧬 Persona	Implemented	Automatic full modality user persona with speaker ID verification and real-time profile matching
😊 Virtual Character	Implemented	Real-time generated virtual character, integrated with AIRI live2d
💡 Reflection	Planned	Optimizer for automatic internal knowledge base construction
🗺️ Planning	Planned	Long-term planning for sequential and reliable actions
🤖 Behavior	Planned	Behavior logic and process flow controller

Tool Plugins#

Plugin	Status	Description
🔍 DeepResearch	Implemented	Network access and deep search via Tavily API, supporting quick search/deep search/web-to-PDF
📖 RAG	Implemented	Document knowledge access via RAG Anything with DeepResearch page indexing

Installation & Quick Start#

Requirements#

Python 3.11+
Dependencies: LiveKit Server, OpenAI API Key, Qdrant (cloud or self-hosted), Tavily API Key (optional)

Install from PyPI (Stable)#

uv venv .my-env --python 3.11
source .my-env/bin/activate
pip install alpha-avatar-agents

Install from GitHub (Latest)#

git clone --recurse-submodules https://github.com/AlphaAvatar/AlphaAvatar.git
cd AlphaAvatar
uv venv .venv --python 3.11
source .venv/bin/activate
uv sync --all-packages

Environment Variables#

export LIVEKIT_API_KEY=<your API Key>
export LIVEKIT_API_SECRET=<your API Secret>
export LIVEKIT_URL=<your LiveKit server URL>
export OPENAI_API_KEY=<your OpenAI API Key>
export QDRANT_URL='https://xxxxxx.us-east.aws.cloud.qdrant.io:6333'
export QDRANT_API_KEY=<your QDRANT API Key>
export TAVILY_API_KEY=<your TAVILY API Key>  # Optional

Run in Development Mode#

alphaavatar download-files
alphaavatar dev examples/pipline_openai_airi.yaml
# or
alphaavatar dev examples/pipline_openai_tools.yaml

Architecture Design#

Core Framework: Built on LiveKit Agents for real-time interaction streams
Modular Design: avatar-agents (core Agent logic & orchestration) + avatar-plugins (plugin implementations)
Context Manager: Core routing component distributing real-time interaction data to plugin models
Data Storage: Qdrant vector database for Memory and Persona Embedding storage
Multimodal Pipeline: LiveKit AV stream → STT → Context Manager (Persona/Memory) → LLM → Tools (DeepResearch/RAG) → TTS → Audio stream + Live2D drive

CLI Commands#

alphaavatar download-files: Initialize and download required resource files
alphaavatar dev <config_path>: Start Agent in development mode with specified YAML config

Version History#

Date	Version	Key Updates
2026-01	v0.3.1	ADD tool calls during user-Assistant interactions to the Memory module
2026-01	v0.3.0	Support DeepResearch by tavily API
2025-12	v0.2.0	Support AIRI live2d-based virtual character display
2025-11	v0.1.0	Support automatic memory extraction, automatic user persona extraction and matching

Project Vision#

Build a universal assistant capable of recognizing users through multimodal streaming input. It should possess self-memory, autonomous reflection, and iterative self-evolution for real-time interaction. The assistant will seamlessly integrate with mainstream external tools to solve practical problems efficiently.