A four-service collaborative AI desktop assistant framework with streaming tool calling, GRAG knowledge graph memory, Live2D avatar, and voice interaction
NagaAgent is a desktop-oriented AI assistant framework using a four-service microservice architecture (API Server, Agent Server, MCP Server, Voice Service).
Core Features#
Streaming Tool Calling#
- Independent of OpenAI Function Calling API, uses
toolcode blocks in LLM text output to embed JSON tool descriptions - Compatible with DeepSeek, Qwen, OpenAI, Ollama and all OpenAI-compatible APIs
- Supports regex extraction + json5 fault-tolerant parsing + full-width character auto-normalization
- Maximum 5 rounds of tool calling loops (configurable), supports parallel execution
GRAG Knowledge Graph Memory#
- Automatically extracts quintuples (subject, subject_type, predicate, object, object_type) from conversations into Neo4j
- Supports structured output (Pydantic models) + JSON fallback extraction
- 3 asyncio worker coroutines consuming task queues, SHA-256 deduplication
- Dual storage: local files + Neo4j graph database
Live2D Avatar#
- Renders Cubism Live2D models via pixi-live2d-display + PixiJS WebGL
- 4-channel orthogonal animation system: posture, motion, expression, tracking
- SSAA supersampling anti-aliasing, lip sync (60FPS extracting 5 parameters)
Voice Interaction#
- TTS: Edge-TTS, supports mp3/aac/wav/opus/flac formats
- ASR: FunASR local server with VAD endpoint detection and WebSocket real-time streaming
- Real-time voice dialogue: Full-duplex WebSocket voice interaction based on Qwen Omni
Four-Service Architecture#
| Service | Port | Responsibility |
|---|---|---|
| API Server | 8000 | Dialogue, streaming tool calling, document upload, auth proxy, memory API, config management, Skill marketplace |
| Agent Server | 8001 | Background intent analysis, OpenClaw integration, task scheduling and compressed memory |
| MCP Server | 8003 | MCP tool registration/discovery/parallel scheduling |
| Voice Service | 5048 | TTS + ASR + real-time voice |
MCP Tool Ecosystem#
Built-in Agents: weather/time, app launcher, game guides, online search, web scraping, browser automation, visual analysis, MQTT IoT control, document extraction. Supports dynamic registration, parallel scheduling, and one-click installation of community Skills via Skill Workshop.
Requirements & Installation#
- Python version: >=3.11, <3.12
- Platform support: Windows / macOS / Linux
git clone https://github.com/RTGS2017/NagaAgent.git
cd NagaAgent
python setup.py # Auto-detect environment, create virtual environment, install dependencies
Configuration Example#
{
"api": {
"api_key": "your-api-key",
"base_url": "https://api.deepseek.com/v1",
"model": "deepseek-v3.2"
}
}
Startup Commands#
python main.py # Full startup
python main.py --headless # Headless mode without GUI
Use Cases#
- Personal desktop assistant: voice interaction, schedule management, document processing, system control
- Knowledge management: automatically build personal knowledge graphs, long-term memory of user preferences and facts
- Gaming assistant: guide Q&A, damage calculation, team recommendations
- IoT control: control smart home devices via MQTT protocol
- Browser automation: web operation automation based on Playwright
- Multi-Agent collaboration: parallel task scheduling via MCP tool system