On-device full-stack AI SDK for Flutter with LLM, Vision, Speech, Image Gen, and RAG; features compute budget contracts and adaptive QoS with zero cloud dependency.
Edge-Veda is an on-device AI SDK for Flutter, designed as a managed on-device AI runtime with zero cloud dependency.
Project Positioning#
Addresses privacy concerns, network latency, and high backend costs in mobile AI development. Brings complete AI runtime to edge devices, providing end-to-end capabilities from multimodal inference to speech processing and vector retrieval.
Core Capabilities#
Inference#
- Text Generation: Streaming/blocking token generation, multi-turn dialog, 42–43 tok/s
- Vision Inference: VLM model processing camera frames with persistent model loading
- Image Generation: stable-diffusion.cpp + Metal GPU, 512×512 in ~14s
Speech Processing#
- STT: whisper.cpp + Metal GPU acceleration, ~670ms/3s chunk
- TTS: iOS AVSpeechSynthesizer wrapper, zero additional binary size
Advanced Features#
- Function Calling: ToolDefinition + ToolRegistry, multi-turn tool chains, JSON recovery
- RAG Pipeline: Built-in pure Dart HNSW VectorIndex + RagPipeline
Runtime Governance#
- Compute Budget Contract: Declare p95 latency, battery drain, thermal state, memory limits
- QoS Levels: Full / Reduced / Minimal / Paused adaptive degradation
- Model Advisor: DeviceProfile hardware detection, ModelAdvisor 4D scoring
Installation#
# pubspec.yaml
dependencies:
edge_veda: ^2.4.1
iOS minimum version 13.0, XCFramework (~31 MB) auto-downloaded during pod install.
Quick Start#
final edgeVeda = EdgeVeda();
await edgeVeda.init(EdgeVedaConfig(modelPath: modelPath));
// Streaming generation
await for (final chunk in edgeVeda.generateStream('Explain quantum computing')) {
stdout.write(chunk.token);
}
Architecture#
Flutter App (Dart)
└── ChatSession / RagPipeline / VectorIndex
└── EdgeVeda (generate, embed, describeImage)
└── Workers (StreamingWorker, VisionWorker, WhisperWorker)
└── Scheduler + EdgeVedaBudget + TelemetryService
└── FFI Bindings (43 C functions)
└── XCFramework (llama.cpp, whisper.cpp, stable-diffusion.cpp)
Key design: All inference runs in background isolates, native pointers don't cross boundaries, models persist in memory after loading.
Platform Support#
- iOS: Metal GPU full support, minimum iOS 13.0
- macOS: Complete support
- Android: Skeleton implemented, Vulkan GPU support planned
Code Scale#
~22,700 LOC / 40 C API functions / 32 Dart SDK files
Primary languages: Dart (67.2%), C++ (8.7%), Shell (5.4%), Python (5.3%)