Edge-Veda

On-device full-stack AI SDK for Flutter with LLM, Vision, Speech, Image Gen, and RAG; features compute budget contracts and adaptive QoS with zero cloud dependency.

Edge-Veda is an on-device AI SDK for Flutter, designed as a managed on-device AI runtime with zero cloud dependency.

Project Positioning#

Addresses privacy concerns, network latency, and high backend costs in mobile AI development. Brings complete AI runtime to edge devices, providing end-to-end capabilities from multimodal inference to speech processing and vector retrieval.

Core Capabilities#

Inference#

Text Generation: Streaming/blocking token generation, multi-turn dialog, 42–43 tok/s
Vision Inference: VLM model processing camera frames with persistent model loading
Image Generation: stable-diffusion.cpp + Metal GPU, 512×512 in ~14s

Speech Processing#

STT: whisper.cpp + Metal GPU acceleration, ~670ms/3s chunk
TTS: iOS AVSpeechSynthesizer wrapper, zero additional binary size

Advanced Features#

Function Calling: ToolDefinition + ToolRegistry, multi-turn tool chains, JSON recovery
RAG Pipeline: Built-in pure Dart HNSW VectorIndex + RagPipeline

Runtime Governance#

Compute Budget Contract: Declare p95 latency, battery drain, thermal state, memory limits
QoS Levels: Full / Reduced / Minimal / Paused adaptive degradation
Model Advisor: DeviceProfile hardware detection, ModelAdvisor 4D scoring

Installation#

# pubspec.yaml
dependencies:
  edge_veda: ^2.4.1

iOS minimum version 13.0, XCFramework (~31 MB) auto-downloaded during pod install.

Quick Start#

final edgeVeda = EdgeVeda();
await edgeVeda.init(EdgeVedaConfig(modelPath: modelPath));

// Streaming generation
await for (final chunk in edgeVeda.generateStream('Explain quantum computing')) {
  stdout.write(chunk.token);
}

Architecture#

Flutter App (Dart)
  └── ChatSession / RagPipeline / VectorIndex
  └── EdgeVeda (generate, embed, describeImage)
  └── Workers (StreamingWorker, VisionWorker, WhisperWorker)
  └── Scheduler + EdgeVedaBudget + TelemetryService
  └── FFI Bindings (43 C functions)
       └── XCFramework (llama.cpp, whisper.cpp, stable-diffusion.cpp)

Key design: All inference runs in background isolates, native pointers don't cross boundaries, models persist in memory after loading.

Platform Support#

iOS: Metal GPU full support, minimum iOS 13.0
macOS: Complete support
Android: Skeleton implemented, Vulkan GPU support planned

Code Scale#

~22,700 LOC / 40 C API functions / 32 Dart SDK files

Primary languages: Dart (67.2%), C++ (8.7%), Shell (5.4%), Python (5.3%)