Edge-Veda
✨On-device full-stack AI SDK for Flutter with LLM, Vision, Speech, Image Gen, and RAG; features compute budget contracts and adaptive QoS with zero cloud dependency.
On-device full-stack AI SDK for Flutter with LLM, Vision, Speech, Image Gen, and RAG; features compute budget contracts and adaptive QoS with zero cloud dependency.
A plug-and-play multi-object tracking (MOT) Python library offering modular implementations of classic algorithms like SORT and ByteTrack. Features a detector-agnostic design compatible with any object detection model (YOLO, DETR, etc.), supporting video files, cameras, RTSP streams, and more. Provides unified CLI tools and Python API with built-in evaluation metrics (CLEAR, HOTA, Identity).
An end-to-side omnimodal LLM by Tsinghua THUNLP supporting vision, speech, and full-duplex multimodal live streaming, optimized for mobile deployment with performance rivaling Gemini 2.5 Flash.
An open-sourced end-to-end VLM-based GUI Agent developed by Tsinghua University and Zhipu AI, built on GLM-4V-9B bilingual VLM, enabling cross-platform GUI automation and reasoning via screenshots and natural language instructions.
MobileAgent is an autonomous mobile agent framework powered by Multimodal Large Language Models (MLLM), enabling automated mobile app operations and task execution through visual perception and tool invocation.
A production-ready implementation of InvisPose that enables real-time, camera-free full-body tracking through walls using commodity WiFi mesh routers and CSI signals, with advanced analytics like fall detection and multi-person tracking.
A Python library for orchestrating zero-shot computer vision models, enabling custom end-to-end pipeline creation without needing to collect and annotate large training datasets.
LLaVA-Plus is a multimodal assistant system that learns to use tools, combining large language models with visual capabilities to enable AI agents to perform general vision tasks.
A text-to-speech model optimized for dialogue scenarios like LLM assistants, supporting mixed Chinese and English input. It generates natural and expressive speech with fine-grained control over prosodic features like laughter and pauses.
VoxCPM is an end-to-end Text-to-Speech (TTS) system built on continuous space modeling, eliminating the need for discrete tokenization. It delivers context-aware, expressive speech generation and enables true-to-life zero-shot voice cloning using short audio clips, making it ideal for high-quality voice synthesis and dubbing applications.
Page 1 / 1 · 10 total
Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.