Edge-Veda
✨On-device full-stack AI SDK for Flutter with LLM, Vision, Speech, Image Gen, and RAG; features compute budget contracts and adaptive QoS with zero cloud dependency.
On-device full-stack AI SDK for Flutter with LLM, Vision, Speech, Image Gen, and RAG; features compute budget contracts and adaptive QoS with zero cloud dependency.
A plug-and-play multi-object tracking (MOT) Python library offering modular implementations of classic algorithms like SORT and ByteTrack. Features a detector-agnostic design compatible with any object detection model (YOLO, DETR, etc.), supporting video files, cameras, RTSP streams, and more. Provides unified CLI tools and Python API with built-in evaluation metrics (CLEAR, HOTA, Identity).
An end-to-side omnimodal LLM by Tsinghua THUNLP supporting vision, speech, and full-duplex multimodal live streaming, optimized for mobile deployment with performance rivaling Gemini 2.5 Flash.
An open-source framework by Stream for building vision AI agents that work with any model or video provider, leveraging Stream's edge network for ultra-low latency video experiences.
A customizable AI desktop companion project with character settings, voice conversations, long-term memory capabilities, and sub-1-second response times. Integrates with Live2D models for visual presentation.
Blades is a multimodal AI Agent framework for the Go language, supporting custom models, tools, memory, middleware, and more. It's designed for multi-turn conversations, chain-of-thought reasoning, and structured output applications.
A Python library for orchestrating zero-shot computer vision models, enabling custom end-to-end pipeline creation without needing to collect and annotate large training datasets.
A tool that gracefully solves hCaptcha challenges using multimodal large language models, without relying on browser extensions or third-party captcha services.
A reinforcement learning environment for Mario AI, offering trainable agents to play Super Mario games.
A video content discovery tool developed by Microsoft that uses deep learning technology to automatically identify and extract key content from videos, helping users efficiently browse and understand video information。
Page 1 / 2 · 18 total
Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.