Roboflow Trackers
✨A plug-and-play multi-object tracking (MOT) Python library offering modular implementations of classic algorithms like SORT and ByteTrack. Features a detector-agnostic design compatible with any object detection model (YOLO, DETR, etc.), supporting video files, cameras, RTSP streams, and more. Provides unified CLI tools and Python API with built-in evaluation metrics (CLEAR, HOTA, Identity).
MultimodalDeep LearningSDK
MiniCPM-o
✨An end-to-side omnimodal LLM by Tsinghua THUNLP supporting vision, speech, and full-duplex multimodal live streaming, optimized for mobile deployment with performance rivaling Gemini 2.5 Flash.
大语言模型MultimodalTransformers
Vision-Agents
✨An open-source framework by Stream for building vision AI agents that work with any model or video provider, leveraging Stream's edge network for ultra-low latency video experiences.
Agent & ToolingPythonPyTorch
hCaptcha Challenger
✨A tool that gracefully solves hCaptcha challenges using multimodal large language models, without relying on browser extensions or third-party captcha services.
Agent & ToolingPythonMultimodal
mario-ai
✨A reinforcement learning environment for Mario AI, offering trainable agents to play Super Mario games.
Agent & ToolingPythonPyTorch
VoxCPM
✨VoxCPM is an end-to-end Text-to-Speech (TTS) system built on continuous space modeling, eliminating the need for discrete tokenization. It delivers context-aware, expressive speech generation and enables true-to-life zero-shot voice cloning using short audio clips, making it ideal for high-quality voice synthesis and dubbing applications.
Model & Inference FrameworkPythonPyTorch