Roboflow Trackers
✨A plug-and-play multi-object tracking (MOT) Python library offering modular implementations of classic algorithms like SORT and ByteTrack. Features a detector-agnostic design compatible with any object detection model (YOLO, DETR, etc.), supporting video files, cameras, RTSP streams, and more. Provides unified CLI tools and Python API with built-in evaluation metrics (CLEAR, HOTA, Identity).
MultimodalDeep LearningSDK
WiFi DensePose
✨A production-ready implementation of InvisPose that enables real-time, camera-free full-body tracking through walls using commodity WiFi mesh routers and CSI signals, with advanced analytics like fall detection and multi-person tracking.
MultimodalDeep LearningDocker
VibeVoice
✨Microsoft's family of open-source frontier voice AI models including both Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) models, designed for long-form audio processing with multilingual support.
Model & Inference FrameworkPyTorchPython
Speech-AI-Forge
✨A project focused on TTS generation models, providing an API server and Gradio-based WebUI with support for multiple voice synthesis, voice cloning, and audio enhancement capabilities.
Model & Inference FrameworkPythonGradio
Embodied_AI_Paper_List
✨A curated list of embodied AI research papers maintained by the Human Communication and Perception Laboratory at SYSU, providing researchers with the latest academic findings in the embodied intelligence field.
Docs, Tutorials & ResourcesPythonMultimodal
DeepVideoDiscovery
✨A video content discovery tool developed by Microsoft that uses deep learning technology to automatically identify and extract key content from videos, helping users efficiently browse and understand video information。
Agent & ToolingPythonPyTorch
LLaVA-Plus
✨LLaVA-Plus is a multimodal assistant system that learns to use tools, combining large language models with visual capabilities to enable AI agents to perform general vision tasks.
Model & Inference FrameworkPythonPyTorch
CV
✨A comprehensive collection of learning notes covering multiple courses including PyTorch and deep learning, focused on computer vision and natural language processing with accompanying video explanations and example datasets。
Docs, Tutorials & ResourcesPythonPyTorch
ChatTTS
✨A text-to-speech model optimized for dialogue scenarios like LLM assistants, supporting mixed Chinese and English input. It generates natural and expressive speech with fine-grained control over prosodic features like laughter and pauses.
Model & Inference FrameworkPythonPyTorch
VoxCPM
✨VoxCPM is an end-to-end Text-to-Speech (TTS) system built on continuous space modeling, eliminating the need for discrete tokenization. It delivers context-aware, expressive speech generation and enables true-to-life zero-shot voice cloning using short audio clips, making it ideal for high-quality voice synthesis and dubbing applications.
Model & Inference FrameworkPythonPyTorch