DISCOVER THE FUTURE OF AI AGENTSarrow_forward

All Projects

10 projects

Roboflow Trackers

A plug-and-play multi-object tracking (MOT) Python library offering modular implementations of classic algorithms like SORT and ByteTrack. Features a detector-agnostic design compatible with any object detection model (YOLO, DETR, etc.), supporting video files, cameras, RTSP streams, and more. Provides unified CLI tools and Python API with built-in evaluation metrics (CLEAR, HOTA, Identity).

MultimodalDeep LearningSDK

WiFi DensePose

A production-ready implementation of InvisPose that enables real-time, camera-free full-body tracking through walls using commodity WiFi mesh routers and CSI signals, with advanced analytics like fall detection and multi-person tracking.

MultimodalDeep LearningDocker

VibeVoice

Microsoft's family of open-source frontier voice AI models including both Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) models, designed for long-form audio processing with multilingual support.

Model & Inference FrameworkPyTorchPython

Speech-AI-Forge

A project focused on TTS generation models, providing an API server and Gradio-based WebUI with support for multiple voice synthesis, voice cloning, and audio enhancement capabilities.

Model & Inference FrameworkPythonGradio

Embodied_AI_Paper_List

A curated list of embodied AI research papers maintained by the Human Communication and Perception Laboratory at SYSU, providing researchers with the latest academic findings in the embodied intelligence field.

Docs, Tutorials & ResourcesPythonMultimodal

DeepVideoDiscovery

A video content discovery tool developed by Microsoft that uses deep learning technology to automatically identify and extract key content from videos, helping users efficiently browse and understand video information。

Agent & ToolingPythonPyTorch

LLaVA-Plus

LLaVA-Plus is a multimodal assistant system that learns to use tools, combining large language models with visual capabilities to enable AI agents to perform general vision tasks.

Model & Inference FrameworkPythonPyTorch

CV

A comprehensive collection of learning notes covering multiple courses including PyTorch and deep learning, focused on computer vision and natural language processing with accompanying video explanations and example datasets。

Docs, Tutorials & ResourcesPythonPyTorch

ChatTTS

A text-to-speech model optimized for dialogue scenarios like LLM assistants, supporting mixed Chinese and English input. It generates natural and expressive speech with fine-grained control over prosodic features like laughter and pauses.

Model & Inference FrameworkPythonPyTorch

VoxCPM

VoxCPM is an end-to-end Text-to-Speech (TTS) system built on continuous space modeling, eliminating the need for discrete tokenization. It delivers context-aware, expressive speech generation and enables true-to-life zero-shot voice cloning using short audio clips, making it ideal for high-quality voice synthesis and dubbing applications.

Model & Inference FrameworkPythonPyTorch
Per page

Page 1 / 1 · 10 total

STAY UPDATED

Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.

rocket_launch