Agent Park - Agent Project Navigator

All Projects

10 projects

Roboflow Trackers

✨

A plug-and-play multi-object tracking (MOT) Python library offering modular implementations of classic algorithms like SORT and ByteTrack. Features a detector-agnostic design compatible with any object detection model (YOLO, DETR, etc.), supporting video files, cameras, RTSP streams, and more. Provides unified CLI tools and Python API with built-in evaluation metrics (CLEAR, HOTA, Identity).

MultimodalDeep LearningSDK

VIEW DETAILS →

WiFi DensePose

✨

A production-ready implementation of InvisPose that enables real-time, camera-free full-body tracking through walls using commodity WiFi mesh routers and CSI signals, with advanced analytics like fall detection and multi-person tracking.

MultimodalDeep LearningDocker

VIEW DETAILS →

VibeVoice

✨

Microsoft's family of open-source frontier voice AI models including both Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) models, designed for long-form audio processing with multilingual support.

Model & Inference FrameworkPyTorchPython

VIEW DETAILS →

Speech-AI-Forge

✨

A project focused on TTS generation models, providing an API server and Gradio-based WebUI with support for multiple voice synthesis, voice cloning, and audio enhancement capabilities.

Model & Inference FrameworkPythonGradio

VIEW DETAILS →

Embodied_AI_Paper_List

✨

A curated list of embodied AI research papers maintained by the Human Communication and Perception Laboratory at SYSU, providing researchers with the latest academic findings in the embodied intelligence field.

Docs, Tutorials & ResourcesPythonMultimodal

VIEW DETAILS →

DeepVideoDiscovery

✨

A video content discovery tool developed by Microsoft that uses deep learning technology to automatically identify and extract key content from videos, helping users efficiently browse and understand video information。

Agent & ToolingPythonPyTorch

VIEW DETAILS →

LLaVA-Plus

✨

LLaVA-Plus is a multimodal assistant system that learns to use tools, combining large language models with visual capabilities to enable AI agents to perform general vision tasks.

Model & Inference FrameworkPythonPyTorch

VIEW DETAILS →

CV

✨

A comprehensive collection of learning notes covering multiple courses including PyTorch and deep learning, focused on computer vision and natural language processing with accompanying video explanations and example datasets。

Docs, Tutorials & ResourcesPythonPyTorch

VIEW DETAILS →

ChatTTS

✨

A text-to-speech model optimized for dialogue scenarios like LLM assistants, supporting mixed Chinese and English input. It generates natural and expressive speech with fine-grained control over prosodic features like laughter and pauses.

Model & Inference FrameworkPythonPyTorch

VIEW DETAILS →

VoxCPM

✨

VoxCPM is an end-to-end Text-to-Speech (TTS) system built on continuous space modeling, eliminating the need for discrete tokenization. It delivers context-aware, expressive speech generation and enables true-to-life zero-shot voice cloning using short audio clips, making it ideal for high-quality voice synthesis and dubbing applications.

Model & Inference FrameworkPythonPyTorch

VIEW DETAILS →

Per page

Page 1 / 1 · 10 total

Browse by Filters

Project Type

Filter by Domain

Filter by Product Form

All Projects

Roboflow Trackers

WiFi DensePose

VibeVoice

Speech-AI-Forge

Embodied_AI_Paper_List

DeepVideoDiscovery

LLaVA-Plus

CV

ChatTTS

VoxCPM

STAY UPDATED