Agent Park - Agent Project Navigator

Model & Inference Framework

10 projects

Edge-Veda

✨

On-device full-stack AI SDK for Flutter with LLM, Vision, Speech, Image Gen, and RAG; features compute budget contracts and adaptive QoS with zero cloud dependency.

大语言模型MultimodalSDK

VIEW DETAILS →

Roboflow Trackers

✨

A plug-and-play multi-object tracking (MOT) Python library offering modular implementations of classic algorithms like SORT and ByteTrack. Features a detector-agnostic design compatible with any object detection model (YOLO, DETR, etc.), supporting video files, cameras, RTSP streams, and more. Provides unified CLI tools and Python API with built-in evaluation metrics (CLEAR, HOTA, Identity).

MultimodalDeep LearningSDK

VIEW DETAILS →

MiniCPM-o

✨

An end-to-side omnimodal LLM by Tsinghua THUNLP supporting vision, speech, and full-duplex multimodal live streaming, optimized for mobile deployment with performance rivaling Gemini 2.5 Flash.

大语言模型MultimodalTransformers

VIEW DETAILS →

CogAgent

✨

An open-sourced end-to-end VLM-based GUI Agent developed by Tsinghua University and Zhipu AI, built on GLM-4V-9B bilingual VLM, enabling cross-platform GUI automation and reasoning via screenshots and natural language instructions.

Model & Inference Framework大语言模型Multimodal

VIEW DETAILS →

MobileAgent

✨

MobileAgent is an autonomous mobile agent framework powered by Multimodal Large Language Models (MLLM), enabling automated mobile app operations and task execution through visual perception and tool invocation.

Model & Inference Framework大语言模型Multimodal

VIEW DETAILS →

WiFi DensePose

✨

A production-ready implementation of InvisPose that enables real-time, camera-free full-body tracking through walls using commodity WiFi mesh routers and CSI signals, with advanced analytics like fall detection and multi-person tracking.

MultimodalDeep LearningDocker

VIEW DETAILS →

overeasy

✨

A Python library for orchestrating zero-shot computer vision models, enabling custom end-to-end pipeline creation without needing to collect and annotate large training datasets.

Model & Inference FrameworkPythonMultimodal

VIEW DETAILS →

LLaVA-Plus

✨

LLaVA-Plus is a multimodal assistant system that learns to use tools, combining large language models with visual capabilities to enable AI agents to perform general vision tasks.

Model & Inference FrameworkPythonPyTorch

VIEW DETAILS →

ChatTTS

✨

A text-to-speech model optimized for dialogue scenarios like LLM assistants, supporting mixed Chinese and English input. It generates natural and expressive speech with fine-grained control over prosodic features like laughter and pauses.

Model & Inference FrameworkPythonPyTorch

VIEW DETAILS →

VoxCPM

✨

VoxCPM is an end-to-end Text-to-Speech (TTS) system built on continuous space modeling, eliminating the need for discrete tokenization. It delivers context-aware, expressive speech generation and enables true-to-life zero-shot voice cloning using short audio clips, making it ideal for high-quality voice synthesis and dubbing applications.

Model & Inference FrameworkPythonPyTorch

VIEW DETAILS →

Per page

Page 1 / 1 · 10 total

Browse by Filters

Project Type

Filter by Domain

Filter by Product Form

Model & Inference Framework

Edge-Veda

Roboflow Trackers

MiniCPM-o

CogAgent

MobileAgent

WiFi DensePose

overeasy

LLaVA-Plus

ChatTTS

VoxCPM

STAY UPDATED