DISCOVER THE FUTURE OF AI AGENTSarrow_forward

All Projects

9 projects

Roboflow Trackers

A plug-and-play multi-object tracking (MOT) Python library offering modular implementations of classic algorithms like SORT and ByteTrack. Features a detector-agnostic design compatible with any object detection model (YOLO, DETR, etc.), supporting video files, cameras, RTSP streams, and more. Provides unified CLI tools and Python API with built-in evaluation metrics (CLEAR, HOTA, Identity).

MultimodalDeep LearningSDK

MiniCPM-o

An end-to-side omnimodal LLM by Tsinghua THUNLP supporting vision, speech, and full-duplex multimodal live streaming, optimized for mobile deployment with performance rivaling Gemini 2.5 Flash.

大语言模型MultimodalTransformers

Vision-Agents

An open-source framework by Stream for building vision AI agents that work with any model or video provider, leveraging Stream's edge network for ultra-low latency video experiences.

Agent & ToolingPythonPyTorch

Odyssey: Empowering Minecraft Agents with Open-World Skills

Odyssey is a framework that empowers LLM-based Minecraft agents with open-world skills, featuring 40 primitive skills and 183 compositional skills, enabling AI to autonomously explore, learn, and execute diverse tasks in the Minecraft universe.

Agent & ToolingPythonLangChain

hCaptcha Challenger

A tool that gracefully solves hCaptcha challenges using multimodal large language models, without relying on browser extensions or third-party captcha services.

Agent & ToolingPythonMultimodal

mario-ai

A reinforcement learning environment for Mario AI, offering trainable agents to play Super Mario games.

Agent & ToolingPythonPyTorch

gptme

Your AI assistant in your terminal, equipped with local tools to write code, use the terminal, browse the web, and see images - a local alternative to ChatGPT with Code Interpreter, Cursor Agent, etc.

Agent & ToolingPythonAI Agents

UI-TARS-desktop

An open-source multimodal AI Agent stack developed by ByteDance, comprising the general Agent TARS framework and the UI-TARS Desktop client. It enables natural language control of computers, browsers, and terminals via Vision-Language Models.

Agent & ToolingTypeScriptNode.js

VoxCPM

VoxCPM is an end-to-end Text-to-Speech (TTS) system built on continuous space modeling, eliminating the need for discrete tokenization. It delivers context-aware, expressive speech generation and enables true-to-life zero-shot voice cloning using short audio clips, making it ideal for high-quality voice synthesis and dubbing applications.

Model & Inference FrameworkPythonPyTorch
Per page

Page 1 / 1 · 9 total

STAY UPDATED

Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.

rocket_launch