DISCOVER THE FUTURE OF AI AGENTSarrow_forward

All Projects

54 projects

Clawd Cursor

AI desktop agent that sees your screen, controls your cursor, and completes tasks autonomously. Features a 5-layer intelligent fallback pipeline, multiple AI providers (Anthropic/OpenAI/Ollama/Kimi), with Web Dashboard and REST API.

MultimodalAI AgentsAgent Framework

Edge-Veda

On-device full-stack AI SDK for Flutter with LLM, Vision, Speech, Image Gen, and RAG; features compute budget contracts and adaptive QoS with zero cloud dependency.

大语言模型MultimodalSDK

NagaAgent

A four-service collaborative AI desktop assistant framework with streaming tool calling, GRAG knowledge graph memory, Live2D avatar, and voice interaction

RAGMultimodalAI Agents

Seline

A local-first AI desktop application integrating conversational AI, visual generation, vector search, and multi-channel connectivity, featuring deep research modes and local knowledge bases.

MultimodalModel Context ProtocolRAG

Roboflow Trackers

A plug-and-play multi-object tracking (MOT) Python library offering modular implementations of classic algorithms like SORT and ByteTrack. Features a detector-agnostic design compatible with any object detection model (YOLO, DETR, etc.), supporting video files, cameras, RTSP streams, and more. Provides unified CLI tools and Python API with built-in evaluation metrics (CLEAR, HOTA, Identity).

MultimodalDeep LearningSDK

MiniCPM-o

An end-to-side omnimodal LLM by Tsinghua THUNLP supporting vision, speech, and full-duplex multimodal live streaming, optimized for mobile deployment with performance rivaling Gemini 2.5 Flash.

大语言模型MultimodalTransformers

CogAgent

An open-sourced end-to-end VLM-based GUI Agent developed by Tsinghua University and Zhipu AI, built on GLM-4V-9B bilingual VLM, enabling cross-platform GUI automation and reasoning via screenshots and natural language instructions.

Model & Inference Framework大语言模型Multimodal

MobileAgent

MobileAgent is an autonomous mobile agent framework powered by Multimodal Large Language Models (MLLM), enabling automated mobile app operations and task execution through visual perception and tool invocation.

Model & Inference Framework大语言模型Multimodal

AlphaAvatar

A learnable, configurable, and pluggable Omni-Avatar Assistant framework built on LiveKit, featuring real-time interaction, multimodal memory, user persona, and external tool integration.

Docs, Tutorials & ResourcesRAGMultimodal

WiFi DensePose

A production-ready implementation of InvisPose that enables real-time, camera-free full-body tracking through walls using commodity WiFi mesh routers and CSI signals, with advanced analytics like fall detection and multi-person tracking.

MultimodalDeep LearningDocker
Per page

Page 1 / 6 · 54 total

STAY UPDATED

Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.

rocket_launch