Clawd Cursor
✨AI desktop agent that sees your screen, controls your cursor, and completes tasks autonomously. Features a 5-layer intelligent fallback pipeline, multiple AI providers (Anthropic/OpenAI/Ollama/Kimi), with Web Dashboard and REST API.
AI desktop agent that sees your screen, controls your cursor, and completes tasks autonomously. Features a 5-layer intelligent fallback pipeline, multiple AI providers (Anthropic/OpenAI/Ollama/Kimi), with Web Dashboard and REST API.
On-device full-stack AI SDK for Flutter with LLM, Vision, Speech, Image Gen, and RAG; features compute budget contracts and adaptive QoS with zero cloud dependency.
A four-service collaborative AI desktop assistant framework with streaming tool calling, GRAG knowledge graph memory, Live2D avatar, and voice interaction
A local-first AI desktop application integrating conversational AI, visual generation, vector search, and multi-channel connectivity, featuring deep research modes and local knowledge bases.
A plug-and-play multi-object tracking (MOT) Python library offering modular implementations of classic algorithms like SORT and ByteTrack. Features a detector-agnostic design compatible with any object detection model (YOLO, DETR, etc.), supporting video files, cameras, RTSP streams, and more. Provides unified CLI tools and Python API with built-in evaluation metrics (CLEAR, HOTA, Identity).
An end-to-side omnimodal LLM by Tsinghua THUNLP supporting vision, speech, and full-duplex multimodal live streaming, optimized for mobile deployment with performance rivaling Gemini 2.5 Flash.
An open-sourced end-to-end VLM-based GUI Agent developed by Tsinghua University and Zhipu AI, built on GLM-4V-9B bilingual VLM, enabling cross-platform GUI automation and reasoning via screenshots and natural language instructions.
MobileAgent is an autonomous mobile agent framework powered by Multimodal Large Language Models (MLLM), enabling automated mobile app operations and task execution through visual perception and tool invocation.
A learnable, configurable, and pluggable Omni-Avatar Assistant framework built on LiveKit, featuring real-time interaction, multimodal memory, user persona, and external tool integration.
A production-ready implementation of InvisPose that enables real-time, camera-free full-body tracking through walls using commodity WiFi mesh routers and CSI signals, with advanced analytics like fall detection and multi-person tracking.
Page 1 / 6 · 54 total
Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.