Clawd Cursor
✨AI desktop agent that sees your screen, controls your cursor, and completes tasks autonomously. Features a 5-layer intelligent fallback pipeline, multiple AI providers (Anthropic/OpenAI/Ollama/Kimi), with Web Dashboard and REST API.
AI desktop agent that sees your screen, controls your cursor, and completes tasks autonomously. Features a 5-layer intelligent fallback pipeline, multiple AI providers (Anthropic/OpenAI/Ollama/Kimi), with Web Dashboard and REST API.
On-device full-stack AI SDK for Flutter with LLM, Vision, Speech, Image Gen, and RAG; features compute budget contracts and adaptive QoS with zero cloud dependency.
A local-first AI desktop application integrating conversational AI, visual generation, vector search, and multi-channel connectivity, featuring deep research modes and local knowledge bases.
An open-sourced end-to-end VLM-based GUI Agent developed by Tsinghua University and Zhipu AI, built on GLM-4V-9B bilingual VLM, enabling cross-platform GUI automation and reasoning via screenshots and natural language instructions.
MobileAgent is an autonomous mobile agent framework powered by Multimodal Large Language Models (MLLM), enabling automated mobile app operations and task execution through visual perception and tool invocation.
Blades is a multimodal AI Agent framework for the Go language, supporting custom models, tools, memory, middleware, and more. It's designed for multi-turn conversations, chain-of-thought reasoning, and structured output applications.
FilmAgent is a multi-agent collaborative system for end-to-end film automation in 3D virtual spaces. It simulates key crew roles—directors, screenwriters, actors, and cinematographers—and integrates efficient human workflows within a sandbox environment.
A Python library for orchestrating zero-shot computer vision models, enabling custom end-to-end pipeline creation without needing to collect and annotate large training datasets.
An open-source intelligent assistant framework for mobile devices that understands screen content through multimodal methods and performs automated operations to help users complete tasks.
JarvisArt is a multi-modal large language model (MLLM)-driven agent for intelligent photo retouching. It liberates human creativity by understanding user intent, mimicking professional artist reasoning, and coordinating over 200 tools in Adobe Lightroom.
Page 1 / 3 · 25 total
Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.