Clawd Cursor
✨AI desktop agent that sees your screen, controls your cursor, and completes tasks autonomously. Features a 5-layer intelligent fallback pipeline, multiple AI providers (Anthropic/OpenAI/Ollama/Kimi), with Web Dashboard and REST API.
AI desktop agent that sees your screen, controls your cursor, and completes tasks autonomously. Features a 5-layer intelligent fallback pipeline, multiple AI providers (Anthropic/OpenAI/Ollama/Kimi), with Web Dashboard and REST API.
A local-first AI desktop application integrating conversational AI, visual generation, vector search, and multi-channel connectivity, featuring deep research modes and local knowledge bases.
An open-sourced end-to-end VLM-based GUI Agent developed by Tsinghua University and Zhipu AI, built on GLM-4V-9B bilingual VLM, enabling cross-platform GUI automation and reasoning via screenshots and natural language instructions.
MobileAgent is an autonomous mobile agent framework powered by Multimodal Large Language Models (MLLM), enabling automated mobile app operations and task execution through visual perception and tool invocation.
FilmAgent is a multi-agent collaborative system for end-to-end film automation in 3D virtual spaces. It simulates key crew roles—directors, screenwriters, actors, and cinematographers—and integrates efficient human workflows within a sandbox environment.
An open-source intelligent assistant framework for mobile devices that understands screen content through multimodal methods and performs automated operations to help users complete tasks.
JarvisArt is a multi-modal large language model (MLLM)-driven agent for intelligent photo retouching. It liberates human creativity by understanding user intent, mimicking professional artist reasoning, and coordinating over 200 tools in Adobe Lightroom.
A computer control agent driven by visual language large models that enables AI to interact with GUIs by observing screenshots and outputting mouse and keyboard operations, completing multi-step tasks.
SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, focusing on large multimodal models (LMMs) like GPT-4V. It consists of a robust codebase for running web agents on live websites and an innovative framework that utilizes LMMs as generalist web agents.
A groundbreaking visual AI development environment for building no-code data pipelines and multimodal agents with real-time capabilities, social connectors, and AI-powered tools.
Page 1 / 2 · 17 total
Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.