DISCOVER THE FUTURE OF AI AGENTSarrow_forward

All Projects

17 projects

Clawd Cursor

AI desktop agent that sees your screen, controls your cursor, and completes tasks autonomously. Features a 5-layer intelligent fallback pipeline, multiple AI providers (Anthropic/OpenAI/Ollama/Kimi), with Web Dashboard and REST API.

MultimodalAI AgentsAgent Framework

Seline

A local-first AI desktop application integrating conversational AI, visual generation, vector search, and multi-channel connectivity, featuring deep research modes and local knowledge bases.

MultimodalModel Context ProtocolRAG

CogAgent

An open-sourced end-to-end VLM-based GUI Agent developed by Tsinghua University and Zhipu AI, built on GLM-4V-9B bilingual VLM, enabling cross-platform GUI automation and reasoning via screenshots and natural language instructions.

Model & Inference Framework大语言模型Multimodal

MobileAgent

MobileAgent is an autonomous mobile agent framework powered by Multimodal Large Language Models (MLLM), enabling automated mobile app operations and task execution through visual perception and tool invocation.

Model & Inference Framework大语言模型Multimodal

FilmAgent

FilmAgent is a multi-agent collaborative system for end-to-end film automation in 3D virtual spaces. It simulates key crew roles—directors, screenwriters, actors, and cinematographers—and integrates efficient human workflows within a sandbox environment.

Agent & ToolingPythonC#

Open-AutoGLM

An open-source intelligent assistant framework for mobile devices that understands screen content through multimodal methods and performs automated operations to help users complete tasks.

Agent & ToolingPythonAgent Framework

JarvisArt

JarvisArt is a multi-modal large language model (MLLM)-driven agent for intelligent photo retouching. It liberates human creativity by understanding user intent, mimicking professional artist reasoning, and coordinating over 200 tools in Adobe Lightroom.

Agent & ToolingPythonAI Agents

ScreenAgent

A computer control agent driven by visual language large models that enables AI to interact with GUIs by observing screenshots and outputting mouse and keyboard operations, completing multi-step tasks.

Agent & ToolingPythonPyTorch

SeeAct

SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, focusing on large multimodal models (LMMs) like GPT-4V. It consists of a robust codebase for running web agents on live websites and an innovative framework that utilizes LMMs as generalist web agents.

Agent & ToolingPythonPlaywright

Magick

A groundbreaking visual AI development environment for building no-code data pipelines and multimodal agents with real-time capabilities, social connectors, and AI-powered tools.

Agent & ToolingDockerPostgreSQL
Per page

Page 1 / 2 · 17 total

STAY UPDATED

Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.

rocket_launch