Open-AutoGLM

An open-source intelligent assistant framework for mobile devices that understands screen content through multimodal methods and performs automated operations to help users complete tasks.

One-Minute Overview#

Open-AutoGLM is an intelligent assistant framework for mobile devices built on visual language models that understands screen content and performs automated operations to help users complete tasks. It's suitable for ordinary users, developers, and researchers who want to automate mobile phone operations. You can simply describe your needs in natural language, such as "Open Xiaohongshu and search for food," and the system will automatically parse your intent, understand the current interface, plan the next steps, and complete the entire process.

Core Value: Makes mobile automation simple and intuitive, enabling complex phone task automation without complex programming.

Quick Start#

Installation Difficulty: Medium - Requires Python environment setup, ADB/HDC tool installation, and device preparation

# Install dependencies
pip install -r requirements.txt
pip install -e .

Is this suitable for me?

✅ Automating repetitive mobile operations: Daily app check-ins, executing fixed workflows

✅ Mobile app testing and automation: Automated testing and operation of applications

✅ Remote phone control: Remotely controlling phones via WiFi to perform specific tasks

❌ Real-time control requiring ultra-low latency: Network connections may affect response speed

❌ Operations requiring extremely high precision: Visual recognition may have errors in complex interfaces

Core Capabilities#

1. Multimodal Screen Understanding - Truly Understanding Interface Content#

Analyzes mobile screen content in real-time through visual language models, accurately identifying UI elements, text information, and interface states. Actual Value: The system can "see" and understand the phone interface like human eyes, providing accurate judgment for subsequent operations.

2. Natural Language Task Parsing - Issuing Commands in Everyday Language#

Accepts task descriptions in natural language, automatically parses user intentions, and converts everyday commands like "Open Meituan and search for hot pot restaurants nearby" into executable operation sequences. Actual Value: Users don't need to learn any programming language; they just need to describe their needs in natural language for the system to understand and execute.

3. Intelligent Operation Planning - Automatically Completing Task Flows#

Automatically plans operation step sequences based on task goals, including application switching, element clicking, text input, and other complex processes. Actual Value: The system can independently complete multi-step tasks, reducing tedious manual operations and improving efficiency.

4. Safety Operation Mechanisms - Protecting User Data and Privacy#

Built-in sensitive operation confirmation mechanisms perform security verification before executing important operations, supporting human handover in login or CAPTCHA scenarios. Actual Value: Ensures user data security during automated operations, preventing data leakage or loss caused by accidental operations.

Model Support: Offers two models - AutoGLM-Phone-9B optimized for Chinese scenarios and AutoGLM-Phone-9B-Multilingual for multilingual scenarios
Third-party Integration: Integrated with Midscene.js open-source UI automation SDK, supporting multi-platform automation through JavaScript or Yaml formats
Cross-platform Support: Supports both Android devices and HarmonyOS devices, adapting to different operating system environments

Maintenance Status#

Development Activity: Project is actively developed with continuous updates and maintenance
Recent Updates: Recent feature updates and documentation improvements with high community participation
Community Response: Has dedicated WeChat community and developer incentive programs with good community support

Documentation & Learning Resources#

Documentation Quality: Comprehensive, including detailed installation guides, configuration instructions, and usage tutorials
Official Documentation: https://raw.githubusercontent.com/zai-org/Open-AutoGLM/refs/heads/main/README.md
Sample Code: Provides Python API usage examples and command-line usage methods

One-Minute Overview#

Quick Start#

Core Capabilities#

1. Multimodal Screen Understanding - Truly Understanding Interface Content#

2. Natural Language Task Parsing - Issuing Commands in Everyday Language#

3. Intelligent Operation Planning - Automatically Completing Task Flows#

4. Safety Operation Mechanisms - Protecting User Data and Privacy#

5. Remote Control Capabilities - Breaking Physical Distance Limitations#

Tech Stack & Integration#

Ecosystem & Extensions#

Maintenance Status#

Documentation & Learning Resources#

Related Projects

oh-my-codex

Ironcurtain

vibe-remote

STAY UPDATED