OS Agents are MLLM-based systems that automate tasks on computers, phones, and browsers by operating through the environments and interfaces provided by operating systems (GUI and CLI). This comprehensive survey consolidates the current state of OS Agents research, providing insights to guide both academic inquiry and industrial development in this emerging field.
One-Minute Overview#
This is an academic survey project on "Operating System Agents" (OS Agents), which are MLLM-based systems that automate tasks on computers, phones, and browsers by interacting with operating system interfaces like GUI and CLI. Supported by an ACL 2025 oral paper, this project is ideal for researchers, developers, and students interested in this field. Core Value: Provides a comprehensive and systematic overview of the OS Agents field, helping users quickly grasp the latest developments and research directions in this emerging area.
Quick Start#
Installation Difficulty: Low - This is an academic resource repository that requires no installation; information can be directly accessed online.
Is this suitable for my needs?
- ✅ Researchers: Need to understand the latest research progress and methods in the OS Agents field
- ✅ Developers: Looking for references on models and frameworks for building OS agents
- ✅ Students: Wanting to quickly understand the fundamentals of this emerging research area
- ❌ Looking for ready-to-use applications: This project provides research resources and paper listings, not finished tools
Core Capabilities#
1. Research Paper Organization - Systematic Knowledge Base#
Categorizes key research papers in the OS Agents field into four core areas: Foundation Models, Agent Frameworks, Evaluation Benchmarks, and Safety & Privacy. Actual Value: Helps researchers quickly grasp the comprehensive state of the field, avoid fragmented information, and improve research efficiency.
2. Foundation Models Research - Technological Development Context#
Systematically reviews foundation models used in OS contexts, including architecture types, training methods, and update timelines. Actual Value: Provides a reference for model selection and helps understand technological trends and best practices.
3. Agent Framework Analysis - Construction Methods Summary#
Organizes different types of OS agent frameworks, covering key components like perception, planning, memory, and action. Actual Value: Offers architectural designs and methodologies for building your own OS agents.
4. Benchmark Introduction - Performance Evaluation Standards#
Collects various evaluation benchmarks for OS agents, categorized by platform (mobile/desktop) and test environment (real/simulated). Actual Value: Provides standardized methods for evaluating and comparing different OS agent performance.
Technology Stack & Integration#
Primary Domains: Artificial Intelligence, Multimodal Large Language Models, Agent Systems Research Scope: Computer Vision, Natural Language Processing, Human-Computer Interaction, Reinforcement Learning Information Presentation: Academic paper tables, categorized resource repositories, research trend analysis
Ecosystem & Extensions#
- Paper Library Updates: Continuously collects and updates research papers in the OS Agents field, maintaining information timeliness
- Academic Collaboration: Establishes partnerships with corporate research teams like OPPO to promote academia-industry integration
- Recruitment Information: Provides recruitment information from related research teams to facilitate talent flow and development
Maintenance Status#
- Development Activity: Continuously updating with new research papers regularly added
- Recent Updates: Recently active with latest research results from October 2024 included
- Community Response: Multi-platform publication (website, arXiv, GitHub, Zhihu, OpenReview, Twitter) expands academic influence
Commercial & Licensing#
- ✅ Academic Use: Permitted for academic research and educational purposes
- ⚠️ Commercial Restrictions: The project itself is an academic resource; cited papers must follow their respective license terms
- ✅ Modification & Distribution: Secondary creation and dissemination based on this project's resources are allowed
Documentation & Learning Resources#
- Documentation Quality: Academic standard with clear structure
- Official Resources: GitHub Repository, Paper Preprint
- Sample Code: The project itself doesn't provide code, but linked research papers may contain related implementations