A minimal, self-evolving autonomous Agent framework that automatically consolidates task execution paths into reusable skills via layered memory
Core Positioning#
GenericAgent is a minimalist autonomous Agent framework with ~3000 lines of core code and an ~100-line agent loop, achieving multi-modal system-level control over browsers, terminals, file systems, keyboard/mouse, screen vision, and Android devices (ADB) using only 9 atomic tools + 2 memory management tools.
Self-Evolution Mechanism#
After completing each new task, execution paths are automatically consolidated into Skills stored in layered memory. Similar tasks subsequently reuse existing skills, building a richer skill tree over time. The entire repository was autonomously created by GenericAgent (including git init and all commits) without the author opening a terminal, serving as a bootstrap proof.
Layered Memory System#
| Layer | Name | Function |
|---|---|---|
| L0 | Meta Rules | Core behavior rules and system constraints |
| L1 | Insight Index | Minimalist memory index for fast routing and recall |
| L2 | Global Facts | Stable knowledge accumulated over long-term operation |
| L3 | Task Skills / SOPs | Reusable processes for specific tasks |
| L4 | Session Archive | Archived records of completed tasks for long-range recall |
Minimal Token Consumption#
Context window maintained under 30K tokens (vs. 200K–1M for comparable frameworks), ensuring critical information presence and reducing noise through layered memory. README claims "6x less token consumption" (specific benchmark data unconfirmed).
9 Atomic Tools#
| Tool | Function |
|---|---|
code_run | Execute arbitrary code |
file_read | Read files |
file_write | Write files |
file_patch | Modify/patch files |
web_scan | Perceive web page content |
web_execute_js | Control browser behavior |
ask_user | Human-in-the-loop confirmation |
update_working_checkpoint | Persist current context |
start_long_term_update | Long-term memory update |
Via code_run, dynamically install Python packages, write scripts, call external APIs, or control hardware — consolidating temporary capabilities into permanent tools.
LLM Compatibility#
Supports Claude/Gemini/Kimi/MiniMax and other mainstream LLMs. Interface format is distinguished via variable naming in mykey.py: oai_config (OpenAI-compatible), claude_config (Claude-compatible), native_oai_config / native_claude_config (standard tool calling for weaker models).
Frontends & Integration#
Natively provides Streamlit GUI, Qt desktop app, and bot frontends for WeChat/QQ/Feishu/WeCom/DingTalk/Telegram. Common chat commands: /new (new conversation), /continue (restore session snapshot). Advanced modes (Reflect, Plan, SubAgent, autonomous exploration, scheduled tasks) are self-documenting.
Typical Scenarios#
- Browser automation with preserved login state
- Automated food delivery ordering
- Quantitative stock screening
- Mobile device control via ADB
- Autonomous web exploration and periodic summarization
- Chat platform Bot integration
Installation#
git clone https://github.com/lsdefine/GenericAgent.git
cd GenericAgent
pip install requests streamlit pywebview
cp mykey_template.py mykey.py
# Edit mykey.py, fill in LLM API Key
python launch.pyw
Minimal CLI startup: python3 agentmain.py
Unconfirmed Information#
- arXiv paper (2604.17091) full experimental data and benchmarks not reviewed in detail
- "Million-level Skill library" specifics not detailed
- "Dintal Claw" government bot has no independent link
- Token consumption comparison lacks specific benchmark data
- V1.0 public date marked as 2026-01-16, discrepancy with current timeline