GenericAgent

A minimal, self-evolving autonomous Agent framework that automatically consolidates task execution paths into reusable skills via layered memory

Core Positioning#

GenericAgent is a minimalist autonomous Agent framework with ~3000 lines of core code and an ~100-line agent loop, achieving multi-modal system-level control over browsers, terminals, file systems, keyboard/mouse, screen vision, and Android devices (ADB) using only 9 atomic tools + 2 memory management tools.

Self-Evolution Mechanism#

After completing each new task, execution paths are automatically consolidated into Skills stored in layered memory. Similar tasks subsequently reuse existing skills, building a richer skill tree over time. The entire repository was autonomously created by GenericAgent (including git init and all commits) without the author opening a terminal, serving as a bootstrap proof.

Layered Memory System#

Layer	Name	Function
L0	Meta Rules	Core behavior rules and system constraints
L1	Insight Index	Minimalist memory index for fast routing and recall
L2	Global Facts	Stable knowledge accumulated over long-term operation
L3	Task Skills / SOPs	Reusable processes for specific tasks
L4	Session Archive	Archived records of completed tasks for long-range recall

Minimal Token Consumption#

Context window maintained under 30K tokens (vs. 200K–1M for comparable frameworks), ensuring critical information presence and reducing noise through layered memory. README claims "6x less token consumption" (specific benchmark data unconfirmed).

9 Atomic Tools#

Tool	Function
`code_run`	Execute arbitrary code
`file_read`	Read files
`file_write`	Write files
`file_patch`	Modify/patch files
`web_scan`	Perceive web page content
`web_execute_js`	Control browser behavior
`ask_user`	Human-in-the-loop confirmation
`update_working_checkpoint`	Persist current context
`start_long_term_update`	Long-term memory update

Via code_run, dynamically install Python packages, write scripts, call external APIs, or control hardware — consolidating temporary capabilities into permanent tools.

LLM Compatibility#

Supports Claude/Gemini/Kimi/MiniMax and other mainstream LLMs. Interface format is distinguished via variable naming in mykey.py: oai_config (OpenAI-compatible), claude_config (Claude-compatible), native_oai_config / native_claude_config (standard tool calling for weaker models).

Frontends & Integration#

Natively provides Streamlit GUI, Qt desktop app, and bot frontends for WeChat/QQ/Feishu/WeCom/DingTalk/Telegram. Common chat commands: /new (new conversation), /continue (restore session snapshot). Advanced modes (Reflect, Plan, SubAgent, autonomous exploration, scheduled tasks) are self-documenting.

Typical Scenarios#

Browser automation with preserved login state
Automated food delivery ordering
Quantitative stock screening
Mobile device control via ADB
Autonomous web exploration and periodic summarization
Chat platform Bot integration

Installation#

git clone https://github.com/lsdefine/GenericAgent.git
cd GenericAgent
pip install requests streamlit pywebview
cp mykey_template.py mykey.py
# Edit mykey.py, fill in LLM API Key
python launch.pyw

Minimal CLI startup: python3 agentmain.py

Unconfirmed Information#

arXiv paper (2604.17091) full experimental data and benchmarks not reviewed in detail
"Million-level Skill library" specifics not detailed
"Dintal Claw" government bot has no independent link
Token consumption comparison lacks specific benchmark data
V1.0 public date marked as 2026-01-16, discrepancy with current timeline