ST-WebAgentBench
✨An enterprise-oriented benchmark suite for evaluating web agent safety and trustworthiness, featuring 375 tasks across GitLab, SuiteCRM, and ShoppingAdmin with six policy dimensions to measure task completion under compliance constraints. Accepted by ICLR 2025.
Model & Inference Framework大语言模型AI Agents
Open-CUAK
✨A platform for teaching, hiring and managing automation agents at scale, starting with browsers, offering a more reliable and privacy-focused alternative to OpenAI Operator.
Agent & ToolingDockerReact
Hercules
✨Hercules is the world's first open-source testing agent, enabling UI, API, Security, Accessibility, and Visual validations without code or maintenance. It transforms simple Gherkin steps into fully automated end-to-end tests, making testing effortless and efficient.
Agent & ToolingPythonPlaywright
BrowserOS
✨An open-source Chromium fork that runs AI agents natively in your browser, offering a privacy-first alternative to services like ChatGPT Atlas, Perplexity Comet, and Dia。
Agent & ToolingTypeScriptJavaScript