Agent Park - Agent Project Navigator

All Projects

1 projects

ST-WebAgentBench

✨

An enterprise-oriented benchmark suite for evaluating web agent safety and trustworthiness, featuring 375 tasks across GitLab, SuiteCRM, and ShoppingAdmin with six policy dimensions to measure task completion under compliance constraints. Accepted by ICLR 2025.

PythonDocker大语言模型

VIEW DETAILS →

Per page

Page 1 / 1 · 1 total

Browse by Filters

Project Type

Filter by Domain

Filter by Product Form

All Projects

ST-WebAgentBench

STAY UPDATED