DISCOVER THE FUTURE OF AI AGENTSarrow_forward

All Projects

1 projects

ST-WebAgentBench

An enterprise-oriented benchmark suite for evaluating web agent safety and trustworthiness, featuring 375 tasks across GitLab, SuiteCRM, and ShoppingAdmin with six policy dimensions to measure task completion under compliance constraints. Accepted by ICLR 2025.

PythonDocker大语言模型
Per page

Page 1 / 1 · 1 total

STAY UPDATED

Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.

rocket_launch