Vision-Agents
✨An open-source framework by Stream for building vision AI agents that work with any model or video provider, leveraging Stream's edge network for ultra-low latency video experiences.
An open-source framework by Stream for building vision AI agents that work with any model or video provider, leveraging Stream's edge network for ultra-low latency video experiences.
A tool that gracefully solves hCaptcha challenges using multimodal large language models, without relying on browser extensions or third-party captcha services.
Your AI assistant in your terminal, equipped with local tools to write code, use the terminal, browse the web, and see images - a local alternative to ChatGPT with Code Interpreter, Cursor Agent, etc.
An open-source multimodal AI Agent stack developed by ByteDance, comprising the general Agent TARS framework and the UI-TARS Desktop client. It enables natural language control of computers, browsers, and terminals via Vision-Language Models.
Page 1 / 1 · 4 total
Get the latest AI tools and trends delivered straight to your inbox. No spam, just intelligence.