Odyssey: Empowering Minecraft Agents with Open-World Skills
✨Odyssey is a framework that empowers LLM-based Minecraft agents with open-world skills, featuring 40 primitive skills and 183 compositional skills, enabling AI to autonomously explore, learn, and execute diverse tasks in the Minecraft universe.
Agent & ToolingPythonLangChain
mario-ai
✨A reinforcement learning environment for Mario AI, offering trainable agents to play Super Mario games.
Agent & ToolingPythonPyTorch
Embodied_AI_Paper_List
✨A curated list of embodied AI research papers maintained by the Human Communication and Perception Laboratory at SYSU, providing researchers with the latest academic findings in the embodied intelligence field.
Docs, Tutorials & ResourcesPythonMultimodal
DeepVideoDiscovery
✨A video content discovery tool developed by Microsoft that uses deep learning technology to automatically identify and extract key content from videos, helping users efficiently browse and understand video information。
Agent & ToolingPythonPyTorch
OS-Agent-Survey
✨OS Agents are MLLM-based systems that automate tasks on computers, phones, and browsers by operating through the environments and interfaces provided by operating systems (GUI and CLI). This comprehensive survey consolidates the current state of OS Agents research, providing insights to guide both academic inquiry and industrial development in this emerging field.
Docs, Tutorials & ResourcesPythonAI Agents
multimodal-agents-course
✨A free, open-source course that teaches how to build AI agents capable of understanding images, text, audio and videos, connecting components through the MCP (Model Context Protocol).
Docs, Tutorials & ResourcesPythonAI Agents
LLaVA-Plus
✨LLaVA-Plus is a multimodal assistant system that learns to use tools, combining large language models with visual capabilities to enable AI agents to perform general vision tasks.
Model & Inference FrameworkPythonPyTorch
awesome-chatgpt-code-interpreter-experiments
✨A curated collection showcasing what's possible with ChatGPT Code Interpreter, featuring experiments that push boundaries and unlock creative potential through the combination of AI and code execution.
Docs, Tutorials & ResourcesPythonAI Agents
OSWorld
✨OSWorld is a benchmarking platform for evaluating multimodal agents' capabilities in performing open-ended tasks within real computer environments. It supports multiple virtualization platforms including VMware, VirtualBox, Docker, and AWS, offering diverse task scenarios and comprehensive evaluation metrics.
Agent & ToolingPythonDocker
CV
✨A comprehensive collection of learning notes covering multiple courses including PyTorch and deep learning, focused on computer vision and natural language processing with accompanying video explanations and example datasets。
Docs, Tutorials & ResourcesPythonPyTorch