An Interactive and Closed-Loop Simulator for Human-Centric Embodied AI, integrating LLMs for high-level planning and social interaction.
Overview#
FreeAskWorld is an interactive and closed-loop simulator for human-centric embodied AI, maintained by the AIR-DISCOVER organization. The associated paper was published at AAAI 2026 Oral. The framework addresses the lack of real human interaction capabilities in traditional Vision-Language Navigation (VLN) tasks by proposing the "Direction Inquiry Task" paradigm.
Core Capabilities#
- LLM-Driven Agents: Intent modeling, logical reasoning, natural dialogue, and instruction generation.
- Realistic Human Simulation: NPCs with personalized profiles, schedules, and distinct motion/navigation styles.
- Dynamic World Generation: Randomization of weather, lighting, traffic, and scene layouts.
- Closed-Loop Synchronization: WebSocket-based state exchange for real-time model-environment interaction.
- Direction Inquiry Task: Extends VLN by allowing agents to actively ask simulated humans for directions and adaptively replan.
Data Generation & Annotation#
A modular data generation pipeline built on Unity Perception produces large-scale multi-modal datasets (6 tasks, 16 object categories, 63,429 frames, 17+ hours of interaction data) with:
- Visual annotations: 2D/3D bounding boxes, instance/semantic segmentation
- Geometric annotations: Depth maps, surface normal maps
- Visual observations: Panoramic RGB images, six 90° perspective views
- Interaction data: Natural language instructions, dialogue history, agent trajectories
- Spatial representations: 2D occupancy heatmaps, environment metadata
Architecture#
- Simulation Engine: Built on Unity Perception
- Primary communication: ROS2 (rclpy transport, listening on 127.0.0.1:10000)
- HTTP API: FastAPI + Uvicorn with health check endpoints (127.0.0.1:8787)
- Fallback link: WebSocket protocol (in
closed_loop/directory, experimental)
Core modules:
src/freeaskclaw/: Core runtime code, CLI entry atfreeaskclaw.cli:mainintegrations/agent_ros2/: ROS2 Agent interface wrapperclosed_loop/: WebSocket closed-loop bridge (includes BEVBert trainer, etc.)pysolotools/: Solo format data processing tools
Baseline Models & Performance#
Includes ETPNav, BEVBert, and their fine-tuned variants (ETPNav-FT, BEVBert-FT). Paper results show that introducing the "asking" mechanism improves the human baseline Success Rate (SR) from 40.2% to 82.6%, and reduces Navigation Error (NE) from 18.3m to 3.49m.
Agent Integration#
Provides a standardized integration layer supporting OpenClaw, Claude Code, Codex, and custom Agent adapters via ROS2.
Quick Start#
Prerequisites: Python ≥ 3.10, ROS2 Humble (manual installation for full interaction mode)
Minimal verification (no Unity required):
git clone https://github.com/AIR-DISCOVER/FreeAskWorld
cd FreeAskWorld
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python -m integrations.agent_ros2.cli --help
python -m integrations.agent_ros2.cli status --output-json
ROS2 live interaction mode:
bash scripts/setup_envs.sh
source .ros2_venv/bin/activate
scripts/start_local_runtime.sh
curl http://127.0.0.1:8787/healthz
STEP_SECONDS=2 OBSERVE_SECONDS=1 scripts/run_live_smoke.sh
Use Cases#
- Proactive Vision-Language Navigation (Proactive VLN)
- Social navigation and pedestrian behavior prediction
- Human-Computer Interaction (HCI) research
- ROS2-based RGBD SLAM validation
- Open-loop evaluation (nuScenes-like paradigm)
Unconfirmed Information#
- AAAI 2026 paper PDF / arXiv link not yet publicly available
- Unity simulator download method not specified
- Dataset download link to be confirmed
- Specific affiliated institution (contributors from NTU, lab name unclear)