FreeAskWorld

An Interactive and Closed-Loop Simulator for Human-Centric Embodied AI, integrating LLMs for high-level planning and social interaction.

Overview#

FreeAskWorld is an interactive and closed-loop simulator for human-centric embodied AI, maintained by the AIR-DISCOVER organization. The associated paper was published at AAAI 2026 Oral. The framework addresses the lack of real human interaction capabilities in traditional Vision-Language Navigation (VLN) tasks by proposing the "Direction Inquiry Task" paradigm.

Core Capabilities#

LLM-Driven Agents: Intent modeling, logical reasoning, natural dialogue, and instruction generation.
Realistic Human Simulation: NPCs with personalized profiles, schedules, and distinct motion/navigation styles.
Dynamic World Generation: Randomization of weather, lighting, traffic, and scene layouts.
Closed-Loop Synchronization: WebSocket-based state exchange for real-time model-environment interaction.
Direction Inquiry Task: Extends VLN by allowing agents to actively ask simulated humans for directions and adaptively replan.

Data Generation & Annotation#

A modular data generation pipeline built on Unity Perception produces large-scale multi-modal datasets (6 tasks, 16 object categories, 63,429 frames, 17+ hours of interaction data) with:

Visual annotations: 2D/3D bounding boxes, instance/semantic segmentation
Geometric annotations: Depth maps, surface normal maps
Visual observations: Panoramic RGB images, six 90° perspective views
Interaction data: Natural language instructions, dialogue history, agent trajectories
Spatial representations: 2D occupancy heatmaps, environment metadata

Architecture#

Simulation Engine: Built on Unity Perception
Primary communication: ROS2 (rclpy transport, listening on 127.0.0.1:10000)
HTTP API: FastAPI + Uvicorn with health check endpoints (127.0.0.1:8787)
Fallback link: WebSocket protocol (in closed_loop/ directory, experimental)

Core modules:

src/freeaskclaw/: Core runtime code, CLI entry at freeaskclaw.cli:main
integrations/agent_ros2/: ROS2 Agent interface wrapper
closed_loop/: WebSocket closed-loop bridge (includes BEVBert trainer, etc.)
pysolotools/: Solo format data processing tools

Baseline Models & Performance#

Includes ETPNav, BEVBert, and their fine-tuned variants (ETPNav-FT, BEVBert-FT). Paper results show that introducing the "asking" mechanism improves the human baseline Success Rate (SR) from 40.2% to 82.6%, and reduces Navigation Error (NE) from 18.3m to 3.49m.

Agent Integration#

Provides a standardized integration layer supporting OpenClaw, Claude Code, Codex, and custom Agent adapters via ROS2.

Quick Start#

Prerequisites: Python ≥ 3.10, ROS2 Humble (manual installation for full interaction mode)

Minimal verification (no Unity required):

git clone https://github.com/AIR-DISCOVER/FreeAskWorld
cd FreeAskWorld
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python -m integrations.agent_ros2.cli --help
python -m integrations.agent_ros2.cli status --output-json

ROS2 live interaction mode:

bash scripts/setup_envs.sh
source .ros2_venv/bin/activate
scripts/start_local_runtime.sh
curl http://127.0.0.1:8787/healthz
STEP_SECONDS=2 OBSERVE_SECONDS=1 scripts/run_live_smoke.sh

Use Cases#

Proactive Vision-Language Navigation (Proactive VLN)
Social navigation and pedestrian behavior prediction
Human-Computer Interaction (HCI) research
ROS2-based RGBD SLAM validation
Open-loop evaluation (nuScenes-like paradigm)

Unconfirmed Information#

AAAI 2026 paper PDF / arXiv link not yet publicly available
Unity simulator download method not specified
Dataset download link to be confirmed
Specific affiliated institution (contributors from NTU, lab name unclear)

Overview#

Core Capabilities#

Data Generation & Annotation#

Architecture#

Baseline Models & Performance#

Agent Integration#

Quick Start#

Use Cases#

Unconfirmed Information#

Related Projects

reskill

amplihack

deco Studio

STAY UPDATED