VLN-NF: Feasibility-Aware Vision-and-Language Navigation with False-Premise Instructions¶
Conference: ACL 2026 arXiv: 2604.10533 Code: https://vln-nf.github.io/ Area: Robotics & Embodied AI Keywords: Vision-Language Navigation, False Premise, NOT-FOUND, Embodied Exploration, Feasibility Awareness
TL;DR¶
VLN-NF is the first benchmark requiring VLN agents to identify false-premise instructions and output NOT-FOUND in 3D partially observable environments. The paper also proposes REV-SPL evaluation metric and ROAM two-stage hybrid framework, achieving 6.1 REV-SPL (+45% over supervised baselines).
Method¶
Key Designs¶
-
Dataset Construction Pipeline (Rewrite + Verify): LLM Rewriter generates semantically fluent but factually incorrect instructions by replacing target objects with plausible alternatives absent from the target room. VLM Verifier confirms via open-vocabulary detection. Human audit error rate <2%.
-
REV-SPL Evaluation Metric: Jointly evaluates navigation efficiency, exploration coverage, and FOUND/NOT-FOUND decision correctness. Penalizes premature stopping and incorrect decisions.
-
ROAM Two-Stage Hybrid Framework: Stage 1 uses supervised DUET model for room-level navigation; Stage 2 uses LLM/VLM for in-room exploration with free-space clearance prior guidance.
Key Experimental Results¶
| Method | Type | REV-SPL |
|---|---|---|
| DUET + VLN-NF | Supervised | 4.2 |
| NaviLLM | LLM-based | 1.0 |
| ROAM | Hybrid | 6.1 |
Highlights & Insights¶
- Fills VLN reliability gap: first systematic study of false-premise navigation in 3D partially observable environments
- Two-stage decomposition strategy is transferable to other embodied tasks requiring decisions under uncertainty
Rating¶
- Novelty: ⭐⭐⭐⭐⭐
- Experimental Thoroughness: ⭐⭐⭐⭐
- Writing Quality: ⭐⭐⭐⭐⭐
- Value: ⭐⭐⭐⭐