Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

computer-vision-5

Vision-Language Navigation

3260 papers • 126 benchmarks • 313 datasets

Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments. ( Image credit: Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout )

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in vision-language-navigation-5

Trend

Dataset

Best Model

Actions

Room2Room

Room2Room

Libraries

i

Use these libraries to find vision-language-navigation-5 models and implementations

Datasets

R2R

TEACh

Talk the Walk

ReaSCAN

BnB

XL-R2R

Subtasks

No subtasks available.

Most implemented papers

The Regretful Agent: Heuristic-Aided Navigation Through Progress Estimation

Z. Kira, Caiming Xiong, Zuxuan Wu, G. Al-Regib, Chih-Yao Ma•Mon Mar 04 2019

This paper proposes to use a progress monitor developed in prior work as a learnable heuristic for search, and proposes two modules incorporated into an end-to-end architecture that significantly outperforms current state-of-the-art methods using greedy action selection.

188

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

SDN

SDN

0

Self-Monitoring Navigation Agent via Auxiliary Progress Estimation

Z. Kira, R. Socher, Caiming Xiong, Zuxuan Wu, G. Al-Regib, Jiasen Lu, Chih-Yao Ma•Sun Jan 06 2019

A self-monitoring agent with two complementary components: (1) visual-textual co-grounding module to locate the instruction completed in the past, the instruction required for the next action, and the next moving direction from surrounding images and (2) progress monitor to ensure the grounded instruction correctly reflects the navigation progress.

302 0

Cross-Lingual Vision-Language Navigation

William Yang Wang, An Yan, X. Wang, Jiangtao Feng•Tue Sep 24 2019

A general cross-lingual VLN framework to enable instruction-following navigation for different languages is proposed and an adversarial domain adaption loss is introduced to improve the transferring ability of the model when given a certain amount of target language data.

17 0

Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation

Hongmin Wang, William Yang Wang, Wenhan Xiong, Xin Eric Wang•Tue Mar 20 2018

A novel, planned-ahead hybrid reinforcement learning model that combines model-free and model-based reinforcement learning to solve a real-world vision-language navigation task and significantly outperforms the baselines and achieves the best on the real- world Room-to-Room dataset.

216 0

Tactical Rewind: Self-Correction via Backtracking in Vision-And-Language Navigation

Zhe Gan, Jianfeng Gao, Yejin Choi, Yonatan Bisk, Ari Holtzman, Jingjing Liu, S. Srinivasa, Xiujun Li, Liyiming Ke•Tue Mar 05 2019

The Frontier Aware Search with backTracking (FAST) Navigator is presented, a general framework for action decoding, that achieves state-of-the-art results on the 2018 Room-to-Room (R2R) Vision-and-Language navigation challenge.

181 0

Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout

Mohit Bansal, Licheng Yu, Hao Tan•Sun Apr 07 2019

This paper presents a generalizable navigational agent, trained in two stages via mixed imitation and reinforcement learning, outperforming the state-of-art approaches by a large margin on the private unseen test set of the Room-to-Room task, and achieving the top rank on the leaderboard.

371 0

Environment-agnostic Multitask Learning for Natural Language Grounded Navigation

Vihan Jain, Zornitsa Kozareva, William Yang Wang, Sujith Ravi, Eugene Ie, X. Wang•Sat Feb 29 2020

This work introduces a multitask navigation model that can be seamlessly trained on both Vision-Language Navigation and Navigation from Dialog History tasks, and proposes to learn environment-agnostic representations for the navigation policy that are invariant among the environments seen during training, thus generalizing better on unseen environments.

70 0

Active Visual Information Gathering for Vision-Language Navigation

Jianbing Shen, Tianmin Shu, Wenguan Wang, Hanqing Wang, Wei Liang•Tue Jul 14 2020

This work proposes an end-to-end framework for learning an exploration policy that decides i) when and where to explore, ii) what information is worth gathering during exploration, and iii) how to adjust the navigation decision after the exploration.

82 0

A Modular Vision Language Navigation and Manipulation Framework for Long Horizon Compositional Tasks in Indoor Environment

S. Sarkar, Homagni Saha, Fateme Fotouhif, Qisai Liu•Mon Jan 18 2021

This paper proposes a modular approach to deal with the combined navigation and object interaction problem without the need for strictly aligned vision and language training data, and proposes a novel geometry-aware mapping technique for cluttered indoor environments, and a language understanding model generalized for household instruction following.

8 0

Adding a benchmark result helps the community track progress.

Vision-Language Navigation | State-of-the-Art