3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in embodied-question-answering-7
No benchmarks available.
Use these libraries to find embodied-question-answering-7 models and implementations
No subtasks available.
This work uses imitation learning to warm-start policies at each level of the hierarchy, dramatically increasing sample efficiency, followed by reinforcement learning, for learning policies for navigation over long planning horizons from language input.
This work proposes the first generalist model for embodied navigation, NaviLLM, which adapts LLMs to em-bodied navigation by introducing schema-based instruction and demonstrates strong generalizability and presents im-pressive results on unseen tasks, e.g. embodied question answering and 3D captioning.
CityEQA, a new task where an embodied agent answers open-vocabulary questions through active exploration in dynamic city spaces, is introduced and the proposed Planner-Manager-Actor (PMA), a novel agent tailored for CityEQA, enables long-horizon planning and hierarchical task execution.
The VideoNavQA dataset is built, which contains pairs of questions and videos generated in the House3D environment and establishes an initial understanding of how well VQA-style methods can perform within this novel EQA paradigm.
This work presents a generalization of EQA -- Multi-Target EQA (MT-EQA), and proposes a modular architecture composed of a program generator, a controller, a navigator, and a VQA module that can outperform previous methods and strong baselines by a significant margin.
It is shown through experiments on the EQAv1 dataset that a simple question-only baseline achieves state-of-the-art results on the EmbodiedQA task in all cases except when the agent is spawned extremely close to the object.
The proposed VirtualHome2KG framework augments both the synthetic video data of daily activities and the contextual semantic data corresponding to the video contents based on the proposed event-centric schema and virtual space simulation results so that context-aware data can be analyzed, and various applications that have conventionally been difficult to develop due to the insufficient availability of relevant data and semantic information can be developed.
AllenAct is introduced, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research that provides first-class support for a growing collection of embodied environments, tasks and algorithms.
This work proposes a novel Multimodal Environment Memory (MEM) module, facilitating the integration of embodied control with large models through the visual-language memory of scenes, and introduces the Multimodal Embodied Interactive Agent (MEIA), capable of translating high-level tasks expressed in natural language into a sequence of executable actions.
Adding a benchmark result helps the community track progress.