3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in 3d-question-answering-3d-qa-1
Use these libraries to find 3d-question-answering-3d-qa-1 models and implementations
No datasets available.
No subtasks available.
A baseline model for 3D-QA is proposed, called the ScanQA11, which learns a fused descriptor from 3D object proposals and encoded sentence embeddings that correlates language expressions with the underlying geometric features of the 3D scan and facilitates the regression of 3D bounding boxes to determine the described objects in textual questions.
This work proposes to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs that can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on.
This work proposes the first generalist model for embodied navigation, NaviLLM, which adapts LLMs to em-bodied navigation by introducing schema-based instruction and demonstrates strong generalizability and presents im-pressive results on unseen tasks, e.g. embodied question answering and 3D captioning.
Adding a benchmark result helps the community track progress.