3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in robot-manipulation-2
Use these libraries to find robot-manipulation-2 models and implementations
A novel deep neural network for 6D pose matching named DeepIM is proposed, trained to predict a relative pose transformation using a disentangled representation of 3D location and 3D orientation and an iterative training process.
A novel method called SilhoNet is introduced that predicts 6D object pose from monocular images using a convolutional neural network pipeline that takes in region of interest proposals to simultaneously predict an intermediate silhouette representation for objects with an associated occlusion mask and a 3D translation vector.
This paper introduces a framework whereby an object locomotion policy is initially obtained using a realistic physics simulator, and this policy is then used to generate auxiliary rewards, called simulated locomotion demonstration rewards (SLDRs), which enable us to learn the robot manipulation policy.
This paper introduces 3D Dynamic Scene Representation (DSR), a 3D volumetric scene representation that simultaneously discovers, tracks, reconstructs objects, and predicts their dynamics while capturing all three properties, and proposes DSR-Net, which learns to aggregate visual observations over multiple interactions to gradually build and refine DSR.
An extensive study of the most critical challenges in learning language conditioned policies from offline free-form imitation datasets is conducted and a novel approach is presented that significantly outperforms the state of the art on the challenging language conditioned long-horizon robot manipulation CALVIN benchmark.
The main idea is to design an intrinsic reward by measuring the novelty based on learned reward by utilizing disagreement across ensemble of learned reward models, which reflects uncertainty in tailored human feedback and could be useful for exploration.
This work proposes a unified transformer-based approach that takes into account multiple inputs, integrates natural language instructions and multi-view scene observations and improves manipulation precision using multiple views and outperforms the state of the art.
It is shown that a wide spectrum of robot manipulation tasks can be expressed with multimodal prompts, interleaving textual and visual tokens, and designed a transformer-based robot agent, VIMA, that processes these prompts and outputs motor actions autoregressively.
Act3D is introduced, a manipulation policy transformer that represents the robot's workspace using a 3D feature field with adaptive resolutions dependent on the task at hand, and sets a new state-of-the-art in RL-Bench, an established manipulation benchmark.
Adding a benchmark result helps the community track progress.