3260 papers • 126 benchmarks • 313 datasets
Image: Fan et al
(Image credit: Papersgraph)
These leaderboards are used to track progress in 3d-object-reconstruction-from-a-single-image-5
Use these libraries to find 3d-object-reconstruction-from-a-single-image-5 models and implementations
No subtasks available.
BWA-MEM automatically chooses between local and end-to-end alignments, supports paired-end reads and performs chimeric alignment, which is robust to sequencing errors and applicable to a wide range of sequence lengths from 70bp to a few megabases.
This paper addresses the problem of 3D reconstruction from a single image, generating a straight-forward form of output unorthordox, and designs architecture, loss function and learning paradigm that are novel and effective, capable of predicting multiple plausible 3D point clouds from an input image.
The proposed Pixel-aligned Implicit Function (PIFu), an implicit representation that locally aligns pixels of 2D images with the global context of their corresponding 3D object, achieves state-of-the-art performance on a public benchmark and outperforms the prior work for clothed human digitization from a single image.
This work formulates a multi-level architecture that is end-to-end trainable and significantly outperforms existing state-of-the-art techniques on single image human shape reconstruction by fully leveraging 1k-resolution input images.
We present Occlusion-Net, a framework to predict 2D and 3D locations of occluded keypoints for objects, in a largely self-supervised manner. We use an off-the-shelf detector as input (like MaskRCNN) that is trained only on visible key point annotations. This is the only supervision used in this work. A graph encoder network then explicitly classifies invisible edges and a graph decoder network corrects the occluded keypoint locations from the initial detector. Central to this work is a trifocal tensor loss that provides indirect self-supervision for occluded keypoint locations that are visible in other views of the object. The 2D keypoints are then passed into a 3D graph network that estimates the 3D shape and camera pose using the self-supervised re-projection loss. At test time, our approach successfully localizes keypoints in a single view under a diverse set of severe occlusion settings. We demonstrate and evaluate our approach on synthetic CAD data as well as a large image set capturing vehicles at many busy city intersections. As an interesting aside, we compare the accuracy of human labels of invisible keypoints against those obtained from geometric trifocal-tensor loss.
A convolutional neural network for joint 3D shape prediction and viewpoint estimation from a single input image that gets the learning signal from a silhouette of an object in the input image-a form of self-supervision.
This paper proposes ARCH (Animatable Reconstruction of Clothed Humans), a novel end-to-end framework for accurate reconstruction of animation-ready 3D clothed humans from a monocular image and shows numerous qualitative examples of animated, high-quality reconstructed avatars unseen in the literature so far.
A key novelty of the proposed technique is to impose 3D geometric reasoning into predicted 3D point clouds by rotating them with randomly sampled poses and then enforcing cycle consistency on both 3D reconstructions and poses.
This paper derives a novel differentiable rendering formulation for learning signed distance functions (SDF) from 2D silhouettes and proposes SDF-SRN, an approach that outperforms the state of the art under challenging single-view supervision settings on both synthetic and real-world datasets.
The main contributions are two ways for leveraging cross-instance consistency: progressive conditioning, a training strategy to gradually specialize the model from category to instances in a curriculum learning fashion; and neighbor reconstruction, a loss enforcing consistency between instances having similar shape or texture.
Adding a benchmark result helps the community track progress.