ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes

Published in

Computer Vision and Pattern Recognition(2017)

External Links:

Generate Graph DownloadPDF

TL;DR

This work introduces ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations, and shows that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks.

Abstract

A key requirement for leveraging supervised deep learning methods is the availability of large, labeled datasets. Unfortunately, in the context of RGB-D scene understanding, very little data is available – current datasets cover a small range of scene views and have limited semantic annotations. To address this issue, we introduce ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations. To collect this data, we designed an easy-to-use and scalable RGB-D capture system that includes automated surface reconstruction and crowdsourced semantic annotation. We show that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks, including 3D object classification, semantic voxel labeling, and CAD model retrieval.

Authors

T. Funkhouser

13 papers

Angel X. Chang

16 papers

Angela Dai

11 papers

References100 items

Joint 2D-3D-Semantic Data for Indoor Scene Understanding

Shape Completion Using 3D-Encoder-Predictor CNNs and Shape Synthesis

Semantic Scene Completion from a Single Depth Image

A Robust 3D-2D Interactive Tool for Scene Segmentation and Annotation

Real-Time Large-Scale Dense 3D Reconstruction with Loop Closure

ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes

Published in

Computer Vision and Pattern Recognition(2017)

External Links:

Generate Graph DownloadPDF

TL;DR

Abstract

Authors

T. Funkhouser

13 papers

Angel X. Chang

16 papers

Angela Dai

11 papers

References100 items

Joint 2D-3D-Semantic Data for Indoor Scene Understanding

Shape Completion Using 3D-Encoder-Predictor CNNs and Shape Synthesis

Semantic Scene Completion from a Single Depth Image

A Robust 3D-2D Interactive Tool for Scene Segmentation and Annotation

Real-Time Large-Scale Dense 3D Reconstruction with Loop Closure

Maciej Halber

2 papers

M. Nießner

31 papers

SceneNN: A Scene Meshes Dataset with aNNotations

Multi-Label Semantic 3D Reconstruction Using Voxel Blocks

ElasticFusion: Real-time dense SLAM and light source estimation

SemanticFusion: Dense 3D semantic mapping with convolutional neural networks

3D Semantic Parsing of Large-Scale Indoor Spaces

Large-Scale Semantic 3D Reconstruction: An Adaptive Multi-resolution Model for Multi-class Volumetric Labeling

Online Reconstruction of Indoor Scenes from RGB-D Streams

Volumetric and Multi-view CNNs for Object Classification on 3D Data

RGBD Datasets: Past, Present and Future

VolumeDeform: Real-Time Volumetric Non-rigid Reconstruction

Learning to Navigate the Energy Landscape

A Large Dataset of Object Scans

ElasticFusion: Dense SLAM Without A Pose Graph

ShapeNet: An Information-Rich 3D Model Repository

Nonparametric Calibration for Depth Sensors

Very High Frame Rate Volumetric Integration of Depth Images on Mobile Devices

Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images

Fall detection using ceiling-mounted 3D depth camera

VoxNet: A 3D Convolutional Neural Network for real-time object recognition

Shading-based refinement on volumetric signed distance functions

Chisel: Real Time Large Scale 3D Reconstruction Onboard a Mobile Device using Spatially Hashed Signed Distance Fields

SUN RGB-D: A RGB-D scene understanding benchmark suite

Robust reconstruction of indoor scenes

Multi-view Convolutional Neural Networks for 3D Shape Recognition

Database‐Assisted Object Retrieval for Real‐Time 3D Reconstruction

Fully convolutional networks for semantic segmentation

A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM

Dense 3D semantic mapping of indoor scenes from RGB-D images

Unfolding an Indoor Origami World

Sliding Shapes for 3D Object Detection in Depth Images

KinectFaceDB: A Kinect Database for Face Recognition

Learning Rich Features from RGB-D Images for Object Detection and Segmentation

Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments

3D ShapeNets: A deep representation for volumetric shapes

Object detection and classification from large‐scale cluttered indoor scans

Microsoft COCO: Common Objects in Context

Building Part-Based Object Detectors via 3D Geometry

SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels

Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors

Data-Driven 3D Primitives for Single Image Understanding

Automatic Registration of RGB-D Scans via Salient Directions

Holistic Scene Understanding for 3D Object Detection with RGBD Cameras

Support Surface Prediction in Indoor Scenes

3D Scene Understanding by Voxel-CRF

Real-time 3D reconstruction at scale using voxel hashing

Combining embedded accelerometers with computer vision for recognizing food preparation activities

Spoofing in 2D face recognition with 3D masks and anti-spoofing with Kinect

Learning Discriminative Representations from RGB-D Video Data

Scalable real-time volumetric surface reconstruction

Unsupervised Intrinsic Calibration of Depth Sensors via SLAM

Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images

Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images

Object discovery in 3D scenes via shape analysis

Object disappearance for object discovery

Segmentation of unknown objects in indoor environments

A benchmark for the evaluation of RGB-D SLAM systems

ImageNet classification with deep convolutional neural networks

Re-identification with RGB-D Sensors

A Global Hypotheses Verification Method for 3D Object Recognition

Indoor Segmentation and Support Inference from RGBD Images

Learning human activities and object affordances from RGB-D videos

Kintinuous: Spatially Extended KinectFusion

Recovering free space of indoor scenes from a single image

RGB-(D) scene labeling: Features and algorithms

An evaluation of the RGB-D SLAM system

Instructing people for training gestural interactive systems

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

People detection in RGB-D data

People tracking in RGB-D data with on-line boosted target models

Latent structured models for human pose estimation

Indoor scene segmentation using a structured light sensor

A category-level 3-D object dataset: Putting the Kinect to work

RGBD-HuDaAct: A color-depth video database for human daily activity recognition

KinectFusion: Real-time dense surface mapping and tracking

Tracking a depth camera: Parameter exploration for fast ICP

Human Activity Detection from RGBD Images

On the Repeatability and Quality of Keypoints for Local Feature-based 3D Object Retrieval from Cluttered Scenes

LabelMe: A Database and Web-Based Tool for Image Annotation

Efficient Graph-Based Image Segmentation

A volumetric method for building complex models from range images

Occipital: The structure sensor

mobile device using spatially hashed signed distance ﬁelds

Scenenet: Understanding real world indoor scenes with synthetic data

Derek Hoiem and R

When Can We Use KinectFusion for Ground Truth Acquisition

CloudCompare3D point cloud and mesh processing software

Edinburgh Research Explorer The PASCAL Visual Object Classes (VOC) Challenge

100

[ A.1. Example Annotated Reconstructions Fig. 10 shows six example annotated reconstructions

Field of Study

Computer Science

Journal Information

Name

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Page

2432-2443

Venue Information

Name

Computer Vision and Pattern Recognition

Type

conference

URL

https://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000147

Alternate Names

CVPR
Comput Vis Pattern Recognit

TL;DR

Abstract

Authors

References100 items

Joint 2D-3D-Semantic Data for Indoor Scene Understanding

Shape Completion Using 3D-Encoder-Predictor CNNs and Shape Synthesis

Semantic Scene Completion from a Single Depth Image

A Robust 3D-2D Interactive Tool for Scene Segmentation and Annotation

Real-Time Large-Scale Dense 3D Reconstruction with Loop Closure

TL;DR

Abstract

Authors

References100 items

Joint 2D-3D-Semantic Data for Indoor Scene Understanding

Shape Completion Using 3D-Encoder-Predictor CNNs and Shape Synthesis

Semantic Scene Completion from a Single Depth Image

A Robust 3D-2D Interactive Tool for Scene Segmentation and Annotation

Real-Time Large-Scale Dense 3D Reconstruction with Loop Closure

SceneNN: A Scene Meshes Dataset with aNNotations

Multi-Label Semantic 3D Reconstruction Using Voxel Blocks

ElasticFusion: Real-time dense SLAM and light source estimation

SemanticFusion: Dense 3D semantic mapping with convolutional neural networks

PiGraphs

3D Semantic Parsing of Large-Scale Indoor Spaces

Large-Scale Semantic 3D Reconstruction: An Adaptive Multi-resolution Model for Multi-class Volumetric Labeling

Online Reconstruction of Indoor Scenes from RGB-D Streams

Volumetric and Multi-view CNNs for Object Classification on 3D Data

BundleFusion

RGBD Datasets: Past, Present and Future

VolumeDeform: Real-Time Volumetric Non-rigid Reconstruction

Learning to Navigate the Energy Landscape

A Large Dataset of Object Scans

ElasticFusion: Dense SLAM Without A Pose Graph

ShapeNet: An Information-Rich 3D Model Repository

Nonparametric Calibration for Depth Sensors

Very High Frame Rate Volumetric Integration of Depth Images on Mobile Devices

Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images

SemanticPaint

Fall detection using ceiling-mounted 3D depth camera

VoxNet: A 3D Convolutional Neural Network for real-time object recognition

Shading-based refinement on volumetric signed distance functions

Chisel: Real Time Large Scale 3D Reconstruction Onboard a Mobile Device using Spatially Hashed Signed Distance Fields

SUN RGB-D: A RGB-D scene understanding benchmark suite

Robust reconstruction of indoor scenes

Multi-view Convolutional Neural Networks for 3D Shape Recognition

Database‐Assisted Object Retrieval for Real‐Time 3D Reconstruction

SceneGrok

Fully convolutional networks for semantic segmentation

A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM

Dense 3D semantic mapping of indoor scenes from RGB-D images

Unfolding an Indoor Origami World

Sliding Shapes for 3D Object Detection in Depth Images

KinectFaceDB: A Kinect Database for Face Recognition

Learning Rich Features from RGB-D Images for Object Detection and Segmentation

Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments

3D ShapeNets: A deep representation for volumetric shapes

Object detection and classification from large‐scale cluttered indoor scans

Microsoft COCO: Common Objects in Context

Building Part-Based Object Detectors via 3D Geometry

SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels

Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors

Data-Driven 3D Primitives for Single Image Understanding

Automatic Registration of RGB-D Scans via Salient Directions

Holistic Scene Understanding for 3D Object Detection with RGBD Cameras

Support Surface Prediction in Indoor Scenes

3D Scene Understanding by Voxel-CRF

Real-time 3D reconstruction at scale using voxel hashing

Combining embedded accelerometers with computer vision for recognizing food preparation activities

Spoofing in 2D face recognition with 3D masks and anti-spoofing with Kinect

Learning Discriminative Representations from RGB-D Video Data

Scalable real-time volumetric surface reconstruction

Unsupervised Intrinsic Calibration of Depth Sensors via SLAM

Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images

Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images

Object discovery in 3D scenes via shape analysis

Object disappearance for object discovery

Segmentation of unknown objects in indoor environments

A benchmark for the evaluation of RGB-D SLAM systems

ImageNet classification with deep convolutional neural networks

Re-identification with RGB-D Sensors