DuRecDial
Recommendation SystemsA human-to-human Chinese dialog dataset (about 10k dialogs, 156k utterances), which contains multiple sequential dialogs for every pair of a recommendation seeker (user) and a recommender (bot).
DrivingStereo
Depth EstimationDrivingStereo contains over 180k images covering a diverse set of driving scenarios, which is hundreds of times larger than the KITTI Stereo dataset. High-quality labels of disparity are produced by a model-guided filtering strategy from multi-frame LiDAR points.
The Double-Sided Braille Image dataset (DSBI) is a large-scale dataset for Braille image recognition. It has detailed Braille recto dots, verso dots and Braille cells annotation.
DublinCity
3D ReconstructionA novel benchmark dataset that includes a manually annotated point cloud for over 260 million laser scanning points into 100'000 (approx.) assets from Dublin LiDAR point cloud [12] in 2015. Objects are labelled into 13 classes using hierarchical levels of detail from large (i.e., building, vegetation and ground) to refined (i.e., window, door and tree) elements.
Dunhuang Grottoes Painting Dataset
ImagesThis dataset provides a large number of training and testing example which is sufficient for a deep learning approach to address Dunhuang Grotto Painting restoration.
DWIE
Named Entity Recognition (NER)The 'Deutsche Welle corpus for Information Extraction' (DWIE) is a multi-task dataset that combines four main Information Extraction (IE) annotation sub-tasks: (i) Named Entity Recognition (NER), (ii) Coreference Resolution, (iii) Relation Extraction (RE), and (iv) Entity Linking. DWIE is conceived as an entity-centric dataset that describes interactions and properties of conceptual entities on the level of the complete document.
DynaSent
Sentiment AnalysisDynaSent is an English-language benchmark task for ternary (positive/negative/neutral) sentiment analysis. DynaSent combines naturally occurring sentences with sentences created using the open-source Dynabench Platform, which facilities human-and-model-in-the-loop dataset creation. DynaSent has a total of 121,634 sentences, each validated by five crowdworkers.
Drive&Act
Activity RecognitionThe Drive&Act dataset is a state of the art multi modal benchmark for driver behavior recognition. The dataset includes 3D skeletons in addition to frame-wise hierarchical labels of 9.6 Million frames captured by 6 different views and 3 modalities (RGB, IR and depth).