miscellaneous-1

Benchmarking

3260 papers • 126 benchmarks • 313 datasets

This task has no description! Would you like to contribute one?

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in benchmarking-2

Trend

Dataset

Best Model

Actions

Wiki-40B

Libraries

i

Use these libraries to find benchmarking-2 models and implementations

Datasets

Subtasks

No subtasks available.

Most implemented papers

Learning Transferable Visual Models From Natural Language Supervision

I. Sutskever, Alec Radford, Jong Wook Kim, Chris Hallacy, A. Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger•Thu Feb 25 2021

It is demonstrated that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet.

39743

Content

0

Paper Graph

MMDetection: Open MMLab Detection Toolbox and Benchmark

Ziwei Liu, Chen Change Loy, Wanli Ouyang, Jingdong Wang, Tianheng Cheng, Dahua Lin, Xin Lu, Jifeng Dai, Qijie Zhao, Jimmy, Shuyang Sun, Jianping Shi, Zheng Zhang, Yu Xiong, Yue Wu, Kai Chen, Chenchen Zhu, Jiarui Xu, Jiangmiao Pang, Xiaoxiao Li, Yuhang Cao, Wansen Feng, Dazhi Cheng, Buyu Li, Rui Zhu•Sun Jun 16 2019

This paper presents MMDetection, an object detection toolbox that contains a rich set of object detection and instance segmentation methods as well as related components and modules, and conducts a benchmarking study on different methods, components, and their hyper-parameters.

3282 0

Paper Graph

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Kashif Rasul, Han Xiao, Roland Vollgraf•Thu Aug 24 2017

Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits.

10089 0

Paper Graph

CIDEr: Consensus-based image description evaluation

C. L. Zitnick, Devi Parikh, Ramakrishna Vedantam•Wed Nov 19 2014

A novel paradigm for evaluating image descriptions that uses human consensus is proposed and a new automated metric that captures human judgment of consensus better than existing metrics across sentences generated by various sources is evaluated.

5058 0

Paper Graph

Benchmarking Graph Neural Networks

Yoshua Bengio, X. Bresson, T. Laurent, Vijay Prakash Dwivedi, Chaitanya K. Joshi•Sat Dec 31 2022

A reproducible GNN benchmarking framework is introduced, with the facility for researchers to add new models conveniently for arbitrary datasets, and a principled investigation into the recent Weisfeiler-Lehman GNNs (WL-GNNs) compared to message passing-based graph convolutional networks (GCNs).

1110 0

Paper Graph

The StarCraft Multi-Agent Challenge

Tabish Rashid, Mikayel Samvelyan, C. S. D. Witt, Gregory Farquhar, Jakob N. Foerster, Shimon Whiteson, Philip H. S. Torr, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung•Sun Feb 10 2019

The StarCraft Multi-Agent Challenge (SMAC), based on the popular real-time strategy game StarCraft II, is proposed as a benchmark problem and an open-source deep multi-agent RL learning framework including state-of-the-art algorithms is opened.

1131 0

Paper Graph

Benchmarking Deep Reinforcement Learning for Continuous Control

P. Abbeel, Yan Duan, Xi Chen, Rein Houthooft, John Schulman•Thu Apr 21 2016

This work presents a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, task with partial observations, and tasks with hierarchical structure.

1749 0

Paper Graph

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Dan Hendrycks, Thomas G. Dietterich•Wed Mar 27 2019

This paper standardizes and expands the corruption robustness topic, while showing which classifiers are preferable in safety-critical applications, and proposes a new dataset called ImageNet-P which enables researchers to benchmark a classifier's robustness to common perturbations.

4000 0

Paper Graph

Habitat: A Platform for Embodied AI Research

Devi Parikh, Dhruv Batra, V. Koltun, Julian Straub, Jitendra Malik, M. Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Jia Liu•Mon Apr 01 2019

The comparison between learning and SLAM approaches from two recent works are revisited and evidence is found -- that learning outperforms SLAM if scaled to an order of magnitude more experience than previous investigations, and the first cross-dataset generalization experiments are conducted.

1692 0

Paper Graph

Technical Report on the CleverHans v2.1.0 Adversarial Examples Library

Jonas Rauber, Yinpeng Dong, I. Goodfellow, Nicolas Papernot, P. Mcdaniel, Tom B. Brown, Vahid Behzadan, Aurko Roy, Alexey Kurakin, Yash Sharma, Cihang Xie, David Berthelot, Jonathan Uesato, Fartash Faghri, Nicholas Carlini, Reuben Feinman, Alexander Matyasko, Karen Hambardzumyan, Zhishuai Zhang, Yi-Lin Juang, Zhi Li, Ryan Sheatsley, Abhibhav Garg, W. Gierke, P. Hendricks, Rujun Long•Sun Oct 02 2016

The core functionalities of the CleverHans library are presented, namely the attacks based on adversarial examples and defenses to improve the robustness of machine learning models to these attacks.

537 0

Paper Graph

Adding a benchmark result helps the community track progress.