Research Connect
Research PapersAboutContact

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Published in ACM Computing Surveys (2021-06-16)
aionlincourseaionlincourseaionlincourseaionlincourseaionlincourseaionlincourse
Generate GraphDownload

On This Page

  • TL;DR
  • Abstract
  • Authors
  • Datasets
  • References
TL

TL;DR

This is the first comprehensive survey in the efficient deep learning space that covers the landscape of model efficiency from modeling techniques to hardware support and the seminal work there.

Abstract

Deep learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval, and more. However, with the progressive improvements in deep learning models, their number of parameters, latency, and resources required to train, among others, have all increased significantly. Consequently, it has become important to pay attention to these footprint metrics of a model as well, not just its quality. We present and motivate the problem of efficiency in deep learning, followed by a thorough survey of the five core areas of model efficiency (spanning modeling techniques, infrastructure, and hardware) and the seminal work there. We also present an experiment-based guide along with code for practitioners to optimize their model training and deployment. We believe this is the first comprehensive survey in the efficient deep learning space that covers the landscape of model efficiency from modeling techniques to hardware support. It is our hope that this survey would provide readers with the mental model and the necessary understanding of the field to apply generic efficiency techniques to immediately get significant improvements, and also equip them with ideas for further research and experimentation to achieve additional gains.

Authors

Gaurav Menghani

1 Paper

Datasets

CIFAR-10

Canadian Institute for Advanced Research, 10 classes

ImageNet

References180 items

1

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Computer ScienceMathematics
2

Deep Residual Learning for Image Recognition

3

ImageNet: A large-scale hierarchical image database

4

This Paper Is Included in the Proceedings of the 12th Usenix Symposium on Operating Systems Design and Implementation (osdi '16). Tensorflow: a System for Large-scale Machine Learning Tensorflow: a System for Large-scale Machine Learning

Computer Science

Research Impact

273

Citations

180

References

2

Datasets

1

5

Xception: Deep Learning with Depthwise Separable Convolutions

6

Going deeper with convolutions

7

ImageNet classification with deep convolutional neural networks

8

Gradient-based learning applied to document recognition

9

High-Performance Neural Networks for Visual Object Classification

10

Learning both Weights and Connections for Efficient Neural Network

11

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

12

Big Self-Supervised Models are Strong Semi-Supervised Learners

13

Language Models are Few-Shot Learners

14

A Simple Framework for Contrastive Learning of Visual Representations

15

Self-Training With Noisy Student Improves ImageNet Classification

16

Unsupervised Representation Learning by Predicting Image Rotations

17

mixup: Beyond Empirical Risk Minimization

18

Revisiting Unreasonable Effectiveness of Data in Deep Learning Era

19

Unsupervised Visual Representation Learning by Context Prediction

20

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

21

On Bayesian Methods for Seeking the Extremum

22

Attention is All you Need

23

SMOTE: Synthetic Minority Over-sampling Technique

24

Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization

25

Random Search for Hyper-Parameter Optimization

26

Multiple Classifier Systems

27

MnasNet: Platform-Aware Neural Architecture Search for Mobile

28

Learning Multiple Layers of Features from Tiny Images

29

GloVe: Global Vectors for Word Representation

30

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

31

Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding

32

Efficient Neural Architecture Search via Parameter Sharing

33

Large-scale deep unsupervised learning using graphics processors

34

Very Deep Convolutional Networks for Large-Scale Image Recognition

35

DARTS: Differentiable Architecture Search

36

The Bottleneck

37

Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges

38

Distilling the Knowledge in a Neural Network

39

Randaugment: Practical automated data augmentation with a reduced search space

40

Learning Transferable Architectures for Scalable Image Recognition

41

AutoAugment: Learning Augmentation Strategies From Data

42

Data Augmentation by Pairing Samples for Images Classification

43

An Attentive Survey of Attention Models

44

Neural Machine Translation by Jointly Learning to Align and Translate

45

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

46

Optimal Brain Damage

47

Efficient Transformers: A Survey

48

QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension

49

Neural Architecture Search with Reinforcement Learning

50

MobileNetV2: Inverted Residuals and Linear Bottlenecks

51

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

52

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

53

Hierarchical Text-Conditional Image Generation with CLIP Latents

54

PaLM: Scaling Language Modeling with Pathways

55

The Efficiency Misnomer

56

MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer

57

A comprehensive survey on optimizing deep learning models by metaheuristics

58

Distilling Large Language Models into Tiny and Effective Students using pQRNN

59

Amazon SageMaker Automatic Model Tuning: Scalable Black-box Optimization

60

Characterising Bias in Compressed Models

61

A Survey on Deep Neural Network Compression: Challenges, Overview, and Solutions

62

Towards Accurate Post-training Network Quantization via Bit-Split and Stitching

63

Neural Structured Learning: Training Neural Networks with Structured Signals

64

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

65

Exploring Bayesian Optimization

66

Training with Quantization Noise for Extreme Model Compression

67

ProFormer: Towards On-Device LSH Projection Based Transformers

68

MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices

69

Hyper-Parameter Optimization: A Review of Algorithms and Applications

70

Multi-modal Self-Supervision from Generalized Data Transformations

71

TinyML: Machine Learning with TensorFlow Lite on Arduino and Ultra-Low-Power Microcontrollers

72

Rigging the Lottery: Making All Tickets Winners

73

Fast Sparse ConvNets

74

Mastering Atari, Go, chess and shogi by planning with a learned model

75

Learning from a Teacher using Unlabeled Data

76

PRADO: Projection Attention Networks for Document Classification On-Device

77

Sparse Networks from Scratch: Faster Training without Losing Performance

78

Transferable Neural Projection Representations

79

Searching for MobileNetV3

80

Billion-scale semi-supervised learning for image classification

81

MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning

82

The State of Sparsity in Deep Neural Networks

83

FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search

84

Microsoft Research

85

Rethinking the Value of Network Pruning

86

SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation

87

Tune: A Research Platform for Distributed Model Selection and Training

88

MONAS: Multi-Objective Neural Architecture Search using Reinforcement Learning

89

Quantizing deep convolutional networks for efficient inference: A whitepaper

90

Glow: Graph Lowering Compiler Techniques for Neural Networks

91

The Lottery Ticket Hypothesis: Training Pruned Neural Networks

92

Model compression via distillation and quantization

93

Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions

94

AMC: AutoML for Model Compression and Acceleration on Mobile Devices

95

Regularized Evolution for Image Classifier Architecture Search

96

Universal Language Model Fine-tuning for Text Classification

97

Advances in Pre-Training Distributed Word Representations

98

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

99

Progressive Neural Architecture Search

100

Population Based Training of Neural Networks

101

To prune, or not to prune: exploring the efficacy of pruning for model compression

102

Google Vizier: A Service for Black-Box Optimization

103

ProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections

104

On Compressing Deep Models by Low Rank and Sparse Decomposition

105

Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon

106

In-datacenter performance analysis of a tensor processing unit

107

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

108

Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning

109

Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer

110

Quasi-Recurrent Neural Networks

111

Pruning Convolutional Neural Networks for Resource Efficient Inference

112

Adaptive data augmentation for image classification

113

Pruning Filters for Efficient ConvNets

114

Ternary Weight Networks

115

Do Deep Convolutional Nets Really Need to be Deep and Convolutional?

116

XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks

117

Structured Pruning of Deep Convolutional Neural Networks

118

Non-stochastic Best Arm Identification and Hyperparameter Optimization

119

Deep Speech: Scaling up end-to-end speech recognition

120

Sequence to Sequence Learning with Neural Networks

121

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

122

Model compression

123

Best practices for convolutional neural networks applied to visual document analysis

124

Similarity estimation techniques from rounding algorithms

125

Torch

126

Optimal Brain Surgeon and general network pruning

127

Neural Network Ensembles

128

Estimating xn–PaLM-kd53c’s training cost. Blog.heim

129

Automatic Mixed Precision examples — PyTorch 1.8.1 documentation

130

TensorFlow models on the Edge TPU | Coral

131

XNNPACK backend for TensorFlow

132

Contributors to Wikimedia projects

133

advisor

134

Cloud TPU | Google Cloud

135

What makes TPUs fine-tuned for deep learning? | Google Cloud Blog

136

XNNPACK Authors

137

Edge TPU performance benchmarks | Coral

138

BFloat16: The Secret to High Performance on Cloud TPUs | Google Cloud Blog

139

Matrix Compression Operator

140

TensorFlow 2 MLPerf submissions demonstrate best-in-class performance on Google Cloud

141

Neural Networks API | Android NDK | Android Developers

142

The Illustrated Transformer

143

Performance Tuning Guide — PyTorch Tutorials 1.8.1+cu102 documentation

144

Accelerate | Apple Developer Documentation

145

Setting the learning rate of your neural network. Jeremy Jordan (Aug 2020)

146

GTC 2020: Accelerating Sparsity in the NVIDIA Ampere Architecture

147

Automating data augmentation: Practice, theory and new direction

148

Inside Volta: The World’s Most Advanced Data Center GPU | NVIDIA Developer Blog

149

Training Neural Networks with Tensor Cores

150

QNNPACK: Open Source Library for Optimized Mobile Deep Learning—Facebook Engineering

151

TensorFlow Model Optimization Toolkit — Post-Training Integer Quantization

152

Review: Xception — With Depthwise Separable Convolution, Better Than Inception-v3 (Image Classification)

153

Pixel 4 is here to help

154

Self-Governing Neural Networks for On-Device Short Text Classification

155

Yann LeCun @EPFL - "Self-supervised learning: could machines learn like humans?

156

Binarized Neural Networks

157

Neural Machine Translation Systems for WMT 16

158

Ran El-Yaniv, and Yoshua Bengio

159

TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems

160

Google for Research

161

Improving the speed of neural networks on CPUs

162

google,我,萨娜

163

Neural Network Ensembles, Cross Validation, and Active Learning

164

Why systolic architectures?

165

Algorithms for VLSI Processor Arrays

166

Introduction to VLSI systems

167

The 11th International Conference on Information Sciences, Signal Processing and their Applications: Main Tracks Towards Unsupervised Speech Processing

168

AVX-512-Wikipedia

169

TensorFlow Lite | ML for Mobile and Edge Devices

170

Multiply-Accumulate Operation-Wikipedia

171

NVIDIA Embedded Systems for Next-Gen Autonomous Machines

172

The Keras Blog

173

Model Optimization | TensorFlow Lite

174

Blazingly Fast Computer Vision Training with the Mosaic ResNet and Composer

175

XLA: Optimizing Compiler for Machine Learning | TensorFlow

176

Efficient Deep Learning: A Survey on Making Deep Learning Models

177

Evaluation Strategy : This defines how we evaluate a model for fitness. It can simply be a conventional metric like validation loss, accuracy, etc. Or it can also be a compound metric,

178

Gaurav

179

FC layers ignore the spatial information of the input pixels

180

SIMD ISAs | Neon-Arm Developer

Authors

Field of Study

Computer Science

Journal Information

Name

ACM Computing Surveys

Volume

55

Venue Information

Name

ACM Computing Surveys

Type

journal

URL

http://www.acm.org/pubs/surveys/

Alternate Names

  • ACM Comput Surv