Winning the Lottery with Continuous Sparsification (2019-09-25T00:00:00.000000Z)

TL;DR

Continuous Sparsification is proposed, a new algorithm to search for winning tickets which continuously removes parameters from a network during training, and learns the sub-network's structure with gradient-based methods instead of relying on pruning strategies.

Abstract

The Lottery Ticket Hypothesis conjectures that, for a typically-sized neural network, it is possible to find small sub-networks that, when trained from scratch, match the performance of the dense counterpart given a comparable training budget. The proposed algorithm to search for winning tickets, Iterative Magnitude Pruning, consistently finds sparse sub-networks which train faster and better than the overparameterized models they were extracted from, creating potential applications to problems such as transfer learning. In this paper, we propose Continuous Sparsification, a new algorithm to search for winning tickets which continuously removes parameters from a network during training, and learns the sub-network's structure with gradient-based methods instead of relying on pruning strategies. We show empirically that our method is capable of finding tickets that are sparser than the ones found by Iterative Magnitude Pruning, while achieving higher performance when trained from scratch. Moreover, our method can be efficiently parallelized, decreasing the ticket search cost measured in wall-clock time significantly given enough parallel computing resources.

Authors

M. Maire

3 papers

Pedro H. P. Savarese

1 papers

Hugo Silva

1 papers

TL;DR

Abstract

Authors

References35 items

Soft Threshold Weight Reparameterization for Learnable Sparsity

Linear Mode Connectivity and the Lottery Ticket Hypothesis

Rigging the Lottery: Making All Tickets Winners

Sparse Networks from Scratch: Faster Training without Losing Performance

Using Winning Lottery Tickets in Transfer Learning for Convolutional Neural Networks

One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers

Discovering Neural Wirings

Sparse Transfer Learning via Winning Lottery Tickets

Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask

Stabilizing the Lottery Ticket Hypothesis

The State of Sparsity in Deep Neural Networks

Learning Implicitly Recurrent CNNs Through Parameter Sharing

A Convergence Theory for Deep Learning via Over-Parameterization

DARTS: Differentiable Architecture Search

Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

AMC: AutoML for Model Compression and Acceleration on Mobile Devices

Learning Sparse Neural Networks through L0 Regularization

To prune, or not to prune: exploring the efficacy of pruning for model compression

Learning Efficient Convolutional Networks through Network Slimming

Training Sparse Neural Networks

Categorical Reparameterization with Gumbel-Softmax

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size

Deep Residual Learning for Image Recognition

Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding

Learning both Weights and Connections for Efficient Neural Network

Adam: A Method for Stochastic Optimization

Very Deep Convolutional Networks for Large-Scale Image Recognition

ImageNet Large Scale Visual Recognition Challenge

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

De-noising by soft-thresholding

The role of over-parametrization in generalization of neural networks

Learning Multiple Layers of Features from Tiny Images

Optimal Brain Damage

Introduction to Numerical Continuation Methods

Field of Study

Journal Information

Name

Volume

Venue Information

Name

Type

URL

Alternate Names