Dive into Deep Learning

Published in

Journal of the American College of Radiology(2020)

External Links:

Generate Graph

TL;DR

This open-source book represents the attempt to make deep learning approachable, teaching readers the concepts, the context, and the code, seamlessly integrating exposition figures, math, and interactive examples with self-contained code.

Authors

Aston Zhang

5 papers

Zachary Chase Lipton

13 papers

Mu Li

2 papers

Alexander J. Smola

1 papers

References187 items

Linear Analysis

Text Style Transfer: A Review and Experiment Evaluation

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

ControlVAE: Controllable Variational Autoencoder

Deep Learning for Natural Language Processing in Radiology-Fundamentals and a Systematic Review.

Dive into Deep Learning

Published in

Journal of the American College of Radiology(2020)

External Links:

Generate Graph

TL;DR

Authors

Aston Zhang

5 papers

Zachary Chase Lipton

13 papers

Mu Li

2 papers

Alexander J. Smola

1 papers

References187 items

Linear Analysis

Text Style Transfer: A Review and Experiment Evaluation

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

ControlVAE: Controllable Variational Autoencoder

Deep Learning for Natural Language Processing in Radiology-Fundamentals and a Systematic Review.

Rise of Robot Radiologists

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Assisting radiologists with reporting urgent findings to referring physicians: A machine learning approach to identify cases for prompt communication

Use of Machine Learning to Identify Follow-Up Recommendations in Radiology Reports

Lectures on Convex Optimization

A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation

Towards Understanding Regularization in Batch Normalization

Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes

Object Detection With Deep Learning: A Review

Troubling Trends in Machine Learning Scholarship

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks

Neural Network Acceptability Judgments

How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift)

Averaging Weights Leads to Wider Optima and Better Generalization

Sequence-Aware Recommender Systems

Bayesian Uncertainty Estimation for Batch Normalized Deep Networks

On the Convergence of Adam and Beyond

Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding

Deep Learning to Classify Radiology Free-Text Reports.

Deep Learning: A Primer for Radiologists.

Progressive Growing of GANs for Improved Quality, Stability, and Variation

Squeeze-and-Excitation Networks

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

The Microsoft 2017 Conversational Speech Recognition System

Libratus: The Superhuman AI for No-Limit Poker

Focal Loss for Dense Object Detection

Learned in Translation: Contextualized Word Vectors

SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation

Device Placement Optimization with Reinforcement Learning

In-datacenter performance analysis of a tensor processing unit

Why Momentum Really Works

Neural Collaborative Filtering

Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks

DeepFM: A Factorization-Machine based Neural Network for CTR Prediction

A Structured Self-attentive Sentence Embedding

Recurrent Recommender Networks

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Densely Connected Convolutional Networks

Enriching Word Vectors with Subword Information

Image Style Transfer Using Convolutional Neural Networks

Gaussian Error Linear Units (GELUs)

Natural Language Processing in Radiology: A Systematic Review.

A guide to convolution arithmetic for deep learning

Identity Mappings in Deep Residual Networks

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

Mastering the game of Go with deep neural networks and tree search

Natural Language Processing Technologies in Radiology Research and Clinical Applications.

Deep Residual Learning for Image Recognition

SSD: Single Shot MultiBox Detector

Rethinking the Inception Architecture for Computer Vision

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

The fallacy of placing confidence in confidence intervals

Neural Machine Translation of Rare Words with Subword Units

AutoRec: Autoencoders Meet Collaborative Filtering

From Averaging to Acceleration, There is Only a Step-size

End-To-End Memory Networks

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Gunrock: a high-performance graph processing library on the GPU

Adam: A Method for Stochastic Optimization

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Fully convolutional networks for semantic segmentation

GloVe: Global Vectors for Word Representation

Going deeper with convolutions

Very Deep Convolutional Networks for Large-Scale Image Recognition

On the Properties of Neural Machine Translation: Encoder–Decoder Approaches

Neural Machine Translation by Jointly Learning to Align and Translate

Convolutional Neural Networks for Sentence Classification

Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Distributed Representations of Words and Phrases and their Compositionality

Generating Sequences With Recurrent Neural Networks

On the importance of initialization and momentum in deep learning

Selective Search for Object Recognition

Efficient Estimation of Word Representations in Vector Space

ADADELTA: An Adaptive Learning Rate Method

ImageNet classification with deep convolutional neural networks

Ames, Iowa: Alternative to the Boston Housing Data as an End of Semester Regression Project

Exploiting geographical influence for collaborative point-of-interest recommendation

Bayesian Learning via Stochastic Gradient Langevin Dynamics

Learning Word Vectors for Sentiment Analysis

The sequence memoizer

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

Factorization Machines

An architecture for parallel topic models

Understanding the difficulty of training deep feedforward neural networks

Convex optimization

Matrix Factorization Techniques for Recommender Systems

Collaborative filtering with temporal dynamics

Collaborative Filtering for Implicit Feedback Datasets

Axiomatic Characterizations of Information Measures

Pattern Recognition and Machine Learning

100

SURF: Speeded Up Robust Features

101

Histograms of oriented gradients for human detection

102

2005 Special Issue: Framewise phoneme classification with bidirectional LSTM and other neural network architectures

103

Distinctive Image Features from Scale-Invariant Keypoints

104

All of Statistics: A Concise Course in Statistical Inference

105

A Neural Probabilistic Language Model

106

Bleu: a Method for Automatic Evaluation of Machine Translation

108

Item-based collaborative filtering recommendation algorithms

109

An Algorithmic Framework for Performing Collaborative Filtering

110

Long Short-Term Memory

111

Bidirectional recurrent neural networks

112

Using collaborative filtering to weave an information tapestry

114

Backpropagation Through Time: What It Does and How to Do It

115

A Statistical Approach to Machine Translation

116

A Statistical Approach to Language Translation

117

Learning representations by back-propagating errors

118

A Computational Approach to Edge Detection

119

Introduction to Modern Information Retrieval

120

A vector space model for automatic indexing

121

Computer Architecture: A Quantitative Approach

122

Statistical Ensembles of Complex, Quaternion, and Real Matrices

123

On the Distribution of the Roots of Certain Symmetric Matrices

124

Outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability

125

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

126

Optimal Message Scheduling for Aggregation

127

Adaptive Methods for Nonconvex Optimization

128

GENERATIVE ADVERSARIAL NETS

130

Proposal Scaling Distributed Machine Learning with System and Algorithm Co-design

131

Dropout: a simple way to prevent neural networks from overfitting

132

Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning

133

Imagenet classification: fast descriptor coding and large-scale svm training. Large scale visual recognition challenge

134

Introduction to linear Algebra

135

A Survey of Collaborative Filtering Techniques

136

The Matrix Cookbook

137

Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies

138

The generalized distributive law

139

Gradient-based learning applied to document recognition

140

A logical calculus of the ideas immanent in nervous activity

141

VLSI Array processors

142

Smooth regression analysis

143

Some methods of speeding up the convergence of iteration methods

144

On Estimating Regression

145

The principles of psychology

146

Current address: Microsoft Research,

147

MXNetʼs contrib.text package provides functions and classes related to natural language processing (see the GluonNLP 201 tool package for more details). Next

148

Then, use the trained model to classify the sentiments of two simple sentences. predict_sentiment(net, vocab, 'this movie is so great') 'positive' predict_sentiment(net, vocab

149

AB for two square matrices in R 1024×1024 . Which one is faster? d2l.build_colormap2label() Build a RGB color to label mapping for segmentation

150

• When the data set is large, we usually sample the context words and the noise words for the central target word in the current mini-batch only when updating the model parameters

151

Set sparse_grad=True when creating an instance of nn.Embedding. Does it accelerate training? Look up MXNet documentation to learn the meaning of this argument

152

• Try to find synonyms for other words

153

Below, we demonstrate the application of pre-trained word vectors, using GloVe as an example

154

• When using Adagrad, the learning rate of each element in the independent variable decreases (or remains unchanged) during iteration. before allreduce

155

What accuracy rate can you achieve on the training and testing data sets? What about trying to re-tune other hyper-parameters?

156

The concatenated vector is transformed into the output for each category through the fully connected layer. A dropout layer can be used in this step to deal with overfitting

157

def predict_sentiment(net, vocab, sentence): sentence = nd.array(vocab[sentence

158

Timer() metric = d2l.Accumulator(2) # loss_sum, num_tokens for i, batch in enumerate(data_iter): center, context_negative, mask, label = [ data.as_in_context(ctx) for data in batch

159

Summary • We can use Gluon to train a skip-gram model through negative sampling

160

V voc_label_indices() (in module d2l

161

net_D, net_G, loss, trainer_D) Update discriminator d2l.update_G(Z, net_D, net_G, loss, trainer_G) Update generator d2l.use_svg_display() Use the svg format to display plot in jupyter

162

MLPAttention (class in d2l), 664 P plot() (in module d2l

163

Tune the hyper-parameters and compare the two sentiment analysis methods, using recurrent neural networks and using convolutional neural networks, as regards accuracy and operational efficiency

164

Perform max-over-time pooling on all output channels, and then concatenate the pooling output values of these channels in a vector

165

We set the maximum context window size to 5. The following extracts all the central target words and their context words in the data set

166

We can apply pre-trained word vectors and recurrent neural networks to classify the emotions in a text

167

Adagrad constantly adjusts the learning rate during iteration to give each element in the independent variable of the objective function its own learning rate

168

• Scan the QR code to access the relevant discussions and exchange ideas about the methods used and the results obtained with the community

169

splitted = gluon.utils.split_and_load(data, ctx) print('input: ', data) print('load into', ctx) print

170

We can use a word to get its index in the dictionary, or we can get the word from its index

171

Clipping input data to the valid range for imshow with RGB data

172

• Tune the hyper-parameters and observe and analyze the experimental results

173

/data/VOCdevkit/VOC2012', is_train=True) Read all VOC feature and label images

174

Next, we start to train the model. First, we set the height and width of the content and style images to 150 by 225 pixels. We use the content image to initialize the composite image

175

Text classification transforms a sequence of text of indefinite length into a category of text

176

Assume the maximum context window is 2 and print all the central target words and their context words. tiny_dataset = [list(range(7)), list(range(7, 10))] print('dataset', tiny_dataset) for center

177

= (loss(pred.reshape(label.shape), label, mask) / mask.sum(axis=1) * mask.shape

178

backward() trainer.step(batch_size) metric.add(l.sum().asscalar(), l.size) if (i+1) % 50 == 0: animator.add(epoch+(i+1)/len(data_iter)

179

The general naming conventions for pre-trained GloVe models are "model.(data set.)number of words in data set.word vector dimension.txt". For more information

180

Now, we try to divide the 6 data instances equally between 2 GPUs using the split_and_load function

181

666 set_figsize() (in module d2l), 666 show_bboxes() (in module d2l), 666 show_images() (in module d2l

182

/data') Download the VOC2012 segmentation dataset

183

Load the pikachu dataset d2l.load_data_voc(batch_size, crop_size) Download and load the VOC2012 semantic dataset

184

# tokens per sentence') d2l.plt.ylabel('count') d2l.plt.legend

185

Finally, define the prediction function

186

Smola is VP/Distinguished Scientist for Machine Learning at Amazon Web Services

187

output image shape:', out_img.shape) d2l

Field of Study

Computer ScienceMedicine

Journal Information

Name

Journal of the American College of Radiology : JACR

Venue Information

Name

Journal of the American College of Radiology

Type

journal

URL

http://www.sciencedirect.com/science/journal/15461440

Alternate Names

J Am Coll Radiol
Journal of The American College of Radiology

TL;DR

Authors

References187 items

Linear Analysis

Text Style Transfer: A Review and Experiment Evaluation

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

ControlVAE: Controllable Variational Autoencoder

Deep Learning for Natural Language Processing in Radiology-Fundamentals and a Systematic Review.

TL;DR

Authors

References187 items

Linear Analysis

Text Style Transfer: A Review and Experiment Evaluation

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

ControlVAE: Controllable Variational Autoencoder

Deep Learning for Natural Language Processing in Radiology-Fundamentals and a Systematic Review.

Rise of Robot Radiologists

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Assisting radiologists with reporting urgent findings to referring physicians: A machine learning approach to identify cases for prompt communication

Use of Machine Learning to Identify Follow-Up Recommendations in Radiology Reports

Lectures on Convex Optimization

A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation

Towards Understanding Regularization in Batch Normalization

Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes

Object Detection With Deep Learning: A Review

Troubling Trends in Machine Learning Scholarship

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks

Neural Network Acceptability Judgments

How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift)

Averaging Weights Leads to Wider Optima and Better Generalization

Sequence-Aware Recommender Systems

Bayesian Uncertainty Estimation for Batch Normalized Deep Networks

On the Convergence of Adam and Beyond

Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding

Deep Learning to Classify Radiology Free-Text Reports.

Deep Learning: A Primer for Radiologists.

Progressive Growing of GANs for Improved Quality, Stability, and Variation

Squeeze-and-Excitation Networks

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

The Microsoft 2017 Conversational Speech Recognition System

Libratus: The Superhuman AI for No-Limit Poker

Focal Loss for Dense Object Detection

Learned in Translation: Contextualized Word Vectors

SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation

Device Placement Optimization with Reinforcement Learning

In-datacenter performance analysis of a tensor processing unit

Why Momentum Really Works

Neural Collaborative Filtering

Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks

Mask R-CNN

DeepFM: A Factorization-Machine based Neural Network for CTR Prediction

A Structured Self-attentive Sentence Embedding

Recurrent Recommender Networks

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Densely Connected Convolutional Networks

Enriching Word Vectors with Subword Information

Image Style Transfer Using Convolutional Neural Networks

Gaussian Error Linear Units (GELUs)

Natural Language Processing in Radiology: A Systematic Review.

A guide to convolution arithmetic for deep learning

Identity Mappings in Deep Residual Networks

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

Mastering the game of Go with deep neural networks and tree search

Natural Language Processing Technologies in Radiology Research and Clinical Applications.

Deep Residual Learning for Image Recognition

SSD: Single Shot MultiBox Detector

Rethinking the Inception Architecture for Computer Vision

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

The fallacy of placing confidence in confidence intervals

Neural Machine Translation of Rare Words with Subword Units

AutoRec: Autoencoders Meet Collaborative Filtering

Fast R-CNN

From Averaging to Acceleration, There is Only a Step-size

End-To-End Memory Networks

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Gunrock: a high-performance graph processing library on the GPU

Adam: A Method for Stochastic Optimization

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Fully convolutional networks for semantic segmentation