This open-source book represents the attempt to make deep learning approachable, teaching readers the concepts, the context, and the code, seamlessly integrating exposition figures, math, and interactive examples with self-contained code.
Authors
Aston Zhang
5 papers
Zachary Chase Lipton
13 papers
Mu Li
2 papers
Alexander J. Smola
1 papers
References187 items
1
Linear Analysis
2
Text Style Transfer: A Review and Experiment Evaluation
3
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
4
ControlVAE: Controllable Variational Autoencoder
5
Deep Learning for Natural Language Processing in Radiology-Fundamentals and a Systematic Review.
6
Rise of Robot Radiologists
7
RoBERTa: A Robustly Optimized BERT Pretraining Approach
8
Assisting radiologists with reporting urgent findings to referring physicians: A machine learning approach to identify cases for prompt communication
9
Use of Machine Learning to Identify Follow-Up Recommendations in Radiology Reports
10
Lectures on Convex Optimization
11
A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation
12
Towards Understanding Regularization in Batch Normalization
13
Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes
14
Object Detection With Deep Learning: A Review
15
Troubling Trends in Machine Learning Scholarship
16
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks
17
Neural Network Acceptability Judgments
18
How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift)
19
Averaging Weights Leads to Wider Optima and Better Generalization
20
Sequence-Aware Recommender Systems
21
Bayesian Uncertainty Estimation for Batch Normalized Deep Networks
22
On the Convergence of Adam and Beyond
23
Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding
24
Deep Learning to Classify Radiology Free-Text Reports.
25
Deep Learning: A Primer for Radiologists.
26
Progressive Growing of GANs for Improved Quality, Stability, and Variation
27
Squeeze-and-Excitation Networks
28
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
29
The Microsoft 2017 Conversational Speech Recognition System
30
Libratus: The Superhuman AI for No-Limit Poker
31
Focal Loss for Dense Object Detection
32
Learned in Translation: Contextualized Word Vectors
An Algorithmic Framework for Performing Collaborative Filtering
110
Long Short-Term Memory
111
Bidirectional recurrent neural networks
112
Using collaborative filtering to weave an information tapestry
113
Q-learning
114
Backpropagation Through Time: What It Does and How to Do It
115
A Statistical Approach to Machine Translation
116
A Statistical Approach to Language Translation
117
Learning representations by back-propagating errors
118
A Computational Approach to Edge Detection
119
Introduction to Modern Information Retrieval
120
A vector space model for automatic indexing
121
Computer Architecture: A Quantitative Approach
122
Statistical Ensembles of Complex, Quaternion, and Real Matrices
123
On the Distribution of the Roots of Certain Symmetric Matrices
124
Outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability
125
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
126
Optimal Message Scheduling for Aggregation
127
Adaptive Methods for Nonconvex Optimization
128
GENERATIVE ADVERSARIAL NETS
129
Deep Learning
130
Proposal Scaling Distributed Machine Learning with System and Algorithm Co-design
131
Dropout: a simple way to prevent neural networks from overfitting
132
Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning
133
Imagenet classification: fast descriptor coding and large-scale svm training. Large scale visual recognition challenge
134
Introduction to linear Algebra
135
A Survey of Collaborative Filtering Techniques
136
The Matrix Cookbook
137
Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies
138
The generalized distributive law
139
Gradient-based learning applied to document recognition
140
A logical calculus of the ideas immanent in nervous activity
141
VLSI Array processors
142
Smooth regression analysis
143
Some methods of speeding up the convergence of iteration methods
144
On Estimating Regression
145
The principles of psychology
146
Current address: Microsoft Research,
147
MXNetʼs contrib.text package provides functions and classes related to natural language processing (see the GluonNLP 201 tool package for more details). Next
148
Then, use the trained model to classify the sentiments of two simple sentences. predict_sentiment(net, vocab, 'this movie is so great') 'positive' predict_sentiment(net, vocab
149
AB for two square matrices in R 1024×1024 . Which one is faster? d2l.build_colormap2label() Build a RGB color to label mapping for segmentation
150
• When the data set is large, we usually sample the context words and the noise words for the central target word in the current mini-batch only when updating the model parameters
151
Set sparse_grad=True when creating an instance of nn.Embedding. Does it accelerate training? Look up MXNet documentation to learn the meaning of this argument
152
• Try to find synonyms for other words
153
Below, we demonstrate the application of pre-trained word vectors, using GloVe as an example
154
• When using Adagrad, the learning rate of each element in the independent variable decreases (or remains unchanged) during iteration. before allreduce
155
What accuracy rate can you achieve on the training and testing data sets? What about trying to re-tune other hyper-parameters?
156
The concatenated vector is transformed into the output for each category through the fully connected layer. A dropout layer can be used in this step to deal with overfitting
Timer() metric = d2l.Accumulator(2) # loss_sum, num_tokens for i, batch in enumerate(data_iter): center, context_negative, mask, label = [ data.as_in_context(ctx) for data in batch
159
Summary • We can use Gluon to train a skip-gram model through negative sampling
160
V voc_label_indices() (in module d2l
161
net_D, net_G, loss, trainer_D) Update discriminator d2l.update_G(Z, net_D, net_G, loss, trainer_G) Update generator d2l.use_svg_display() Use the svg format to display plot in jupyter
162
MLPAttention (class in d2l), 664 P plot() (in module d2l
163
Tune the hyper-parameters and compare the two sentiment analysis methods, using recurrent neural networks and using convolutional neural networks, as regards accuracy and operational efficiency
164
Perform max-over-time pooling on all output channels, and then concatenate the pooling output values of these channels in a vector
165
We set the maximum context window size to 5. The following extracts all the central target words and their context words in the data set
166
We can apply pre-trained word vectors and recurrent neural networks to classify the emotions in a text
167
Adagrad constantly adjusts the learning rate during iteration to give each element in the independent variable of the objective function its own learning rate
168
• Scan the QR code to access the relevant discussions and exchange ideas about the methods used and the results obtained with the community
We can use a word to get its index in the dictionary, or we can get the word from its index
171
Clipping input data to the valid range for imshow with RGB data
172
• Tune the hyper-parameters and observe and analyze the experimental results
173
/data/VOCdevkit/VOC2012', is_train=True) Read all VOC feature and label images
174
Next, we start to train the model. First, we set the height and width of the content and style images to 150 by 225 pixels. We use the content image to initialize the composite image
175
Text classification transforms a sequence of text of indefinite length into a category of text
176
Assume the maximum context window is 2 and print all the central target words and their context words. tiny_dataset = [list(range(7)), list(range(7, 10))] print('dataset', tiny_dataset) for center
The general naming conventions for pre-trained GloVe models are "model.(data set.)number of words in data set.word vector dimension.txt". For more information
180
Now, we try to divide the 6 data instances equally between 2 GPUs using the split_and_load function
181
666 set_figsize() (in module d2l), 666 show_bboxes() (in module d2l), 666 show_images() (in module d2l
182
/data') Download the VOC2012 segmentation dataset
183
Load the pikachu dataset d2l.load_data_voc(batch_size, crop_size) Download and load the VOC2012 semantic dataset
184
# tokens per sentence') d2l.plt.ylabel('count') d2l.plt.legend
185
Finally, define the prediction function
186
Smola is VP/Distinguished Scientist for Machine Learning at Amazon Web Services
187
output image shape:', out_img.shape) d2l
Views
Field of Study
Computer ScienceMedicine
Journal Information
Name
Journal of the American College of Radiology : JACR