3260 papers • 126 benchmarks • 313 datasets
See Weight Decay. $L_{2}$ Regularization or Weight Decay, is a regularization technique applied to the weights of a neural network. We minimize a loss function compromising both the primary loss function and a penalty on the $L_{2}$ Norm of the weights: $$L_{new}\left(w\right) = L_{original}\left(w\right) + \lambda{w^{T}w}$$ where $\lambda$ is a value determining the strength of the penalty (encouraging smaller weights). Weight decay can be incorporated directly into the weight update rule, rather than just implicitly by defining it through to objective function. Often weight decay refers to the implementation where we specify it directly in the weight update rule (whereas L2 regularization is usually the implementation which is specified in the objective function).
(Image credit: Papersgraph)
These leaderboards are used to track progress in l2-regularization-8
No benchmarks available.
Use these libraries to find l2-regularization-8 models and implementations
No datasets available.
No subtasks available.
The results provide an understanding of the relative difficulty of the scenarios and that simple baselines (Adagrad, L2 regularization, and naive rehearsal strategies) can surprisingly achieve similar performance to current mainstream methods.
This work developed convolutional neural networks for a facial expression recognition task and employed a hybrid feature strategy by which a novel CNN model was trained with the combination of raw pixel data and Histogram of Oriented Gradients (HOG) features.
This study investigates how weight decay affects the update behavior of individual neurons in deep neural networks through a combination of applied analysis and experimentation, offering a new simple perspective on training that elucidates the efficacy of widely used but poorly understood methods in deep learning.
It is shown that the emergence of ICL during transformer training is, in fact, often transient, and it is found that L2 regularization may offer a path to more persistent ICL that removes the need for early stopping based on ICL-style validation tasks.
This paper identifies a problem with the usual procedure for L2-regularization parameter estimation in a domain adaptation setting and concludes with an empirical analysis of the effect of several importance weight estimators on the estimation of the regularization parameter.
A novel online dictionary-learning (sparse-coding) framework which incorporates the addition and deletion of hidden units (dictionary elements), and is inspired by the adult neurogenesis phenomenon in the dentate gyrus of the hippocampus, known to be associated with improved cognitive function and adaptation to new environments.
It is shown that deeper convolutional architectures improve generalization, as do methods traditionally found in supervised learning, including L2 regularization, dropout, data augmentation and batch normalization.
A smooth kernel regularizer is proposed that encourages spatial correlations in convolution kernel weights and can help constrain models for visual recognition, improving over an L2 regularization baseline.
Adding a benchmark result helps the community track progress.