3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in memorization-10
Use these libraries to find memorization-10 models and implementations
No subtasks available.
This work proposes mixup, a simple learning principle that trains a neural network on convex combinations of pairs of examples and their labels, which improves the generalization of state-of-the-art neural network architectures.
Wide & Deep learning is presented---jointly trained wide linear models and deep neural networks---to combine the benefits of memorization and generalization for recommender systems and is open-sourced in TensorFlow.
Empirical results on noisy versions of MNIST, CIFar-10 and CIFAR-100 demonstrate that Co-teaching is much superior to the state-of-the-art methods in the robustness of trained deep models.
The ByteNet decoder attains state-of-the-art performance on character-level language modelling and outperforms the previous best results obtained with recurrent networks and the latent alignment structure contained in the representations reflects the expected alignment between the tokens.
It is suggested that learning similarity between sequences of text is easier than predicting the next word, and that nearest neighbor search is an effective approach for language modeling in the long tail.
A suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters is introduced, demonstrating that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics.
This paper argues that these datasets provide a fertile ground for studying a poorly understood aspect of deep learning: generalization of overparametrized neural networks beyond memorization of the finite training dataset.
A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.
This work investigates a new method to augment recurrent neural networks with extra memory without increasing the number of network parameters, which creates redundant copies of stored information, which enables retrieval with reduced noise.
A robust learning paradigm called Co-teaching+, which bridges the "Update by Disagreement" strategy with the original Co-Teaching, which is much superior to many state-of-the-art methods in the robustness of trained models.
Adding a benchmark result helps the community track progress.