natural-language-processing-10

Memorization

3260 papers • 126 benchmarks • 313 datasets

This task has no description! Would you like to contribute one?

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in memorization-10

Trend

Dataset

Best Model

Actions

BIG-bench (Hindu Knowledge)

Libraries

i

Use these libraries to find memorization-10 models and implementations

faceonlive/ai-research

2 papers 131

Datasets

BIG-bench

DS-1000

PopQA

LM Email Address Leakage

Subtasks

No subtasks available.

Most implemented papers

mixup: Beyond Empirical Risk Minimization

Yann Dauphin, Hongyi Zhang, Moustapha Cissé, David Lopez-Paz•Tue Oct 24 2017

This work proposes mixup, a simple learning principle that trains a neural network on convex combinations of pairs of examples and their labels, which improves the generalization of state-of-the-art neural network architectures.

10965

Content

smilelab-fl/fednoisy

2 papers 15

0

Paper Graph

Wide & Deep Learning for Recommender Systems

Heng-Tze Cheng, L. Koc, Jeremiah Harmsen, T. Shaked, Tushar Chandra, H. Aradhye, Glen Anderson, G. Corrado, Wei Chai, M. Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, Hemal Shah•Thu Jun 23 2016

Wide & Deep learning is presented---jointly trained wide linear models and deep neural networks---to combine the benefits of memorization and generalization for recommender systems and is open-sourced in TensorFlow.

3896 0

Paper Graph

Co-teaching: Robust training of deep neural networks with extremely noisy labels

Weihua Hu, Masashi Sugiyama, Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, I. Tsang•Tue Apr 17 2018

Empirical results on noisy versions of MNIST, CIFar-10 and CIFAR-100 demonstrate that Co-teaching is much superior to the state-of-the-art methods in the robustness of trained deep models.

2357 0

Paper Graph

Neural Machine Translation in Linear Time

K. Simonyan, Nal Kalchbrenner, K. Kavukcuoglu, Alex Graves, L. Espeholt, Aäron van den Oord•Sun Oct 30 2016

The ByteNet decoder attains state-of-the-art performance on character-level language modelling and outperforms the previous best results obtained with recurrent networks and the latent alignment structure contained in the representations reflects the expected alignment between the tokens.

577 0

Paper Graph

Generalization through Memorization: Nearest Neighbor Language Models

Dan Jurafsky, Omer Levy, Luke Zettlemoyer, M. Lewis, Urvashi Khandelwal•Thu Oct 31 2019

It is suggested that learning similarity between sequences of text is easier than predicting the next word, and that nearest neighbor search is an effective approach for language modeling in the long tail.

987 0

Paper Graph

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Edward Raff, Aviya Skowron, Stella Biderman, Lintang Sutawika, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Oskar van der Wal•Sun Apr 02 2023

A suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters is introduced, demonstrating that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics.

1679 0

Paper Graph

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Yuri Burda, Harrison Edwards, Alethea Power, Igor Babuschkin, Vedant Misra•Wed Jan 05 2022

This paper argues that these datasets provide a fertile ground for studying a poorly understood aspect of deep learning: generalization of overparametrized neural networks beyond memorization of the finite training dataset.

513 0

Paper Graph

PaLM: Scaling Language Modeling with Pathways

Michele Catasta, J. Dean, Xavier García, Orhan Firat, Noam Shazeer, James Bradbury, Andrew M. Dai, Sharan Narang, Anselm Levskaya, S. Ghemawat, M. Isard, Barret Zoph, Daphne Ippolito, A. Chowdhery, Emily Reif, Adam Roberts, D. Eck, Jacob Devlin, Slav Petrov, Zongwei Zhou, Katherine Lee, Kensen Shi, Pengcheng Yin, Oleksandr Polozov, Ryan Sepassi, H. Michalewski, Jacob Austin, Maarten Bosma, David Dohan, Charles Sutton, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Denny Zhou, Jason Wei, Ben Hutchinson, Vedant Misra, Xuezhi Wang, R. Child, Gaurav Mishra, L. Fedus, Nan Du, P. Barham, Parker Schuh, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Vinodkumar Prabhakaran, Reiner Pope, Guy Gur-Ari, Toju Duke, Sunipa Dev, Kevin Robinson, D. Luan, Hyeontaek Lim, A. Spiridonov, Shivani Agrawal, Mark Omernick, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Brennan Saeta, Mark Díaz, K. Meier-Hellstern, Noah Fiedel•Mon Apr 04 2022

A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.

7593 0

Paper Graph

Associative Long Short-Term Memory

Nal Kalchbrenner, Alex Graves, Greg Wayne, Ivo Danihelka, Benigno Uria•Mon Feb 08 2016

This work investigates a new method to augment recurrent neural networks with extra memory without increasing the number of network parameters, which creates redundant copies of stored information, which enables retrieval with reduced noise.

190 0

Paper Graph

How does Disagreement Help Generalization against Label Corruption?

Masashi Sugiyama, Bo Han, Xingrui Yu, Gang Niu, I. Tsang, Jiangchao Yao•Sun Jan 13 2019

A robust learning paradigm called Co-teaching+, which bridges the "Update by Disagreement" strategy with the original Co-Teaching, which is much superior to many state-of-the-art methods in the robustness of trained models.

917 0

Paper Graph

Adding a benchmark result helps the community track progress.