1
Solving Quantitative Reasoning Problems with Language Models
2
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
3
Designing Effective Sparse Expert Models
4
Scaling Up Models and Data with t5x and seqio
5
Training Compute-Optimal Large Language Models
6
Pathways: Asynchronous Distributed Dataflow for ML
7
Self-Consistency Improves Chain of Thought Reasoning in Language Models
8
Training language models to follow instructions with human feedback
9
Quantifying Memorization Across Neural Language Models
10
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
11
Deduplicating Training Data Mitigates Privacy Risks in Language Models
12
Competition-level code generation with AlphaCode
13
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
14
Chain of Thought Prompting Elicits Reasoning in Large Language Models
15
Reasoning Like Program Executors
16
LaMDA: Language Models for Dialog Applications
17
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
18
Improving language models by retrieving from trillions of tokens
19
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
20
Ethical and social risks of harm from Language Models
21
Show Your Work: Scratchpads for Intermediate Computation with Language Models
22
AI and the Everything in the Whole Wide World Benchmark
23
Training Verifiers to Solve Math Word Problems
24
Multitask Prompted Training Enables Zero-Shot Task Generalization
25
Learning Compact Metrics for MT
26
Challenges in Detoxifying Language Models
27
Finetuned Language Models Are Zero-Shot Learners
28
MWPToolkit: An Open-Source Framework for Deep Learning-Based Math Word Problem Solvers
29
Harms of Gender Exclusivity and Challenges in Non-Binary Representation in Language Technologies
30
Program Synthesis with Large Language Models
31
On the Opportunities and Risks of Foundation Models
32
Deduplicating Training Data Makes Language Models Better
33
Evaluating Large Language Models Trained on Code
34
Break-It-Fix-It: Unsupervised Learning for Program Repair
35
Measuring and Improving BERT’s Mathematical Abilities by Predicting the Order of Reasoning.
36
ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models
37
GSPMD: General and Scalable Parallelization for ML Computation Graphs
38
Societal Biases in Language Generation: Progress and Challenges
39
PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
40
Carbon Emissions and Large Neural Network Training
41
RoFormer: Enhanced Transformer with Rotary Position Embedding
42
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
43
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep learning
44
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
45
Are NLP Models really able to Solve Simple Math Word Problems?
46
Designing Disaggregated Evaluations of AI Systems: Choices, Considerations, and Tradeoffs
47
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜
48
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
49
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
50
Re-imagining Algorithmic Fairness in India and Beyond
51
ZeRO-Offload: Democratizing Billion-Scale Model Training
52
Persistent Anti-Muslim Bias in Large Language Models
53
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
54
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
55
HateCheck: Functional Tests for Hate Speech Detection Models
56
mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer
57
Beyond English-Centric Multilingual Machine Translation
58
Complete Multilingual Neural Machine Translation
59
WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization
60
Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information
61
Rethinking Attention with Performers
62
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
63
Measuring Massive Multitask Language Understanding
64
Big Bird: Transformers for Longer Sequences
65
DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters
66
You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion
67
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
68
A domain-specific supercomputer for training deep neural networks
69
Memory-Efficient Pipeline-Parallel DNN Training
70
Scalable Cross Lingual Pivots to Model Pronoun Gender for Translation
71
Language Models are Few-Shot Learners
72
Language (Technology) is Power: A Critical Survey of “Bias” in NLP
73
Graph-based, Self-Supervised Program Repair from Diagnostic Feedback
74
Social Biases in NLP Models as Barriers for Persons with Disabilities
75
MLSUM: The Multilingual Summarization Corpus
76
Shortcut learning in deep neural networks
77
Efficient Content-Based Sparse Attention with Routing Transformers
78
TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages
79
5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding
80
GLU Variants Improve Transformer
81
REALM: Retrieval-Augmented Language Model Pre-Training
82
Towards a Human-like Open-Domain Chatbot
83
Scaling Laws for Neural Language Models
84
Reformer: The Efficient Transformer
85
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
86
Measurement and Fairness
87
PIQA: Reasoning about Physical Commonsense in Natural Language
88
Microsoft Research Asia’s Systems for WMT19
89
Fast Transformer Decoding: One Write-Head is All You Need
90
Semantic Noise Matters for Neural Natural Language Generation
91
Adversarial NLI: A New Benchmark for Natural Language Understanding
92
Toward Gender-Inclusive Coreference Resolution
93
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
94
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
95
Neural Generation for Czech: Data and Baselines
96
ZeRO: Memory Optimization Towards Training A Trillion Parameter Models
97
Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation
98
On Measuring and Mitigating Biased Inferences of Word Embeddings
99
Natural Questions: A Benchmark for Question Answering Research
100
Quantifying Social Biases in Contextual Word Representations
101
Neural Machine Translation for English–Kazakh with Morphological Segmentation and Synthetic Data
103
The Risk of Racial Bias in Hate Speech Detection
104
Tagged Back-Translation
105
SPoC: Search-based Pseudocode to Code
106
Evaluating Gender Bias in Machine Translation
107
MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms
108
MASS: Masked Sequence to Sequence Pre-training for Language Generation
109
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
110
HellaSwag: Can a Machine Really Finish Your Sentence?
111
Generating Long Sequences with Sparse Transformers
112
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
113
The State of Sparsity in Deep Neural Networks
114
Measuring and Mitigating Unintended Bias in Text Classification
115
The adverse effects of code duplication in machine learning models of code
116
An Empirical Model of Large-Batch Training
117
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
118
Mesh-TensorFlow: Deep Learning for Supercomputers
119
Model Cards for Model Reporting
120
Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization
121
CoQA: A Conversational Question Answering Challenge
122
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
123
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
124
Know What You Don’t Know: Unanswerable Questions for SQuAD
125
Gender Bias in Coreference Resolution
126
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
127
Datasheets for datasets
128
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
129
Don't Decay the Learning Rate, Increase the Batch Size
130
DéjàVu: a map of code duplicates on GitHub
131
A Survey of Machine Learning for Big Code and Naturalness
132
Creating Training Corpora for NLG Micro-Planners
133
The E2E Dataset: New Challenges For End-to-End Generation
134
Attention is All you Need
135
Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems
136
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
137
RACE: Large-scale ReAding Comprehension Dataset From Examinations
138
DeepFix: Fixing Common C Language Errors by Deep Learning
139
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
140
The LAMBADA dataset: Word prediction requiring a broad discourse context
141
MAWPS: A Math Word Problem Repository
142
A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories
143
Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network
144
An analysis of patch plausibility and correctness for generate-and-validate patch generation systems
145
Adam: A Method for Stochastic Optimization
146
Semantic Parsing on Freebase from Question-Answer Pairs
147
Understanding the exploding gradient problem
148
The Winograd Schema Challenge
149
ROUGE: A Package for Automatic Evaluation of Summaries
150
Sustainability at Google.Carbon neutral since 2007.Carbon free by 2030., 2022
151
Structure-to-Text Generation with Self-Training, Acceptability Classifiers and Context-Conditioning for the GEM Shared Task
152
What do Bias Measures Measure?
153
Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets
154
An Empirical Cybersecurity Evaluation of GitHub Copilot's Code Contributions
155
GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https: //github.com/kingoflolz/mesh-transformer-jax
156
Jurassic-1: Technical details and evaluation
157
The 2020 Bilingual, Bi-Directional WebNLG+ Shared Task: Overview and Evaluation Results (WebNLG+ 2020)
158
Pytorch distributed: Experiences on accelerating data parallel training
159
anti-semitism Table 24: TF-IDF tokens co-occurring most frequently with religions (identities chosen to match the analysis
160
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
161
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
162
Multi-Agent Dual Learning
163
The NiuTrans Machine Translation Systems for WMT19
164
JAX: Composable transformations of Python+NumPy programs, 2018
165
Improving Language Understanding by Generative Pre-Training
166
Understanding back-translation at scale
167
NLTK: The Natural Language Toolkit
168
The Invisible Whiteness of Being: Whiteness, White Supremacy, White Privilege, and Racism.
169
Unsupervised translation of programming languages. CoRR, abs
170
Zongwei Zhou Model serving (API, use cases and efficiency
171
Aakanksha Chowdhery Translation tasks (few-shot evaluation): Xavier Garcia, Orhan Firat Multilingual Natural Language Generation (few-shot evaluation and finetuning
172
of the Association for Computational Linguistics
173
Google cloud infotype detector