1
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
2
What Language Model to Train if You Have One Million GPU Hours?
3
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
4
Rationale-Augmented Ensembles in Language Models
5
Emergent Abilities of Large Language Models
6
Memory-Based Model Editing at Scale
7
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
8
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
9
DeepStruct: Pretraining of Language Models for Structure Prediction
10
OPT: Open Pre-trained Transformer Language Models
11
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
12
PaLM: Scaling Language Modeling with Pathways
13
Training Compute-Optimal Large Language Models
14
Compression of Generative Pre-trained Language Models via Quantization
15
DeepNet: Scaling Transformers to 1,000 Layers
16
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
17
Chain of Thought Prompting Elicits Reasoning in Large Language Models
18
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
19
LaMDA: Language Models for Dialog Applications
20
Counterfactual Memorization in Neural Language Models
21
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
22
Efficient Large Scale Language Modeling with Mixtures of Experts
23
Ethical and social risks of harm from Language Models
24
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
25
Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs
26
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
27
NormFormer: Improved Transformer Pretraining with Extra Normalization
28
Multitask Prompted Training Enables Zero-Shot Task Generalization
29
Learning Compact Metrics for MT
30
Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning
31
8-bit Optimizers via Block-wise Quantization
32
TruthfulQA: Measuring How Models Mimic Human Falsehoods
33
Finetuned Language Models Are Zero-Shot Learners
34
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
35
Harms of Gender Exclusivity and Challenges in Non-Binary Representation in Language Technologies
36
On the Opportunities and Risks of Foundation Models
37
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
38
FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark
39
Towards Understanding and Mitigating Social Biases in Language Models
40
CogView: Mastering Text-to-Image Generation via Transformers
41
Societal Biases in Language Generation: Progress and Challenges
42
PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
43
Carbon Emissions and Large Neural Network Training
44
RoFormer: Enhanced Transformer with Rotary Position Embedding
45
The Power of Scale for Parameter-Efficient Prompt Tuning
46
Editing Factual Knowledge in Language Models
47
An Empirical Study of Training Self-Supervised Vision Transformers
48
Detecting Hate Speech with GPT-3
50
GLM: General Language Model Pretraining with Autoregressive Blank Infilling
51
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜
52
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
53
Zero-Shot Text-to-Image Generation
54
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
55
Exploring Text-transformers in AAAI 2021 Shared Task: COVID-19 Fake News Detection in English
56
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
57
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
58
RealFormer: Transformer Likes Residual Attention
59
Modifying Memories in Transformer Models
60
Large Scale Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training
61
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models
62
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
63
Measuring Massive Multitask Language Understanding
64
DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters
65
Memory-Efficient Pipeline-Parallel DNN Training
66
Self-Supervised Learning: Generative or Contrastive
67
Language Models are Few-Shot Learners
68
MLSUM: The Multilingual Summarization Corpus
69
StereoSet: Measuring stereotypical bias in pretrained language models
70
CLUE: A Chinese Language Understanding Evaluation Benchmark
71
Pre-trained models for natural language processing: A survey
72
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
73
5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding
74
On Layer Normalization in the Transformer Architecture
75
How Much Knowledge Can You Pack into the Parameters of a Language Model?
76
Measurement and Fairness
77
PIQA: Reasoning about Physical Commonsense in Natural Language
78
Semantic Noise Matters for Neural Natural Language Generation
79
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
80
Quantifying the Carbon Emissions of Machine Learning
81
Q8BERT: Quantized 8Bit BERT
82
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
83
Reducing Transformer Depth on Demand with Structured Dropout
84
TinyBERT: Distilling BERT for Natural Language Understanding
85
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
86
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
87
Entity, Relation, and Event Extraction with Contextualized Span Representations
88
“Going on a vacation” takes longer than “Going for a walk”: A Study of Temporal Commonsense Understanding
89
Natural Questions: A Benchmark for Question Answering Research
91
Energy and Policy Considerations for Deep Learning in NLP
92
GLTR: Statistical Detection and Visualization of Generated Text
93
Unified Language Model Pre-training for Natural Language Understanding and Generation
94
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
95
Are Sixteen Heads Really Better than One?
96
Parameter-Efficient Transfer Learning for NLP
97
Transformer-XL: Attentive Language Models beyond a Fixed-Length Context
98
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
99
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
100
T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples
101
Gender Bias in Coreference Resolution
102
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
103
Generating Wikipedia by Summarizing Long Sequences
104
Decoupled Weight Decay Regularization
105
Mixed Precision Training
106
Position-aware Attention and Supervised Data Improve Slot Filling
107
Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly
108
Attention is All you Need
109
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
111
Gaussian Error Linear Units (GELUs)
112
The LAMBADA dataset: Word prediction requiring a broad discourse context
113
Semantic Parsing on Freebase from Question-Answer Pairs
114
Towards Robust Linguistic Analysis using OntoNotes
115
The Winograd Schema Challenge
116
Modeling Relations and Their Mentions without Labeled Text
117
Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling
118
ROUGE: A Package for Automatic Evaluation of Summaries
119
A Linear Programming Formulation for Global Inference in Natural Language Tasks
120
Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition
121
The GENIA corpus: an annotated research abstract corpus in molecular biology domain
122
A bridging model for parallel computation
123
P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks
124
Accelerated inference for large transformer models using nvidia triton inference server
125
leading to in-depth studies of LLMs’ theory, capacity, and flaws. Researchers can also modify the model architecture and weights, to validate the proposed algorithms to improve LLMs Zhu et al
126
Jurassic-1: Technical details and evaluation
127
Prefix-Tuning: Optimizing Continuous Prompts for Generation
128
WuDaoCorpora: A super large-scale Chinese corpora for pre-training language models
129
Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets
130
GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax
131
Also, it is known that LLMs can suffer from problems in fairness, bias, privacy, and truthful
132
2021) later observe that Pre-LN is still unable to handle the vulnerable training
133
Ethos: an online hate speech detection dataset
134
The 2020 Bilingual, Bi-Directional WebNLG+ Shared Task: Overview and Evaluation Results (WebNLG+ 2020)
135
2020) is a comprehensive language modeling benchmark that initially includes 22 different text datasets from diverse domains. We report our results over a part of 18 datasets
136
2020), or namely Crowdsourced Stereotype Pairs benchmark, is widely used for measuring biases for masked language models. It collects 1508 examples with nine different conventional biases and adopts
137
Language Models are Unsupervised Multitask Learners
139
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
140
transparency to all the researchers and promote the research to reduce the potential harm of LLMs, like algorithms to identify the synthetic text
141
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
142
Improving language understanding with unsupervised learning
143
From TreeBank to PropBank
144
• Predicate Recognition: given a segment of a sentence and its corresponding semantic role, identify which verb it is related to
145
It would not be possible to reach its current status if without the collaboration of multiple teams-the Knowledge Engineering Group (KEG)
146
According to d'Holbach, people always act according to
147
Semantic Role Labeling: the traditional task form, where a verb (i.e., predicate) is annotated in text, and the model is asked to generate related semantic roles
148
Semantic Role Filling: given a verb and a potential semantic role, the model is asked to judge whether the role exists in the sentence and generate it
149
A) not suitable for the young. (B) not suitable for the old. (C) important, but unpleasant. (D) none of the above
150
• Optimize A100 kernel's computing efficiency => A100 kernels prefer square-shaped inputs, and seq_len=2,048 is optimal for our hidden-state dimension
151
The timeline of major issues that training GLM-130B encountered and addressed, as of
152
Below is a prompted example with 1-shot priming. We predict the probability on
153
Technical Report 2022-10-06
154
GPT-3 was estimated to use 500 tons of carbon emissions footprint (CO2eq)