1
RealTime QA: What's the Answer Right Now?
2
Emergent Abilities of Large Language Models
3
Can Foundation Models Help Us Achieve Perfect Secrecy?
4
Is a Question Decomposition Unit All We Need?
5
Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations
6
Can Foundation Models Wrangle Your Data?
7
Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning
8
Language Models in the Loop: Incorporating Prompting into Weak Supervision
9
OPT: Open Pre-trained Transformer Language Models
10
PaLM: Scaling Language Modeling with Pathways
11
Can language models learn from explanations in context?
12
Training Compute-Optimal Large Language Models
13
Self-Consistency Improves Chain of Thought Reasoning in Language Models
14
Training language models to follow instructions with human feedback
15
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
16
Chain of Thought Prompting Elicits Reasoning in Large Language Models
17
WebGPT: Browser-assisted question-answering with human feedback
18
Ethical and social risks of harm from Language Models
19
Training Verifiers to Solve Math Word Problems
20
Multitask Prompted Training Enables Zero-Shot Task Generalization
21
AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts
22
Reframing Instructional Prompts to GPTk’s Language
23
On the Opportunities and Risks of Foundation Models
24
Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right
25
GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow
26
Calibrate Before Use: Improving Few-Shot Performance of Language Models
27
What Makes Good In-Context Examples for GPT-3?
28
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
29
It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
30
Language Models are Few-Shot Learners
31
Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods
32
Scaling Laws for Neural Language Models
33
How Can We Know What Language Models Know?
34
Adversarial NLI: A New Benchmark for Natural Language Understanding
35
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
36
Natural Questions: A Benchmark for Question Answering Research
37
The CommitmentBank: Investigating projection in naturally occurring discourse
38
Snorkel: rapid training data creation with weak supervision
39
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
40
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
41
Learning Dependency Structures for Weak Supervision Models
42
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
43
ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension
44
Training Complex Models with Multi-Task Weak Supervision
45
WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations
46
Know What You Don’t Know: Unanswerable Questions for SQuAD
47
Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences
48
Snorkel: Rapid Training Data Creation with Weak Supervision
49
LSDSem 2017 Shared Task: The Story Cloze Test
50
Data Programming: Creating Large Training Sets, Quickly
51
Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering
52
Character-level Convolutional Networks for Text Classification
53
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
54
Semantic Parsing on Freebase from Question-Answer Pairs
55
The Winograd Schema Challenge
56
Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning
57
Robust principal component analysis?
58
ROUGE: A Package for Automatic Evaluation of Summaries
59
Benchmarking Generalization via In-Context Instructions on 1, 600+ Language Tasks
60
Bigscience large open-science open-access multilingual language model
61
Star: Self-taught reasoner bootstrapping reasoning with reasoning. arXiv:2203.14465v2, 2022
62
The EleutherAI models are trained on The Pile corpus Black et al
63
Gpt-j-6b: A 6 billion parameter autoregressive language model, 2021
64
All tasks are scored using matching accuracy except for DROP/RealTimeQA that use text f1, WebQ/NQ that use span overlap accuracy, and MultiRC that uses f1a accuracy
65
MultiRC Description: Multi-sentence reading comprehension
66
The algorithm uses P(x) and D to first learn the dependency structure Ĝ among prompts using the approach
67
εi is the error of pi on a labeled training set of 1000 examples, and η is a temperature hyperparameter, for which we perform a sweep over [0.25
68
Un in the 2014 film "The Interview", Minnesota governor Danny Chung in "Veep
69
born March 1, 1941) is an American poet. He served as Poet Laureate of the United States from
70
Context: The Beatles were an English rock band
71
born 7 April 1947 in The Bronx New York) was an original member of the American singing girl group the Chiffons