1
Large Language Models are few(1)-shot Table Reasoners
2
Language Models are Multilingual Chain-of-Thought Reasoners
3
Binding Language Models in Symbolic Languages
4
Compositional Semantic Parsing with Large Language Models
5
Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango
6
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
7
Language Models (Mostly) Know What They Know
8
Emergent Abilities of Large Language Models
9
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
10
Large Language Models are Zero-Shot Reasoners
11
Prompt-and-Rerank: A Method for Zero-Shot and Few-Shot Arbitrary Textual Style Transfer with Small Language Models
12
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
13
Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning
14
AmbiPun: Generating Humorous Puns with Ambiguous Context
15
PaLM: Scaling Language Modeling with Pathways
16
Can language models learn from explanations in context?
17
Training Compute-Optimal Large Language Models
18
Self-Consistency Improves Chain of Thought Reasoning in Language Models
19
Training language models to follow instructions with human feedback
20
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
21
Predictability and Surprise in Large Generative Models
22
Competition-level code generation with AlphaCode
23
Chain of Thought Prompting Elicits Reasoning in Large Language Models
24
Reframing Human-AI Collaboration for Generating Free-Text Explanations
25
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
26
Show Your Work: Scratchpads for Intermediate Computation with Language Models
27
Few-Shot Self-Rationalization with Natural Language Prompts
28
An Explanation of In-context Learning as Implicit Bayesian Inference
29
MetaICL: Learning to Learn In Context
30
Multitask Prompted Training Enables Zero-Shot Task Generalization
31
Language Models are Few-shot Multilingual Learners
32
Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color
33
A Recipe for Arbitrary Text Style Transfer with Large Language Models
34
Finetuned Language Models Are Zero-Shot Learners
35
Do Prompt-Based Models Really Understand the Meaning of Their Prompts?
36
Program Synthesis with Large Language Models
37
Evaluating Large Language Models Trained on Code
38
True Few-Shot Learning with Language Models
39
Cross-Task Generalization via Natural Language Crowdsourcing Instructions
40
The Power of Scale for Parameter-Efficient Prompt Tuning
41
Calibrate Before Use: Improving Few-Shot Performance of Language Models
42
When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data
43
Scaling Laws for Transfer
44
Towards Interpretable Natural Language Understanding with Explanations as Latent Variables
45
Language Models are Few-Shot Learners
46
Scaling Laws for Neural Language Models
47
Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference
48
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
49
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
50
On the Advance of Making Language Models Better Reasoners
51
Mapping Language Models to Grounded Conceptual Spaces
52
On the Machine Learning of Ethical Judgments from Natural Language
53
Natural Language Inference with a Human Touch: Using Human Explanations to Guide Model Attention
54
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
55
Amongst all the options, the only movie similar to these ones seems to be The Princess Bride (1987)
56
crowdworkers) or research with human participants?
57
Did you report the full text of instructions given to participants, including e.g., screenshots, disclaimers of any risks to participants or annotators
58
for preprocessing, for normalization, or for evaluation), did you report the implementation, model, and parameter settings used (e.g., NLTK, Spacy, ROUGE, etc
59
error bars around results, summary statistics from sets of experiments), and is it transparent whether you are reporting the max, mean, etc
60
crowdsourcing platform, students) and paid participants, and discuss if such payment is adequate given the participants' demographic
61
Kristian tells the truth
63
Was the data collection protocol approved (or determined exempt) by an ethics review board? No response
64
Did you report the basic demographic and geographic characteristics of the annotator population that is the source of the data? No response
65
Fidel says Maybelle lies
66
CoT Prompt for Snarks Determine which of two sentences is sarcastic
67
tasks: cause and effect, word unscrambling, movie dialog same or different, moral permissibility, fake text, discourse marker prediction, checkmate in one, mnist ascii, ascii word