What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

Published in

Conference on Empirical Methods in Natural Lang...(2021)

External Links:

Generate Graph DownloadPDF

TL;DR

The possibility of materializing the No Code AI paradigm by providing AI prototyping capabilities to non-experts of ML by introducing HyperCLOVA studio, an interactive prompt engineering interface is discussed and the performance benefits of prompt-based learning are shown and how it can be integrated into the prompt engineering pipeline.

Abstract

GPT-3 shows remarkable in-context learning ability of large-scale language models (LMs) trained on hundreds of billion scale data. Here we address some remaining issues less reported by the GPT-3 paper, such as a non-English LM, the performances of different sized models, and the effect of recently introduced prompt optimization on in-context learning. To achieve this, we introduce HyperCLOVA, a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens. Enhanced by our Korean-specific tokenization, HyperCLOVA with our training configuration shows state-of-the-art in-context zero-shot and few-shot learning performances on various downstream tasks in Korean. Also, we show the performance benefits of prompt-based learning and demonstrate how it can be integrated into the prompt engineering pipeline. Then we discuss the possibility of materializing the No Code AI paradigm by providing AI prototyping capabilities to non-experts of ML by introducing HyperCLOVA studio, an interactive prompt engineering interface. Lastly, we demonstrate the potential of our methods with three successful in-house applications.

What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

Published in

Conference on Empirical Methods in Natural Lang...(2021)

External Links:

Generate Graph DownloadPDF

TL;DR

Abstract

Authors

Kang Min Yoo

6 papers

Minsuk Chang

3 papers

Jung-Woo Ha

11 papers

Sang-Woo Lee

6 papers

Donghyun Kwak

2 papers

Dong-hyun Ham

2 papers

Soyoung Kang

2 papers

W. Park

2 papers

Sunghyun Park

3 papers

Boseop Kim

1 papers

Hyoungseok Kim

1 papers

Gichang Lee

1 papers

D. Jeon

1 papers

Sungju Kim

1 papers

Seonhoon Kim

1 papers

D. Seo

1 papers

Heungsub Lee

1 papers

Minyoung Jeong

1 papers

Sungjae Lee

1 papers

Minsub Kim

1 papers

SukHyun Ko

1 papers

Seokhun Kim

1 papers

Taeyong Park

1 papers

Nahyeon Ryu

1 papers

Soobin Suh

1 papers

Sookyo In

1 papers

Jinseong Park

1 papers

Kyungduk Kim

1 papers

Hiun Kim

1 papers

Jisu Jeong

1 papers

Y. Yeo

1 papers

Dongju Park

1 papers

Min Young Lee

1 papers

Inho Kang

1 papers

Nako Sung

1 papers

References56 items

KLUE: Korean Language Understanding Evaluation

PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

Carbon Emissions and Large Neural Network Training

The Power of Scale for Parameter-Efficient Prompt Tuning

GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation

Generating Datasets with Pretrained Language Models

Retrieval Augmentation Reduces Hallucination in Conversation

Rainbow Memory: Continual Learning with a Memory of Diverse Samples

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜

A Survey on Bias in Deep NLP

Calibrate Before Use: Improving Few-Shot Performance of Language Models

Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm

Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models

What Makes Good In-Context Examples for GPT-3?

Persistent Anti-Muslim Bias in Large Language Models

WARP: Word-level Adversarial ReProgramming

Do Neural Language Models Overcome Reporting Bias?

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks

Controlling Style in Generated Dialogue

It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners

Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering

Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics

Language Models are Few-Shot Learners

Recipes for Building an Open-Domain Chatbot

PhoBERT: Pre-trained language models for Vietnamese

Towards a Human-like Open-Domain Chatbot

Scaling Laws for Neural Language Models

RobBERT: a Dutch RoBERTa-based Language Model

CamemBERT: a Tasty French Language Model

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

KorQuAD1.0: Korean QA Dataset for Machine Reading Comprehension

CTRL: A Conditional Transformer Language Model for Controllable Generation

Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring

Multi-Modal Generative Adversarial Network for Short Product Title Generation in Mobile E-Commerce

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

Decoupled Weight Decay Regularization

SQuAD: 100,000+ Questions for Machine Comprehension of Text

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

Prefix-Tuning: Optimizing Continuous Prompts for Generation

PADA: A Prompt-based Autoregressive Approach for Adaptation to Unseen Domains

McMillanMajor, and Shmargaret Shmitchell

Keywords: slippers, indoor slippers, ofﬁce slippers, high heels, spring new arrival shoes, spring shoes, women’s slippers, female slippers, women’s high heels, female high heels

Knowledge distillation for lightweight roberta of korean

AutoPrompt: Eliciting knowledge from language models with automatically generated prompts

Korean, English, Japanese, and other languages, respectively. A.2 Data Cleaning In a similar way to the work

Keywords: imported bowl, vintage bowl, enamel pot, spoon and chopsticks set, strong cup, retro pot 날

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Language Models are Unsupervised Multitask Learners

Chatbot personalities matters

Title: Spring women’s ofﬁce slippers high heels SALE

We introduce HyperCLOVA, a large-scale Korean in-context learning-based LM with nearly 100B parameters, by constructing a large Korean-centric corpus of 560B tokens

TL;DR

Abstract

TL;DR

Abstract

Authors

References56 items

KLUE: Korean Language Understanding Evaluation

PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

Carbon Emissions and Large Neural Network Training

The Power of Scale for Parameter-Efficient Prompt Tuning

GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation

Generating Datasets with Pretrained Language Models

Retrieval Augmentation Reduces Hallucination in Conversation

Rainbow Memory: Continual Learning with a Memory of Diverse Samples

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜

A Survey on Bias in Deep NLP

Calibrate Before Use: Improving Few-Shot Performance of Language Models

Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm

Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models

What Makes Good In-Context Examples for GPT-3?

Persistent Anti-Muslim Bias in Large Language Models

WARP: Word-level Adversarial ReProgramming

Do Neural Language Models Overcome Reporting Bias?

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks

Controlling Style in Generated Dialogue

It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners

Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering

Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics

Language Models are Few-Shot Learners

Recipes for Building an Open-Domain Chatbot

PhoBERT: Pre-trained language models for Vietnamese

Towards a Human-like Open-Domain Chatbot

Scaling Laws for Neural Language Models

RobBERT: a Dutch RoBERTa-based Language Model

CamemBERT: a Tasty French Language Model

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

KorQuAD1.0: Korean QA Dataset for Machine Reading Comprehension

CTRL: A Conditional Transformer Language Model for Controllable Generation

Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring

Multi-Modal Generative Adversarial Network for Short Product Title Generation in Mobile E-Commerce

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

Decoupled Weight Decay Regularization

SQuAD: 100,000+ Questions for Machine Comprehension of Text

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

Prefix-Tuning: Optimizing Continuous Prompts for Generation

PADA: A Prompt-based Autoregressive Approach for Adaptation to Unseen Domains

McMillanMajor, and Shmargaret Shmitchell

Keywords: slippers, indoor slippers, ofﬁce slippers, high heels, spring new arrival shoes, spring shoes, women’s slippers, female slippers, women’s high heels, female high heels

Knowledge distillation for lightweight roberta of korean

AutoPrompt: Eliciting knowledge from language models with automatically generated prompts

Korean, English, Japanese, and other languages, respectively. A.2 Data Cleaning In a similar way to the work

Keywords: imported bowl, vintage bowl, enamel pot, spoon and chopsticks set, strong cup, retro pot 날

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Language Models are Unsupervised Multitask Learners

Chatbot personalities matters

Title: Spring women’s ofﬁce slippers high heels SALE

We introduce HyperCLOVA, a large-scale Korean in-context learning-based LM with nearly 100B parameters, by constructing a large Korean-centric corpus of 560B tokens

Title: White Day Couple Jewelry Sale

Title: Kitchenware overseas direct purchase discount exhibition

We discover the effect of language-speciﬁc tokenization on large-scale in-context LMs for training corpus of non-English languages