natural-language-processing-10

Linguistic Acceptability

3260 papers • 126 benchmarks • 313 datasets

Linguistic Acceptability is the task of determining whether a sentence is grammatical or ungrammatical. Image Source: Warstadt et al

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in linguistic-acceptability-10

Trend

Dataset

Best Model

Actions

CoLA

RuCoLA

CoLA Dev

Libraries

i

Use these libraries to find linguistic-acceptability-10 models and implementations

huggingface/transformers

7 papers 112,684

Datasets

Subtasks

No subtasks available.

Most implemented papers

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Kenton Lee, Kristina Toutanova, Ming-Wei Chang•Mon Dec 31 2018

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

109344

Content

ItaCoLA

DaLAJ

Tencent/TurboTransformers

3 papers 1,385

awslabs/mlm-scoring

3 papers 316

epfml/collaborative-attention

3 papers 142

Karthik-Bhaskar/Context-Based-Quest…

3 papers 36

labmlai/annotated_deep_learning_pap…

2 papers 36,603

facebookresearch/xformers

2 papers 5,606

namisan/mt-dnn

2 papers 2,149

kaushaltrivedi/fast-bert

2 papers 1,815

utterworks/fast-bert

2 papers 1,815

IndicoDataSolutions/finetune

2 papers 688

google/seqio

2 papers 466

thu-keg/omnievent

2 papers 248

sdadas/polish-roberta

2 papers 78

wangcongcong123/ttt

2 papers 37

stefan-it/europeana-bert

2 papers 31

AdamStein97/Semi-Supervised-BERT-NER

2 papers 27

sunyilgdx/prompts4keras

2 papers 20

0

Paper Graph

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Danqi Chen, Omer Levy, Luke Zettlemoyer, Veselin Stoyanov, M. Lewis, Myle Ott, Naman Goyal, Mandar Joshi, Jingfei Du, Yinhan Liu•Thu Jul 25 2019

It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

28280 0

Paper Graph

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Kevin Gimpel, Zhenzhong Lan, Radu Soricut, Piyush Sharma, Sebastian Goodman, Mingda Chen•Wed Sep 25 2019

This work presents two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT, and uses a self-supervised loss that focuses on modeling inter-sentence coherence.

7198 0

Paper Graph

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Noam Shazeer, Peter J. Liu, Sharan Narang, Yanqi Zhou, Adam Roberts, Colin Raffel, Katherine Lee, Michael Matena, Wei Li•Tue Oct 22 2019

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

24077 0

Paper Graph

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Thomas Wolf, Victor Sanh, Julien Chaumond, Lysandre Debut•Tue Oct 01 2019

This work proposes a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can be fine-tuned with good performances on a wide range of tasks like its larger counterparts, and introduces a triple loss combining language modeling, distillation and cosine-distance losses.

9079 0

Paper Graph

FNet: Mixing Tokens with Fourier Transforms

J. Ainslie, J. Lee-Thorp, Ilya Eckstein, Santiago Ontañón•Sat May 08 2021

The FNet model is significantly faster: when compared to the “efficient Transformers” on the Long Range Arena benchmark, FNet matches the accuracy of the most accurate models, while outpacing the fastest models across all sequence lengths on GPUs (and across relatively shorter lengths on TPUs).

652 0

Paper Graph

Big Bird: Transformers for Longer Sequences

M. Zaheer, Chris Alberti, Philip Pham, J. Ainslie, Santiago Ontañón, Guru Guruganesh, Kumar Avinava Dubey, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed•Mon Jul 27 2020

It is shown that BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model.

2591 0

Paper Graph

Multi-Task Deep Neural Networks for Natural Language Understanding

Jianfeng Gao, Weizhu Chen, Xiaodong Liu, Pengcheng He•Wed Jan 30 2019

A Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks that allows domain adaptation with substantially fewer in-domain labels than the pre-trained BERT representations.

1332 0

Paper Graph

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

Michael Auli, Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu•Sun Feb 06 2022

Data2vec is a framework that uses the same learning method for either speech, NLP or computer vision to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture.

1012 0

Paper Graph

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Jianfeng Gao, Weizhu Chen, Xiaodong Liu, Pengcheng He•Thu Jun 04 2020

A new model architecture DeBERTa (Decoding-enhanced BERT with disentangled attention) is proposed that improves the BERT and RoBERTa models using two novel techniques that significantly improve the efficiency of model pre-training and performance of downstream tasks.

3483 0

Paper Graph

Adding a benchmark result helps the community track progress.