3260 papers • 126 benchmarks • 313 datasets
The goal of Paraphrase Identification is to determine whether a pair of sentences have the same meaning. Source: Adversarial Examples with Difficult Common Words for Paraphrase Identification Image source: On Paraphrase Identification Corpora
(Image credit: Papersgraph)
These leaderboards are used to track progress in paraphrase-identification-8
Use these libraries to find paraphrase-identification-8 models and implementations
No subtasks available.
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.
The FNet model is significantly faster: when compared to the “efficient Transformers” on the Long Range Arena benchmark, FNet matches the accuracy of the most accurate models, while outpacing the fastest models across all sequence lengths on GPUs (and across relatively shorter lengths on TPUs).
This work proposes a bilateral multi-perspective matching (BiMPM) model under the "matching-aggregation" framework that achieves the state-of-the-art performance on all tasks.
Data2vec is a framework that uses the same learning method for either speech, NLP or computer vision to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture.
This work presents a general Attention Based Convolutional Neural Network (ABCNN) for modeling a pair of sentences and proposes three attention schemes that integrate mutual influence between sentences into CNNs; thus, the representation of each sentence takes into consideration its counterpart.
A Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks that allows domain adaptation with substantially fewer in-domain labels than the pre-trained BERT representations.
A new learning framework for robust and efficient fine-tuning for pre-trained models to attain better generalization performance and outperforms the state-of-the-art T5 model, which is the largest pre- trained model containing 11 billion parameters, on GLUE.
The approach extends BERT by masking contiguous random spans, rather than random tokens, and training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it.
A novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models is proposed and, by leveraging this new KD method, the plenty of knowledge encoded in a large “teacher” BERT can be effectively transferred to a small “student” TinyBERT.
Adding a benchmark result helps the community track progress.