Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

natural-language-processing-5

Semantic Textual Similarity

3260 papers • 126 benchmarks • 313 datasets

Semantic textual similarity deals with determining how similar two pieces of texts are. This can take the form of assigning a score from 1 to 5. Related tasks are paraphrase or duplicate identification. Image source: Learning Semantic Textual Similarity from Conversations

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in semantic-textual-similarity-5

Trend

Dataset

Best Model

Actions

STS Benchmark

STS Benchmark

MRPC

MRPC

MTEB

MTEB

Libraries

i

Use these libraries to find semantic-textual-similarity-5 models and implementations

huggingface/transformers

9 papers 124,061

Datasets

GLUE

MRPC

SICK

SentEval

CARER

MTEB

Subtasks

Paraphrase Identification Cross-Lingual Semantic Textual Similarity

Most implemented papers

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Kenton Lee, Kristina Toutanova, Ming-Wei Chang•Mon Dec 31 2018

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

109344

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

STS13

STS13

SICK

SICK

STS12

STS12

STS14

STS14

STS15

STS15

STS16

STS16

SentEval

SentEval

CxC

CxC

SICK-R

SICK-R

MRPC Dev

MRPC Dev

facebookresearch/xformers

3 papers 7,455

facebookresearch/InferSent

3 papers 2,279

3 papers 2,195

facebookresearch/SentEval

3 papers 2,047

microsoft/MT-DNN

3 papers 153

labmlai/annotated_deep_learning_pap…

2 papers 47,025

PaddlePaddle/ERNIE

2 papers 6,179

faceonlive/ai-research

2 papers 99

alibaba-damo-academy/spokennlp

2 papers 82

sunyilgdx/prompts4keras

2 papers 24

STS Benchmark

STS Benchmark

EVALution

PIT

CxC

0

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Iryna Gurevych, Nils Reimers•Tue Aug 13 2019

Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity is presented.

16007 0

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Danqi Chen, Omer Levy, Luke Zettlemoyer, Veselin Stoyanov, M. Lewis, Myle Ott, Naman Goyal, Mandar Joshi, Jingfei Du, Yinhan Liu•Thu Jul 25 2019

It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

28280 0

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Kevin Gimpel, Zhenzhong Lan, Radu Soricut, Piyush Sharma, Sebastian Goodman, Mingda Chen•Wed Sep 25 2019

This work presents two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT, and uses a self-supervised loss that focuses on modeling inter-sentence coherence.

7198 0

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Noam Shazeer, Peter J. Liu, Sharan Narang, Yanqi Zhou, Adam Roberts, Colin Raffel, Katherine Lee, Michael Matena, Wei Li•Tue Oct 22 2019

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

24077 0

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Thomas Wolf, Victor Sanh, Julien Chaumond, Lysandre Debut•Tue Oct 01 2019

This work proposes a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can be fine-tuned with good performances on a wide range of tasks like its larger counterparts, and introduces a triple loss combining language modeling, distillation and cosine-distance losses.

9079 0

Universal Sentence Encoder

Daniel Matthew Cer, Yinfei Yang, C. Tar, Noah Constant, Nan Hua, Yun-Hsuan Sung, Sheng-yi Kong, Nicole Limtiaco, Rhomni St. John, Mario Guajardo-Cespedes, Steve Yuan, B. Strope, R. Kurzweil•Wed Mar 28 2018

It is found that transfer learning using sentence embeddings tends to outperform word level transfer with surprisingly good performance with minimal amounts of supervised training data for a transfer task.

2044 0

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Quoc V. Le, Yiming Yang, R. Salakhutdinov, Zhilin Yang, Zihang Dai, J. Carbonell•Tue Jun 18 2019

XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.

9150 0

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

Antoine Bordes, Alexis Conneau, Douwe Kiela, Holger Schwenk, Loïc Barrault•Thu May 04 2017

It is shown how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks.

2173 0

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Danqi Chen, Tianyu Gao, Xingcheng Yao•Sat Apr 17 2021

SimCSE is presented, a simple contrastive learning framework that greatly advances the state-of-the-art sentence embeddings and regularizes pre-trainedembeddings’ anisotropic space to be more uniform, and it better aligns positive pairs when supervised signals are available.

4107 0

Adding a benchmark result helps the community track progress.

Semantic Textual Similarity | State-of-the-Art