3260 papers • 126 benchmarks • 313 datasets
Comparing two sentences and their relationship based on their internal representation.
(Image credit: Papersgraph)
These leaderboards are used to track progress in sentence-pair-modeling-2
No benchmarks available.
Use these libraries to find sentence-pair-modeling-2 models and implementations
No datasets available.
ZEN is proposed, a BERT-based Chinese text encoder enhanced by n-gram representations, where different combinations of characters are considered during training, thus potential word or phrase boundaries are explicitly pre-trained and fine-tuned with the character encoder (BERT).
The experiments show that subword models without any pretrained word embedding can achieve new state-of-the-art results on two social media datasets and competitive results on news data for paraphrase identification.
It is shown that encoding contextual information by LSTM and inter-sentence interactions are critical and the Enhanced Sequential Inference Model is the best so far for larger datasets, while the Pairwise Word Interaction Model achieves the best performance when less data is available.
This work presents a simple yet efficient data augmentation strategy called Augmented SBERT, where the cross-encoder is used to label a larger set of input pairs to augment the training data for the bi-encoding, and shows that, in this process, selecting the sentence pairs is non-trivial and crucial for the success of the method.
Distilled Sentence Embedding is introduced - a model that is based on knowledge distillation from cross-attentive models, focusing on sentence-pair tasks that significantly outperforms several ELMO variants and other sentence embedding methods, while accelerating computation of the query-candidate sentence-pairs similarities.
Adding a benchmark result helps the community track progress.