3260 papers • 126 benchmarks • 313 datasets
This is the problem of detecting duplicate questions in forums, which is an important step towards automating the process of answering new questions
(Image credit: Papersgraph)
These leaderboards are used to track progress in question-similarity-7
Use these libraries to find question-similarity-7 models and implementations
This work proposes a novel QA approach based on Recognizing Question Entailment (RQE) and describes the QA system and resources that are built and evaluated on real medical questions and finds that this approach exceeds the best results of the medical task with a 29.8% increase over the best official score.
A large scale analysis of multilingual hate speech in 9 languages from 16 different sources shows that in low resource setting, simple models such as LASER embedding with logistic regression performs the best, while in high resource setting BERT based models perform better.
A Support Vector Machine (SVM) based system that makes use of textual, domain-specific, word-embedding and topic-modeling features, and a novel method for dialogue chain identification in comment threads is proposed.
The method builds on a siamese CNN architecture which is extended by two attention mechanisms and achieves 7th place obtaining a MAP score of 86:24 points on the Question-Comment Similarity subtask.
A novel approach to learn representations for sentence-level semantic similarity using conversational data and achieves the best performance among all neural models on the STS Benchmark and is competitive with the state-of-the-art feature engineered and mixed systems for both tasks.
This paper proposes to inject structural representations in NNs by learning a model with Tree Kernels on relatively few pairs of questions as gold standard training data is typically scarce, and predicting labels on a very large corpus of question pairs.
A FAQ retrieval system that considers the similarity between a user's query and a question as well as the relevance between the query and an answer, and demonstrates that the proposed method outperforms baseline methods on these datasets.
This work addresses the problem of detecting duplicate questions in forums, and focuses on adversarial domain adaptation, deriving important findings about when it performs well and what properties of the domains are important in this regard.
SemEval–2017 Task 3 on Community Question Answering reran the four subtasks from SemEval-2016, providing all the data from 2015 and 2016 for training, and fresh data for testing, and added a new subtask E in order to enable experimentation with Multi-domain Question Duplicate Detection in a larger-scale scenario, using StackExchange subforums.
This paper systematically combines and compares the two approaches to automatically detect question-similarity, and analyzes the impact of preprocessing steps and word meaning similarity based on different distributions on the performance of the task.
Adding a benchmark result helps the community track progress.