3260 papers • 126 benchmarks • 313 datasets
Entity resolution (also known as entity matching, record linkage, or duplicate detection) is the task of finding records that refer to the same real-world entity across different data sources (e.g., data files, books, websites, and databases). (Source: Wikipedia) Surveys on entity resolution: Christophides et al.: End-to-End Entity Resolution for Big Data: A Survey, 2020. Barlaug and Gulla: Neural Networks for Entity Matching: A Survey, 2021. The task of entity resolution is closely related to the task of entity alignment which focuses on matching entities between knowledge bases. The task of entity linking differs from entity resolution as entity linking focuses on identifying entity mentions in free text.
(Image credit: Papersgraph)
These leaderboards are used to track progress in entity-resolution-18
Use these libraries to find entity-resolution-18 models and implementations
A principled model for scalable Bayesian ER, called “distributed Bayesian linkage” or d-blink, is proposed, which jointly performs blocking and ER without compromising posterior correctness.
This paper introduces a deep learning method for candidate selection through toponym matching, using state-of-the-art neural network architectures and evaluates its performance in the context of geographical candidate selection in English and Spanish.
Abstract This article introduces a novel evaluation methodology for entity resolution algorithms. It is motivated by PatentsView.org, a public-use patent data exploration platform that disambiguates patent inventors using an entity resolution algorithm. We provide a data collection methodology and tailored performance estimators that account for sampling biases. Our approach is simple, practical, and principled—key characteristics that allow us to paint the first representative picture of PatentsView’s disambiguation performance. The results are used to inform PatentsView’s users of the reliability of the data and to allow the comparison of competing disambiguation algorithms.
It is demonstrated empirically that training the parser to directly generate EXR notation not only solves the problem of entity resolution in one fell swoop and overcomes a number of expressive limitations of TOP notation, but also results in significantly greater parsing accuracy.
The results show that DL does not outperform current solutions on structured EM, but it can significantly outperform them on textual and dirty EM, which suggests that practitioners should seriously consider using DL for textual anddirty EM problems.
This paper surveys metrics used to evaluate ER results in order to iteratively improve performance and guarantee sufficient quality prior to deployment, and provides practitioners the basic knowledge to begin evaluating their entity resolution results.
This work proposes an entity-centric data labeling methodology that integrates with a unified framework for monitoring summary statistics, estimating key performance metrics such as cluster and pairwise precision and recall, and analyzing root causes for errors.
Adding a benchmark result helps the community track progress.