3260 papers • 126 benchmarks • 313 datasets
Entity resolution (also known as entity matching, record linkage, or duplicate detection) is the task of finding records that refer to the same real-world entity across different data sources (e.g., data files, books, websites, and databases). (Source: Wikipedia) Blocking is a crucial step in any entity resolution pipeline because a pair-wise comparison of all records across two data sources is infeasible. Blocking applies a computationally cheap method to generate a smaller set of candidate record pairs reducing the workload of the matcher. During matching a more expensive pair-wise matcher generates a final set of matching record pairs. Survey on blocking: Papadakis et al.: Blocking and Filtering Techniques for Entity Resolution: A Survey, 2020.
(Image credit: Papersgraph)
These leaderboards are used to track progress in blocking
Use these libraries to find blocking models and implementations
No subtasks available.
The new extremely lightweight portrait segmentation model SINet is introduced, containing an information blocking decoder and spatial squeeze modules, and it is demonstrated that the method can be used for general semantic segmentation on the Cityscapes dataset.
A compact and efficient network for seamless attenuation of different compression artifacts is formulated and it is demonstrated that a deeper model can be effectively trained with the features learned in a shallow network.
A principled model for scalable Bayesian ER, called “distributed Bayesian linkage” or d-blink, is proposed, which jointly performs blocking and ER without compromising posterior correctness.
It is shown that the likelihood objective itself is at fault, resulting in a model that assigns too much probability to sequences containing repeats and frequent words, unlike those from the human training distribution, thus providing a strong alternative to existing techniques.
A compact and efficient network for seamless attenuation of different compression artifacts and shows superior performance than the state-of-the-art methods both on benchmark datasets and a real-world use case.
The proposed UniBlocker is a dense blocker that is pre-trained on a domain-independent, easily-obtainable tabular corpus using self-supervised contrastive learning, which significantly outperforms previous self- and unsupervised dense blocking methods and is comparable and complementary to the state-of-the-art sparse blocking methods.
A new automated disambiguation solution exploiting more than one million crowdsourced annotations to learn an accurate classifier for identifying coreferring authors and to guide the clustering of scientific publications by distinct authors in a semi-supervised way is proposed.
Pcival proves that image-based perceptual ad blocking is an attractive complement to today's dominant approach of block lists, and demonstrates the feasibility of deploying traditionally heavy models (i.e. deep neural networks) inside the critical path of the rendering engine of a browser.
This work introduces several competitive multi-agent environments where agents compete in a 3D world with simulated physics and points out that such environments come with a natural curriculum, because for any skill level, an environment full of agents of this level will have the right level of difficulty.
Adding a benchmark result helps the community track progress.