3260 papers • 126 benchmarks • 313 datasets
Knowledge base population is the task of filling the incomplete elements of a given knowledge base by automatically processing a large corpus of text.
(Image credit: Papersgraph)
These leaderboards are used to track progress in knowledge-base-population-22
Use these libraries to find knowledge-base-population-22 models and implementations
No subtasks available.
An extensive evaluation to compare popular KB inference models across popular datasets in the literature and proposes an extension to MF models so that they can better handle out-of-vocabulary (OOV) entity pairs, and develops a novel combination of TF and MF models.
An effective new model is proposed, which combines an LSTM sequence model with a form of entity position-aware attention that is better suited to relation extraction that builds TACRED, a large supervised relation extraction dataset obtained via crowdsourcing and targeted towards TAC KBP relations.
This work investigates an information extraction approach for grounding criteria from trials in ClinicalTrials(dot)gov to a shared knowledge base, and concludes that this system is competitive with Criteria2Query, which it is view as the current state-of-the-art in criteria extraction.
Several strategies are described to improve the retriever and the generator of RAG in order to make it a better slot filler, which reached the top-1 position on the KILT leaderboard on both T-REx and zsRE dataset with a large margin.
Reasoning over commonsense knowledge bases (CSKB) whose elements are in the form of free-text is an important yet hard task in NLP. While CSKB completion only fills the missing links within the domain of the CSKB, CSKB population is alternatively proposed with the goal of reasoning unseen assertions from external resources. In this task, CSKBs are grounded to a large-scale eventuality (activity, state, and event) graph to discriminate whether novel triples from the eventuality graph are plausible or not. However, existing evaluations on the population task are either not accurate (automatic evaluation with randomly sampled negative examples) or of small scale (human annotation). In this paper, we benchmark the CSKB population task with a new large-scale dataset by first aligning four popular CSKBs, and then presenting a high-quality human-annotated evaluation set to probe neural models’ commonsense reasoning ability. We also propose a novel inductive commonsense reasoning model that reasons over graphs. Experimental results show that generalizing commonsense reasoning on unseen assertions is inherently a hard task. Models achieving high accuracy during training perform poorly on the evaluation set, with a large gap between human performance. We will make the data publicly available for future contributions. Codes and data are available at https://github.com/HKUST-KnowComp/CSKB-Population.
TIMEN is a community-driven tool for temporal expression normalisation derived from current best approaches and is an independent tool, enabling easy integration in existing systems and inviting the IE community to contribute to a knowledge base in order to solve the temporal expressionnormalisation problem.
This work investigates knowledge-guided linguistic rewrites as a secondary source of evidence and finds that they can vastly improve the quality of inference rule corpora, obtaining 27 to 33 point precision improvement while retaining substantial recall.
TweedTIME, a temporal tagger for recognizing and normalizing time expressions in Twitter, is described, a minimally supervised method that learns from large quantities of unlabeled data and requires no hand-engineered rules or hand-annotated training corpora.
The SemEval task of extracting keyphrases and relations between them from scientific documents, which is crucial for understanding which publications describe which processes, tasks and materials, is described.
Adding a benchmark result helps the community track progress.