3260 papers • 126 benchmarks • 313 datasets
Keyword extraction is tasked with the automatic identification of terms that best describe the subject of a document (Source: Wikipedia).
(Image credit: Papersgraph)
These leaderboards are used to track progress in keyword-extraction-4
Use these libraries to find keyword-extraction-4 models and implementations
No subtasks available.
A parameterless method for constructing graph of text that captures the contextual relation between words, and a novel word scoring method based on the connection between concepts that are individually superior to those followed by the sate-of-the-art graph-based keyword extraction algorithms.
It is hypothesized that keywords are more likely to be found among influential nodes of a graph-of-words rather than among its nodes high on eigenvector -related centrality measures.
A fully unsupervised, extractive text summarization system that leverages a submodularity framework that allows summaries to be generated in a greedy way while preserving near-optimal performance guarantees is presented.
YAKE!, a light-weight unsupervised automatic keyword extraction method which rests on statistical text features extracted from single documents to select the most relevant keywords of a text, is described.
Corpus2graph is an open-source NLP-application-oriented tool that generates a word co-occurrence network from a large corpus that not only contains different built-in methods to preprocess words, analyze sentences, extract word pairs and define edge weights, but also supports user-customized functions.
This work explores how load centrality, a graph-theoretic measure applied to graphs derived from a given text can be used to efficiently identify and rank keywords.
A supervised framework for automatic keyword extraction from single document is presented, and the claim that graph-theoretic properties of words are effective discriminators between keywords and non-keywords is substantiated.
This research presents a novel algorithm for keyword identification, an extraction of one or multiword phrases representing key aspects of a given document, called Transformer-Based Neural Tagger for Keyword IDentification (TNT-KID), capable of overcoming deficiencies of both supervised and unsupervised state-of-the-art approaches to keyword extraction.
This work follows a different path to detect the keywords from a text document by modeling the main distribution of the document's words using local word vector representations, and confirms the high performance of this approach compared to strong baselines and state-of-the-art unsupervised keyword extraction methods.
A set of nearly four million documents from health-care social media was collected and was trained in order to draw semantic model and to find the word embeddings, and the features of semantic space were utilized to rearrange the original TF-IDF scores through an iterative solution so as to improve the moderate performance of this algorithm on informal texts.
Adding a benchmark result helps the community track progress.