3260 papers • 126 benchmarks • 313 datasets
Given a document, selecting a subset of the words or sentences which best represents a summary of the document.
(Image credit: Papersgraph)
These leaderboards are used to track progress in extractive-text-summarization-11
Use these libraries to find extractive-text-summarization-11 models and implementations
A novel architecture that augments the standard sequence-to-sequence attentional model in two orthogonal ways, using a hybrid pointer-generator network that can copy words from the source text via pointing, which aids accurate reproduction of information, while retaining the ability to produce novel words through the generator.
This paper introduces a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences and proposes a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two.
A novel efficient attention mechanism equivalent to dot-product attention but with substantially less memory and computational costs is proposed, which allows more widespread and flexible integration of attention modules into a network, which leads to better accuracies.
This paper reports on the project called Lecture Summarization Service, a python based RESTful service that utilizes the BERT model for text embeddings and KMeans clustering to identify sentences closes to the centroid for summary selection.
This work proposes a fully data-driven approach to abstractive sentence summarization by utilizing a local attention-based model that generates each word of the summary conditioned on the input sentence.
Two adaptive learning models are presented: AREDSUM-SEQ that jointly considers salience and novelty during sentence selection; and a two-step AREDsUM-CTX that scores salience first, then learns to balancesalience and redundancy, enabling the measurement of the impact of each aspect.
The DebateSum dataset, which consists of 187,386 unique pieces of evidence with corresponding argument and extractive summaries, is presented and a search engine for this dataset is presented, utilized extensively by members of the National Speech and Debate Association today.
A centroid-based method for text summarization that exploits the compositional capabilities of word embeddings and achieves good performance even in comparison to more complex deep learning models.
This work creates a resource for benchmarking the techniques for document level novelty detection via event-specific crawling of news documents across several domains in a periodic manner and releases the annotated corpus with necessary statistics.
Adding a benchmark result helps the community track progress.