3260 papers • 126 benchmarks • 313 datasets
Coreference resolution is the task of clustering mentions in text that refer to the same underlying real world entities. Example: +-----------+ | | I voted for Obama because he was most aligned with my values", she said. | | | +-------------------------------------------------+------------+ "I", "my", and "she" belong to the same cluster and "Obama" and "he" belong to the same cluster.
(Image credit: Papersgraph)
These leaderboards are used to track progress in coreference-resolution
Use these libraries to find coreference-resolution models and implementations
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.
An innovated contextualized attention-based deep neural network, SDNet, to fuse context into traditional MRC models, which leverages both inter-attention and self-att attention to comprehend conversation context and extract relevant information from passage.
A new model architecture DeBERTa (Decoding-enhanced BERT with disentangled attention) is proposed that improves the BERT and RoBERTa models using two novel techniques that significantly improve the efficiency of model pre-training and performance of downstream tasks.
The approach extends BERT by masking contiguous random spans, rather than random tokens, and training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it.
It is found that instruction finetuning with the above aspects dramatically improves performance on a variety of model classes (PaLM, T5, U-PaLM), prompting setups, and evaluation benchmarks (MMLU, BBH, TyDiQA, MGSM, open-ended generation).
Adding a benchmark result helps the community track progress.