3260 papers • 126 benchmarks • 313 datasets
Dialogue generation is the task of "understanding" natural language inputs - within natural language processing in order to produce output. The systems are usually intended for conversing with humans, for instance back and forth dialogue with a conversation agent like a chatbot. Some example benchmarks for this task (see others such as Natural Language Understanding) include FusedChat and Ubuntu DIalogue Corpus (UDC). Models can be evaluated via metrics such as BLEU, ROUGE, and METEOR albeit with challenges in terms of weak correlation with human judgement, that may be addressed by new ones like UnSupervised and Reference-free (USR) and Metric for automatic Unreferenced dialog evaluation (MaUde).
(Image credit: Papersgraph)
These leaderboards are used to track progress in dialogue-generation-31
Use these libraries to find dialogue-generation-31 models and implementations
It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
A new approach to generative data-driven dialogue systems (e.g. chatbots) called TransferTransfo is introduced which is a combination of a Transfer learning based training scheme and a high-capacity Transformer model which shows strong improvements over the current state-of-the-art end-to-end conversational models.
This work collects data and train models tocondition on their given profile information; and information about the person they are talking to, resulting in improved dialogues, as measured by next utterance prediction.
This work simulates dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties: informativity, non-repetitive turns, coherence, and ease of answering.
This work applies adversarial training to open-domain dialogue generation, training a system to produce sequences that are indistinguishable from human-generated dialogue utterances, and investigates models for adversarial evaluation that uses success in fooling an adversary as a dialogue evaluation metric, while avoiding a number of potential pitfalls.
The Multimodal EmotionLines Dataset (MELD), an extension and enhancement of Emotion lines, contains about 13,000 utterances from 1,433 dialogues from the TV-series Friends and shows the importance of contextual and multimodal information for emotion recognition in conversations.
This work proposes a new benchmark for empathetic dialogue generation and EmpatheticDialogues, a novel dataset of 25k conversations grounded in emotional situations, and presents empirical comparisons of dialogue model adaptations forEmpathetic responding, leveraging existing models or datasets without requiring lengthy re-training of the full model.
A new class of models called multiresolution recurrent neural networks, which explicitly model natural language generation at multiple levels of abstraction, are introduced, which outperform competing models by a substantial margin and generate more fluent, relevant and goal-oriented responses.
An enhanced multi-flow sequence to sequence pre-training and fine-tuning framework named ERNIE-GEN, which bridges the discrepancy between training and inference with an infilling generation mechanism and a noise-aware generation method to make generation closer to human writing patterns.
An empirical study indicates that automated metrics such as BLEU have stronger correlation with human judgments in the task-oriented setting compared to what has been observed in the non task- oriented setting.
Adding a benchmark result helps the community track progress.