3260 papers • 126 benchmarks • 313 datasets
Dialogue generation is the task of "understanding" natural language inputs - within natural language processing in order to produce output. The systems are usually intended for conversing with humans, for instance back and forth dialogue with a conversation agent like a chatbot. Some example benchmarks for this task (see others such as Natural Language Understanding) include FusedChat and Ubuntu DIalogue Corpus (UDC). Models can be evaluated via metrics such as BLEU, ROUGE, and METEOR albeit with challenges in terms of weak correlation with human judgement, that may be addressed by new ones like UnSupervised and Reference-free (USR) and Metric for automatic Unreferenced dialog evaluation (MaUde).
(Image credit: Papersgraph)
These leaderboards are used to track progress in dialogue-generation-40
Use these libraries to find dialogue-generation-40 models and implementations
This work collects data and train models tocondition on their given profile information; and information about the person they are talking to, resulting in improved dialogues, as measured by next utterance prediction.
A new approach to generative data-driven dialogue systems (e.g. chatbots) called TransferTransfo is introduced which is a combination of a Transfer learning based training scheme and a high-capacity Transformer model which shows strong improvements over the current state-of-the-art end-to-end conversational models.
It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
This work simulates dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties: informativity, non-repetitive turns, coherence, and ease of answering.
This work proposes a new benchmark for empathetic dialogue generation and EmpatheticDialogues, a novel dataset of 25k conversations grounded in emotional situations, and presents empirical comparisons of dialogue model adaptations forEmpathetic responding, leveraging existing models or datasets without requiring lengthy re-training of the full model.
This work applies adversarial training to open-domain dialogue generation, training a system to produce sequences that are indistinguishable from human-generated dialogue utterances, and investigates models for adversarial evaluation that uses success in fooling an adversary as a dialogue evaluation metric, while avoiding a number of potential pitfalls.
The Multimodal EmotionLines Dataset (MELD), an extension and enhancement of Emotion lines, contains about 13,000 utterances from 1,433 dialogues from the TV-series Friends and shows the importance of contextual and multimodal information for emotion recognition in conversations.
It is shown that automatic metrics provide a better guidance than human on discriminating system-level performance in Text Summarization and Controlled Generation tasks, and that multi-aspect human-aligned metric (UniEval) is not necessarily dominant over single-aspects human- aligned metrics (CTC, CtrlEval), and task-agnostic metrics (BLEU, BERTScore), particularly in Controlled generation tasks.
A new class of models called multiresolution recurrent neural networks, which explicitly model natural language generation at multiple levels of abstraction, are introduced, which outperform competing models by a substantial margin and generate more fluent, relevant and goal-oriented responses.
An end-to-end multi-turns proactive dialogue generation agent was established with the aid of data augmentation techniques and variant encoder-decoder structure designs and a rank-based ensemble approach was developed for boosting performance.
Adding a benchmark result helps the community track progress.