The end-to-end system not only outperforms modularized dialogue system baselines for both objective and subjective evaluation, but also is robust to noises as demonstrated by several systematic experiments with different error granularity and rates specific to the language understanding module.

379

Content

facebookresearch/ParlAI

2 papers 10,426

ChatGPT-software-testing

Customer Support on Twitter

0

Paper Graph

Visual Dialog

Devi Parikh, Dhruv Batra, Satwik Kottur, Deshraj Yadav, Abhishek Das, Khushi Gupta, Avi Singh, José M. F. Moura•Fri Nov 25 2016

A retrieval-based evaluation protocol for Visual Dialog where the AI agent is asked to sort a set of candidate answers and evaluated on metrics such as mean-reciprocal-rank of human response, and a family of neural encoder-decoder models, which outperform a number of sophisticated baselines.

1063 0

Paper Graph

QLoRA: Efficient Finetuning of Quantized LLMs

Luke Zettlemoyer, Ari Holtzman, Tim Dettmers, Artidoro Pagnoni•Mon May 22 2023

QLoRA finetuning on a small high-quality dataset leads to state-of-the-art results, even when using smaller models than the previous SoTA, and current chatbot benchmarks are not trustworthy to accurately evaluate the performance levels of chatbots.

3843 0

Paper Graph

Deep Reinforcement Learning for Dialogue Generation

Dan Jurafsky, Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao•Sat Jun 04 2016

This work simulates dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties: informativity, non-repetitive turns, coherence, and ease of answering.

1374 0

Paper Graph

Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

E. Xing, Haotong Zhang, Joseph E. Gonzalez, Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Ion Stoica•Thu Jun 08 2023

The results reveal that strong LLM judges like GPT-4 can match both controlled and crowdsourced human preferences well, achieving over 80% agreement, the same level of agreement between humans, and LLM-as-a-judge is a scalable and explainable way to approximate human preferences, which are otherwise very expensive to obtain.

6810 0

Paper Graph

Recipes for Building an Open-Domain Chatbot

J. Weston, Eric Michael Smith, Kurt Shuster, Emily Dinan, Mary Williamson, Myle Ott, Naman Goyal, Da Ju, Jing Xu, Yinhan Liu, Stephen Roller, Y-Lan Boureau•Mon Apr 27 2020

Human evaluations show the best models outperform existing approaches in multi-turn dialogue on engagingness and humanness measurements, and the limitations of this work are discussed by analyzing failure cases of the models.

1095 0

Paper Graph

Subword Semantic Hashing for Intent Classification on Small Datasets

Pedro Alonso, G. Pihlgren, Ayushman Dash, M. Liwicki, Kumar Shridhar, Amit Sahu, Vinay Pondeknath, Fotini Simistira•Mon Oct 15 2018

This paper introduces the use of Semantic Hashing as embedding for the task of Intent Classification and achieves state-of-the-art performance on three frequently used benchmarks.

32 0

Paper Graph

PLATO-2: Towards Building an Open-Domain Chatbot via Curriculum Learning

Hua Wu, Haifeng Wang, Xinchao Xu, Wenquan Wu, Siqi Bao, H. He, Fan Wang, Zhen Guo, Zhibin Liu•Mon Jun 29 2020

To build a high-quality open-domain chatbot, this work introduces the effective training process of PLATO-2 via curriculum learning, achieving new state-of-the-art results.

147 0

Paper Graph

Mistral 7B

A. Mensch, Guillaume Lample, Thibaut Lavril, Diego de Las Casas, M. Lachaux, Thomas Wang, Teven Le Scao, Timothée Lacroix, Albert Qiaochu Jiang, Alexandre Sablayrolles, Chris Bamford, Devendra Singh Chaplot, Florian Bressand, Gianna Lengyel, Lucile Saulnier, Lélio Renard Lavaud, Pierre Stock, William El Sayed•Mon Oct 09 2023

This work introduces Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency, which leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost.

3033 0

Paper Graph

Adding a benchmark result helps the community track progress.