Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning (2018-11-19T00:00:00.000000Z)

TL;DR

By combining switcher and active learning, the new framework named as Switch-based Active Deep Dyna-Q (Switch-DDQ), leads to significant improvement over DDQ and Q-learning baselines in both simulation and human evaluations.

Abstract

Training task-completion dialogue agents with reinforcement learning usually requires a large number of real user experiences. The Dyna-Q algorithm extends Q-learning by integrating a world model, and thus can effectively boost training efficiency using simulated experiences generated by the world model. The effectiveness of Dyna-Q, however, depends on the quality of the world model - or implicitly, the pre-specified ratio of real vs. simulated experiences used for Q-learning. To this end, we extend the recently proposed Deep Dyna-Q (DDQ) framework by integrating a switcher that automatically determines whether to use a real or simulated experience for Q-learning. Furthermore, we explore the use of active learning for improving sample efficiency, by encouraging the world model to generate simulated experiences in the stateaction space where the agent has not (fully) explored. Our results show that by combining switcher and active learning, the new framework named as Switch-based Active Deep Dyna-Q (Switch-DDQ), leads to significant improvement over DDQ and Q-learning baselines in both simulation and human evaluations.1

Authors

Yuexin Wu

3 papers

Yiming Yang

11 papers

Jianfeng Gao

39 papers

Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning

TL;DR

Abstract

Authors

References28 items

Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning

Neural Approaches to Conversational AI

Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning

A Tutorial on Thompson Sampling

End-to-End Task-Completion Neural Dialogue Systems

A User Simulator for Task-Completion Dialogues

Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM

Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

Neural Belief Tracker: Data-Driven Dialogue State Tracking

Continuously Learning Neural Dialogue Management

Mastering the game of Go with deep neural networks and tree search

Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems

Domain-Adversarial Training of Neural Networks

Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval

Human-level control through deep reinforcement learning

POMDP-Based Statistical Spoken Dialog Systems: A Review

A survey on metrics for the evaluation of user simulations

Rectified Linear Units Improve Restricted Boltzmann Machines

Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System

Finite-time Analysis of the Multiarmed Bandit Problem

Learning dialogue strategies within the Markov decision process framework

Long Short-Term Memory

Efﬁcient Exploration for Dialogue Policy Learning with BBQ Networks & Replay Buffer Spiking

and Hastie

Rmsprop: Divide the gradient by a running average of its recent magnitude. Neural networks for machine learning, Coursera lecture 6e

and Schmidhuber

Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming

Field of Study

Venue Information

Name

Type

URL

Alternate Names