Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning (2018-01-18T00:00:00.000000Z)

TL;DR

Deep Dyna-Q is presented, which to the authors' knowledge is the first deep RL framework that integrates planning for task-completion dialogue policy learning and incorporates into the dialogue agent a model of the environment, referred to as the world model, to mimic real user response and generate simulated experience.

Abstract

Training a task-completion dialogue agent via reinforcement learning (RL) is costly because it requires many interactions with real users. One common alternative is to use a user simulator. However, a user simulator usually lacks the language complexity of human interlocutors and the biases in its design may tend to degrade the agent. To address these issues, we present Deep Dyna-Q, which to our knowledge is the first deep RL framework that integrates planning for task-completion dialogue policy learning. We incorporate into the dialogue agent a model of the environment, referred to as the world model, to mimic real user response and generate simulated experience. During dialogue policy learning, the world model is constantly updated with real user experience to approach real user behavior, and in turn, the dialogue agent is optimized using both real experience and simulated experience. The effectiveness of our approach is demonstrated on a movie-ticket booking task in both simulated and human-in-the-loop settings.

Authors

Baolin Peng

5 papers

Xiujun Li

9 papers

Jianfeng Gao

1 papers

TL;DR

Abstract

Authors

References40 items

Introduction to Reinforcement Learning

Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

Mastering the game of Go without human knowledge

Iterative policy learning in end-to-end trainable task-oriented neural dialog models

Imagination-Augmented Agents for Deep Reinforcement Learning

Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning

End-to-End Task-Completion Neural Dialogue Systems

Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning

The Predictron: End-To-End Learning and Planning

A User Simulator for Task-Completion Dialogues

Dialogue Learning With Human-In-The-Loop

Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM

Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access

Efficient Exploration for Dialog Policy Learning with Deep BBQ Networks \& Replay Buffer Spiking

Neural Belief Tracker: Data-Driven Dialogue State Tracking

Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning

Continuously Learning Neural Dialogue Management

On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems

Continuous Deep Q-Learning with Model-based Acceleration

Value Iteration Networks

Mastering the game of Go with deep neural networks and tree search

Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems

Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval

Human-level control through deep reinforcement learning

POMDP-Based Statistical Spoken Dialog Systems: A Review

A survey on metrics for the evaluation of user simulations

A Comparative Study of Reinforcement Learning Techniques on Dialogue Management

On-line policy optimisation of spoken dialogue systems via live interaction with human subjects

Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System

Gaussian Processes for Fast Policy Optimisation of POMDP-based Dialogue Managers

Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping

Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System

Learning dialogue strategies within the Markov decision process framework

Efficient Learning and Planning Within the Dyna Framework

Reinforcement Learning with a Hierarchy of Abstract Models

Efﬁcient Exploration for Dialogue Policy Learning with BBQ Networks & Replay Buffer Spiking

Prioritized sweeping: Reinforcement learning with less data and less time

Model-Based Reinforcement Learning with an Approximate, Learned Model

Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming

Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Sample Efficient On-Line Learning of Optimal Dialogue Policies with Kalman Temporal Differences

Field of Study

Venue Information

Name

Type

URL

Alternate Names