3260 papers • 126 benchmarks • 313 datasets
Imitation Learning is a framework for learning a behavior policy from demonstrations. Usually, demonstrations are presented in the form of state-action trajectories, with each pair indicating the action to take at the state being visited. In order to learn the behavior policy, the demonstrated actions are usually utilized in two ways. The first, known as Behavior Cloning (BC), treats the action as the target label for each state, and then learns a generalized mapping from states to actions in a supervised manner. Another way, known as Inverse Reinforcement Learning (IRL), views the demonstrated actions as a sequence of decisions, and aims at finding a reward/cost function under which the demonstrated decisions are optimal. Finally, a newer methodology, Inverse Q-Learning aims at directly learning Q-functions from expert data, implicitly representing rewards, under which the optimal policy can be given as a Boltzmann distribution similar to soft Q-learning Source: Learning to Imitate
(Image credit: Papersgraph)
These leaderboards are used to track progress in imitation-learning-18
No benchmarks available.
Use these libraries to find imitation-learning-18 models and implementations
A new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning, is proposed and a certain instantiation of this framework draws an analogy between imitation learning and generative adversarial networks.
This work evaluates different architectures for conditional imitation learning in vision-based driving and conducts experiments in realistic three-dimensional simulations of urban driving and on a 1/5 scale robotic truck that is trained to drive in a residential area.
This paper presents an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstrating data and is able to automatically assess the necessary ratio of demonstrationData while learning thanks to a prioritized replay mechanism.
This work proposes a two-phase, autonomous imitation learning technique called behavioral cloning from observation (BCO), that allows the agent to acquire experience in a self-supervised fashion to develop a model which is then utilized to learn a particular task by observing an expert perform that task without the knowledge of the specific actions taken.
This work proposes a simple and general technique to constrain information flow in the discriminator by means of an information bottleneck, and demonstrates that the proposed variational discriminator bottleneck (VDB) leads to significant improvements across three distinct application areas for adversarial learning algorithms.
This work proposes a simple alternative that still uses RL, but does not require learning a reward function, and can be implemented with a handful of minor modifications to any standard Q-learning or off-policy actor-critic algorithm, called soft Q imitation learning (SQIL).
A new graph convolutional neural network model is proposed for learning branch-and-bound variable selection policies, which leverages the natural variable-constraint bipartite graph representation of mixed-integer linear programs.
This work introduces a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function, implicitly representing both reward and policy, and achieves state-of-the-art results in offline and online imitation learning settings, significantly outperforming existing methods both in the number of required environment interactions and scalability in high-dimensional spaces.
A new algorithm is proposed that can infer the latent structure of expert demonstrations in an unsupervised way, built on top of Generative Adversarial Imitation Learning, and can not only imitate complex behaviors, but also learn interpretable and meaningful representations of complex behavioral data, including visual demonstrations.
Adding a benchmark result helps the community track progress.