This paper proposes a simple algorithm in which an agent continually relabels and imitates the trajectories it generates to progressively learn goal- reaching behaviors from scratch, and formally shows that this iterated supervised learning procedure optimizes a bound on the RL objective, derive performance bounds of the learned policy, and empirically demonstrates improved goal-reaching performance and robustness over current RL algorithms in several benchmark tasks.