A novel signal-to-noise (SNR) iterative pruning procedure is introduced, which extracts evolvable sub-networks and incorporates loss curvature information into the network pruning step and finds that these initializations encode an inductive bias, which transfers across different evolution strategies, related tasks and even GD-based training.
Lottery tickets in Deep Learning [2] refer to highly sparse neural network initializations, which train to the performance level of their dense counterparts The existence of such sparse trainable initializations has previously been documented for a variety of gradient-based training settings. But is the lottery ticket phenomenon an idiosyncrasy of stochastic gradient descent or does it generalize to evolutionary optimization? In this paper we establish the existence of highly sparse trainable initializations for evolution strategies (ES) and characterize qualitative differences compared to gradient descent (GD)-based sparse training. We introduce a novel signal-to-noise (SNR) iterative pruning procedure, which extracts evolvable sub-networks and incorporates loss curvature information into the network pruning step. We demonstrate the existence of highly sparse evolvable initializations for a wide range of network architectures, evolution strategies and task settings. Furthermore, we find that these initializations encode an inductive bias, which transfers across different evolution strategies, related tasks and even GD-based training. Finally, we compare the local optima resulting from the different optimization paradigms and sparsity levels. In contrast to GD, ES explore diverse and flat local optima and do not preserve linear mode connectivity across sparsity levels and independent runs. The full paper was accepted at the ICML conference [4].