Continuous Sparsification is proposed, a new algorithm to search for winning tickets which continuously removes parameters from a network during training, and learns the sub-network's structure with gradient-based methods instead of relying on pruning strategies.
The Lottery Ticket Hypothesis conjectures that, for a typically-sized neural network, it is possible to find small sub-networks that, when trained from scratch, match the performance of the dense counterpart given a comparable training budget. The proposed algorithm to search for winning tickets, Iterative Magnitude Pruning, consistently finds sparse sub-networks which train faster and better than the overparameterized models they were extracted from, creating potential applications to problems such as transfer learning. In this paper, we propose Continuous Sparsification, a new algorithm to search for winning tickets which continuously removes parameters from a network during training, and learns the sub-network's structure with gradient-based methods instead of relying on pruning strategies. We show empirically that our method is capable of finding tickets that are sparser than the ones found by Iterative Magnitude Pruning, while achieving higher performance when trained from scratch. Moreover, our method can be efficiently parallelized, decreasing the ticket search cost measured in wall-clock time significantly given enough parallel computing resources.