Large Batch Optimization for Deep Learning: Training BERT in 76 minutes - Citation Graph | Papersgraph