Training data-efficient image transformers & distillation through attention - Citation Graph | Papersgraph