Distilled Dual-Encoder Model for Vision-Language Understanding - Citation Graph | Papersgraph