Fine-Grained Visual Textual Alignment for Cross-Modal Retrieval Using Transformer Encoders (2020-08-12T00:00:00.000000Z)