Multimodal Grounding for Sequence-to-sequence Speech Recognition - Citation Graph | Papersgraph