Audiovisual Generalised Zero-shot Learning with Cross-modal Attention and Language - Citation Graph | Papersgraph