Temporal and cross-modal attention for audio-visual zero-shot learning (2022-07-20T00:00:00.000000Z)