Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning (2023-03-25T00:00:00.000000Z)