mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (2023-02-01T00:00:00.000000Z)