Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone (2022-06-15T00:00:00.000000Z)