Unconditional Image-Text Pair Generation with Multimodal Cross Quantizer - Citation Graph | Papersgraph