A novel Visual-Linguistic Causal Intervention (VLCI) framework for MRG is proposed, which consists of a visual deconfounding module (VDM) and a linguistic deconfounding module (LDM), to implicitly mitigate the visual-linguistic confounders by causal front-door intervention.