This work proposes a novel SnAg method that utilizes a Transformer-based architecture equipped with modality-level noise masking for the robust integration of multi-modal entity features in KGs, and achieves SOTA performance across a total of ten datasets.