CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction - Citation Graph | Papersgraph