CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction (2023-10-02T00:00:00.000000Z)