Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models - Citation Graph | Papersgraph