Video-LLaVA: Learning United Visual Representation by Alignment Before Projection - Citation Graph | Papersgraph