This work scales both data and model size for the InternVideo2, a model that outperforms others on various video-related captioning, dialogue, and long video understanding benchmarks, highlighting its ability to reason and comprehend long temporal contexts.