In these five tasks, the performance of InternVideo-Ego4D comprehensively surpasses the baseline methods and the champions of CVPR2022, demonstrating the powerful representation ability of Intern video as a video foundation model.
In this report, we present our champion solutions to five tracks at Ego4D challenge. We leverage our developed InternVideo, a video foundation model, for five Ego4D tasks, including Moment Queries, Natural Language Queries, Future Hand Prediction, State Change Object Detection, and Short-term Object Interaction Anticipation. InternVideo-Ego4D is an effective paradigm to adapt the strong foundation model to the downstream ego-centric video understanding tasks with simple head designs. In these five tasks, the performance of InternVideo-Ego4D comprehensively surpasses the baseline methods and the champions of CVPR2022, demonstrating the powerful representation ability of InternVideo as a video foundation model. Our code will be released at https://github.com/OpenGVLab/ego4d-eccv2022-solutions
Jiahao Wang
4 papers
Yu Qiao
4 papers
Junting Pan
8 papers
Kunchang Li
7 papers
Yinan He
7 papers
Zun Wang
4 papers
Guo Chen
3 papers
Hongjie Zhang
4 papers
Bingkun Huang
2 papers
Zhiyu Zhao
2 papers
Yi Liu
2 papers
Sen Xing
2 papers
Yifei Huang
4 papers
Yi Wang
1 papers
Yizhuo Li
1 papers
Yin-Dong Zheng
1 papers
Tong Lu
1 papers
Liming Wang
1 papers