3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in video-editing-1
No benchmarks available.
Use these libraries to find video-editing-1 models and implementations
No datasets available.
Extensive quantitative and qualitative analysis suggests that LEO significantly improves coherent synthesis of human videos over previous methods on the datasets TaichiHD, FaceForensics and CelebV-HQ, as well as content-preserving video editing.
This work introduces point-to-point video generation that controls the generation process with two control points: the targeted start- and end-frames and proposes to maximize the modified variational lower bound of conditional data likelihood under a skip-frame training strategy.
This work proposes SoccerNet-v2, a novel large-scale corpus of manual annotations for the SoccerNet video dataset, along with open challenges to encourage more research in soccer understanding and broadcast production, and extends current tasks in the realm of soccer to include action spotting, camera shot segmentation with boundary detection, and a novel replay grounding task.
BodyNet is an end-to-end trainable network that benefits from a volumetric 3D loss, a multi-view re-projection loss, and intermediate supervision of 2D pose, 2D body part segmentation, and 3D pose and achieves state-of-the-art performance.
A deep learning based free-form video inpainting model is introduced, with proposed 3D gated convolutions to tackle the uncertainty offree-form masks and a novel Temporal PatchGAN loss to enhance temporal consistency.
Non-Rigid Neural Radiance Fields (NR-NeRF), a reconstruction and novel view synthesis approach for general non-rigid dynamic scenes, takes RGB images of a dynamic scene as input, and creates a high-quality space-time geometry and appearance representation.
This tech report presents a two-stage paradigm to detect what and when events happen in soccer broadcast videos, fine-tune multiple action recognition models on soccer data to extract high-level semantic features, and design a transformer based temporal detection module to locate the target events.
This survey analyzes and characterize the misinformation video from three levels including signal, semantic, and intent and systematically reviews existing works for detection from features of various modalities to techniques for clue integration.
A method that decomposes, and "unwraps", an input video into a set of layered 2D atlases, each providing a unified representation of the appearance of an object (or background) over the video, which does not require any prior 3D knowledge about scene geometry or camera poses.
The proposed Face Diffusion NeRF (FaceDNeRF), a new generative method to reconstruct high-quality Face NeRFs from single images, complete with semantic editing and relighting capabilities, achieves exceptionally realistic results and unprecedented flexibility in editing compared with state-of-the-art 3D face reconstruction and editing methods.
Adding a benchmark result helps the community track progress.