1
WorldSimBench: Towards Video Generation Models as World Simulators
2
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
3
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
4
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
5
Photorealistic Video Generation with Diffusion Models
6
VBench: Comprehensive Benchmark Suite for Video Generative Models
7
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
8
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
9
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
10
Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack
11
ModelScope Text-to-Video Technical Report
12
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
13
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models
14
Exploiting Diffusion Prior for Real-World Image Super-Resolution
15
VideoChat: chat-centric video understanding
16
LEO: Generative Latent Image Animator for Human Video Synthesis
17
Collaborative Diffusion for Multi-Modal Face Generation and Editing
18
Align Your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
19
Text2Performer: Text-Driven Human Video Generation
20
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
21
LLaMA: Open and Efficient Foundation Language Models
22
Adding Conditional Control to Text-to-Image Diffusion Models
23
Zero-shot Image-to-Image Translation
24
Reference-Based Image and Video Super-Resolution via $C^{2}$-Matching
25
Towards Smooth Video Composition
26
MagicVideo: Efficient Video Generation With Latent Diffusion Models
27
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
28
LAION-5B: An open large-scale dataset for training next generation image-text models
29
Imagen Video: High Definition Video Generation with Diffusion Models
30
Make-A-Video: Text-to-Video Generation without Text-Video Data
31
Towards Robust Blind Face Restoration with Codebook Lookup Transformer
32
Generating Long Videos of Dynamic Scenes
33
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
34
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
35
Hierarchical Text-Conditional Image Generation with CLIP Latents
37
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
38
Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks
39
StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2
40
High-Resolution Image Synthesis with Latent Diffusion Models
41
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
42
Investigating Tradeoffs in Real-World Video Super-Resolution
43
LoRA: Low-Rank Adaptation of Large Language Models
44
Robust Reference-based Super-Resolution via C2-Matching
45
A Good Image Generator Is What You Need for High-Resolution Video Synthesis
46
GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions
47
BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment
48
RoFormer: Enhanced Transformer with Rotary Position Embedding
49
VideoGPT: Video Generation using VQ-VAE and Transformers
50
Image Super-Resolution via Iterative Refinement
51
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
52
Learning Transferable Visual Models From Natural Language Supervision
53
Zero-Shot Text-to-Image Generation
54
Improved Denoising Diffusion Probabilistic Models
55
The MSR-Video to Text dataset with clean annotations
56
InMoDeGAN: Interpretable Motion Decomposition Generative Adversarial Network for Video Generation
57
Taming Transformers for High-Resolution Image Synthesis
58
Score-Based Generative Modeling through Stochastic Differential Equations
59
Denoising Diffusion Implicit Models
60
Denoising Diffusion Probabilistic Models
61
Long-Term Video Prediction via Criticization and Retrospection
62
Cross-Scale Internal Graph Neural Network for Image Super-Resolution
63
ImaGINator: Conditional Spatio-Temporal GAN for Video Generation
64
Disentangling Multiple Features in Video Sequences Using Gaussian Processes in Variational Autoencoders
65
G3AN: Disentangling Appearance and Motion for Video Generation
66
Analyzing and Improving the Image Quality of StyleGAN
67
Motion-Based Generator Model: Unsupervised Disentanglement of Appearance, Trackable and Intrackable Motions in Dynamic Patterns
68
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
69
Adversarial Video Generation on Complex Datasets
70
A Style-Based Generator Architecture for Generative Adversarial Networks
71
Large Scale GAN Training for High Fidelity Natural Image Synthesis
72
Disentangled Sequential Autoencoder
73
Neural Discrete Representation Learning
74
MoCoGAN: Decomposing Motion and Content for Video Generation
75
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
76
Temporal Generative Adversarial Nets with Singular Value Clipping
77
Generating Videos with Scene Dynamics
78
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
79
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
81
Supplementary Materials for: NULL-text Inversion for Editing Real Images using Guided Diffusion Models
82
Vdt:General-purposevideodiffusiontransformersviamaskmodeling
83
Scalablediffusionmodelswithtransform-ers
84
Latent Video Diffusion Models for High-Fidelity Video Generation with Arbitrary Lengths
85
Learning to Generate Human Videos
86
AUTO-ENCODING VARIATIONAL BAYES
87
GENERATIVE ADVERSARIAL NETS
88
Deep learning , volume 1
89
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis