3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in image-generation
No benchmarks available.
Use these libraries to find image-generation models and implementations
No datasets available.
No subtasks available.
Point-LLM is presented, the first 3D large language model (LLM) following 3D multi-modal instructions, which injects the semantics of Point-Bind into pre-trained LLMs, e.g., LLaMA, which requires no 3D instruction data, but exhibits superior 3D and multi- modal question-answering capacity.
The hierarchical Latent Point Diffusion Model (LION) is introduced, set up as a variational autoencoder (VAE) with a hierarchical latent space that combines a global shape latent representation with a point-structured latent space for 3D shape generation.
This work adapts the score distillation to the publicly available, and computationally efficient, Latent Diffusion Models, which apply the entire diffusion process in a compact latent space of a pretrained autoencoder.
This work proposes to model the 3D parameter as a random variable instead of a constant as in SDS and presents variational score distillation (VSD), a principled particle-based variational framework to explain and address the aforementioned issues in text-to-3D generation.
This paper presents a novel method for generating high-quality, stylized 3D avatars that utilizes pre-trained image-text diffusion models for data generation and a Generative Adversarial Network (GAN)-based 3D generation network for training.
Experiments show that SyncDreamer generates images with high consistency across different views, thus making it well-suited for various 3D generation tasks such as novel-view-synthesis, text-to-3D, and image-to -3D.
GeoDream is presented, a novel method that incorporates explicit generalized 3D priors with 2D diffusion priors to enhance the capability of obtaining unambiguous 3D consistent geometric structures without sacrificing diversity or fidelity and provides superior guidance for the refinement of 3D geometric priors.
This work revisits the impact of different 3D representations on generation quality and efficiency and proposes a progressive generation method through Voxel-Point Progressive Representation (VPP), which efficiently generates high-fidelity and diverse 3D shapes across different categories, while also exhibiting excellent representation transfer performance.
MVDream, a diffusion model that is able to generate consistent multi-view images from a given text prompt, is introduced and it is demonstrated that such a multi-view diffusion model is implicitly a generalizable 3D prior agnostic to 3D representations.
V3D is introduced, which leverages the world simulation capacity of pre-trained video diffusion models to facilitate 3D generation and can be extended to scene-level novel view synthesis, achieving precise control over the camera path with sparse input views.
Adding a benchmark result helps the community track progress.