3260 papers • 126 benchmarks • 313 datasets
Layout-to-image generation its the task to generate a scene based on the given layout. The layout describes the location of the objects to be included in the output image. In this section, you can find state-of-the-art leaderboards for Layout-to-image generation.
(Image credit: Papersgraph)
These leaderboards are used to track progress in image-generation
Use these libraries to find image-generation models and implementations
No subtasks available.
These latent diffusion models achieve new state of the art scores for image inpainting and class-conditional image synthesis and highly competitive performance on various tasks, including unconditional image generation, text-to-image synthesis, and super-resolution, while significantly reducing computational requirements compared to pixel-based DMs.
This work proposes a method for generating images from scene graphs, enabling explicitly reasoning about objects and their relationships, and validates this approach on Visual Genome and COCO-Stuff.
The method separates between a layout embedding and an appearance embedding, which leads to generated images that better match the scene graph, have higher visual quality, and support more complex scene graphs.
This work presents a novel model that addresses semantic equivalence issues in graphs by learning canonical graph representations from the data, resulting in improved image generation for complex visual scenes.
A diffusion model named LayoutDiffusion is proposed that can obtain higher generation quality and greater controllability than the previous works and is proposed to construct a structural image patch with region information and transform the patched image into a special layout to fuse with the normal layout in a unified form.
An intuitive paradigm for the task, layout-to-mask- to-image, which learns to unfold object masks in a weakly-supervised way based on an input layout and object style codes is proposed and a method built on Generative Adversarial Networks (GANs) is presented.
A context-aware feature transformation module is introduced in the generator to ensure that the generated feature encoding of either object or stuff is aware of other coexisting objects/stuff in the scene.
This work proposes a multi-object generation framework that can synthesize images with multiple objects without explicitly requiring their contextual information during the generation process and shows that augmenting the training set with the samples generated by the approach improves the performance of existing models.
This work introduces a spatio-semantic scene graph network that does not require direct supervision for constellation changes or image edits, and makes it possible to train the system from existing real-world datasets with no additional annotation effort.
Adding a benchmark result helps the community track progress.