3260 papers • 126 benchmarks • 313 datasets
This is a sub-class of diffusion personalization methods where the model is not required to be tuned on few user-specific images. Rather, the diffusion models are additionally trained on some dataset to allow forward pass personalization during test time.
(Image credit: Papersgraph)
These leaderboards are used to track progress in diffusion-personalization-tuning-free-11
Use these libraries to find diffusion-personalization-tuning-free-11 models and implementations
No subtasks available.
The proposed IP-Adapter is an effective and lightweight adapter to achieve image prompt capability for the pretrained text-to-image diffusion models and has the benefit of the decoupled cross-attention strategy, the image prompt can also work well with the text prompt to achieve multimodal image generation.
This work proposes HyperDreamBooth—a hypernetwork capable of efficiently generating a small set of personalized weights from a single image of a person, coupled with fast finetuning, which can generate a person's face in various contexts and styles, with high subject details while also preserving the model's crucial knowledge of diverse styles and semantic modifications.
This work designs a novel IdentityNet by imposing strong semantic and weak spatial conditions, integrating facial and landmark images with textual prompts to steer the image generation, and demonstrates exceptional performance and efficiency, proving highly beneficial in real-world applications where identity preservation is paramount.
FastComposer proposes delayed subject conditioning in the denoising step to maintain both identity and editability in subject-driven image generation, and paves the way for efficient, personalized, and high-quality multi-subject image creation.
PhotoMaker is introduced, an efficient personalized text-to-image generation method, which mainly encodes an arbitrary number of input ID images into a stack ID embed-ding for preserving ID information, and an ID-oriented data construction pipeline to assemble the training data.
This paper proposes a brand new training-free text-to-image generation/editing framework, namely Recaption, Plan and Generate (RPG), harnessing the powerful chain-of-thought reasoning ability of multimodal LLMs to enhance the compositionality of text-to-image diffusion models.
Adding a benchmark result helps the community track progress.