3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in text-to-face-generation
No benchmarks available.
Use these libraries to find text-to-face-generation models and implementations
No subtasks available.
This paper generates captions for images in the CelebA dataset by creating an algorithm to automatically convert a list of attributes to a set of captions, and model the highly multi-modal problem of text to face generation as learning the conditional distribution of faces in same latent space.
A parallel decoding model for fast and high-fidelity text-to-lip generation (ParaLip) is proposed that predicts the duration of the encoded linguistic features and model the target lip frames conditioned on the encode linguistic features with their duration in a non-autoregressive manner and incorporates the structural similarity index loss and adversarial learning.
This Perspective highlights emerging positive use cases of AI-generated characters, specifically in supporting learning and well-being, and demonstrates an easy-to-use AI character generation pipeline to enable such outcomes.
Generating photos satisfying multiple constraints finds broad utility in the content creation industry. A key hurdle to accomplishing this task is the need for paired data consisting of all modalities (i.e., constraints) and their corresponding output. Moreover, existing methods need retraining using paired data across all modalities to introduce a new condition. This paper proposes a solution to this problem based on denoising diffusion probabilistic models (DDPMs). Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models. Since each sampling step in the DDPM follows a Gaussian distribution, we show that there exists a closed-form solution for generating an image given various constraints. Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task through our proposed sampling strategy. We also introduce a novel reliability parameter that allows using different off-the-shelf diffusion models trained across various datasets during sampling time alone to guide it to the desired outcome satisfying multiple constraints. We perform experiments on various standard multimodal tasks to demonstrate the effectiveness of our approach. More details can be found at: https://nithin-gk.github.io/projectpages/Multidiff
Adding a benchmark result helps the community track progress.