Character-centric Story Visualization via Visual Planning and Token Alignment (2022-10-16T00:00:00.000000Z)

TL;DR

This work proposes to adapt a recent work that augments VQ-VAE with a text-to-visual-token (transformer) architecture that excels at preserving characters and can produce higher quality image sequences compared with the strong baselines.

Abstract

Story visualization advances the traditional text-to-image generation by enabling multiple image generation based on a complete story. This task requires machines to 1) understand long text inputs, and 2) produce a globally consistent image sequence that illustrates the contents of the story. A key challenge of consistent story visualization is to preserve characters that are essential in stories. To tackle the challenge, we propose to adapt a recent work that augments VQ-VAE with a text-to-visual-token (transformer) architecture. Specifically, we modify the text-to-visual-token module with a two-stage framework: 1) character token planning model that predicts the visual tokens for characters only; 2) visual token completion model that generates the remaining visual token sequence, which is sent to VQ-VAE for finalizing image generations. To encourage characters to appear in the images, we further train the two-stage framework with a character-token alignment objective. Extensive experiments and evaluations demonstrate that the proposed method excels at preserving characters and can produce higher quality image sequences compared with the strong baselines.

Authors

Nanyun Peng

28 papers

Te-Lin Wu

5 papers

Hong Chen

2 papers

TL;DR

Abstract

Authors

References37 items

Go Back in Time: Generating Flashbacks in Stories with Event Temporal Prompts

Integrating Visuospatial, Linguistic, and Commonsense Structure into Story Visualization

CogView: Mastering Text-to-Image Generation via Transformers

Improving Generation and Evaluation of Visual Stories via Semantic Consistency

VideoGPT: Video Generation using VQ-VAE and Transformers

Zero-Shot Text-to-Image Generation

Improved-StoryGAN for sequential images visualization

Content Planning for Neural Story Generation with Aristotelian Rescoring

Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs

Eigen-CAM: Class Activation Map using Principal Components

Ablation-CAM: Visual Explanations for Deep Convolutional Network via Gradient-free Localization

PororoGAN: An Improved Story Visualization Model on Pororo-SV Dataset

Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks

VAEGAN: A Collaborative Filtering Framework based on Adversarial Variational Autoencoders

Generating Long Sequences with Sparse Transformers

Plan, Write, and Revise: an Interactive System for Open-Domain Story Generation

MirrorGAN: Learning Text-To-Image Generation by Redescription

StoryGAN: A Sequential Conditional GAN for Story Visualization

Plan-And-Write: Towards Better Automatic Storytelling

Imagine This! Scripts to Compositions to Videos

On Convergence and Stability of GANs

A Semantic Loss Function for Deep Learning with Symbolic Knowledge

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

Neural Discrete Representation Learning

Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks

StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks

Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization

Learning Structured Output Representation using Deep Conditional Generative Models

Rethinking the Inception Architecture for Computer Vision

Conditional Generative Adversarial Nets

LANGUAGE, CHARACTER AND ACTION: A LINGUISTIC APPROACH TO THE ANALYSIS OF CHARACTER IN A HEMINGWAY SHORT STORY

Character-Preserving Coherent Story Visualization

AUTO-ENCODING VARIATIONAL BAYES

Language Models are Unsupervised Multitask Learners

Improving Generalization and Stability of Generative Adversarial Networks

GENERATIVE ADVERSARIAL NETS

Towards Controllable Story Generation

Field of Study

Journal Information

Name

Volume

Venue Information

Name

Type

URL

Alternate Names