Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

computer-vision-11

Story Visualization

3260 papers • 126 benchmarks • 313 datasets

Story Visualization is the task of generating coherent and aligned sequence of images given a sequence of textual captions representing description of a story. It mainly consists of two tasks: story generation and story continuation, where story continuation uses additional ground truth information in the form of the first frame.

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in story-visualization-11

Trend

Dataset

Best Model

Actions

Pororo

Pororo

CLEVR-SV

CLEVR-SV

Zero-Shot Action Execution DiDeMO-CSV

Zero-Shot Action Execution DiDeMO-CSV

Libraries

i

Use these libraries to find story-visualization-11 models and implementations

Datasets

StoryBench

Subtasks

No subtasks available.

Most implemented papers

Character-centric Story Visualization via Visual Planning and Token Alignment

Nanyun Peng, Te-Lin Wu, Hong Chen, Rujun Han, Hideki Nakayama•Sat Oct 15 2022

This work proposes to adapt a recent work that augments VQ-VAE with a text-to-visual-token (transformer) architecture that excels at preserving characters and can produce higher quality image sequences compared with the strong baselines.

37

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

0

StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion

Changsheng Xu, Ming Tao, Bing-Kun Bao, Hao Tang, Yaowei Wang•Mon Apr 08 2024

The StoryImager enhances the storyboard generative ability inherited from the pre-trained text-to-image model for a bidirectional generation and introduces a Target Frame Masking Strategy to extend and unify different story image generation tasks.

19 0

Show Me a Story: Towards Coherent Neural Story Illustration

Dimitris N. Metaxas, L. Sigal, Mubbasir Kapadia, Hareesh Ravi, Lezi Wang, C. Muñiz•Thu May 31 2018

We propose an end-to-end network for visual illustration of a sequence of sentences forming a story. At the core of our model is the ability to model the inter-related nature of the sentences within a story, as well as the ability to learn coherence to support reference resolution. The framework takes the form of an encoder-decoder architecture, where sentences are encoded using a hierarchical two-level sentence-story GRU, combined with an encoding of coherence, and sequentially decoded using a predicted feature representation into a consistent illustrative image sequence. We optimize all parameters of our network in an end-to-end fashion with respect to order embedding loss, encoding entailment between images and sentences. Experiments on the VIST storytelling dataset [9] highlight the importance of our algorithmic choices and efficacy of our overall model.

29 0

StoryGAN: A Sequential Conditional GAN for Story Visualization

Yuexin Wu, Zhe Gan, L. Carin, Jianfeng Gao, Yelong Shen, Jingjing Liu, Yu Cheng, Yitong Li, David Edwin Carlson•Wed Dec 05 2018

A new story-to-image-sequence generation model, StoryGAN, based on the sequential conditional GAN framework is proposed, which outperformed state-of-the-art models in image quality, contextual consistency metrics, and human evaluation.

276 0

Improving Generation and Evaluation of Visual Stories via Semantic Consistency

Mohit Bansal, Adyasha Maharana, Darryl Hannan•Wed May 19 2021

A number of improvements to prior modeling approaches are presented, including the addition of a dual learning framework that utilizes video captioning to reinforce the semantic alignment between the story and generated images, a copy-transform mechanism for sequentially-consistent story visualization, and MART-based transformers to model complex interactions between frames.

64 0

Word-Level Fine-Grained Story Visualization

Thomas Lukasiewicz, Bowen Li•Tue Aug 02 2022

A new sentence representation is introduced, which incorporates word information from all story sentences to mitigate the inconsistency problem and a new discriminator with fusion features is proposed to improve image quality and story consistency.

28 0

StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation

Mohit Bansal, Adyasha Maharana, Darryl Hannan•Mon Sep 12 2022

This work enhances or 'retro-fit' the pretrained text-to-image synthesis models with task-specific modules for story continuation and facilitates copying of visual elements from the source image, thereby improving continuity in the generated visual story.

100 0

Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models

Wenhu Chen, Hui Xue, Xichen Pan, Yuhong Li, Pengda Qin•Sat Nov 19 2022

This work proposes AR-LDM, a latent diffusion model auto-regressively conditioned on history captions and generated images that can generalize to new characters through adaptation and extends the text-conditioned method to multimodal conditioning.

79 0

Adding a benchmark result helps the community track progress.

Story Visualization | State-of-the-Art