StoryLLaVA: Enhancing Visual Storytelling with Multi-Modal Large Language Models

Published in

International Conference on Computational Lingu...(2024)

External Links:

TL;DR

This work proposes a Topic-Driven Narrative Optimizer (TDNO) that improves both the training data and MLLM models by integrating image descriptions, topic generation, and GPT-4-based refinements, and employs a preference-based ranked story sampling method that aligns model outputs with human storytelling preferences through positive-negative pairing.

Authors

Li Yang

1 papers

Zhiding Xiao

1 papers

Wenxin Huang

1 papers

Xian Zhong

1 papers

StoryLLaVA: Enhancing Visual Storytelling with Multi-Modal Large Language Models

Published in

International Conference on Computational Lingu...(2024)

External Links:

Generate Graph

TL;DR

Authors

Li Yang

1 papers

Zhiding Xiao

1 papers

Wenxin Huang

1 papers

Xian Zhong

1 papers

StoryLLaVA: Enhancing Visual Storytelling with Multi-Modal Large Language Models

TL;DR

Authors

StoryLLaVA: Enhancing Visual Storytelling with Multi-Modal Large Language Models

TL;DR

Authors

Field of Study

Venue Information

Name

Type

URL

Alternate Names