3260 papers • 126 benchmarks • 313 datasets
Image-guided Story Ending Generation (IgSEG) aims to generate a story ending for a given multi-sentence story plot and an ending-related image.
(Image credit: Papersgraph)
These leaderboards are used to track progress in image-guided-story-ending-generation
Use these libraries to find image-guided-story-ending-generation models and implementations
No subtasks available.
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.
This work proposes Multimodal Memory Transformer (MMT), an end-to-end framework that models and fuses both contextual and visual information to effectively capture the multimodal dependency for IgSEG.
This survey delves into the realm of interpretable cross-modal reasoning (I-CMR), where the objective is not only to achieve high predictive performance but also to provide human-understandable explanations for the results.
A novel model for story ending generation that adopts an incremental encoding scheme to represent context clues which are spanning in the story context and can generate more reasonable story endings than state-of-the-art baselines1.
Adding a benchmark result helps the community track progress.