Current work on image-based story generation suffers from the fact that the existing image sequence collections do not have coherent plots behind them. We improve visual story generation by producing a new image-grounded dataset, Visual Writing Prompts (VWP). VWP contains almost 2K selected sequences of movie shots, each including 5-10 images. The image sequences are aligned with a total of 12K stories which were collected via crowd-sourcing given the image sequences and a set of grounded characters from the corresponding image sequence. Our new image sequence col-lection and filtering process has allowed us to obtain stories that are more coherent, diverse, and visually grounded compared to previous work. We also propose a character-based story generation model driven by coherence as a strong baseline. Evaluations show that our generated stories are more coherent, visually grounded, and diverse than stories generated with the current state-of-the-art model. Our code, image features, annotations and collected stories are available at https://vwprompt.github.io/.
CITATION STYLE
Hong, X., Sayeed, A., Mehra, K., Demberg, V., & Schiele, B. (2023). Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences. Transactions of the Association for Computational Linguistics, 11, 565–581. https://doi.org/10.1162/tacl_a_00553
Mendeley helps you to discover research relevant for your work.