Multimodal content is central to digital communications and has been shown to increase user engagement - making them indispensable in today's digital economy. Image-text combination is a common multimodal manifestation seen in several digital forums, e.g., banners, online ads, social posts. The choice of a specific image-text combination is dictated by (a) the information to be represented, (b) the strength of the image and text modalities in representing the information, and (c) the need of the reader consuming the content. Given an input content, representing the information to be represented in a multimodal fragment, creating variants accounting for these factors is a non-trivial and tedious task; calling for a need to automate. In this paper, we propose a holistic approach to automatically create multimodal image-text fragments derived from an unstructured input content tailored towards a target need. The proposed approach aligns the fragment to the target need both in terms of content as well as style. With the help of metric-based and human evaluations, we show the effectiveness of the proposed approach in generating multimodal fragments aligned to target needs while also capturing the information to be presented.
CITATION STYLE
Verma, G., Bv, S., Sharma, S., & Srinivasan, B. V. (2020). Generating need-adapted multimodal fragments. In International Conference on Intelligent User Interfaces, Proceedings IUI (pp. 335–346). Association for Computing Machinery. https://doi.org/10.1145/3377325.3377487
Mendeley helps you to discover research relevant for your work.