We describe an approach for viewing any large, detail-rich picture on a small display by generating a video from the image, as taken by a virtual camera moving across it at varying distance. Our main innovation is the ability to build the virtual camera's motion from a textual description of a picture, e.g., a museum caption, so that relevance and ordering of image regions are determined by co-analyzing image annotations and natural language text. Furthermore, our system arranges the resulting presentation such that it is synchronized with an audio track generated from the text by use of a text-to-speech system. © Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering 2010.
CITATION STYLE
Reiterer, B., Concolato, C., & Hellwagner, H. (2010). Natural-language-based conversion of images to mobile multimedia experiences. In Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering (Vol. 40 LNICST, pp. 87–90). https://doi.org/10.1007/978-3-642-12630-7_10
Mendeley helps you to discover research relevant for your work.