Automatic generation of captions for visual contents has recently emerged as a challenging research field due to it’s enormous impact in areas like computer vision, information retrieval, autonomous vehicles and natural language processing. Traditional models mainly focus on single aspect of the visual features to generate descriptions. The proposed model incorporates spatial information of salient objects capturing detailed characteristics coupled with scene category to incorporate general image setting. These extracted features are processed by topic-aware attention-based language model to generate human like captions. Performance of the proposed model is compared with state-of-the-art research through evaluation on benchmark image captioning datasets. The experimental results depict the promising performance of the proposed model compared with the captioning models proposed in recent literature.
CITATION STYLE
Zia, U., Riaz, M. M., & Ghafoor, A. (2022). Topic Guided Image Captioning with Scene and Spatial Features. In Lecture Notes in Networks and Systems (Vol. 450 LNNS, pp. 180–191). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-99587-4_16
Mendeley helps you to discover research relevant for your work.