Automatic image captioning has achieved a great progress. However, the existing captioning frameworks basically enumerate the objects in the image. The generated captions lack the real-world knowledge about named entities and their relations, such as the relations among famous persons, organizations and buildings. On the contrary, humans interpret images in a specific way by providing real-world knowledge with relations of the aforementioned named entities. To generate human-like captions, we focus on captioning the images of news, which could provide real-world knowledge of the whole story behind the images. Then we propose a novel model that makes captions closer to the human-like description of the image, by leveraging the semantic relevance of the named entities. The named entities are not only extracted from news under the guidance of the image content, but also extended with external knowledge based on the semantic relations. In detail, we propose a sentence correlation analysis algorithm to selectively draw the contextual information from news, and use entity-linking algorithm based on knowledge graph to discover the relations of entities with a global sight. The results of extensive experiments based on real-world dataset which is collected from the news show that our model generates image captions closer to the corresponding real-world captions.
CITATION STYLE
Jing, Y., Zhiwei, X., & Guanglai, G. (2020). Context-Driven Image Caption with Global Semantic Relations of the Named Entities. IEEE Access, 8, 143584–143594. https://doi.org/10.1109/ACCESS.2020.3013321
Mendeley helps you to discover research relevant for your work.