Reducing unknown unknowns with guidance in image caption

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Deep recurrent models applied in Image Caption, which link up computer vision and natural language processing, have achieved excellent results enabling automatically generating natural sentences describing an image. However, the mismatch of sample distribution between training data and the open world may leads to tons of hiding-in-dark Unknown Unknowns (UUs). And such errors may greatly harm the correctness of generated captions. In this paper, we present a framework targeting on UUs reduction and model optimization based on recurrently training with small amounts of external data detected under assistance of crowd commonsense. We demonstrate and analyze our method with currently state-of-the-art image-to-text model. Aiming at reducing the number of UUs in generated captions, we obtain over 12% of UUs reduction and reinforcement of model cognition on these scenes.

Cite

CITATION STYLE

APA

Ni, M., Yang, J., Lin, X., & He, L. (2017). Reducing unknown unknowns with guidance in image caption. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10614 LNCS, pp. 547–555). Springer Verlag. https://doi.org/10.1007/978-3-319-68612-7_62

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free