BraIN: A Bidirectional Generative Adversarial Networks for image captions

3Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.

Abstract

Although progress has been made in image captioning, machine-generated captions and human-generated captions are still quite distinct. Machine-generated captions perform well based on automated metrics. However, they lack naturalness, an essential characteristic of human language, because they maximize the likelihood of training samples. We propose a novel model to generate more human-like captions than has been accomplished with prior methods. Our model includes an attention mechanism, a bidirectional language generation model, and a conditional generative adversarial network. Specifically, the attention mechanism captures image details by segmenting important information into smaller pieces. The bidirectional language generation model produces human-like sentences by considering multiple perspectives. Simultaneously, the conditional generative adversarial network increases sentence quality by comparing a set of captions. To evaluate the performance of our model, we compare human preferences for BraIN-generated captions with baseline methods. We also compare results with actual human-generated captions using automated metrics. Results show our model is capable of producing more human-like captions than baseline methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, Y., & Cook, D. (2020). BraIN: A Bidirectional Generative Adversarial Networks for image captions. In ACM International Conference Proceeding Series. Association for Computing Machinery. https://doi.org/10.1145/3446132.3446406

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free