This paper explores how humans conduct conversations with images by investigating an open-domain image conversation dataset, ImageChat. We examined the conversations with images from the perspectives of image relevancy and image information.We found that utterances/conversations are not always related to the given image, and conversation topics diverge within three turns about half of the time. Besides image objects, more comprehensive non-object image information is also indispensable. After inspecting the causes, we suggested that understanding the overall scenario of image and connecting objects based on their high-level attributes might be very helpful to generate more engaging open-domain conversations when an image is presented. We proposed enriching the image information with image caption and object tags based on our analysis. With our proposed image+ features, we improved automatic metrics including BLEU and Bert Score, and increased the diversity and image-relevancy of generated responses to the strong SOTA baseline. The result verifies that our analysis provides valuable insights and could facilitate future research on open-domain conversations with images.
CITATION STYLE
Chen, Y. P., Miyazaki, T., Shimizu, N., & Nakayama, H. (2022). How do people talk about images? A study on open-domain conversations with images. In NAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Student Research Workshop (pp. 156–162). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.naacl-srw.20
Mendeley helps you to discover research relevant for your work.