Visual Captions: Augmenting Verbal Communication with On-the-fly Visuals

Bruce Xingyu; Vladimir Kirilyuk; Xiuxiu Yuan; Alex Olwal; Peggy Chi; Xiang Anthony Chen; Ruofei Du

Conference ProceedingsOPEN ACCESS

Visual Captions: Augmenting Verbal Communication with On-the-fly Visuals

Conference on Human Factors in Computing Systems - Proceedings (2023)

DOI: 10.1145/3544548.3581566

17Citations

33Readers

Get full text

Abstract

Video conferencing solutions like Zoom, Google Meet, and Microsoft Teams are becoming increasingly popular for facilitating conversations, and recent advancements such as live captioning help people better understand each other. We believe that the addition of visuals based on the context of conversations could further improve comprehension of complex or unfamiliar concepts. To explore the potential of such capabilities, we conducted a formative study through remote interviews (N=10) and crowdsourced a dataset of over 1500 sentence-visual pairs across a wide range of contexts. These insights informed Visual Captions, a real-time system that integrates with a video conferencing platform to enrich verbal communication. Visual Captions leverages a fine-tuned large language model to proactively suggest relevant visuals in open-vocabulary conversations. We present findings from a lab study (N=26) and an in-the-wild case study (N=10), demonstrating how Visual Captions can help improve communication through visual augmentation in various scenarios.

Author supplied keywords

Cite

CITATION STYLE

APA

Xingyu, B., Kirilyuk, V., Yuan, X., Olwal, A., Chi, P., Chen, X. A., & Du, R. (2023). Visual Captions: Augmenting Verbal Communication with On-the-fly Visuals. In Conference on Human Factors in Computing Systems - Proceedings. Association for Computing Machinery. https://doi.org/10.1145/3544548.3581566

Visual Captions: Augmenting Verbal Communication with On-the-fly Visuals

Abstract

Author supplied keywords

Cite

Register to see more suggestions