Toward automatic audio description generation for accessible videos

Yujia Wang; Wei Liang

Conference ProceedingsOPEN ACCESS

Toward automatic audio description generation for accessible videos

Conference on Human Factors in Computing Systems - Proceedings (2021)

DOI: 10.1145/3411764.3445347

39Citations

66Readers

Get full text

Abstract

Video accessibility is essential for people with visual impairments. Audio descriptions describe what is happening on-screen, e.g., physical actions, facial expressions, and scene changes. Generating highquality audio descriptions requires a lot of manual description generation [50]. To address this accessibility obstacle, we built a system that analyzes the audiovisual contents of a video and generates the audio descriptions. The system consisted of three modules: AD insertion time prediction, AD generation, and AD optimization. We evaluated the quality of our system on five types of videos by conducting qualitative studies with 20 sighted users and 12 users who were blind or visually impaired. Our findings revealed how audio description preferences varied with user types and video types. Based on our study's analysis, we provided recommendations for the development of future audio description generation technologies.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, Y., & Liang, W. (2021). Toward automatic audio description generation for accessible videos. In Conference on Human Factors in Computing Systems - Proceedings. Association for Computing Machinery. https://doi.org/10.1145/3411764.3445347

Toward automatic audio description generation for accessible videos

Abstract

Author supplied keywords

Cite

Register to see more suggestions