Uncovering Human Traits in Determining Real and Spoofed Audio: Insights from Blind and Sighted Individuals

1Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

This paper explores how blind and sighted individuals perceive real and spoofed audio, highlighting differences and similarities between the groups. Through two studies, we find that both groups focus on specific human traits in audio-such as accents, vocal inflections, breathing patterns, and emotions-to assess audio authenticity. We further reveal that humans, irrespective of visual ability, can still outperform current state-of-the-art machine learning models in discerning audio authenticity; however, the task proves psychologically demanding. Moreover, detection accuracy scores between blind and sighted individuals are comparable, but each group exhibits unique strengths: the sighted group excels at detecting deepfake-generated audio, while the blind group excels at detecting text-to-speech (TTS) generated audio. These findings not only deepen our understanding of machine-manipulated and neural-renderer audio but also have implications for developing countermeasures, such as perceptible watermarks and human-AI collaboration strategies for spoofing detection.

Cite

CITATION STYLE

APA

Han, C., Mitra, P., & Billah, S. M. (2024). Uncovering Human Traits in Determining Real and Spoofed Audio: Insights from Blind and Sighted Individuals. In Conference on Human Factors in Computing Systems - Proceedings. Association for Computing Machinery. https://doi.org/10.1145/3613904.3642817

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free