Keep the noise down: On the performance of automatic speech recognition of voice-recordings in web surveys

Katharina Meitinger; Sabien van der Sluis; Matthias Schonlau

Journal ArticleOPEN ACCESS

Keep the noise down: On the performance of automatic speech recognition of voice-recordings in web surveys

Meitinger K
van der Sluis S
Schonlau M

Survey Practice (2024) 1-12

DOI: 10.29115/sp-2023-0022

N/ACitations

5Readers

Abstract

Voice-recordings are increasingly implemented in web surveys, but the resulting audio data need to be transcribed before analysis. Since manual coding is too time- and work-intensive, researchers often rely on automatic speech recognition (ASR) systems for the transcription of the voice-recordings. However, ASR tools might create partly incorrect transcriptions and potentially change the content of responses. If the ASR performance (i.e., accuracy and validity) differs by subgroup and contextual factors, a bias is introduced in the analysis of open-ended questions. We assessed the impact of sociodemographic and contextual factors on the accuracy and validity of ASR transcriptions with data from the Longitudinal Internet Studies for the Social Sciences (LISS) panel collected in December 2020. We find that background noise reduces the accuracy and validity of ASR transcriptions. In addition, validity improved when the respondent was alone during the survey. Fortunately, we did not find any evidence of systematic differences across subgroups (age, sex, education), devices or respondent location.

Cite

CITATION STYLE

APA

Meitinger, K., van der Sluis, S., & Schonlau, M. (2024). Keep the noise down: On the performance of automatic speech recognition of voice-recordings in web surveys. Survey Practice, 1–12. https://doi.org/10.29115/sp-2023-0022

Keep the noise down: On the performance of automatic speech recognition of voice-recordings in web surveys

Abstract

Cite

Register to see more suggestions