Acquiring Speech Transcriptions Using Mismatched Crowdsourcing

  • Jyothi P
  • Hasegawa-johnson M
  • 10

    Readers

    Mendeley users who have this article in their library.
  • 12

    Citations

    Citations of this article.

Abstract

Transcribed speech is a critical resource for building statistical speech recognition systems. Recent work has looked towards soliciting transcriptions for large speech corpora from native speakers of the language using crowdsourcing techniques. However, native speakers of the target language may not be readily available for crowdsourcing.We examine the following question: can humans unfamiliar with the target language help transcribe? We follow an information-theoretic approach to this problem: (1) We learn the characteristics of a noisy channel that models the transcribers’ systematic perception biases. (2)We use an error-correcting code, specifically a repetition code, to encode the inputs to this channel, in conjunction with a maximum-likelihood decoding rule. To demonstrate the feasibility of this approach, we transcribe isolated Hindi words with the help of Mechanical Turk workers unfamiliar with Hindi. We successfully recover Hindi words with an accuracy of over 85% (and 94% in a 4-best list) using a 15-fold repetition code. We also estimate the conditional entropy of the input to this channel (Hindi words) given the channel output (transcripts from crowdsourced workers) to be less than 2 bits; this serves as a theoretical estimate of the average number of bits of auxiliary information required for errorless recovery.

Author-supplied keywords

  • Human Computation and Crowd Sourcing Track

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

  • ISBN: 9781577357001
  • SGR: 84959879321
  • PUI: 608914145
  • SCOPUS: 2-s2.0-84959879321

Authors

  • Preethi Jyothi

  • Mark Hasegawa-johnson

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free