We describe and analyze a new web-based spoken dialogue data collection framework. The framework enables the capture of conversational speech from two remote users who converse with each other and play a dialogue game entirely through their web browsers. We report on the substantial improvements in the speed and cost of data capture we have observed with this crowd-sourced paradigm. We also analyze a range of data quality factors by comparing a crowdsourced data set involving 196 remote users to a smaller but more quality controlled lab-based data set. We focus our comparison on aspects that are especially important in our spoken dialogue research, including audio quality, the effect of communication latency on the interaction, our ability to synchronize the collected data, our ability to collect examples of excellent game play, and the naturalness of the resulting interactions. This analysis illustrates some of the current trade-offs between lab-based and crowd-sourced spoken dialogue data.
CITATION STYLE
Manuvinakurike, R., & DeVault, D. (2015). Pair me up: A web framework for crowd-sourced spoken dialogue collection. In Natural Language Dialog Systems and Intelligent Assistants (pp. 189–201). Springer International Publishing. https://doi.org/10.1007/978-3-319-19291-8_18
Mendeley helps you to discover research relevant for your work.