Collecting spontaneous speech corpora that are open-ended, yet topically constrained, is increasingly popular for research in spoken dialogue systems and speaker state, inter alia. Typically, these corpora are labeled by human annotators, either in the lab or through crowdsourcing; however, this is cumbersome and time-consuming for large corpora. We present four different approaches to automatically tagging a corpus when general topics of the conversations are known. We develop these approaches on the Columbia X-Cultural Deception corpus and find accuracy that significantly exceeds the baseline. Finally, we conduct a cross-corpus evaluation by testing the best performing approach on the Columbia/SRI/Colorado corpus.
CITATION STYLE
Maredia, A. S., Schechtman, K., Levitan, S. I., & Hirschberg, J. (2017). Comparing approaches for automatic question identification. In *SEM 2017 - 6th Joint Conference on Lexical and Computational Semantics, Proceedings (pp. 110–114). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/s17-1013
Mendeley helps you to discover research relevant for your work.