Data Acquisition and Linguistic Resources

  • Strassel S
  • Christianson C
  • McCary J
  • et al.
N/ACitations
Citations of this article
18Readers
Mendeley users who have this article in their library.
Get full text

Abstract

All human language technology demands substantial quantities of data for system training and development, plus stable benchmark data to measure ongoing progress. While creation of high quality linguistic resources is both costly and time consuming, such data has the potential to profoundly impact not just a single evaluation program but language technology research in general. GALE’s challenging performance targets demand linguistic data on a scale and complexity never before encountered. Resources cover multiple languages (Arabic, Chinese, and English) and multiple genres -- both structured (newswire and broadcast news) and unstructured (web text, including blogs and newsgroups, and broadcast conversation). These resources include significant volumes of monolingual text and speech, parallel text, and transcribed audio combined with multiple layers of linguistic annotation, ranging from word aligned parallel text and Treebanks to rich semantic annotation.

Cite

CITATION STYLE

APA

Strassel, S., Christianson, C., McCary, J., Staderman, W., & Olive, J. (2011). Data Acquisition and Linguistic Resources. In Handbook of Natural Language Processing and Machine Translation (pp. 1–131). Springer New York. https://doi.org/10.1007/978-1-4419-7713-7_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free