Data Acquisition and Linguistic Resources

Stephanie Strassel; Caitlin Christianson; John McCary; William Staderman; Joseph Olive

Book Chapter

Data Acquisition and Linguistic Resources

Strassel S
Christianson C
McCary J
et al.

Springer New York, (2011), 1-131

DOI: 10.1007/978-1-4419-7713-7_1

N/ACitations

18Readers

Get full text

Abstract

All human language technology demands substantial quantities of data for system training and development, plus stable benchmark data to measure ongoing progress. While creation of high quality linguistic resources is both costly and time consuming, such data has the potential to profoundly impact not just a single evaluation program but language technology research in general. GALE’s challenging performance targets demand linguistic data on a scale and complexity never before encountered. Resources cover multiple languages (Arabic, Chinese, and English) and multiple genres -- both structured (newswire and broadcast news) and unstructured (web text, including blogs and newsgroups, and broadcast conversation). These resources include significant volumes of monolingual text and speech, parallel text, and transcribed audio combined with multiple layers of linguistic annotation, ranging from word aligned parallel text and Treebanks to rich semantic annotation.

Cite

CITATION STYLE

APA

Strassel, S., Christianson, C., McCary, J., Staderman, W., & Olive, J. (2011). Data Acquisition and Linguistic Resources. In Handbook of Natural Language Processing and Machine Translation (pp. 1–131). Springer New York. https://doi.org/10.1007/978-1-4419-7713-7_1

Data Acquisition and Linguistic Resources

Abstract

Cite

Register to see more suggestions