Measuring the distance between comparable corpora between languages

Serge Sharoff

Book Chapter

Measuring the distance between comparable corpora between languages

Sharoff S

Springer Berlin Heidelberg, (2013), 113-130

DOI: 10.1007/978-3-642-20128-8_6

8Citations

8Readers

Get full text

Abstract

The notion of comparable corpora rests on our ability to assess the difference between corpora which are claimed to be comparable, but this activity is still art rather than proper science. Here I will discuss attempts at approximating the content of corpora collected from the Web using various methods, also in comparison to traditional corpora, such as the BNC. The procedure for estimating the corpus composition is based on selecting keywords, followed by hard clustering or by building topic models. This can apply to corpora within the same language, e.g., the BNC against ukWac as well as to corpora in different languages, e.g., webpages collected using the same procedure for English and Russian.

Cite

CITATION STYLE

APA

Sharoff, S. (2013). Measuring the distance between comparable corpora between languages. In Building and Using Comparable Corpora (pp. 113–130). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-20128-8_6

Measuring the distance between comparable corpora between languages

Abstract

Cite

Register to see more suggestions