New advances in corpus-based lexicography

Arvi Hurskainen

Journal ArticleOPEN ACCESS

New advances in corpus-based lexicography

Hurskainen A

Lexikos (2003) 13 111-132

DOI: 10.5788/13-0-725

3Citations

5Readers

Abstract

This article presents various approaches used in corpus-based computational lexicography. A claim is made that in order for computational lexicography to be efficient, precise and comprehensive, it should utilize the method where the corpus text is first analysed, and the results of this analysis is then processed further to meet the needs of a dictionary. This method has several advantages, including high precision and recall, as well as the possibility to automate the process much further than with more traditional computational methods. The frequency list obtained by using the lemma (the equivalent of the headword) as basis helps in selecting the words to be included in the dictionary. The approach is demonstrated through various phases by applying SALAMA (the Swahili Language Manager) to the process. Manual work will be needed in the phase when examples of use are selected from the corpus, and possibly modified. However, the list of examples of use, arranged alphabetically according to the corresponding headword, can also be produced automatically. Thus the alphabetical list of headwords with examples of use is the material on which the lexicographer works manually. The article deals with problems encountered in compiling traditional printed dictionaries, and it excludes electronic dictionaries and thesauri.

Author supplied keywords

Cite

CITATION STYLE

APA

Hurskainen, A. (2003). New advances in corpus-based lexicography. Lexikos, 13, 111–132. https://doi.org/10.5788/13-0-725

New advances in corpus-based lexicography

Abstract

Author supplied keywords

Cite

Register to see more suggestions