Lemmatization for ancient languages: Rules or neural networks?

Oksana Dereza

Conference Proceedings

Lemmatization for ancient languages: Rules or neural networks?

Dereza O

Communications in Computer and Information Science (2018) 930 35-47

DOI: 10.1007/978-3-030-01204-5_4

7Citations

12Readers

Get full text

Abstract

Lemmatisation, which is one of the most important stages of text preprocessing, consists in grouping the inflected forms of a word together so they can be analysed as a single item. This task is often considered solved for most modern languages irregardless of their morphological type, but the situation is dramatically different for ancient languages. Rich inflectional system and high level of orthographic variation common to these languages together with lack of resources make lemmatising historical data a challenging task. It becomes more and more important as manuscripts are being extensively digitized now, but still remains poorly covered in literature. In this work, I compare a rule-based and a neural network based approach to lemmatisation in case of Early Irish (Old and Middle Irish are often described together as “Early Irish”) data.

Author supplied keywords

Cite

CITATION STYLE

APA

Dereza, O. (2018). Lemmatization for ancient languages: Rules or neural networks? In Communications in Computer and Information Science (Vol. 930, pp. 35–47). Springer Verlag. https://doi.org/10.1007/978-3-030-01204-5_4

Lemmatization for ancient languages: Rules or neural networks?

Abstract

Author supplied keywords

Cite

Register to see more suggestions