Lemmatization for ancient languages: Rules or neural networks?

7Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Lemmatisation, which is one of the most important stages of text preprocessing, consists in grouping the inflected forms of a word together so they can be analysed as a single item. This task is often considered solved for most modern languages irregardless of their morphological type, but the situation is dramatically different for ancient languages. Rich inflectional system and high level of orthographic variation common to these languages together with lack of resources make lemmatising historical data a challenging task. It becomes more and more important as manuscripts are being extensively digitized now, but still remains poorly covered in literature. In this work, I compare a rule-based and a neural network based approach to lemmatisation in case of Early Irish (Old and Middle Irish are often described together as “Early Irish”) data.

Cite

CITATION STYLE

APA

Dereza, O. (2018). Lemmatization for ancient languages: Rules or neural networks? In Communications in Computer and Information Science (Vol. 930, pp. 35–47). Springer Verlag. https://doi.org/10.1007/978-3-030-01204-5_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free