Lemmatisation, which is one of the most important stages of text preprocessing, consists in grouping the inflected forms of a word together so they can be analysed as a single item. This task is often considered solved for most modern languages irregardless of their morphological type, but the situation is dramatically different for ancient languages. Rich inflectional system and high level of orthographic variation common to these languages together with lack of resources make lemmatising historical data a challenging task. It becomes more and more important as manuscripts are being extensively digitized now, but still remains poorly covered in literature. In this work, I compare a rule-based and a neural network based approach to lemmatisation in case of Early Irish (Old and Middle Irish are often described together as “Early Irish”) data.
CITATION STYLE
Dereza, O. (2018). Lemmatization for ancient languages: Rules or neural networks? In Communications in Computer and Information Science (Vol. 930, pp. 35–47). Springer Verlag. https://doi.org/10.1007/978-3-030-01204-5_4
Mendeley helps you to discover research relevant for your work.