Lexicon-assisted tagging and lemmatization in Latin: A comparison of six taggers and two lemmatization methods

Steffen Eger; Tim Vor Der Br¨uck; Alexander Mehler

Conference ProceedingsOPEN ACCESS

Lexicon-assisted tagging and lemmatization in Latin: A comparison of six taggers and two lemmatization methods

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2015) 2015-text 105-113

DOI: 10.18653/v1/w15-3716

17Citations

79Readers

Abstract

We present a survey of tagging accuracies - concerning part-of-speech and full morphological tagging - for several taggers based on a corpus for medieval church Latin (see www.comphistsem.org). The best tagger in our sample, Lapos, has a PoS tagging accuracy of close to 96% and an overall tagging accuracy (including full morphological tagging) of about 85%. When we 'intersect' the taggers with our lexicon, the latter score increases to almost 91% for Lapos. A conservative assessment of lemmatization accuracy on our data estimates a score of 93-94% for a lexicon-based lemmatization strategy and a score of 94-95% for lemmatizing via trained lemmatizers. c 2015 Association for Computational Linguistics and The Asian Federation of Natural Language Processing.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Eger, S., Vor Der Br¨uck, T., & Mehler, A. (2015). Lexicon-assisted tagging and lemmatization in Latin: A comparison of six taggers and two lemmatization methods. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 2015-text, pp. 105–113). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w15-3716

Readers' Seniority

PhD / Post grad / Masters / Doc 24

63%

Researcher 9

24%

Lecturer / Post doc 3

Professor / Associate Prof. 2

Readers' Discipline

Computer Science 28

67%

Linguistics 8

19%

Social Sciences 3

Arts and Humanities 3

Lexicon-assisted tagging and lemmatization in Latin: A comparison of six taggers and two lemmatization methods

Abstract

References Powered by Scopus

Comparisons of sequence labeling algorithms and extensions

The annals of humanities computing: The index Thomisticus

The Perseus Project: A digital library for the humanities

Cited by Powered by Scopus

Non-literal text reuse in historical texts: An approach to identify reuse transformations and its application to bible reuse

Building a text analysis pipeline for classical languages

Spam Detection Over Call Transcript Using Deep Learning

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline