Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning

  • Kestemont M
  • De Gussem J
N/ACitations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

In this paper we consider two sequence tagging tasks for medieval Latin: part-of-speech tagging and lemmatization. These are both basic, yet foundational preprocessing steps in applications such as text re-use detection. Nevertheless, they are generally complicated by the considerable orthographic variation which is typical of medieval Latin. In Digital Classics, these tasks are traditionally solved in a (i) cascaded and (ii) lexicon-dependent fashion. For example, a lexicon is used to generate all the potential lemma-tag pairs for a token, and next, a context-aware PoS-tagger is used to select the most appropriate tag-lemma pair. Apart from the problems with out-of-lexicon items, error percolation is a major downside of such approaches. In this paper we explore the possibility to elegantly solve these tasks using a single, integrated approach. For this, we make use of a layered neural network architecture from the field of deep representation learning.

Cite

CITATION STYLE

APA

Kestemont, M., & De Gussem, J. (2017). Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning. Journal of Data Mining & Digital Humanities, Special Issue on...(Towards a Digital Ecosystem:...). https://doi.org/10.46298/jdmdh.1398

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free