In this paper, we describe the development of a language identification system and a part-of-speech tagger for Latin-Middle English mixed text. To this end, we annotate data with language IDs and Universal POS tags (Petrov et al., 2012). As a classifier, we train a conditional random field classifier for both sub-tasks, including features generated by the TreeTagger models of both languages. The focus lies on both a general and a task-specific evaluation. Moreover, we describe our effort concerning beyond proof-of-concept implementation of tools and towards a more task-oriented approach, showing how to apply our techniques in the context of Humanities research.
CITATION STYLE
Schulz, S., & Keller, M. (2016). Code-switching ubique est - Language identification and part-of-speech tagging for historical mixed text. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 43–51). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-2105
Mendeley helps you to discover research relevant for your work.