Code-switching ubique est - Language identification and part-of-speech tagging for historical mixed text

8Citations
Citations of this article
73Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we describe the development of a language identification system and a part-of-speech tagger for Latin-Middle English mixed text. To this end, we annotate data with language IDs and Universal POS tags (Petrov et al., 2012). As a classifier, we train a conditional random field classifier for both sub-tasks, including features generated by the TreeTagger models of both languages. The focus lies on both a general and a task-specific evaluation. Moreover, we describe our effort concerning beyond proof-of-concept implementation of tools and towards a more task-oriented approach, showing how to apply our techniques in the context of Humanities research.

Cite

CITATION STYLE

APA

Schulz, S., & Keller, M. (2016). Code-switching ubique est - Language identification and part-of-speech tagging for historical mixed text. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 43–51). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-2105

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free