Detecting spelling variants in non-standard texts

9Citations
Citations of this article
82Readers
Mendeley users who have this article in their library.

Abstract

Spelling variation in non-standard language, e.g. computer-mediated communication and historical texts, is usually treated as a deviation from a standard spelling, e.g. 2mr as a non-standard spelling for tomorrow. Consequently, in normalization - the standard approach of dealing with spelling variation - so-called non-standard words are mapped to their corresponding standard words. However, there is not always a corresponding standard word. This can be the case for single types (like emoticons in computermediated communication) or a complete language, e.g. texts from historical languages that did not develop to a standard variety. The approach presented in this thesis proposal deals with spelling variation in absence of reference to a standard. The task is to detect pairs of types that are variants of the same morphological word. An approach for spelling-variant detection is presented, where pairs of potential spelling variants are generated with Levenshtein distance and subsequently filtered by supervised machine learning. The approach is evaluated on historical Low German texts. Finally, further perspectives are discussed.

Cite

CITATION STYLE

APA

Barteld, F. (2017). Detecting spelling variants in non-standard texts. In 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 - Proceedings of the Student Research Workshop (pp. 11–22). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/e17-4002

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free