Revealing phonological similarities between related languages from automatically generated parallel corpora

Karin Müller

Conference Proceedings

Revealing phonological similarities between related languages from automatically generated parallel corpora

Müller K

Texts@ACL 2005 - Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond, Proceedings of the Workshop (2005) 33-40

DOI: 10.3115/1654449.1654455

1Citations

80Readers

Get full text

Abstract

In this paper, we present an approach to automatically revealing phonological correspondences within historically related languages. We create two bilingual pronunciation dictionaries for the language pairs German-Dutch and German-English. The data is used for automatically learning phonological similarities between the two language pairs via EM-based clustering. We apply our models to predict from a phonological German word the phonemes of a Dutch and an English cognate. The similarity scores show that German and Dutch phonemes are more similar than German and English phonemes, which supplies statistical evidence of the common knowledge that German is more closely related to Dutch than to English. We assess our approach qualitatively, finding meaningful classes caused by historical sound changes. The classes can be used for language learning.

Cite

CITATION STYLE

APA

Müller, K. (2005). Revealing phonological similarities between related languages from automatically generated parallel corpora. In Texts@ACL 2005 - Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond, Proceedings of the Workshop (pp. 33–40). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1654449.1654455

Revealing phonological similarities between related languages from automatically generated parallel corpora

Abstract

Cite

Register to see more suggestions