An Expectation Maximisation Algorithm for Automated Cognate Detection

2Citations
Citations of this article
67Readers
Mendeley users who have this article in their library.

Abstract

In historical linguistics, cognate detection is the task of determining whether sets of words have common etymological roots. Inspired by the comparative method used by human linguists, we develop a system for automated cognate detection that frames the task as an inference problem for a general statistical model consisting of observed data (potentially cognate pairs of words), latent variables (the cognacy status of pairs) and unknown global parameters (which sounds correspond between languages). We then give a specific instance of such a model along with an expectation-maximisation algorithm to infer its parameters. We evaluate our system on a dataset of 8140 cognate sets, finding its performance of our method to be comparable to the state of the art. We additionally carry out qualitative analysis demonstrating various advantages it has over existing systems. We also suggest several ways our work could be extended within the general theoretical framework we propose.

Cite

CITATION STYLE

APA

MacSween, R., Caines, A., & Buttery, P. (2020). An Expectation Maximisation Algorithm for Automated Cognate Detection. In CoNLL 2020 - 24th Conference on Computational Natural Language Learning, Proceedings of the Conference (pp. 476–485). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.conll-1.38

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free