Investigating phonological theories with crowd-sourced data: The Inventory Size Hypothesis in the light of Lingua Libre

2Citations
Citations of this article
27Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Data-driven research in phonetics and phonology relies massively on oral resources, and access thereto. We propose to explore a question in comparative linguistics using an open-source crowd-sourced corpus, Lingua Libre, Wikimedia's participatory linguistic library, to show that such corpora may offer a solution to typologists wishing to explore numerous languages at once. For the present proof of concept, we compare the realizations of Italian and Spanish vowels (sample size = 5000) to investigate whether vowel production is influenced by the size of the phonemic inventory (the Inventory Size Hypothesis), by the exact shape of the inventory (the Vowel Quality Hypothesis) or by none of the above. Results show that the size of the inventory does not seem to influence vowel production, thus supporting previous research, but also that the shape of the inventory may well be a factor determining the extent of variation in vowel production. Most of all, these results show that Lingua Libre has the potential to provide valuable data for linguistic inquiry.

Cite

CITATION STYLE

APA

Hutin, M., & Allassonnière-Tang, M. (2022). Investigating phonological theories with crowd-sourced data: The Inventory Size Hypothesis in the light of Lingua Libre. In SIGMORPHON 2022 - 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, Proceedings of the Workshop (pp. 23–28). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.sigmorphon-1.3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free