Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING)

10Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Impressionistic coding of sociolinguistic variables like English (ING), the alternation between pronunciations like talkin' and talking, has been a central part of the analytic workflow in studies of language variation and change for over a half-century. Techniques for automating the measurement and coding for a wide range of sociolinguistic data have been on the rise over recent decades but procedures for coding some features, especially those without clearly defined acoustic correlates like (ING), have lagged behind others, such as vowels and sibilants. This paper explores computational methods for automatically coding variable (ING) in speech recordings, examining the use of automatic speech recognition procedures related to forced alignment (using the Montreal Forced Aligner) as well as supervised machine learning algorithms (linear and radial support vector machines, and random forests). Considering the automated coding of pronunciation variables like (ING) raises broader questions for sociolinguistic methods, such as how much different human analysts agree in their impressionistic codes for such variables and what data might act as the “gold standard” for training and testing of automated procedures. This paper explores several of these considerations in automated, and manual, coding of sociolinguistic variables and provides baseline performance data for automated and manual coding methods. We consider multiple ways of assessing algorithms' performance, including agreement with human coders, as well as the impact on the outcome of an analysis of (ING) that includes linguistic and social factors. Our results show promise for automated coding methods but also highlight that variability in results should be expected even with careful human coded data. All data for our study come from the public Corpus of Regional African American Language and code and derivative datasets (including our hand-coded data) are available with the paper.

References Powered by Scopus

The measurement of observer agreement for categorical data

60873Citations
N/AReaders
Get full text

Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences

4091Citations
N/AReaders
Get full text

Variation and the indexical field

1174Citations
N/AReaders
Get full text

Cited by Powered by Scopus

From sonority hierarchy to posterior probability as a measure of lenition: The case of Spanish stops

10Citations
N/AReaders
Get full text

Quantitative Acoustic versus Deep Learning Metrics of Lenition

6Citations
N/AReaders
Get full text

Computational sociophonetics using automatic speech recognition

5Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Kendall, T., Vaughn, C., Farrington, C., Gunter, K., McLean, J., Tacata, C., & Arnson, S. (2021). Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING). Frontiers in Artificial Intelligence, 4. https://doi.org/10.3389/frai.2021.648543

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 5

100%

Readers' Discipline

Tooltip

Medicine and Dentistry 2

40%

Linguistics 1

20%

Mathematics 1

20%

Neuroscience 1

20%

Article Metrics

Tooltip
Social Media
Shares, Likes & Comments: 44

Save time finding and organizing research with Mendeley

Sign up for free