Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING)

Tyler Kendall; Charlotte Vaughn; Charlie Farrington; Kaylynn Gunter; Jaidan McLean; Chloe Tacata; Shelby Arnson

Journal ArticleOPEN ACCESS

Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING)

Frontiers in Artificial Intelligence (2021) 4

DOI: 10.3389/frai.2021.648543

10Citations

9Readers

Get full text

Abstract

Impressionistic coding of sociolinguistic variables like English (ING), the alternation between pronunciations like talkin' and talking, has been a central part of the analytic workflow in studies of language variation and change for over a half-century. Techniques for automating the measurement and coding for a wide range of sociolinguistic data have been on the rise over recent decades but procedures for coding some features, especially those without clearly defined acoustic correlates like (ING), have lagged behind others, such as vowels and sibilants. This paper explores computational methods for automatically coding variable (ING) in speech recordings, examining the use of automatic speech recognition procedures related to forced alignment (using the Montreal Forced Aligner) as well as supervised machine learning algorithms (linear and radial support vector machines, and random forests). Considering the automated coding of pronunciation variables like (ING) raises broader questions for sociolinguistic methods, such as how much different human analysts agree in their impressionistic codes for such variables and what data might act as the “gold standard” for training and testing of automated procedures. This paper explores several of these considerations in automated, and manual, coding of sociolinguistic variables and provides baseline performance data for automated and manual coding methods. We consider multiple ways of assessing algorithms' performance, including agreement with human coders, as well as the impact on the outcome of an analysis of (ING) that includes linguistic and social factors. Our results show promise for automated coding methods but also highlight that variability in results should be expected even with careful human coded data. All data for our study come from the public Corpus of Regional African American Language and code and derivative datasets (including our hand-coded data) are available with the paper.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Kendall, T., Vaughn, C., Farrington, C., Gunter, K., McLean, J., Tacata, C., & Arnson, S. (2021). Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING). Frontiers in Artificial Intelligence, 4. https://doi.org/10.3389/frai.2021.648543

Readers' Seniority

PhD / Post grad / Masters / Doc 5

100%

Readers' Discipline

Medicine and Dentistry 2

40%

Linguistics 1

20%

Mathematics 1

20%

Neuroscience 1

20%

Article Metrics

Social Media

Shares, Likes & Comments: 44

View details >

Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING)

Abstract

Author supplied keywords

References Powered by Scopus

The measurement of observer agreement for categorical data

Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences

Variation and the indexical field

Cited by Powered by Scopus

From sonority hierarchy to posterior probability as a measure of lenition: The case of Spanish stops

Quantitative Acoustic versus Deep Learning Metrics of Lenition

Computational sociophonetics using automatic speech recognition

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline

Article Metrics