Abstract
There is no scientific consensus on what is meant by "emotion"- researchers have examined various phenomena spanning brain modes, feelings, sensations, and cognitive structures, among others, in their study of emotional experiences. For the purposes of developing an AI speech emotion recognition (SER) system, however, emotion must be defined, bounded, and instantiated as ground truth in the training data. This means practical choices must be made in which particular emotional ontologies are prioritized over others in the construction of SER datasets. In this paper, I explore these tensions around fairness, accountability, and transparency by analyzing open-source datasets used for SER applications along with their accompanying methodology papers. Specifically, I critique the centrality of discrete emotion theory in SER applications as a contestable emotional framework that is invoked primarily for its practical utility and alignment - as opposed to scientific rigor - with machine learning epistemologies. In so doing, I also shed light on the role of the dataset creators as emotional designers in their attempt to produce, elicit, record, and index emotional expressions for the purposes of crafting SER training datasets. Ultimately, by further querying SER through the aperture of Critical Disability Studies, I use this empirical work to examine the sociopolitical stakes of SER as a normative and regulatory technology that siphons emotion into a broader agenda of capitalistic productivity in the context of call center optimization.
Author supplied keywords
Cite
CITATION STYLE
Kang, E. B. (2023). On the Praxes and Politics of AI Speech Emotion Recognition. In ACM International Conference Proceeding Series (pp. 455–466). Association for Computing Machinery. https://doi.org/10.1145/3593013.3594011
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.