Comparison of Inter-rater Reliability of Human and Computer Prosodic Annotation Using Brazil’s Prosody Model

  • Kang O
  • Johnson D
N/ACitations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

The current study examined whether the computer annotations of prodody based on Brazil’s (1997) framework were comparable with human annotations. A series of statistical tests were performed for each prosodic feature: tone unit (two accuracy scores and Pearson’s correlation), prominent syllable (accuracy, F-measure, and Cohen’s kappa), tone choice (accuracy and Fleiss' kappa), and relative pitch (accuracy, Fleiss' kappa, and Pearson’s correlation). We considered one population to be the inter-rater reliability scores between the three human coders and the other population to be the inter-rater reliability scores between the computer and the three humans. If the differences between these two populations were significant, then the computer and human annotations were considered not comparable, but if the differences were not significant, then the computer and human annotations were considered comparable. The results indicated that the computer and human annotations were comparable for tone choice and not comparable for prominent syllable. For tone unit, two of the t-tests provided evidence that they were comparable and one did not. The relative pitch t-tests showed a significant disparity between the estimates of relative pitch by the humans and the computer’s actual relative pitch calculation.

Cite

CITATION STYLE

APA

Kang, O., & Johnson, D. O. (2015). Comparison of Inter-rater Reliability of Human and Computer Prosodic Annotation Using Brazil’s Prosody Model. English Linguistics Research, 4(4). https://doi.org/10.5430/elr.v4n4p58

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free