The current study examined whether the computer annotations of prodody based on Brazil’s (1997) framework were comparable with human annotations. A series of statistical tests were performed for each prosodic feature: tone unit (two accuracy scores and Pearson’s correlation), prominent syllable (accuracy, F-measure, and Cohen’s kappa), tone choice (accuracy and Fleiss' kappa), and relative pitch (accuracy, Fleiss' kappa, and Pearson’s correlation). We considered one population to be the inter-rater reliability scores between the three human coders and the other population to be the inter-rater reliability scores between the computer and the three humans. If the differences between these two populations were significant, then the computer and human annotations were considered not comparable, but if the differences were not significant, then the computer and human annotations were considered comparable. The results indicated that the computer and human annotations were comparable for tone choice and not comparable for prominent syllable. For tone unit, two of the t-tests provided evidence that they were comparable and one did not. The relative pitch t-tests showed a significant disparity between the estimates of relative pitch by the humans and the computer’s actual relative pitch calculation.
CITATION STYLE
Kang, O., & Johnson, D. O. (2015). Comparison of Inter-rater Reliability of Human and Computer Prosodic Annotation Using Brazil’s Prosody Model. English Linguistics Research, 4(4). https://doi.org/10.5430/elr.v4n4p58
Mendeley helps you to discover research relevant for your work.