Existing tests of interrater agreements have high statistical power; however, they lack specificity. If the ratings of the two raters do not show agreement but are not ran-dom, the current tests, some of which are based on Cohen's kappa, will often reject the null hypothesis, leading to the wrong conclusion that agreement is present. A new test of interrater agreement, applicable to nominal or ordinal categories, is pre-sented. The test statistic can be expressed as a ratio (labeled Q A , ranging from 0 to infinity) or as a proportion (labeled P A , ranging from 0 to 1). This test weighs informa-tion supporting agreement with information supporting disagreement. This new test's effectiveness (power and specificity) is compared with five other tests of interrater agreement in a series of Monte Carlo simulations. The new test, although slightly less powerful than the other tests reviewed, is the only one sensitive to agreement only. We also introduce confidence intervals on the proportion of agreement.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below