Evaluation of aircrew performance serves the critical functions of assessing the qualifications of individual pilots and, under newer proficiency-based training pro- grams, providing data for modifying training programs. We apply psychometric methods to assessing and improving the quality of evaluation of aircrew perfor- mance. Quality evaluations require human judges to recognize and discriminate changes in performance levels (sensitivity) and map these observations onto the appropriate grade-scale values (accuracy). We define statistical measures for both of these properties. A distinction is made between referent-rater reliability (RRR) and traditional interrater reliability, and we argue that RRR more meaningfully measures evaluators’ grading performance and has clearer training implications. We also discuss the implementation of training and calibration sessions that are intended to help improve evaluators’ ratings of aircrew performance. We offer several practical guidelines for designing and conducting these sessions.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below