Abstract
This study evaluated the impact of unequal reliability on test equating methods in the nonequivalent groups with anchor test (NEAT) design. Classical true score-based models were compared in terms of their assumptions about how reliability impacts test scores. These models were related to treatment of population ability differences by different NEAT equating methods. A score model was then developed based on the most important features of the reviewed score models and used to study reliability in a simulation study across a total of 45 measurement conditions (= 5 test and anchor reliability combinations × 3 population ability difference conditions × 3 sample sizes). Ten equating methods were considered: chained linear, chained equipercentile with raw and smoothed frequencies, Tucker, frequency estimation equipercentile with raw and smoothed frequencies, Levine observed using Angoff-estimated and the “correct” reliabilities based on the data generation model used in this study, and Levine true using Angoff-estimated and correct reliabilities. The results were consistent with what is known about equating functions and their variability. Unequal and/or low reliability inflates equating function variability and alters equating functions when population abilities differ.
Author supplied keywords
Cite
CITATION STYLE
Moses, T., & Kim, S. (2007). RELIABILITY AND THE NONEQUIVALENT GROUPS WITH ANCHOR TEST DESIGN. ETS Research Report Series, 2007(1), i–40. https://doi.org/10.1002/j.2333-8504.2007.tb02058.x
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.