RELIABILITY AND THE NONEQUIVALENT GROUPS WITH ANCHOR TEST DESIGN

2Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This study evaluated the impact of unequal reliability on test equating methods in the nonequivalent groups with anchor test (NEAT) design. Classical true score-based models were compared in terms of their assumptions about how reliability impacts test scores. These models were related to treatment of population ability differences by different NEAT equating methods. A score model was then developed based on the most important features of the reviewed score models and used to study reliability in a simulation study across a total of 45 measurement conditions (= 5 test and anchor reliability combinations × 3 population ability difference conditions × 3 sample sizes). Ten equating methods were considered: chained linear, chained equipercentile with raw and smoothed frequencies, Tucker, frequency estimation equipercentile with raw and smoothed frequencies, Levine observed using Angoff-estimated and the “correct” reliabilities based on the data generation model used in this study, and Levine true using Angoff-estimated and correct reliabilities. The results were consistent with what is known about equating functions and their variability. Unequal and/or low reliability inflates equating function variability and alters equating functions when population abilities differ.

Cite

CITATION STYLE

APA

Moses, T., & Kim, S. (2007). RELIABILITY AND THE NONEQUIVALENT GROUPS WITH ANCHOR TEST DESIGN. ETS Research Report Series, 2007(1), i–40. https://doi.org/10.1002/j.2333-8504.2007.tb02058.x

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free