Achieving a stable scale for an assessment with multiple forms: Weighting test samples in IRT linking

Jiahe Qian; Alina A. von Davier; Yanming Jiang

Conference Proceedings

Achieving a stable scale for an assessment with multiple forms: Weighting test samples in IRT linking

Springer Proceedings in Mathematics and Statistics (2013) 66 171-185

DOI: 10.1007/978-1-4614-9348-8_11

6Citations

3Readers

Get full text

Abstract

In the quality control of an assessment with multiple forms, one goal is to attain a stable scale across time. Variability and seasonality across examinee samples and test conditions could cause variation in IRT linking and equating procedures and twist the “sampling exchangeability” in the Draper–Lindley–de Finetti (DLD) measurement validity framework. As an initial exploration of optimal design in linking, we intended to obtain an improved sampling design for invariant Stocking–Lord test characteristic curve (TCC) linking across testing seasons. We applied statistical weighting techniques, such as raking and poststratification, to yield a weighted sample distribution that is consistent with the target population distribution. To assess the weighting effects on linking, we first selected multiple subsamples from an original sample; then, we compared the linking parameters from subsamples with those from the original sample. The results showed that the linking parameters from the weighted sample yielded smaller mean square errors (MSE) than those from the unweighted subsample. The developed techniques can be applied to (1) assessments such as GRE® and TOEFL® with variability and seasonality among multiple forms and (2) assessments such as state assessments with linking decisions based on small initial data.

Cite

CITATION STYLE

APA

Qian, J., von Davier, A. A., & Jiang, Y. (2013). Achieving a stable scale for an assessment with multiple forms: Weighting test samples in IRT linking. In Springer Proceedings in Mathematics and Statistics (Vol. 66, pp. 171–185). Springer New York LLC. https://doi.org/10.1007/978-1-4614-9348-8_11

Achieving a stable scale for an assessment with multiple forms: Weighting test samples in IRT linking

Abstract

Cite

Register to see more suggestions