Differential Item Functioning Analysis of United States Medical Licensing Examination Step 1 Items

4Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Purpose Previous studies have examined and identified demographic group score differences on United States Medical Licensing Examination (USMLE) Step examinations. It is necessary to explore potential etiologies of such differences to ensure fairness of examination use. Although score differences are largely explained by preceding academic variables, one potential concern is that item-level bias may be associated with remaining group score differences. The purpose of this 2019-2020 study was to statistically identify and qualitatively review USMLE Step 1 exam questions (items) using differential item functioning (DIF) methodology. Method Logistic regression DIF was used to identify and classify the effect size of DIF on Step 1 items meeting minimum sample size criteria. After using DIF to flag items statistically, subject matter expert (SME) review was used to identify potential reasons why items may have performed differently between racial and gender groups, including characteristics such as content, format, wording, context, or stimulus materials. USMLE SMEs reviewed items to identify the group difference they believed was present, if any; articulate a rationale behind the group difference; and determine whether that rationale would be considered construct relevant or construct irrelevant. Results All identified DIF rationales were relevant to the constructs being assessed and therefore did not reflect item bias. Where SME-generated rationales aligned with statistical differences (flags), they favored self-identified women on items tagged to women's health content categories and were judged to be construct relevant. Conclusions This study did not find evidence to support the hypothesis that group-level performance differences beyond those explained by prior academic performance variables are driven by item-level bias. Health professions examination programs have an obligation to assess for group differences, and when present, investigate to what extent, if any, measurement bias plays a role.

Cite

CITATION STYLE

APA

Rubright, J. D., Jodoin, M., Woodward, S., & Barone, M. A. (2022). Differential Item Functioning Analysis of United States Medical Licensing Examination Step 1 Items. Academic Medicine, 97(5), 718–722. https://doi.org/10.1097/ACM.0000000000004567

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free