Using Machine Learning to Uncover Hidden Heterogeneities in Survey Data

10Citations
Citations of this article
32Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Survey responses in public health surveys are heterogeneous. The quality of a respondent’s answers depends on many factors, including cognitive abilities, interview context, and whether the interview is in person or self-administered. A largely unexplored issue is how the language used for public health survey interviews is associated with the survey response. We introduce a machine learning approach, Fuzzy Forests, which we use for model selection. We use the 2013 California Health Interview Survey (CHIS) as our training sample and the 2014 CHIS as the test sample. We found that non-English language survey responses differ substantially from English responses in reported health outcomes. We also found heterogeneity among the Asian languages suggesting that caution should be used when interpreting results that compare across these languages. The 2013 Fuzzy Forests model also correctly predicted 86% of good health outcomes using 2014 data as the test set. We show that the Fuzzy Forests methodology is potentially useful for screening for and understanding other types of survey response heterogeneity. This is especially true in high-dimensional and complex surveys.

Cite

CITATION STYLE

APA

Ramirez, C. M., Abrajano, M. A., & Alvarez, R. M. (2019). Using Machine Learning to Uncover Hidden Heterogeneities in Survey Data. Scientific Reports, 9(1). https://doi.org/10.1038/s41598-019-51862-x

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free