Contextual Gaps in Machine Learning for Mental Illness Prediction: The Case of Diagnostic Disclosures

1Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Getting training data for machine learning (ML) prediction of mental illness on social media data is labor intensive. To work around this, ML teams will extrapolate proxy signals, or alternative signs from data to evaluate illness status and create training datasets. However, these signals' validity has not been determined, whether signals align with important contextual factors, and how proxy quality impacts downstream model integrity. We use ML and qualitative methods to evaluate whether a popular proxy signal, diagnostic self-disclosure, produces a conceptually sound ML model of mental illness. Our findings identify major conceptual errors only seen through a qualitative investigation - training data built from diagnostic disclosures encodes a narrow vision of diagnosis experiences that propagates into paradoxes in the downstream ML model. This gap is obscured by strong performance of the ML classifier (F1 = 0.91). We discuss the implications of conceptual gaps in creating training data for human-centered models, and make suggestions for improving research methods.

Cite

CITATION STYLE

APA

Chancellor, S., Feuston, J. L., & Chang, J. (2023). Contextual Gaps in Machine Learning for Mental Illness Prediction: The Case of Diagnostic Disclosures. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW2). https://doi.org/10.1145/3610181

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free