Discovering and controlling for latent confounds in text classification using adversarial domain adaptation

6Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

In text classification, the testing data often systematically differ from the training data, a problem called dataset shift. In this paper, we investigate a type of dataset shift we call confounding shift. Such a setting exists when two conditions are met: (a) there is a confound variable Z that influences both text features X and class label Y ; (b) the relationship between Z and Y changes from training to testing. While recent work in this area has required confounds to be known ahead of time, this is unrealistic for many settings. To address this shortcoming, we propose a method both to discover and to control for potential confounds. The approach first uses neural network-based topic modeling to discover potential confounds that differ between training and testing data, then uses adversarial training to fit a classification model that is invariant to these discovered confounds. We find the resulting method to improve over state-of-the-art domain adaptation method, while also producing results that are competitive with those obtained when confounds are known ahead of time.

Cite

CITATION STYLE

APA

Landeiro, V., Tran, T., & Culotta, A. (2019). Discovering and controlling for latent confounds in text classification using adversarial domain adaptation. In SIAM International Conference on Data Mining, SDM 2019 (pp. 298–305). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611975673.34

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free