Discovering and controlling for latent confounds in text classification using adversarial domain adaptation

Virgile Landeiro; Tuan Tran; Aron Culotta

Conference ProceedingsOPEN ACCESS

Discovering and controlling for latent confounds in text classification using adversarial domain adaptation

SIAM International Conference on Data Mining, SDM 2019 (2019) 298-305

DOI: 10.1137/1.9781611975673.34

6Citations

9Readers

Abstract

In text classification, the testing data often systematically differ from the training data, a problem called dataset shift. In this paper, we investigate a type of dataset shift we call confounding shift. Such a setting exists when two conditions are met: (a) there is a confound variable Z that influences both text features X and class label Y ; (b) the relationship between Z and Y changes from training to testing. While recent work in this area has required confounds to be known ahead of time, this is unrealistic for many settings. To address this shortcoming, we propose a method both to discover and to control for potential confounds. The approach first uses neural network-based topic modeling to discover potential confounds that differ between training and testing data, then uses adversarial training to fit a classification model that is invariant to these discovered confounds. We find the resulting method to improve over state-of-the-art domain adaptation method, while also producing results that are competitive with those obtained when confounds are known ahead of time.

Cite

CITATION STYLE

APA

Landeiro, V., Tran, T., & Culotta, A. (2019). Discovering and controlling for latent confounds in text classification using adversarial domain adaptation. In SIAM International Conference on Data Mining, SDM 2019 (pp. 298–305). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611975673.34

Discovering and controlling for latent confounds in text classification using adversarial domain adaptation

Abstract

Cite

Register to see more suggestions