We present an extension of Chervonenkis and Vapnik’s classical empirical risk minimization (ERM) where the empirical risk is replaced by a median-of-means (MOM) estimator of the risk. The resulting new estimators are called MOM minimizers. While ERM is sensitive to corruption of the dataset for many classical loss functions used in classification, we show that MOM minimizers behave well in theory, in the sense that it achieves Vapnik’s (slow) rates of convergence under weak assumptions: the functions in the hypothesis class are only required to have a finite second moment and some outliers may also have corrupted the dataset. We propose algorithms, inspired by MOM minimizers, which may be interpreted as MOM version of block stochastic gradient descent (BSGD). The key point of these algorithms is that the block of data onto which a descent step is performed is chosen according to its “ centrality” among the other blocks. This choice of “ descent block” makes these algorithms robust to outliers; also, this is the only extra step added to classical BSGD algorithms. As a consequence, classical BSGD algorithms can be easily turn into robust MOM versions. Moreover, MOM algorithms perform a smart subsampling which may help to reduce substantially time computations and memory resources when applied to non linear algorithms. These empirical performances are illustrated on both simulated and real datasets.
CITATION STYLE
Lecué, G., Lerasle, M., & Mathieu, T. (2020). Robust classification via MOM minimization. Machine Learning, 109(8), 1635–1665. https://doi.org/10.1007/s10994-019-05863-6
Mendeley helps you to discover research relevant for your work.