Differentially Private Distance Learning in Categorical Data

Elena Battaglia; Simone Celano; Ruggero G. Pensa

Journal ArticleOPEN ACCESS

Differentially Private Distance Learning in Categorical Data

Data Mining and Knowledge Discovery (2021) 35(5) 2050-2088

DOI: 10.1007/s10618-021-00778-0

2Citations

10Readers

Abstract

Most privacy-preserving machine learning methods are designed around continuous or numeric data, but categorical attributes are common in many application scenarios, including clinical and health records, census and survey data. Distance-based methods, in particular, have limited applicability to categorical data, since they do not capture the complexity of the relationships among different values of a categorical attribute. Although distance learning algorithms exist for categorical data, they may disclose private information about individual records if applied to a secret dataset. To address this problem, we introduce a differentially private family of algorithms for learning distances between any pair of values of a categorical attribute according to the way they are co-distributed with the values of other categorical attributes forming the so-called context. We define different variants of our algorithm and we show empirically that our approach consumes little privacy budget while providing accurate distances, making it suitable in distance-based applications, such as clustering and classification.

Author supplied keywords

Cite

CITATION STYLE

APA

Battaglia, E., Celano, S., & Pensa, R. G. (2021). Differentially Private Distance Learning in Categorical Data. Data Mining and Knowledge Discovery, 35(5), 2050–2088. https://doi.org/10.1007/s10618-021-00778-0

Differentially Private Distance Learning in Categorical Data

Abstract

Author supplied keywords

Cite

Register to see more suggestions