With remarkable performance and extensive applications, reinforcement learning is becoming one of the most popular learning techniques. Often, the policy released by reinforcement learning model may contain sensitive information, and an adversary can infer demographic information through observing the output of the environment. In this paper, we formulate differential privacy in reinforcement learning contexts, design mechanisms for -greedy and Softmax in the K-armed bandit problem to achieve differentially private guarantees. Our implementation and experiments illustrate that the output policies are under good privacy guarantees with a tolerable utility cost.
CITATION STYLE
Ma, P., Wang, Z., Zhang, L., Wang, R., Zou, X., & Yang, T. (2020). Differentially Private Reinforcement Learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11999 LNCS, pp. 668–683). Springer. https://doi.org/10.1007/978-3-030-41579-2_39
Mendeley helps you to discover research relevant for your work.