A Data Deduplication Scheme Based on DBSCAN With Tolerable Clustering Deviation

Yan Teng; Hequn Xian; Quanli Lu; Feng Guo

Journal ArticleOPEN ACCESS

A Data Deduplication Scheme Based on DBSCAN With Tolerable Clustering Deviation

IEEE Access (2023) 11 9742-9750

DOI: 10.1109/ACCESS.2022.3231604

10Citations

16Readers

Abstract

To protect data privacy, users prefer to store encrypted data in cloud servers. Cloud servers reduce the cost of storage and network bandwidth by eliminating duplicate copies. To address the potential internal data leakage problem, the concept of clustering deviation is proposed for the first time. We improve the DBSCAN algorithm to tolerate clustering deviation. A data deduplication scheme is built upon the new algorithm, which considers users as clustering samples. Instead of immediately re-clustering new users, a certain deviation is tolerated to assign the users to the existing classes. We determine the popularity of the data according to user clustering results and apply different encryption schemes to protect the security of unpopular data more effectively. The performance of the algorithm is analyzed and compared with other methods through experiments, and the results verify the feasibility and efficiency of the proposed deduplication scheme.

Author supplied keywords

Cite

CITATION STYLE

APA

Teng, Y., Xian, H., Lu, Q., & Guo, F. (2023). A Data Deduplication Scheme Based on DBSCAN With Tolerable Clustering Deviation. IEEE Access, 11, 9742–9750. https://doi.org/10.1109/ACCESS.2022.3231604

A Data Deduplication Scheme Based on DBSCAN With Tolerable Clustering Deviation

Abstract

Author supplied keywords

Cite

Register to see more suggestions