A Gradient-Based Clustering for Multi-Database Mining

Salim Miloudi; Yulin Wang; Wenjia Ding

Journal ArticleOPEN ACCESS

A Gradient-Based Clustering for Multi-Database Mining

IEEE Access (2021) 9 11144-11172

DOI: 10.1109/ACCESS.2021.3050404

5Citations

11Readers

Abstract

Multinational corporations have multiple databases distributed throughout their branches, which store millions of transactions per day. For business applications, identifying disjoint clusters of similar and relevant databases contributes to learning the common buying patterns among customers and also increases the profits by targeting potential clients in the future. This process is called clustering, which is an important unsupervised technique for big data mining. In this article, we present an effective approach to search for the optimal clustering of multiple transaction databases in a weighted undirected similarity graph. To assess the clustering quality, we use dual gradient descent to minimize a constrained quasi-convex loss function whose parameters will determine the edges needed to form the optimal database clusters in the graph. Therefore, finding the global minimum is guaranteed in a finite and short time compared with the existing non-convex objectives where all possible candidate clusterings are generated to find the ideal clustering. Moreover, our algorithm does not require specifying the number of clusters a priori and uses a disjoint-set forest data structure to maintain and keep track of the clusters as they are updated. Through a series of experiments on public data samples and precomputed similarity matrices, we show that our algorithm is more accurate and faster in practice than the existing clustering algorithms for multi-database mining.

Author supplied keywords

Cite

CITATION STYLE

APA

Miloudi, S., Wang, Y., & Ding, W. (2021). A Gradient-Based Clustering for Multi-Database Mining. IEEE Access, 9, 11144–11172. https://doi.org/10.1109/ACCESS.2021.3050404

A Gradient-Based Clustering for Multi-Database Mining

Abstract

Author supplied keywords

Cite

Register to see more suggestions