Abstract
The growth of data produced by social media, the Internet of Things (IoT), and mobile devices has exceeded the capabilities of traditional relational databases. This rapid development has led to the rise of NoSQL databases, designed to meet the demands to handle large-scale and diversified data. Although NoSQL databases are often schemaless, significant research studies highlight the need for modeling techniques to efficiently structure their data. Many scientific papers have proposed various design methodologies, and some of them have automated the modeling process. However, no studies have explored NoSQL database design through the use of clustering techniques from artificial intelligence (AI). This paper presents KDN (K-means-based Design for NoSQL databases), an AI-based data modeling approach that leverages k-means, the popular AI algorithm, to create optimized schemas for NoSQL document-oriented databases. We illustrate the principles of the approach using an airflight management database, and we discuss the results and the advantages it offers. The optimal clustering is computed using the silhouette score, achieving a best score of 0.56 for our use case. Experiments using this database showed that our approach reduced query execution time by up to 90% on queries compared to other design approaches. The results demonstrate that KDN significantly enhances schema efficiency and simplifies the design process through automation, which is useful for designers in improving schema quality.
Author supplied keywords
Cite
CITATION STYLE
Dourhri, A., & Hanine, M. (2025). A Clustering-Based Approach for NoSQL Schema Design Using K-Means. International Journal of Intelligent Engineering and Systems, 18(8), 499–516. https://doi.org/10.22266/ijies2025.0930.31
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.