Abstract
Clustering is one of the most important tasks performed in Data Mining applications. This paper presents an efficient SQL implementation of the EM algorithm to perform clustering in very large databases. Our version can effectively handle high dimensional data, a high number of clusters and more importantly, a very large number of data records. We present three strategies to implement EM in SQL: horizontal, vertical and a hybrid one. We expect this work to be useful for data mining programmers and users who want to cluster large data sets inside a relational DBMS.
Cite
CITATION STYLE
Ordonez, C., & Cereghini, P. (2000). SQLEM: Fast Clustering in SQL using the EM Algorithm. In SIGMOD 2000 - Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (pp. 559–570). Association for Computing Machinery, Inc. https://doi.org/10.1145/342009.335468
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.