In this work, we jointly apply several text mining methods to a corpus of legal documents in order to compare the separation quality of two inherently different document classification schemes. The classification schemes are compared with the clusters produced by the K-means algorithm. In the future, we believe that our comparison method will be coupled with semi-supervised and active learning techniques. Also, this paper presents the idea of combining K-means and Principal Component Analysis for cluster visualization. The described idea allows calculations to be performed in reasonable amount of CPU time. © 2008 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Šilić, A., Moens, M. F., Žmak, L., & Bašić, B. D. (2008). Comparing document classification schemes using K-means clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5177 LNAI, pp. 615–624). Springer Verlag. https://doi.org/10.1007/978-3-540-85563-7_78
Mendeley helps you to discover research relevant for your work.