Document clustering using a graph covering with pseudostable sets

Jens Dorpinghaus; Sebastian Schaaf; Juliane Fluck; Marc Jacobs

Conference ProceedingsOPEN ACCESS

Document clustering using a graph covering with pseudostable sets

Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, FedCSIS 2017 (2017) 329-338

DOI: 10.15439/2017F84

7Citations

6Readers

Abstract

In text mining, document clustering describes the efforts to assign unstructured documents to clusters, which in turn usually refer to topics. Clustering is widely used in science for data retrieval and organisation. In this paper we present a new graph theoretical approach to document clustering and its application on a real-world data set. We will show that the well-known graph partition to stable sets or cliques can be generalized to pseudostable sets or pseudocliques. This allows to make a soft clustering as well as a hard clustering. We will present an integer linear programming and a greedy approach for this NP-complete problem and discuss some results on random instances and some real world data for different similarity measures.

Cite

CITATION STYLE

APA

Dorpinghaus, J., Schaaf, S., Fluck, J., & Jacobs, M. (2017). Document clustering using a graph covering with pseudostable sets. In Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, FedCSIS 2017 (pp. 329–338). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.15439/2017F84

Document clustering using a graph covering with pseudostable sets

Abstract

Cite

Register to see more suggestions