Clustering with domain-specific usefulness scores

Yale Chang; Junxiang Chen; Michael H. Cho; Peter J. Castaldi; Edwin K. Silverman; Jennifer G. Dy

Conference ProceedingsOPEN ACCESS

Clustering with domain-specific usefulness scores

Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017 (2017) 207-215

DOI: 10.1137/1.9781611974973.24

8Citations

7Readers

Abstract

Clustering is a challenging problem because given the same data set, it can be grouped in multiple different ways. Which of these clustering solutions is interesting depends on its domain application. Thus, incorporating domain expert input often improves clustering performance. However, most existing semi-supervised clustering techniques can only incorporate instance-level constraints (a few labels or must-link/cannot-link constraints), which domain experts may not be comfortable providing in knowledge discovery problems because categories are not known. Fortunately, domain experts often have an idea regarding properties that clustering solutions should have in order to be useful in domain application based on domain relevant scores. In this paper, we provide a framework for jointly optimizing the usefulness and quality of a clustering solution. Experiments on a synthetic data, a benchmark data, and a real-world disease subtyping problem demonstrate the usefulness of our proposed approach.

Cite

CITATION STYLE

APA

Chang, Y., Chen, J., Cho, M. H., Castaldi, P. J., Silverman, E. K., & Dy, J. G. (2017). Clustering with domain-specific usefulness scores. In Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017 (pp. 207–215). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611974973.24

Clustering with domain-specific usefulness scores

Abstract

Cite

Register to see more suggestions