Clustering with domain-specific usefulness scores

8Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

Clustering is a challenging problem because given the same data set, it can be grouped in multiple different ways. Which of these clustering solutions is interesting depends on its domain application. Thus, incorporating domain expert input often improves clustering performance. However, most existing semi-supervised clustering techniques can only incorporate instance-level constraints (a few labels or must-link/cannot-link constraints), which domain experts may not be comfortable providing in knowledge discovery problems because categories are not known. Fortunately, domain experts often have an idea regarding properties that clustering solutions should have in order to be useful in domain application based on domain relevant scores. In this paper, we provide a framework for jointly optimizing the usefulness and quality of a clustering solution. Experiments on a synthetic data, a benchmark data, and a real-world disease subtyping problem demonstrate the usefulness of our proposed approach.

Cite

CITATION STYLE

APA

Chang, Y., Chen, J., Cho, M. H., Castaldi, P. J., Silverman, E. K., & Dy, J. G. (2017). Clustering with domain-specific usefulness scores. In Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017 (pp. 207–215). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611974973.24

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free