Clustering is a challenging problem because given the same data set, it can be grouped in multiple different ways. Which of these clustering solutions is interesting depends on its domain application. Thus, incorporating domain expert input often improves clustering performance. However, most existing semi-supervised clustering techniques can only incorporate instance-level constraints (a few labels or must-link/cannot-link constraints), which domain experts may not be comfortable providing in knowledge discovery problems because categories are not known. Fortunately, domain experts often have an idea regarding properties that clustering solutions should have in order to be useful in domain application based on domain relevant scores. In this paper, we provide a framework for jointly optimizing the usefulness and quality of a clustering solution. Experiments on a synthetic data, a benchmark data, and a real-world disease subtyping problem demonstrate the usefulness of our proposed approach.
CITATION STYLE
Chang, Y., Chen, J., Cho, M. H., Castaldi, P. J., Silverman, E. K., & Dy, J. G. (2017). Clustering with domain-specific usefulness scores. In Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017 (pp. 207–215). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611974973.24
Mendeley helps you to discover research relevant for your work.