Multidimensional data sets often include categorical information. When most dimensions have categorical information, clustering the data set as a whole can reveal interesting patterns in the data set. However, the categorical information is often more useful as a way to partition the data set: gene expression data for healthy versus diseased samples or stock performance for common, preferred, or convertible shares. We present novel ways to utilize categorical information in exploratory data analysis by enhancing the rank-by-feature framework. First, we present ranking criteria for categorical variables and ways to improve the score overview. Second, we present a novel way to utilize the categorical information together with clustering algorithms. Users can partition the data set according to categorical information vertically or horizontally, and the clustering result for each partition can serve as new categorical information. We report the results of a longitudinal case study with a bio-medical research team, including insights gained and potential future work. Copyright © 2007, Lawrence Erlbaum Associates, Inc.
Mendeley saves you time finding and organizing research
There are no full text links
Choose a citation style from the tabs below