Exploratory data analysis with categorical variables: An improved rank-by-feature framework and a case study

  • Seo J
  • Gordish-Dressman H
  • 5


    Mendeley users who have this article in their library.
  • N/A


    Citations of this article.


Multidimensional data sets often include categorical information. When most dimensions have categorical information, clustering the data set as a whole can reveal interesting patterns in the data set. However, the categorical information is often more useful as a way to partition the data set: gene expression data for healthy versus diseased samples or stock performance for common, preferred, or convertible shares. We present novel ways to utilize categorical information in exploratory data analysis by enhancing the rank-by-feature framework. First, we present ranking criteria for categorical variables and ways to improve the score overview. Second, we present a novel way to utilize the categorical information together with clustering algorithms. Users can partition the data set according to categorical information vertically or horizontally, and the clustering result for each partition can serve as new categorical information. We report the results of a longitudinal case study with a bio-medical research team, including insights gained and potential future work. Copyright © 2007, Lawrence Erlbaum Associates, Inc.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

There are no full text links


  • J Seo

  • H Gordish-Dressman

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free