Parallel Sets: interactive exploration and visual analysis of categorical data
- ISSN: 10772626
- DOI: 10.1109/TVCG.2006.76
Abstract
Categorical data dimensions appear in many real-world data sets, but few visualization methods exist that properly deal with them. Parallel Sets are a new method for the visualization and interactive exploration of categorical data that shows data frequencies instead of the individual data points. The method is based on the axis layout of parallel coordinates, with boxes representing the categories and parallelograms between the axes showing the relations between categories. In addition to the visual representation, we designed a rich set of interactions. Parallel Sets allow the user to interactively remap the data to new categorizations and, thus, to consider more data dimensions during exploration and analysis than usually possible. At the same time, a metalevel, semantic representation of the data is built. Common procedures, like building the cross product of two or more dimensions, can be performed automatically, thus complementing the interactive visualization. We demonstrate Parallel Sets by analyzing a large CRM data set, as well as investigating housing data from two US states.
Author-supplied keywords
Parallel Sets: interactive exploration and visual analysis of categorical data
Visual Analysis of Categorical Data
Robert Kosara, Member, IEEE Computer Society, Fabian Bendix, and
Helwig Hauser, Member, IEEE Computer Society
Abstract—Categorical data dimensions appear in many real-world data sets, but few visualization methods exist that properly deal
with them. Parallel Sets are a new method for the visualization and interactive exploration of categorical data that shows data
frequencies instead of the individual data points. The method is based on the axis layout of parallel coordinates, with boxes
representing the categories and parallelograms between the axes showing the relations between categories. In addition to the visual
representation, we designed a rich set of interactions. Parallel Sets allow the user to interactively remap the data to new
categorizations and, thus, to consider more data dimensions during exploration and analysis than usually possible. At the same time, a
metalevel, semantic representation of the data is built. Common procedures, like building the cross product of two or more dimensions,
can be performed automatically, thus complementing the interactive visualization. We demonstrate Parallel Sets by analyzing a large
CRM data set, as well as investigating housing data from two US states.
Index Terms—Information visualization, interaction, nominal data, categorical data, multivariate data.
1 INTRODUCTION
CATEGORICAL dimensions play a very important role inthe analysis of many real-world data sets. Numerical
attributes often can only be understood in the context of
categorizations, and users working with data often examine
different classes before even looking at numbers. While
numerical dimensions are well understood in both statistics
and visualization, the categorization of products, custo-
mers, etc., provides a special challenge for visualization.
Categorical dimensions are generally data dimensions
that only contain a small number of different values, which
often have special meanings. Categories usually do not
have an inherent order (e.g., bank account types and ethnic
groups), which means that the mapping to numerical values
is arbitrary, and also the differences between these values
are not meaningful.
Dimensions with many categories also are often orga-
nized hierarchically: customer surveys contain sections
with related questions, split one piece of information
between several questions (e.g., education), or ask the same
question several times for cross-checking; bank accounts are
classified in several ways that will often involve hierarchical
categorizations, etc. Using these hierarchies for visualiza-
tion is extremely helpful for the user, because they provide
a natural way of aggregating and abstracting data. The
visualization application has to know about those hierar-
chies in order to make use of them, of course, requiring
additional data about the data set, or metadata. Interaction is
also required, because the user will want to switch back and
forth between a detailed investigation and a more general
overview by means of these hierarchies.
Most existing work has focused on the visualization of
numerical data, treating categories as a special case with
only a few values. The approach presented in this paper
had to be radically different in order to accomodate the
special properties of categorical data and large categorical
data sets in practice.
An implicit assumption in many visualization systems is
also that the user will perform a whole analysis in one,
uninterrupted session, and will never return to the same
kind of analysis or the same data set. Our experience has
shown that this is not the case, however. Users often deal
with similar data sets and similar tasks, which consequently
require them to go through the same or similar sets of
actions for each new data set. Also, the analysis of a typical
real-world data set requires many sessions, potentially
spread out over a long time period. The user needs to be
able to save results to continue where he or she left off as
seamlessly as possible.
We present a new approach to information visualization,
called Parallel Sets [1] (Fig. 1), which was developed
specifically for categorical data. This paper presents addi-
tional features as well as a new case study to demonstrate
the method. Parallel Sets support interactive visual explora-
tion and analysis [2] by combining a new visual metaphor
with an advanced interaction scheme and automated
procedures. Parallel Sets adopt the advantages of two older
and well-proven visualization techniques: the flexible layout
of Parallel Coordinates [3] (Fig. 2b), treating all dimensions
as visually independent—in contrast to recursive space-
subdivision approaches like Mosaic Displays—and display-
ing frequencies as representatives for the categories
(Fig. 2c)—as opposed to the usual one-by-one items-based
visualization of data.
558 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 12, NO. 4, JULY/AUGUST 2006
. R. Kosara is with the Computer Science Department, University of North
Carolina at Charlotte, 435D Woodward Hall, 9201 University City
Boulevard, Charlotte, NC 28223-0001. E-mail: rkosara@uncc.edu.
. F. Bendix and H. Hauser are with the VRVis Research Center, Donau-
City-Strasse 1, A-1220 Vienna, Austria.
E-mail: {Bendix, Hauser}@VRVis.at.
Manuscript received 1 Nov. 2005; revised 31 Dec. 2005; accepted 30 Jan.
2006; published online 10 May 2006.
For information on obtaining reprints of this article, please send e-mail to:
tvcg@computer.org, and reference IEEECS Log Number TVCG-0145-1105.
1077-2626/06/$20.00 2006 IEEE Published by the IEEE Computer Society
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


