Integration of unsupervised clustering, interaction and parallel coordinates for the exploration of large multivariate data
- ISSN: 10939547
- ISBN: 0769521770
- DOI: 10.1109/IV.2004.1320124
Abstract
Parallel coordinates are widely used in many applications for visualization of multivariate data. Because of the nature of parallel coordinates, the visualization technique is often used for data overview. However, when the number of tuples to be visualized becomes very large, this technique makes it difficult to distinguish the overall structure. In this paper we present a novel technique which uses a classification approach, the self-organizing map (an unsupervised learning algorithm), to solve this problem by creating an initial clustering of the data. By initially only visualizing the resulting representational clusters, the inherited global structure can be shown. Using linked views and allowing the user to perform drill-down and filtering on these representations reveals the single data items without loss of context.
Integration of unsupervised clustering, interaction and parallel coordinates for the exploration of large multivariate data
the Exploration of Large Multivariate Data
Jimmy Johansson
Linko¨ping University
jimjo@itn.liu.se
Robert Treloar
Unilever Research
robert.treloar@unilever.com
Mikael Jern
Linko¨ping University
mikje@itn.liu.se
Abstract
Parallel coordinates are widely used in many applica-
tions for visualization of multivariate data. Because of the
nature of parallel coordinates, the visualization technique
is often used for data overview. However, when the num-
ber of tuples to be visualized becomes very large, this tech-
nique makes it difficult to distinguish the overall structure.
In this paper we present a novel technique which uses a
classification approach, the self-organizing map (an unsu-
pervised learning algorithm), to solve this problem by cre-
ating an initial clustering of the data. By initially only visu-
alizing the resulting representational clusters, the inherited
global structure can be shown. Using linked views and al-
lowing the user to perform drill-down and filtering on these
representations reveals the single data items without loss of
context.
Keywords— Parallel coordinates, unsupervised clustering,
linked views, interactive visualization.
1. Introduction
Visualization of multivariate data is a challenging task.
The goal is not the display of multiple data dimensions but
user comprehension of the multivariate data. Parallel coor-
dinates [7] is one of the established techniques transforming
multi-dimensional patterns into two-dimensional patterns.
Visualization is facilitated by viewing the two-dimensional
representation of the m-dimensional data items as lines
crossing m parallel axes (figure 1), each of which repre-
sents one dimension of the original feature space. This ap-
proach scales well with increasing m and has been incorpo-
rated into several data analysis tools.
There are well-known issues with the representation
when the number of tuples in a data set gets large. In our
present application we have defined a large data set to be
visualized with parallel coordinates to be one which con-
tains at least 10,000 tuples each containing more than 10
Figure 1. Three tuples visualized with parallel
coordinates.
data items. The parallel coordinates technique does not pro-
vide a good overview and it becomes hard to see the struc-
ture in the data (figure 2). Pre-processing or filtering the data
is required as an integrated step in the visualization process.
In recent years, several research efforts have been made
directed towards enhancing the parallel coordinates tech-
nique to make it more effective for exploring large multi-
variate data sets. Fua, Ward and Rundensteiner [4] propose
a multiresolutional view of the data via hierarchical cluster-
ing. By displaying groups of data at different levels of ab-
straction, the amount of clutter can be reduced. Their im-
plementation provides a number of interaction techniques,
such as drill-down, dimension zooming and structure based
brushing, in order to manipulate the data and to navigate
within the hierarchy. Wong and Bergeron [19] propose
wavelet brushing as a technique for browsing large multi-
dimensional multivariate data sets. Siirtola [15] uses a tech-
nique called polyline averaging that aggregates a set of user
specified lines in a parallel coordinates chart providing a
better overview of the data.
Linked or coordinated views [3, 1] is another recog-
nized technique for analyzing multivariate data. One move-
ment or change in one view automatically propagates to all
other views. This technique was successfully implemented
by Brodbeck and Girardin [2] to analyze high-dimensional
data in the field of geo-engineering and has also been used
by Mukherjea, Foley and Hudson [12] to make a complex
hypermedia system understandable to the user. In evaluating
their snap-together visualization system, North and Shnei-
Proceedings of the Eighth International Conference on Information Visualisation (IV’04)
1093-9547/04 $ 20.00 IEEE
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


