Sign up & Download
Sign in

An Introduction to Diffusion Maps

by J De Porte, B M Herbst, W Hereman, S J Van Der Walt
Techniques (2008)

Abstract

This paper describes a mathematical technique 1 for dealing with dimensionality reduction. Given data in a high-dimensional space, we show how to find parameters that describe the lower-dimensional structures of which it is comprised. Unlike other popular methods such as Principle Component Analysis and Multi-dimensional Scaling, diffusion maps are non-linear and focus on discovering the underlying manifold (lower-dimensional constrained surface upon which the data is embedded). By integrating local similarities ...

Cite this document (BETA)

Available from inside.mines.edu
Page 1
hidden

An Introduction to Diffusion Maps

An Introduction to Diffusion Maps
J. de la Porte†, B. M. Herbst†, W. Hereman?, S. J. van der Walt†
† Applied Mathematics Division, Department of Mathematical Sciences,
University of Stellenbosch, South Africa
? Colorado School of Mines, United States of America
jolanidlp@googlemail.com, herbst@sun.ac.za,
hereman@mines.edu, stefan@sun.ac.za
Abstract
This paper describes a mathematical technique [1] for
dealing with dimensionality reduction. Given data in a
high-dimensional space, we show how to find parameters
that describe the lower-dimensional structures of which it
is comprised. Unlike other popular methods such as Prin-
ciple Component Analysis and Multi-dimensional Scal-
ing, diffusion maps are non-linear and focus on discov-
ering the underlying manifold (lower-dimensional con-
strained “surface” upon which the data is embedded). By
integrating local similarities at different scales, a global
description of the data-set is obtained. In comparisons, it
is shown that the technique is robust to noise perturbation
and is computationally inexpensive. Illustrative examples
and an open implementation are given.
1. Introduction: Dimensionality Reduction
The curse of dimensionality, a term which vividly re-
minds us of the problems associated with the process-
ing of high-dimensional data, is ever-present in today’s
information-driven society. The dimensionality of a data-
set, or the number of variables measured per sample, can
easily amount to thousands. Think, for example, of a
100 by 100 pixel image, where each pixel can be seen
to represent a variable, leading to a dimensionality of
10, 000. In such a high-dimensional feature space, data
points typically occur sparsely, causes numerous prob-
lems: some algorithms slow down or fail entirely, func-
tion and density estimation become expensive or inaccu-
rate, and global similarity measures break down [4].
The breakdown of common similarity measures ham-
pers the efficient organisation of data, which, in turn, has
serious implications in the field of pattern recognition.
For example, consider a collection of n × m images,
each encoding a digit between 0 and 9. Furthermore, the
images differ in their orientation, as shown in Fig.1. A
human, faced with the task of organising such images,
would likely first notice the different digits, and there-
after that they are oriented. The observer intuitively at-
taches greater value to parameters that encode larger vari-
ances in the observations, and therefore clusters the data
in 10 groups, one for each digit. Inside each of the 10
groups, digits are furthermore arranged according to the
angle of rotation. This organisation leads to a simple two-
dimensional parametrisation, which significantly reduces
the dimensionality of the data-set, whilst preserving all
important attributes.
Figure 1: Two images of the same digit at different rota-
tion angles.
On the other hand, a computer sees each image as a
data point in Rnm, an nm-dimensional coordinate space.
The data points are, by nature, organised according to
their position in the coordinate space, where the most
common similarity measure is the Euclidean distance.
A small Euclidean distance between vectors almost
certainly indicate that they are highly similar. A large
distance, on the other hand, provides very little informa-
tion on the nature of the discrepancy. This Euclidean dis-
tance therefore provides a good measure of local similar-
ity only. In higher dimensions, distances are often large,
given the sparsely populated feature space.
Key to non-linear dimensionality reduction is the re-
alisation that data is often embedded in (lies on) a lower-
dimensional structure or manifold, as shown in Fig. 2. It
would therefore be possible to characterise the data and
the relationship between individual points using fewer di-
mensions, if we were able to measure distances on the
manifold itself instead of in Euclidean space. For ex-
ample, taking into account its global structure, we could
represent the data in our digits data-set using only two
variables: digit and rotation.
Page 2
hidden
Figure 2: Low dimensional data measured in a high-
dimensional space.
The challenge, then, is to determine the lower-
dimensional data structure that encapsulates the data,
leading to a meaningful parametrisation. Such a repre-
sentation achieves dimensionality reduction, whilst pre-
serving the important relationships between data points.
One realisation of the solution is diffusion maps.
In Section 2, we give an overview of three other well
known techniques for dimensionality reduction. Section
3 introduces diffusion maps and explains their function-
ing. In Section 4, we apply the knowledge gained in a
real world scenario. Section 5 compares the performance
of diffusion maps against the other techniques discussed
in Section 2. Finally, Section 6 demonstrates the organi-
sational ability of diffusion maps in an image processing
example.
2. Other Techniques for Dimensionality
Reduction
There exist a number of dimensionality reduction tech-
niques. These can be broadly categorised into those that
are able to detect non-linear structures, and those that are
not. Each method aims to preserve some specific property
of interest in the mapping. We focus on three well known
techniques: Principle Component Analysis (PCA), multi-
dimensional scaling (MDS) and isometric feature map
(isomap).
2.1. Principal Component Analysis (PCA)
PCA [3] is a linear dimensionality reduction technique.
It aims to find a linear mapping between a high dimen-
sional space (n dimensional) and a subspace (d dimen-
sional with d < n) that captures most of the variability in
the data. The subspace is specified by d orthogonal vec-
tors: the principal components. The PCA mapping is a
projection into that space.
The principal components are the dominant eigenvec-
tors (i.e., the eigenvectors corresponding to the largest
eigenvalues) of the covariance matrix of the data.
Principal component analysis is simple to implement,
but many real-world data-sets have non-linear character-
istics which a PCA mapping fails to encapsulate.
2.2. Multidimensional Scaling (MDS)
MDS [6] aims to embed data in a lower dimensional
space in such a way that pair-wise distances between data
points, X1..N , are preserved. First, a distance matrix DX
is created. Its elements contain the distances between
points in the feature space, i.e. DX [i, j] = d(xi, xj).
For simplicity sake, we consider only Euclidean distances
here.
The goal is to find a lower-dimensional set of feature
vectors, Y1..N , for which the distance matrix, DY [i, j] =
d(yi, yj), minimises a cost function, ρ(DX , DY ). Of the
different cost functions available, strain is the most pop-
ular (MDS using strain is called “classical MDS”):
ρstrain(DX , DY ) = ||JT (D2X −D2Y )J)||2F .
Here, J is the centering matrix, so that JTXJ sub-
tracts the vector mean from each component in X .
The Frobenius matrix norm, ||X||F , is defined as√∑M
i=1
∑N
j=1 |xij |
2
.
The intuition behind this cost function is that it pre-
serves variation in distances, so that scaling by a constant
factor has no influence [2]. Minimising the strain has a
convenient solution, given by the dominant eigenvectors
of the matrix − 12JTD2XJ .MDS, when using Euclidean distances, is criticised
for weighing large and small distances equally. We
mentioned earlier that large Euclidean distances provide
little information on the global structure of a data-set,
and that only local similarity can be accurately inferred.
For this reason, MDS cannot capture non-linear, lower-
dimensional structures according to their true parameters
of change.
2.3. Isometric Feature Map (Isomap)
Isomap [5] is a non-linear dimensionality reduction tech-
nique that builds on MDS. Unlike MDS, it preserves
geodesic distance, and not Euclidean distance, between
data points. The geodesic represents a straight line in
curved space or, in this application, the shortest curve
along the geometric structure defined by our data points
[2]. Isomap seeks a mapping such that the geodesic dis-
tance between data points match the corresponding Eu-
clidean distance in the transformed space. This preserves
the true geometric structure of the data.
How do we approximate the geodesic distance be-
tween points without knowing the geometric structure of
our data? We assume that, in a small neighbourhood
(determined by K-nearest neighbours or points within a
specified radius), the Euclidean distance is a good ap-
proximation for the geodesic distance. For points fur-
ther apart, the geodesic distance is approximated as the
sum of Euclidean distances along the shortest connecting
path. There exist a number of graph-based algorithms for
calculating this approximation.

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

16 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
38% Ph.D. Student
 
25% Student (Master)
 
13% Researcher (at an Academic Institution)
by Country
 
50% United States
 
13% Germany
 
6% Austria