Sign up & Download
Sign in

Kernel principal component analysis

by Bernhard Schölkopf, Alexander Smola, Klaus-Robert Müller
Artificial Neural Networks—ICANN97 (1997)

Abstract

Kernel Principal Component analysis is a nonlinear generalization of the popular linear multivariate analysis method. However, this method assumes that the observed data is independent, a disadvantage for many practical applications. In order to overcome this difficulty, the authors propose a combination of Kernel Principal Component analysis and hidden Markov models. The novelty of the proposed method consists mainly in the way in which a static dimensionality reduction technique has been combined with a classic mixture model in time, to enhance the capabilities of transformation, reduction and classification of voice disorder data. Experimental results show improvements in classification accuracies even with highly reduced representations of the two databases used.

Cite this document (BETA)

Available from www.springerlink.com
Page 1
hidden

Kernel principal component analysis

DR
AF
T
Kernel Principal Component Analysis
Bernhard Scholkopf1, Alexander Smola2, Klaus{Robert Muller2
1 Max-Planck-Institut f. biol. Kybernetik, Spemannstr. 38, 72076 Tubingen, Germany
2 GMD First, Rudower Chaussee 5, 12489 Berlin, Germany
Abstract. A new method for performing a nonlinear form of Principal
Component Analysis is proposed. By the use of integral operator kernel
functions, one can eciently compute principal components in high{
dimensional feature spaces, related to input space by some nonlinear
map; for instance the space of all possible d{pixel products in images.
We give the derivation of the method and present rst experimental
results on polynomial feature extraction for pattern recognition.
1 Introduction
Principal Component Analysis (PCA) is a basis transformation to diagonalize
an estimate of the covariance matrix of the data xk, k = 1; : : : ; `, xk 2 RN ,P`
k=1 xk = 0, de ned as
C = 1
`
X`
j=1
xjx>j : (1)
The new coordinates in the Eigenvector basis, i.e. the orthogonal projections
onto the Eigenvectors, are called principal components.
In this paper, we generalize this setting to a nonlinear one of the following
kind. Suppose we rst map the data nonlinearly into a feature space F by
 : RN ! F; x 7! X: (2)
We will show that even if F has arbitrarily large dimensionality, for certain
choices of , we can still perform PCA in F . This is done by the use of kernel
functions known from support vector machines (Boser, Guyon, & Vapnik, 1992).
2 Kernel PCA
Assume for the moment that our data mapped into feature space, (x1); : : : ; (x`),
is centered, i.e.
P`
k=1 (xk) = 0. To do PCA for the covariance matrix
C = 1
`
X`
j=1
(xj)(xj)>; (3)
we have to nd Eigenvalues   0 and Eigenvectors V 2 Fnf0g satisfying
V = CV: Substituting (3), we note that all solutions V lie in the span of
(x1); : : : ; (x`). This implies that we may consider the equivalent equation
((xk) V) = ((xk)  CV) for all k = 1; : : : ; `; (4)
Page 2
hidden
DR
AF
T
and that there exist coecients 1; : : : ; ` such that
V =
X`
i=1
i(xi): (5)
Substituting (3) and (5) into (4), and de ning an ` ` matrix K by
Kij := ((xi)  (xj)); (6)
we arrive at
`K = K2 ; (7)
where denotes the column vector with entries 1; : : : ; `. To nd solutions of
(7), we solve the Eigenvalue problem
` = K (8)
for nonzero Eigenvalues. Clearly, all solutions of (8) do satisy (7). Moreover, it
can be shown that any additional solutions of (8) do not make a di erence in
the expansion (5) and thus are not interesting for us.
We normalize the solutions k belonging to nonzero Eigenvalues by requiring
that the corresponding vectors in F be normalized, i.e. (Vk Vk) = 1: By virtue
of (5), (6) and (8), this translates into
1 =
X`
i;j=1
ki
k
j ((xi)  (xj)) = ( k K k) = k( k  k): (9)
For principal component extraction, we compute projections of the image of a
test point (x) onto the Eigenvectors Vk in F according to
(Vk  (x)) =
X`
i=1
ki ((xi)  (x)): (10)
Note that neither (6) nor (10) requires the (xi) in explicit form | they are
only needed in dot products. Therefore, we are able to use kernel functions for
computing these dot products without actually performing the map  (Aizerman,
Braverman, & Rozonoer, 1964; Boser, Guyon, & Vapnik, 1992): for some choices
of a kernel k(x;y), it can be shown by methods of functional analysis that there
exists a map  into some dot product space F (possibly of in nite dimension)
such that k computes the dot product in F . Kernels which have successfully been
used in support vector machines (Scholkopf, Burges, & Vapnik, 1995) include
polynomial kernels
k(x;y) = (x  y)d; (11)
radial basis functions k(x;y) = exp

kx yk2=(2 2)

, and sigmoid kernels
k(x;y) = tanh((x  y) +). It can be shown that polynomial kernels of degree
d correspond to a map  into a feature space which is spanned by all products
of d entries of an input pattern, e.g., for the case of N = 2; d = 2,
(x  y)2 = (x21; x1x2; x2x1; x22)(y21 ; y1y2; y2y1; y22)>: (12)

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

25 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
28% Ph.D. Student
 
20% Researcher (at an Academic Institution)
 
16% Post Doc
by Country
 
12% Germany
 
12% United Kingdom
 
8% Netherlands

Groups

clsi