Sign up & Download
Sign in

Performance of the Bayesian online algorithm for the perceptron.

by Evaldo Araújo De Oliveira, Roberto Castro Alamino
IEEE Transactions on Neural Networks (2007)

Abstract

In this letter, we derive continuum equations for the generalization error of the Bayesian online algorithm (BOnA) for the one-layer perceptron with a spherical covariance matrix using the Rosenblatt potential and show, by numerical calculations, that the asymptotic performance of the algorithm is the same as the one for the optimal algorithm found by means of variational methods with the added advantage that the BOnA does not use any inaccessible information during learning.

Cite this document (BETA)

Page 1
hidden

Performance of the Bayesian online algorithm for the perceptron.

902 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007
Letters
Performance of the Bayesian Online Algorithm
for the Perceptron
Evaldo Araújo de Oliveira and Roberto Castro Alamino
Abstract—In this letter, we derive continuum equations for the general-
ization error of the Bayesian online algorithm (BOnA) for the one-layer
perceptron with a spherical covariance matrix using the Rosenblatt poten-
tial and show, by numerical calculations, that the asymptotic performance
of the algorithm is the same as the one for the optimal algorithm found
by means of variational methods with the added advantage that the BOnA
does not use any inaccessible information during learning.
Index Terms—Bayesian algorithms, online gradient methods, pattern
classification.
I. INTRODUCTION
Online algorithms have great importance in applications mainly be-
cause, if suitably designed, they can be able to adapt to situations where
the rule is changing although, in general, they perform worse than of-
fline algorithms in static scenarios.
The optimal performance of any perceptron learning rule is achieved
by the so-called Bayes learning rule which gives rise to a lower bound
for the generalization error that cannot be surpassed by any other
learning algorithm [4]. It is also generally accepted that online
Bayesian methods should perform better than non-Bayesian ones
because the former use the available information in the best possible
way.
Based on this and in the positive results obtained by the application
of the Bayesian approach to a broad range of different situations, a lot
of work1 on Bayesian methods for machine learning has been made.
However, exact Bayesian methods turned out to be computationally
time-consuming and approximations had to be developed. One impor-
tant particular approximation, from now on called by us the Bayesian
online algorithm (BOnA), was proposed and analyzed by Opper [8]
for online learning on perceptrons and relies on a projection of the
posterior probabilities of the parameters to be estimated on a space of
tractable distributions minimizing the Kullback–Leibler divergence be-
tween both.
A different approach to learning is provided by variational methods.
Variational methods rely on minimizing the generalization error in each
step of learning to obtain the best possible performance in each case.
Applying a variational method to a one-layer perceptron learning with
Manuscript received January 4, 2006; revised October 23, 2006; accepted
November 2, 2006. The work of E. de Oliveira was supported in part by Fun-
dação de Apoio à Pesquisa do Estado de São Paulo (FAPESP) under Grant
05/60141-0. The work of R. C. Alamino was supported by the Evergrow Project.
E. A. de Oliveira is with the Instituto de Astronomia, Geofísica e Ciências
Atmosféricas, Universidade de São Paulo, São Paulo, CEP 01060-970, Brazil
(e-mail: evaldo@model.iag.usp.br).
R. C. Alamino is with the Neural Computing Research Group, Aston
University, Birmingham B4 7ET, U.K. (e-mail: alaminrc@aston.ac.uk; rober-
toalamino@yahoo.com).
Digital Object Identifier 10.1109/TNN.2007.891189
1This can be seen by the crescent amount of papers on Bayesian methods
presented at the Neural Information Processing Systems (NIPS) Confer-
ence—http://www.nips.cc/.
a Hebbian rule, Kinouchi and Caticha [5] were able to show by means
of numerical calculations that the asymptotic behavior of its generaliza-
tion error when ! 1, where is a scaling parameter proportional
to the number of examples, is approximately 0:88= , which turns out
to be two times that of the offline Bayes learning rule. However, the de-
rived algorithm makes use of an unaccessible information: the teacher
field (to be defined later). This problem is circumvented in the cited
paper by using the mean of this variable as an estimator of its true value.
In this letter, we derive continuum equations for the generalization
error of the one-layer perceptron learning by the BOnA with a simpli-
fied covariance matrix, which we assume to be spherical, and compare
the resulting generalization curve with the optimal algorithm obtained
using the variational method in [5]. We show that the performance of
the Bayesian algorithm coincides with the performance of the optimal
algorithm with the additional advantage that there is no need to use any
unaccessible parameter, just the information available in the given data
set.
The rest of this letter is organized as follows. In Section II, we review
the variational approach to online learning given in [5]. In Section III,
the Bayesian method is presented and the Bayesian online algorithm is
described. In Section IV, we write the Bayesian simplified equations
and finally, in Section V, we discuss the results.
II. VARIATIONAL ALGORITHM
Let us consider the supervised learning situation where a one-layer
perceptron with N input units and parameterized by its synaptic
weights ! 2 N is trained with a data set of examples given by pairs
y

= (

; 

), where 

2 f1; 1g is the answer given by a teacher
perceptron with synaptic weights ! 2 N to the input vector 

. The
teacher is normalized as k!k = 1.
A variational algorithm for a one-layer perceptron learning by a Heb-
bian rule is given in [5]. Using the update equation given by
!
+1
= !

+
1
N
W





(1)
the modulation function that gives the best gain in generalization ability
per example is found by taking the functional derivative with respect
toW

of the variation rate of , the overlap of synaptic vectors of the
teacher and the student, with the number of examples and equating it
to zero. The solution is given by
W


= k!k


b




h

(2)
where b

= !

 

and h

= !

 

=k!

k are known, respectively,
as the teacher and student fields.
However, the above modulation function depends on a variable
which is not accessible in most practical applications: the teacher field
b

. In the cited paper, the authors use an estimative forW given by its
expected value over jbj
^
W

=
djbjP (b; h)W


djbjP (b; h)
: (3)
The asymptotic behavior of the resulting algorithm for ! 1,
= P=N , where P is the number of examples, is shown to be ap-
proximately 0:88= by numerical calculations (assuming a spherical
distribution for ). This implies that the performance for a large number
of examples of this algorithm is approximately two times worse than
that of the offline Bayesian algorithm [4].
1045-9227/$25.00 © 2007 IEEE

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

2 Readers on Mendeley
by Discipline
 
100% Physics
by Academic Status
 
50% Post Doc
 
50% Researcher (at an Academic Institution)
by Country
 
50% United Kingdom
 
50% Brazil