Sign up & Download
Sign in

Fast Weak Learner Based on Genetic Algorithm

by Boris Yangel
Proceedings of GraphiCon09 (2009)

Abstract

An approach to the acceleration of parametric weak classifier boosting is proposed. Weak classifier is called parametric if it has fixed number of parameters and, so, can be represented as a point into multidimensional space. Genetic algorithm is used instead of exhaustive search to learn parameters of such classifier. Proposed approach also takes cases when effective algorithm for learning some of the classifier parameters exists into account. Experiments confirm that such an approach can dramatically decrease classifier training time while keeping both training and test errors small.

Cite this document (BETA)

Available from Boris Yangel's profile on Mendeley.
Page 1
hidden

Fast Weak Learner Based on Genetic Algorithm

ar
X
iv
:0
90
6.
08
72
v1
[
cs
.L
G]
4
Ju
n 2
00
9
Fast Weak Learner Based on Genetic Algorithm
Boris Yangel
Department of Computational Mathematics and Cybernetics
Moscow State University, Moscow, Russia
hr0nix@acm.org
Abstract
An approach to the acceleration of parametric weak classifier boost-
ing is proposed. Weak classifier is called parametric if it has fixed
number of parameters and, so, can be represented as a point into
multidimensional space. Genetic algorithm is used instead of ex-
haustive search to learn parameters of such classifier. Proposed
approach also takes cases when effective algorithm for learning
some of the classifier parameters exists into account. Experiments
confirm that such an approach can dramatically decrease classifier
training time while keeping both training and test errors small.
Keywords: boosting, genetic algorithm, classification, haar fea-
ture
1 Introduction
Boosting is one of the commonly used classifier learning ap-
proaches. It is machine learning meta-algorithm that iteratively
learns additive model consisting of weighed weak classifiers that
belong to some classifier family W . In case of two-class classifica-
tion problem (which we will consider in this paper) boosted classi-
fier usually has form
s(y) = sgn

N
X
i=1
αiwi(y)
!
. (1)
There y ∈ Y is a sample to classify, wi ∈ W are weak classifiers
learned during boosting procedure, αi are weak classifier weights,
wi(y) ∈ {−1, 1}, s(y) ∈ {−1, 1}. Set W is referred to as weak
classifier family. That is because it elements should have error rate
only slightly better than random guessing. It expresses the key idea
of boosting: strong classifier can be built on top of many weak.
There are many boosting procedures that differ by the type of loss
being optimized for the final classifier. But no matter what kind of
boosting procedure is used, on each iteration it should select (learn)
a weak classifier with minimal weighed loss from W family using
special algorithm called weak learner. Fast and accurate optimiza-
tion methods are often not applicable there (especially in the case
of discrete classifier parameters), so exhaustive search over weak
classifier parameter space is used as a weak learner. Unfortunately,
exhaustive search can take a lot of time. For example, learning cas-
cade of boosted classifiers based on haar features with AdaBoost
and exhaustive search over classifier parameter space took several
weeks in the famous work [Viola and Jones 2001]. That’s why it is
often very important to decrease weak classifier learning time using
some appropriate numerical optimization approach.
One of the widely used approaches to the numerical optimization is
genetic algorithm [Goldberg 1989]. It is based on biological evo-
lution ideas. Optimization problem solution is coded as chromo-
some vector. Initial population of solutions is created using random
number generator. Fitness function is then used to assign fitness
value to every population member. Solutions with the biggest fit-
ness values are selected for the next step. In the next step, genetic
operators (crossover and mutation usually) are applied to selected
chromosomes to produce new solutions and to modify existing ones
slightly. That modified solutions form up a new generation. Then
described process repeats. That’s how evolution is modeled. It con-
tinues until global or suboptimal solution is found or time allowed
for evolution is over. Genetic algorithms are often used for global
extremum search in big and complicated search spaces. It makes
genetic algorithm good candidate for weak classifier learner.
2 Related work
Usage of genetic algorithm for weak learner acceleration was al-
ready proposed in several works. For example, in [Treptow and
Zell 2004] genetic weak learner with special crossover and muta-
tion operators was used to learn classifier based on extended haar
feature set. In [Ramirez 2007] genetic algorithm was used to select
a few thousand weak classifiers with smallest error on unweighed
training set before boosting process starts. Then exhaustive search
over selected classifiers was performed on each boosting iteration
to select the one with minimal weighed loss. In [Masada et al.
2008] boosting procedure was completly integrated with genetic
algorithm. Few classifiers were selected on each boosting iteration
from solution population and added to the strong classifier. That se-
lected classifiers were then used to produce new population mem-
bers by applying genetic operators. Then, in [Abramson et al. 2006]
authors used for weak learner some special evolutionary algorithm
they’ve called Evolutionary Hill-Climbing. Crossover operator was
not used in it. Instead, 5 different mutations were applied to every
population member on each algorithm iteration. Result of each mu-
tation was rejected when it did not improve fitness function value.
There were two main reasons for using genetic search instead of
any other approaches in these works. Most of the classifiers used in
mentioned works were some extensions of the haar classifier family
originally proposed in [Viola and Jones 2001]. So, huge size of
a weak classifier family do not allow to apply exhaustive search
based optimization. And complicated discrete structure of a weak
classifier blocks all other optimization options.
Another important observation is the fact that every time work au-
thors were forced to implement some specialized solution for ge-
netic weak learner. So, ability to generalize evolutionary approach
to learning weak classifier is investigated in this work.
3 Proposed method
We are interested in developing some general approach to learning
weak classifier. This approach should work much more faster than
exhaustive search over classifier parameter space. In the follow-
ing document sections one such approach is presented. It is based
on the fact that when number of classifier parameters to optimize is
fixed, weighed loss optimization problem simply turns out into mul-
tivariate function minimization problem which is well-developed
area of genetic algorithm application.
Page 2
hidden
3.1 Population member
Let W be some parametric family of weak classifiers. It means that
every weak w ∈ W can be described by set of it’s n real-valued
parameters x1, . . . , xn. Let’s also assume that for last l parameters
(l can be equal to zero) there exists some effective learning algo-
rithm LE : Rn−l → Rl. We will refer to such parameters as to
linked. For given values of parameters x1, . . . , xn−l, called free,
LE finds optimal values for linked parameters that minimize loss
function E : Rn → R+. It means that our task is to find val-
ues of free parameters that deliver the minimum to the loss func-
tion E[x1, . . . , xn−l, LE(x1, . . . , xn−l)]. So, set of parameters
x1, . . . , xn−l represents solution to our optimization problem and
form up a member of genetic algorithm population.
3.2 Fitness function
It is natural to assume that classifier with small error on training
set should have greater probability to get to the next generation of
genetic algorithm. That allows us to introduce fitness function F :
Rn−l → R+ as follows:
F (x1, . . . , xn−l) =
= 1/E[x1, . . . , xn−l, LE(x1, . . . , xn−l)]. (2)
We do not consider E = 0 case. Classifier can not be called weak if
it has zero error value on training set. If such a classifier is presented
in a weak classifier family, we can select only that classifier as a
whole boosting procedure result.
3.3 Genetic representation
Every approach that allows us to code a set of free parameters is
appropriate for population member representation. In this work we
have selected binary string representation which was confirmed to
be effective in function optimization problems. Some alternative
representations can be found, for example, in [Goldberg 1989].
To form the binary string classifier representation, each classifier
parameter should be first represented as a binary string of fixed
length, using fixed-precision encoding. Then all the parameters
can be simply concatenated to form the final binary string of fixed
length.
Sometimes point p ∈ Rn can have no corresponding classifier. For
the different families of image region classifiers it is possible, for
example, when one of the free parameters representing top-left cor-
ner of a classifier window is below zero. In this case fitness func-
tion value for the population member representing that point can
be forced to be zero. That is how such situations were dealt with
in experiments described in section 4. Another possible approach
is to select representation and genetic operators in a way that sim-
ply does not allow such points to appear. But that approach is less
general.
3.4 Genetic operators
In this work we’ve used two most common genetic operators: mu-
tation and crossover. For binary string representation mutation and
crossover are usually defined as follows:
• Crossover operator selects random position in the binary
string. Then it swaps all the bits to the right of the selected
position between two chromosomes. Such crossover imple-
mentation is called 1-point crossover.
• Mutation operator changes value of the random chromosome
bit to the opposite.
In our case, crossover operator produces two new solutions from
the two given chromosomes as following: some of the parameters
(placed to the left of the selected position) are taken from the first
classifier, some of the parameters (placed to the right) — from the
second. And one parameter, probably, can be made from both the
the first and the second classifier. Mutation operator simply pro-
duces new solution by changing value of the random classifier pa-
rameter.
3.5 Algorithm summary
Algorithm 1 Genetic weak learner
1: Generate initial population of N random binary strings;
2: for i = 1, . . . , Kmax do
3: Add ⌈NRc⌉ members to the population by applying
crossover operator to the pairs of the best population mem-
bers;
4: Apply mutation operator to ⌈NRm⌉ random population
members;
5: Calculate value of (2) for each population member;
6: Remove all the population members except of the N best
(the ones with highest value of (2));
7: end for
8: return weak classifier associated with point represented by
best population member as a result;
Algorithm 1 uses elitism as a population member selection ap-
proach. It has 4 parameters:
• N > 0 — population size.
• Kmax > 0 — number of generations.
• Rc ∈ (0, 1] — crossover rate.
• Rm ∈ (0, 1] — mutation rate.
3.6 Discussion
Advantage of the proposed method lies in the fact that computa-
tional complexity of the weak learner does not depend on the size
of the weak classifier family. One can achieve balance between
training time and classifier performance only by changing values of
N , Kmax and S (discussed later). Similar effect can be achieved
by shrinking weak classifier family itself. But in most cases prior
knowledge about weak classifier performance in boosting is simply
not available.
One of the main disadvantages of the proposed weak learner is the
fact that many potentially interesting weak classifiers can not be
represented as a parameter vector of constant length. For example,
decision trees, widely used in boosting, can have variable number
of nodes. Misclassification loss we want to optimize should also
be more or less stable as a function of classifier free parameters.
If small perturbations of the free parameter vector lead to the un-
predictable changes in the loss function value, genetic optimization
does not make much sense, becoming just a random search. But,
unfortunately, that situation happens quite often, especially if clas-
sifier parameter count is small. Common example is a situation
when one of the free parameters represents feature number and fea-
tures with close numbers are not correlated at all.
Page 3
hidden
4 Experiments
4.1 Algorithms for experiments
Two boosting-based algorithms were implemented to compare pro-
posed genetic weak learner with original learners proposed by algo-
rithm authors. Viola-Jones [Viola and Jones 2001] and Face align-
ment via boosted ranking model [Wu et al. 2008] were selected for
that purpose because both algorithms use parametric weak clas-
sifiers applied to image regions. These algorithms are based on
distinct boosting procedures (AdaBoost and GentleBoost), so loss,
sample weight and classifier weight functions used in them differ
a lot. Another difference between selected algorithms is a problem
they solve: two-class classification in [Viola and Jones 2001] and
ranking in [Wu et al. 2008]. Training time of the naive implemen-
tation is quite long for both algorithms, so acceleration of boosting
process is necessary.
Weak classifiers used in both algorithms are based on haar
features and have common set of adjustable parameters. So,
weak classifier in both problems can be represented as wi =
(xi, yi, widthi, heighti, typei, gi, ti). There xi, yi, widthi and
heighti describe image region, typei encodes haar feature type, gi
is a haar feature sign and ti represents weak classifier threshold.
Parameters gi and ti are linked because both algorithms have an ef-
fective algorithm for learning them. Parameter typei was also made
linked: changing feature type during genetic optimization does not
make much sense because it can change fitness function value sig-
nificantly after just one mutation or crossover. Separate algorithm
run was performed instead for each feature type. Best result from
all the runs was then selected. We’ve used the same 5 haar feature
types as in [Wu et al. 2008] for training both classifiers.
4.2 Run patterns
Comparison of two different genetic algorithm run patterns was
also performed in this work. One pattern considered was running
genetic optimization once with big population size. Another pattern
used was running optimization algorithm multiple times (denoted
as S) with small population size and then selecting best found clas-
sifier. When population size is small, final solution depends on
initial population a lot. So, considerably different results can be
obtained for different algorithm runs. While this run pattern pro-
duces worse classifiers, it can be implemented on multiprocessor
and multicore architectures very efficiently: each processing unit
can run it’s own genetic simulation. That makes perfect parallel
algorithm acceleration possible.
4.3 Training and test sets
As in work [Treptow and Zell 2004], [Carbonetto 2002] human
faces database was used to train and test classifier for Viola-Jones
algorithm. Database was divided in half to form the training and
test sets. Each sample has size of 24 × 24 pixels.
Face images with landmarks from FG-NET aging database were
used to form the database for learning face alignment ranker pro-
posed in [Wu et al. 2008]. 600 face images were selected from
database and then resized to size of 40×40 pixels. 400 images were
used to produce training set and other 200 — for testing. 10 se-
quential 6-step random landmark position perturbations were then
applied to selected face images to produce images of misaligned
faces, as described in original paper. Training and test set samples
were then made of pairs of images with increasing alignment qual-
ity.
Table 1: Viola-Jones, acceleration
Run pattern Time (sec) Acceleration
S N Kmax
1 50 10 2.82 329.38
1 100 20 9.40 98.77
1 400 40 100.29 9.26
10 10 20 4.00 231.94
20 20 40 28.74 32.31
Brute force 928.52 1.00
Table 2: Viola-Jones, error
Run pattern Error
S N Kmax Learning Test
1 50 10 0.0005 0.0356
1 100 20 0.0002 0.0380
1 400 40 0.0000 0.0328
10 10 20 0.0003 0.0378
20 20 40 0.0000 0.0391
Brute force 0.0000 0.0349
4.4 Hardware
All the experiments were performed on PC equipped with 2.33 GHz
Intel Core 2 Quad processor and 2 GB of DDR2 RAM.
4.5 Results
Tables 1 and 3 show average duration of 1 boosting iteration to-
gether with comparison to exhaustive search. Tables 2 and 4 show
error rate of the final classifiers on the training and test sets. We
have not trained any classifier using exhaustive search for boosted
ranking model because it would take about a year to finish the pro-
cess on our training set.
Experiments with Viola-Jones object detector showed that classi-
fier trained using genetic weak learner performs only slightly worse
than classifier trained using exhaustive search over classifier space.
For N = 400 final classifier even shows better performance. Clas-
sifier trained with S = 1, N = 50 and Kmax = 10 acceler-
ates boosting nearly 300 compared to exhaustive search times while
still performing good on test set. Classifiers trained with small N
and big S values (using second run pattern) perform worse than
any other. But, as it was mentioned before, such classifiers can be
trained on multiprocessor or multicore systems very efficiently.
Experiments with face alignment via boosted ranking model
showed how exactly classifier performance depends on values of
S, N and Kmax. Increasing value of the each parameter results
in increased training time, but also in increased classifier perfor-
mance. Nevertheless, difference in training time is much more sig-
nificant compared to the difference in prediction error. Classifier
with S = 1, N = 25 Kmax = 10 was trained 50 times faster
than the best obtained classifier for BRM, but it’s error is only 1.2
times worse. It makes such a classifier a perfect candidate for pre-
liminary experiments that usually take place before training final
classifier starts.
5 Conclusion
An approach to boosting procedure acceleration was proposed in
this work. Approach is based on usage of special genetic weak
learner for learning weak classifier on each boosting iteration. Ge-
netic weak learner uses genetic algorithm with binary chromo-
Page 4
hidden
Table 3: Face alignment via BRM, acceleration
Run pattern Time (sec) Acceleration
S N Kmax
1 25 10 68.15 5195.88
1 50 10 173.33 2043.09
2 75 15 909.55 389.34
4 100 20 3582.37 98.85
Table 4: Face alignment via BRM, error
Run mode Error
S N Kmax Learning Test
1 25 10 0.0278 0.0317
1 50 10 0.0246 0.0297
2 75 15 0.0199 0.0268
4 100 20 0.0173 0.0259
somes. That genetic algorithm is designed to solve an optimization
problem of selecting weak classifier with the smallest weighed loss
from some parametric classifier family. Proposed method was gen-
eralized for the case when there exists an effective algorithm for
learning some of the parameters of a weak classifier. Experiments
have shown that such approach allows us to accelerate training pro-
cess dramatically for practical tasks while keeping prediction error
small.
Genetic weak learner proposed in this work can’t be used to boost
any tree-based classifiers. That fact limits its usage in many scenar-
ios because stump weak classifiers can not represent any relation-
ships between different object features. So, in the future work we
plan to generalize our approach for accelerating tree-based boost-
ing.
Another option for future research is performing additional experi-
ments with classifiers not related to haar features in any way. That
will confirm proposed algorithm’s profit in computer vision prob-
lems not biased towards haar feature usage. In fact, it would be
nice to determine different parametric classifier families that can be
efficiently boosted using proposed weak learner.
References
ABRAMSON, Y., MOUTARDE, F., STANCIULESCU, B., AND
STEUX, B. 2006. Combining adaboost with a hill-climbing
evolutionary feature search for efficient training of performant
visual object detectors. In FLINS06.
CARBONETTO, P., 2002. Viola-jones training data. http:
//www.cs.ubc.ca/∼pcarbo/viola-traindata.
tar.gz.
GOLDBERG, D. E. 1989. Genetic Algorithms in Search, Opti-
mization, and Machine Learning. Addison-Wesley Professional,
January.
MASADA, K., CHEN, Q., WU, H., AND WADA, T. 2008. Ga
based feature generation for training cascade object detector. In
Pattern Recognition, 2008. ICPR 2008. 19th International Con-
ference on, 1–4.
RAMIREZ, G. A., 2007. Face and street detection with asymmetric
haar features.
TREPTOW, A., AND ZELL, A., 2004. Combining adaboost learn-
ing and evolutionary search to select features for real-time object
detection.
VIOLA, P., AND JONES, M. 2001. Robust real-time object detec-
tion. In International Journal of Computer Vision.
WU, H., LIU, X., AND DORETTO, G. 2008. Face alignment via
boosted ranking model. In Computer Vision and Pattern Recog-
nition, 2008. CVPR 2008. IEEE Conference on, 1–8.

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

5 Readers on Mendeley
by Discipline
 
 
by Academic Status
 
20% Student (Bachelor)
 
20% Ph.D. Student
 
20% Researcher (at an Academic Institution)
by Country
 
60% Russia
 
20% Japan
 
20% Croatia