Towards Life-Long Meta Learning
Portfolio The Magazine Of The Fine Arts (2005)
Available from iitrl.acadiau.ca
or
Abstract
We reformulate algorithm selection as a time allocation problem: all candidate algorithms are run in parallel, and their relative priorities are continually updated based on its current time to solution, estimated according to a parametric model that is trained and used while solving a sequence of problems.
Available from iitrl.acadiau.ca
Page 1
Towards Life-Long Meta Learning
Towards Life-Long Meta Learning
Matteo Gagliolo † Ju¨rgen Schmidhuber †‡
†IDSIA, Galleria 2, 6928 Manno-Lugano, Switzerland
‡TU Munich, Boltzmannstr. 3, 85748 Garching, Mu¨nchen, German
{matteo,juergen}@idsia.ch
Abstract
We reformulate algorithm selection as a time allocation problem: all
candidate algorithms are run in parallel, and their relative priorities are
continually updated based on its current time to solution, estimated ac-
cording to a parametric model that is trained and used while solving a
sequence of problems.
1 Motivation
Meta-Learning techniques typically require a long training phase, during which a large
number of problems is repeatedly solved with each of the available algorithms, in order to
learn a mapping from (problem,algorithm) pairs to expected performance, to be used for
algorithm selection. This approach poses a number of problems. It presumes that such
a mapping can be learned at all, i.e. that the actual performance of an algorithm on a
given problem will be predictable with enough precision before even starting it. It also
assumes problem instances met during the training phase to be statistically representative
of successive ones. For these reasons, there usually is no way to detect a relevant discrep-
ancy between expected and actual performance of the chosen algorithm. It also neglects
computational complexity issues: ranking between algorithms is often based solely on the
expected quality of the performance, and the time spent during the training phase is not
even considered. The Algorithm Portfolio paradigm [1] consists in selecting a subset of
the available algorithms, to be run in parallel, with the same priority, until the fastest one
solves the problem. This simple scheme is more robust, as it’s less likely that performance
estimates will be wrong for all selected algorithms, but it requires the same expensive train-
ing procedure, and also involves an additional overhead, due to the “brute force” parallel
execution of all candidate solvers.
In our view, a crucial weakness of these approaches is that they don’t exploit any feedback
from the actual execution of the algorithms. We try to move a step in this direction, intro-
ducing Dynamic Algorithm Portfolios. Instead of first choosing a portfolio then running it,
we iteratively allocate a time slice, sharing it among all the available algorithms, and up-
date the relative priorities of the algorithms, based on their current state, in order to favor
the most promising ones. Instead of basing the priority attribution on performance quality,
we fix a target performance, and try to minimize the time to reach it. To this aim, we search
for a mapping from (problem,algorithm,current algorithm state) triples to expected time
to reach the desired performance quality. To further reduce computational complexity, we
focus on lifelong-learning techniques that drop the artificial boundary between training and
Matteo Gagliolo † Ju¨rgen Schmidhuber †‡
†IDSIA, Galleria 2, 6928 Manno-Lugano, Switzerland
‡TU Munich, Boltzmannstr. 3, 85748 Garching, Mu¨nchen, German
{matteo,juergen}@idsia.ch
Abstract
We reformulate algorithm selection as a time allocation problem: all
candidate algorithms are run in parallel, and their relative priorities are
continually updated based on its current time to solution, estimated ac-
cording to a parametric model that is trained and used while solving a
sequence of problems.
1 Motivation
Meta-Learning techniques typically require a long training phase, during which a large
number of problems is repeatedly solved with each of the available algorithms, in order to
learn a mapping from (problem,algorithm) pairs to expected performance, to be used for
algorithm selection. This approach poses a number of problems. It presumes that such
a mapping can be learned at all, i.e. that the actual performance of an algorithm on a
given problem will be predictable with enough precision before even starting it. It also
assumes problem instances met during the training phase to be statistically representative
of successive ones. For these reasons, there usually is no way to detect a relevant discrep-
ancy between expected and actual performance of the chosen algorithm. It also neglects
computational complexity issues: ranking between algorithms is often based solely on the
expected quality of the performance, and the time spent during the training phase is not
even considered. The Algorithm Portfolio paradigm [1] consists in selecting a subset of
the available algorithms, to be run in parallel, with the same priority, until the fastest one
solves the problem. This simple scheme is more robust, as it’s less likely that performance
estimates will be wrong for all selected algorithms, but it requires the same expensive train-
ing procedure, and also involves an additional overhead, due to the “brute force” parallel
execution of all candidate solvers.
In our view, a crucial weakness of these approaches is that they don’t exploit any feedback
from the actual execution of the algorithms. We try to move a step in this direction, intro-
ducing Dynamic Algorithm Portfolios. Instead of first choosing a portfolio then running it,
we iteratively allocate a time slice, sharing it among all the available algorithms, and up-
date the relative priorities of the algorithms, based on their current state, in order to favor
the most promising ones. Instead of basing the priority attribution on performance quality,
we fix a target performance, and try to minimize the time to reach it. To this aim, we search
for a mapping from (problem,algorithm,current algorithm state) triples to expected time
to reach the desired performance quality. To further reduce computational complexity, we
focus on lifelong-learning techniques that drop the artificial boundary between training and
Page 2
usage, exploiting the mapping during training, and including training time in performance
evaluation. In [2, 3] we termed this approach Adaptive Online Time Allocation (AOTA).
2 Previous work
A number of interesting “dynamic” exceptions to the otherwise static algorithm selection
paradigm can be met in literature (see the techrep version of [2] for a more exhaustive
bibliography). In [4], algorithm recommendation is based on the performance of the can-
didate algorithms during a predefined amount of time, called the observational horizon.
In anytime algorithm monitoring [5], the dynamic performance profile of a planning tech-
nique is updated according to its performance, in order to stop the planning phase when
further improvements in the actions planned are not worth the time spent in evaluating
them. The “Parameterless GA” [6] is a fixed heuristic time allocation technique for Genetic
Algorithms. In a Reinforcement Learning setting, algorithm selection can be formulated
as a Markov Decision Process: in [7], the algorithm set includes sequences of recursive
algorithms, formed dynamically at run-time solving a sequential decision problem, and a
variation of Q-learning is used to find an online algorithm selection policy; in [8], a set
of deterministic algorithms is considered, and, under some limitations, static and dynamic
algorithm selection techniques based on dynamic programming are presented.
3 AOTA framework
Consider a sequence B of m problem instances b1, b2, . . . , bm, roughly sorted in increasing
order of difficulty, and featuring precise stopping criteria (e.g. search problems in which
the solution is known to exist and can be recognized; optimization problems in which a
reachable target value for performance is given); and a set A of n algorithms a1, a2, . . . , an,
that can be applied to the solution of the problems in B, paused and resumed at any time,
and queried, at a negligible cost, for state information d ∈ d related to their progress in
solving the current instance. We aim at minimizing the time to solve the whole problem
sequence B. To describe the state of a Dynamic Algorithm Portfolio (DAP), let ti be the
time already spent on ai, τi the current estimate of the time still needed by ai to solve
the current problem, xi a feature vector, possibly including information about the current
problem instance, the algorithm ai itself (e.g. its kind, the values of its parameters), and its
current state di; Hi = {(x(r)i , t
(r)
i ), r = 0, . . . , hi} a set of collected samples of these pairs,
fτ a model that maps histories Hi to estimated τi. If the model fτ was precise enough, we
would not need to run more than one algorithm, the ai that is mapped to a lower τ before
its start (ti = 0): it is instead more realistic to assume that the model’s estimates are rough,
but can be improved by collecting more data in Hi, i.e. by getting more run-time feedback
on the actual performance of ai on current problem instance. We then introduce a set of
nonnegative scalars PA = {p1, .., pn}, pi ≥ 0,
∑n
i=1 pi = 1, that represent the current
bias of the portfolio, slice machine time with a small interval ∆t, and iteratively share each
time slice between the algorithms proportionally to the current bias; before each iteration,
the bias is updated according to a function fP of {τi}, that obviously gives more time to
expected faster ai (i.e. the ones with a low τi); after a share pi∆t has expired, τi is updated
based on current Hi (Fig. 3). In intra-problem AOTA, the predictive model fτ is fixed; in
inter-problem AOTA, fτ itself is adaptive, and gets updated after each problem’s solution.
For fP , one reasonable heuristic, that gave good results, consists in assigning 1/2 of the
current time slice to the expected fastest algorithm (i.e. the one with lowest τi), 1/4 to
the second fastest, and so on. This heuristic cannot be directly applied to inter-problem
AOTA, though, as the model would obviously be unreliable during the first problems of
the sequence. In this case it is better to start the problem sequence with a “brute force” fP
(pi = 1/n), and vary it gradually towards the above described “ranking” fP .
evaluation. In [2, 3] we termed this approach Adaptive Online Time Allocation (AOTA).
2 Previous work
A number of interesting “dynamic” exceptions to the otherwise static algorithm selection
paradigm can be met in literature (see the techrep version of [2] for a more exhaustive
bibliography). In [4], algorithm recommendation is based on the performance of the can-
didate algorithms during a predefined amount of time, called the observational horizon.
In anytime algorithm monitoring [5], the dynamic performance profile of a planning tech-
nique is updated according to its performance, in order to stop the planning phase when
further improvements in the actions planned are not worth the time spent in evaluating
them. The “Parameterless GA” [6] is a fixed heuristic time allocation technique for Genetic
Algorithms. In a Reinforcement Learning setting, algorithm selection can be formulated
as a Markov Decision Process: in [7], the algorithm set includes sequences of recursive
algorithms, formed dynamically at run-time solving a sequential decision problem, and a
variation of Q-learning is used to find an online algorithm selection policy; in [8], a set
of deterministic algorithms is considered, and, under some limitations, static and dynamic
algorithm selection techniques based on dynamic programming are presented.
3 AOTA framework
Consider a sequence B of m problem instances b1, b2, . . . , bm, roughly sorted in increasing
order of difficulty, and featuring precise stopping criteria (e.g. search problems in which
the solution is known to exist and can be recognized; optimization problems in which a
reachable target value for performance is given); and a set A of n algorithms a1, a2, . . . , an,
that can be applied to the solution of the problems in B, paused and resumed at any time,
and queried, at a negligible cost, for state information d ∈ d related to their progress in
solving the current instance. We aim at minimizing the time to solve the whole problem
sequence B. To describe the state of a Dynamic Algorithm Portfolio (DAP), let ti be the
time already spent on ai, τi the current estimate of the time still needed by ai to solve
the current problem, xi a feature vector, possibly including information about the current
problem instance, the algorithm ai itself (e.g. its kind, the values of its parameters), and its
current state di; Hi = {(x(r)i , t
(r)
i ), r = 0, . . . , hi} a set of collected samples of these pairs,
fτ a model that maps histories Hi to estimated τi. If the model fτ was precise enough, we
would not need to run more than one algorithm, the ai that is mapped to a lower τ before
its start (ti = 0): it is instead more realistic to assume that the model’s estimates are rough,
but can be improved by collecting more data in Hi, i.e. by getting more run-time feedback
on the actual performance of ai on current problem instance. We then introduce a set of
nonnegative scalars PA = {p1, .., pn}, pi ≥ 0,
∑n
i=1 pi = 1, that represent the current
bias of the portfolio, slice machine time with a small interval ∆t, and iteratively share each
time slice between the algorithms proportionally to the current bias; before each iteration,
the bias is updated according to a function fP of {τi}, that obviously gives more time to
expected faster ai (i.e. the ones with a low τi); after a share pi∆t has expired, τi is updated
based on current Hi (Fig. 3). In intra-problem AOTA, the predictive model fτ is fixed; in
inter-problem AOTA, fτ itself is adaptive, and gets updated after each problem’s solution.
For fP , one reasonable heuristic, that gave good results, consists in assigning 1/2 of the
current time slice to the expected fastest algorithm (i.e. the one with lowest τi), 1/4 to
the second fastest, and so on. This heuristic cannot be directly applied to inter-problem
AOTA, though, as the model would obviously be unreliable during the first problems of
the sequence. In this case it is better to start the problem sequence with a “brute force” fP
(pi = 1/n), and vary it gradually towards the above described “ranking” fP .
Page 3
Figure 1: A pseudocode for inter-problem AOTA
For each problem bk
initialize {τi}
While (bk not solved)
update PA = fP ({τi})
For each algorithm ai
run ai for pi∆t
update Hi = Hi ∪ (xi, ti)
update τi = fτ (Hi)
End
End
update fτ based on {Hi}
End
4 Example AOTAs and experiments
In [2] we presented a fixed heuristic fτ . We considered algorithms with a scalar state x, that
had to reach a target value: Hi in this case is a simple learning curve. Through a shifting
window linear regression, we extrapolated for each i the time ti,sol at which the current
learning curve Hi would reach the target value, in order to estimate the time to solution
τi = ti,sol − ti. Even though the estimates were obviously optimistic, they were updated
so often that the overall performance of the intra-problem AOTA was remarkably good;
its obvious limitations were that it required some prior knowledge about the algorithms,
and a simple relationship between the learning curve and the time to solution. What if we
instead want to learn a potentially complex mapping fτ from scratch? For a successful
algorithm ai that solved the problem at time t(hi)i , we can a posteriori evaluate the correct
τ (r)i = t
(hi)
i − t
(r)
i for each pair (x
(r)
i , t
(r)
i ) in Hi. In a first tentative experiment, that led
to poor results, these values were used as targets to learn a regression from pairs (x, t) to
residual time values τ . The main problem with this approach is which τ values to choose
as targets for the unsuccessful algorithms. The alternative we presented in [3] is inspired by
censored sampling for lifetime distribution estimation, and consists in learning a parametric
model g(τ |x;w) of the conditional probability density function (pdf) of the residual time
τ . One advantage of this approach is that it fully exploits the state history information
gathered, as it allows to learn from the unsuccessful algorithms as well. The model was
obtained by training a neural network to map x values to the two parameters of an Extreme
Value distribution of the time to solution, on data collected while solving a sequence B
of 21 deceptive problems, with a set A of 76 different Genetic Algorithms. In Fig. 2 we
compare the NN model (NN-AOTA) from [3] with a simpler one, a quadratic expansions of
x of the form w0 +
∑
i wixi +
∑
i,j wi,jxixj (L2-AOTA). The average ratio between the
time spent by the whole portfolio and the (usually different at each run and on each task)
best element in the set was about 11 for the fixed fτ and 8 for the adaptive fτ AOTA. This
latter would be e.g. the performance of an already trained “static” Algorithm Portfolio that
picked, for each problem, 8 of the 76 algorithms, always including the fastest: to fairly
compare with such a technique, though, we should also consider its additional training
time.
We advocate the use of Dynamic Algorithm Portfolios with sets of computationally expen-
sive algorithms. For faster ones, a more refined approach should also take into account the
cost of updating the model. The model fτ was trained on all historic data gathered so far,
in a “batch learning” approach: for longer problem sequences, an online method would
obviously be preferable, in order to obtain a scalable life-long meta learning technique. In
future work we plan to address these and other limitations; ongoing experiments focus on
For each problem bk
initialize {τi}
While (bk not solved)
update PA = fP ({τi})
For each algorithm ai
run ai for pi∆t
update Hi = Hi ∪ (xi, ti)
update τi = fτ (Hi)
End
End
update fτ based on {Hi}
End
4 Example AOTAs and experiments
In [2] we presented a fixed heuristic fτ . We considered algorithms with a scalar state x, that
had to reach a target value: Hi in this case is a simple learning curve. Through a shifting
window linear regression, we extrapolated for each i the time ti,sol at which the current
learning curve Hi would reach the target value, in order to estimate the time to solution
τi = ti,sol − ti. Even though the estimates were obviously optimistic, they were updated
so often that the overall performance of the intra-problem AOTA was remarkably good;
its obvious limitations were that it required some prior knowledge about the algorithms,
and a simple relationship between the learning curve and the time to solution. What if we
instead want to learn a potentially complex mapping fτ from scratch? For a successful
algorithm ai that solved the problem at time t(hi)i , we can a posteriori evaluate the correct
τ (r)i = t
(hi)
i − t
(r)
i for each pair (x
(r)
i , t
(r)
i ) in Hi. In a first tentative experiment, that led
to poor results, these values were used as targets to learn a regression from pairs (x, t) to
residual time values τ . The main problem with this approach is which τ values to choose
as targets for the unsuccessful algorithms. The alternative we presented in [3] is inspired by
censored sampling for lifetime distribution estimation, and consists in learning a parametric
model g(τ |x;w) of the conditional probability density function (pdf) of the residual time
τ . One advantage of this approach is that it fully exploits the state history information
gathered, as it allows to learn from the unsuccessful algorithms as well. The model was
obtained by training a neural network to map x values to the two parameters of an Extreme
Value distribution of the time to solution, on data collected while solving a sequence B
of 21 deceptive problems, with a set A of 76 different Genetic Algorithms. In Fig. 2 we
compare the NN model (NN-AOTA) from [3] with a simpler one, a quadratic expansions of
x of the form w0 +
∑
i wixi +
∑
i,j wi,jxixj (L2-AOTA). The average ratio between the
time spent by the whole portfolio and the (usually different at each run and on each task)
best element in the set was about 11 for the fixed fτ and 8 for the adaptive fτ AOTA. This
latter would be e.g. the performance of an already trained “static” Algorithm Portfolio that
picked, for each problem, 8 of the 76 algorithms, always including the fastest: to fairly
compare with such a technique, though, we should also consider its additional training
time.
We advocate the use of Dynamic Algorithm Portfolios with sets of computationally expen-
sive algorithms. For faster ones, a more refined approach should also take into account the
cost of updating the model. The model fτ was trained on all historic data gathered so far,
in a “batch learning” approach: for longer problem sequences, an online method would
obviously be preferable, in order to obtain a scalable life-long meta learning technique. In
future work we plan to address these and other limitations; ongoing experiments focus on
Page 4
2 4 6 8 10 12 14 16 18 200
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5 x 10
6
Task sequence, from 1 to 21
Cu
m
ul
at
ive
ti
m
e
(fit
ne
ss
fu
nc
. e
va
ls)
BEST
L2−AOTA (adaptive f
τ
)
NN−AOTA (adaptive f
τ
)
AOTA
ra
(fixed f
τ
)
BRUTE
Figure 2: The cumulative time spent on the sequence of tasks by the adaptive fτ method, with a
neural network (NN-AOTA) and a quadratic model (L2-AOTA), compared with the fixed fτ from [2]
this time with ranking fP (AOTAra). Also shown are the the performances of the (usually different
at each run and on each task) fastest solver of the set, (BEST), which would be the performance of
an ideal algorithm selection with “foresight” of the correct τi values at ti = 0; and the estimated
performance of a brute force approach (BRUTE), i.e. running all the algorithms in parallel until one
solves the problem, which leaves the figure and completes the task sequence at time 3.3× 107. Time
is measured in fitness function evaluations, values shown are upper 95% confidence limits calculated
on 20 runs.
alternative parametric models, and different algorithm set/problem sequence combinations.
References
[1] C. P. Gomes and B. Selman. Algorithm portfolios. Artificial Intelligence, 126(1–2):43–
62, 2001.
[2] M. Gagliolo, V. Zhumatiy, and J. Schmidhuber. Adaptive online time alloca-
tion to search algorithms. In J. F. Boulicaut et al., editor, Machine Learn-
ing: ECML 2004., pages 134–143. Springer, 2004. — Extended techrep
http://www.idsia.ch/idsiareport/IDSIA-23-04.ps.gz.
[3] M. Gagliolo and J. Schmidhuber. A neural network model for inter-problem adaptive
online time allocation. In W. Duch et al., editor, ICANN 2005, Proceedings, Part 2,
pages 7–12, 2005.
[4] E. Horvitz, Y. Ruan, C. P. Gomes, H. A. Kautz, B. Selman, and D. Maxwell Chickering.
A bayesian approach to tackling hard computational problems. In UAI ’01, pages 235–
244, 2001.
[5] E. A. Hansen and S. Zilberstein. Monitoring and control of anytime algorithms: A
dynamic programming approach. Artificial Intelligence, 126(1–2):139–157, 2001.
[6] G. R. Harick and F. G. Lobo. A parameter-less genetic algorithm. In W. Banzhaf et al.,
editor, GECCO, volume 2, 1999.
[7] M. G. Lagoudakis and M. L. Littman. Algorithm selection using reinforcement learn-
ing. In Proc. 17th ICML, pages 511–518, 2000.
[8] M. Petrik. Statistically optimal combination of algorithms. SOFSEM 2005.
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5 x 10
6
Task sequence, from 1 to 21
Cu
m
ul
at
ive
ti
m
e
(fit
ne
ss
fu
nc
. e
va
ls)
BEST
L2−AOTA (adaptive f
τ
)
NN−AOTA (adaptive f
τ
)
AOTA
ra
(fixed f
τ
)
BRUTE
Figure 2: The cumulative time spent on the sequence of tasks by the adaptive fτ method, with a
neural network (NN-AOTA) and a quadratic model (L2-AOTA), compared with the fixed fτ from [2]
this time with ranking fP (AOTAra). Also shown are the the performances of the (usually different
at each run and on each task) fastest solver of the set, (BEST), which would be the performance of
an ideal algorithm selection with “foresight” of the correct τi values at ti = 0; and the estimated
performance of a brute force approach (BRUTE), i.e. running all the algorithms in parallel until one
solves the problem, which leaves the figure and completes the task sequence at time 3.3× 107. Time
is measured in fitness function evaluations, values shown are upper 95% confidence limits calculated
on 20 runs.
alternative parametric models, and different algorithm set/problem sequence combinations.
References
[1] C. P. Gomes and B. Selman. Algorithm portfolios. Artificial Intelligence, 126(1–2):43–
62, 2001.
[2] M. Gagliolo, V. Zhumatiy, and J. Schmidhuber. Adaptive online time alloca-
tion to search algorithms. In J. F. Boulicaut et al., editor, Machine Learn-
ing: ECML 2004., pages 134–143. Springer, 2004. — Extended techrep
http://www.idsia.ch/idsiareport/IDSIA-23-04.ps.gz.
[3] M. Gagliolo and J. Schmidhuber. A neural network model for inter-problem adaptive
online time allocation. In W. Duch et al., editor, ICANN 2005, Proceedings, Part 2,
pages 7–12, 2005.
[4] E. Horvitz, Y. Ruan, C. P. Gomes, H. A. Kautz, B. Selman, and D. Maxwell Chickering.
A bayesian approach to tackling hard computational problems. In UAI ’01, pages 235–
244, 2001.
[5] E. A. Hansen and S. Zilberstein. Monitoring and control of anytime algorithms: A
dynamic programming approach. Artificial Intelligence, 126(1–2):139–157, 2001.
[6] G. R. Harick and F. G. Lobo. A parameter-less genetic algorithm. In W. Banzhaf et al.,
editor, GECCO, volume 2, 1999.
[7] M. G. Lagoudakis and M. L. Littman. Algorithm selection using reinforcement learn-
ing. In Proc. 17th ICML, pages 511–518, 2000.
[8] M. Petrik. Statistically optimal combination of algorithms. SOFSEM 2005.
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
5 Readers on Mendeley
by Discipline
20% Social Sciences
by Academic Status
80% Ph.D. Student
20% Researcher (at an Academic Institution)
by Country
20% Switzerland
20% Germany
20% Turkey


