Genetic Programming: Theory and Practice

  • Worzel B
  • Riolo R
N/ACitations
Citations of this article
44Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Genetic programming is a machine learning technique to automatically create computer programs from high-level specifications of a problem. It achieves this goal by genetically breeding a population of computer programs using the principles of Darwinian natural selection and biologically inspired operations. Selection is obtained by attaching to each program a fitness value, which quantifies how well it solves the problem. In spite of the numerous practical successes of genetic programming and the good quality of theory that has been developed until nowadays, some unsolved problems persist. In synthesis: methodologies to predict, or to measure, the difficulty of problems have not been developed yet (i.e. no technique to measure the capability of genetic programming to find good solutions for a given problem exists) and genetic programming is, in general, a slow and resource consuming process. Proposing solutions to these problems is the guiding thread of this thesis. Two measures of problem difficulty are presented:fitness distance correlation (based on the idea that what makes a problem difficult or easy for genetic programming is the relationship between fitness and distance to the goal) and negative slope coefficient (which quantifies some aspects of evolvability, i.e. the ability of genetic operators to produce offsprings that are fitter than their parents). These measures are based on statistical sampling of the search space. Both of them succeed in correctly measuring the difficulty of a wide range of different problems. Advantages and drawbacks of these measures are discussed in depth. Furthermore, a discussion of the concept of fitness landscape, on which these two measures are inspired, is proposed. Among the main causes of inefficiency of genetic programming, one may mention: premature convergence (or the tendency to produce populations in which all the individuals have similar characteristics), bloat (or progressive individuals' code growth) and the fact that fitness evaluation often requires the execution of programs on many different input data (known as fitness cases). This thesis shows how distributing individuals into separate communicating subpopulations naturally counteracts premature convergence and bloat, allowing genetic programming to find solutions of better quality, more quickly. The advantage of parallelising genetic programming is twofold: on the one hand it enables to achieve time savings by distributing the computational effort on a set of calculating agents, on the other hand, the parallel setting offers benefits from the algorithmic point of view, in analogy with the natural parallel evolution of spatially distributed populations. Furthermore, techniques to dynamically tune the size of populations, and to limit the number of fitness cases to be tested, in order to save computational effort, are proposed.

Cite

CITATION STYLE

APA

Worzel, B., & Riolo, R. (2003). Genetic Programming: Theory and Practice. In Genetic Programming Theory and Practice (pp. 1–10). Springer US. https://doi.org/10.1007/978-1-4419-8983-3_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free