Friday, September 14, 2007

1990 Natural solutions give their best

Natural solutions give their best: Suppose a sales rep has to visit several towns. What is the shortest, most economical route? This is just one optimisation problem that is benefiting from lessons learnt from nature

  • 14 April 1990
  • From New Scientist Print Edition. Subscribe and get 4 free issues.

THE SHAPE of a soap bubble, the amount of inventory to keep in a shop, and the most economical way to lay out a printed circuit board are all solutions to problems of optimisation that both nature and human engineers often face. In each case, you can think of the optimal solution as the least or greatest value of a mathematical function describing how some variable behaves. The soap bubble minimises its surface area, the correct amount of inventory minimises cost, and the best printed circuit board layout is the one that requires the least time to assemble.

During the past couple of decades, new techniques inspired by natural systems in physics and biology have emerged for solving problems like these. A key element in the success of these techniques aimed at optimisation is randomness.

Optimisation is the problem of finding the best solution to a particular problem. This is usually achieved by creating a what is called a 'cost' function, a term borrowed from economics. It measures how good any particular solution is. The problem of finding the best solution then becomes one of finding where this cost function is minimised.

If we think of the cost function as a landscape and altitude as a measure of cost, then the optimal location in a landscape is where the deepest valley has its bottom. This is called the global optimum, in order to distinguish it from local optima, which are points at the bottom of smaller valleys where the cost function is lower than at any immediately adjacent point (see Figure 1a). The landscape of possible solutions is called a search space; each possible solution is a single point in this space.

Optimisation problems are difficult because in most of them information is available only about the immediate area. To understand why this makes optimisation difficult, imagine that you have become lost hill-walking in the dark, and have run out of water. Your objective is to find your way downhill to a lake; the only information you have is the slope of the terrain immediately around you.

Clearly, an exhaustive search of the landscape around you would take far too long to find the lake. The next simplest strategy is to pick the direction that seems to lead downhill most quickly and walk that way. Follow the direction of steepest descent as it changes in order to lose altitude as quickly as possible. This method, known as gradient descent, is an example of an adaptive search, in which information gained from the previous search step guides the direction of the next search step (see Figure 1b).

Gradient descent works for simple problems, but fails when it encounters terrain like that shown in Figure 1c. Following the direction of the slope, or gradient, from any of the starting points on the right leads to the global optimum, but starting from the left leads only to one of the local optima.

Almost all real problems have local optima of this sort, which makes the simple gradient-descent approach inadequate for most interesting problems. One way to get around this difficulty is simply to pick many starting points and see where gradient descent leads for each. This, however, requires a great deal more computation.

Another technique, known as conjugate gradient descent, sometimes offers an improvement on simple gradient descent. At the starting point, we can pick the direction along which the gradient is steepest, and move in this direction by small steps until the terrain starts to bend uphill, even if the direction in which we move is not that of steepest descent at all points. Once the terrain starts to bend uphill, we stop, examine our surroundings, pick a new direction perpendicular to the old one and then move that way (see Figure 1d). This method works better than simple gradient descent because it allows us to accept small short-term penalties in exchange for potentially larger long-term rewards. It turns out that one reason that systems found in nature can optimise so well is that they can often move uphill in order to escape from local optima.

An example of a system that does this is a cooling metal. The perfect configuration for a crystal - the one with the least total energy - is the one in which all the atoms are aligned in a lattice. As the metal cools, atoms line up with their neigh bours, but different regions may come to point in different directions (see Figure 2). If we measure cost by counting the number of unaligned atoms, changing the orientation of one atom at a time does not move us toward a better solution. Whole regions must be reversed in order to move out of the local optimum created by having two unaligned crystals.

The energy to do this is available as the heat in the metal. At any temperature above absolute zero, atoms vibrate with an energy that depends on the temperature of the metal. This energy can cause atoms to reorientate themselves with a probability that increases with the temperature. When the energy in the metal is large, it is easy for atoms to reorientate themselves, even if this temporarily increases the total energy of the configuration. Realignments such as this are a way for the metal to jump out of a local optimum. As the temperature drops it becomes more difficult for the system to undergo large changes. When the temperature approaches absolute zero, flipping becomes impossible and the state of the atoms is frozen, almost always with an energy near that of the global optimum. The more slowly the metal is cooled, the more nearly perfect the crystal that it forms is likely to be, and the nearer its energy will be to that of the global optimum.

The word 'near' is important. No real metal will ever form perfect crystals. Instead, its state will be somewhere near perfect - in some local optimum whose value is very close to that of the global optimum. In practical optimisation problems, such nearly perfect solutions are usually acceptable, because engineers and scientists prefer finding a good, cheap solution in a hurry to taking a long time to find a perfect solution which is only slightly better.

Temperature-dependent behaviour like this is the basis for an optimisation technique called simulated annealing. A cost function usually called the 'energy' of the system describes how it behaves, by analogy with the energy of a cooling crystal. The system is randomly changed by some small amount. If the change decreases the value of the cost function, the change is automatically accepted. If the change increases the system's energy, on the other hand, it is accepted with a probability that depends on the current temperature of the system according to the rule (see Figure 3):

Prob(accept change) = e-E/kT

where E is the change in energy, T is the temperature, and k is a constant known as Boltzmann's constant, after the German physicist who developed much of modern thermodynamics in the 19th century.

The simulated temperature is initially set very high so that the system is free to wander about the search space relatively easily. Each time that you use the acceptance procedure described above, the temperature is lowered very slightly. By the time the temperature has dropped so low that the system is no longer capable of escaping from a deep valley, it will usually be near a very good solution to the problem.

To see how this works in practice, consider an optimisation problem known as the travelling sales rep problem, or TSP. Each of the points in Figure 4 represents a city that a sales rep must visit. In order to keep her costs low, she must visit each city once, using as short a path as possible. In practice, it is not necessary for the sales rep to use the shortest possible path; any path that is almost as short will do.

We can solve this problem using simulated annealing by choosing a way to represent solutions (paths), a function to be minimised (the total path length), and a way of moving from one possible solution to another, such as exchanging two randomly chosen links in the path. If such an exchange decreases the total path length, the exchange is accepted. If it increases the total path length, the change is accepted with a probability that depends on the current temperature. As the temperature is slowly decreased, we reach a point after which the system no longer accepts any changes. The con figuration at this point is the solution to the problem. Figures 4a to 4d show a computer performing simulated annealing on a small example of the TSP.

Working at IBM's research centre in Yorktown Heights, New York State, in the mid 1970s, Scott Kirkpatrick and others used simulated annealing to solve several problems that arise in manufacturing computers and computer components. One problem is that of partitioning circuits into small packages, which can then be put in separate chips. Another is that of deciding where to put these chips on printed circuit boards so that the paths between them are as short as possible, and cross one another as infrequently as possible. Finally, there is the problem of actually laying out the circuits within a particular chip. Because each chip may contain hundreds of thousands of interconnected units, it is impossible to obtain good solutions by hand.

Kirkpatrick explored each of these problems in turn using simulated annealing. To solve the chip placement problem, he and his group developed a cost function that counted the number of wires crossing the board. They then moved chips randomly between different board sites in an attempt to minimise this function. The results obtained this way were as good as, or better than, the results obtained using more conventional optimisation methods.

An alternative to simulated annealing that has been receiving a great deal of attention during the past 10 years takes its inspiration from the ways plants and animals evolve. Species of plants and animals do not really perform optimisation in the strict sense, because their goal (if they can be said to have one at all) is that of reproducing efficiently and robustly. But in the process of evolving, they do adaptively improve, and often develop solutions that are in some sense nearly optimal, such as the shape of albatross wings. Evolution in natural systems would, therefore, seem to be a good place to look for inspiration in developing optimisation techniques.

Evolution can happen whenever there are systems that make copies of themselves. A replicating system might usually produce exact copies of itself, but no copying process always works perfectly. If the replicating system is very sophisticated, any mistakes will probably impair the copy's ability to reproduce successfully itself. The mutant system will probably die off, or at least become less numerous than its parent type. Sometimes, however, the mutant will be superior to the original, in which case it may be more successful than its parent and become more common.

Because resources are limited, there is usually competition between different organisms and species. This makes it appropriate to think of organisms having a 'reproductive fitness'. Many things will contribute to this fitness - in a squirrel, factors such as the sharpness of its eyes and the agility of its limbs, its attractiveness to other squirrels, and perhaps its strength, all determine how likely the squirrel is to reproduce.

In 1975, inspired by these concepts, John Holland developed his genetic algorithm for performing adaptive searches. This approach used all the ideas and language of genetics - chromosomes, genes, mutations and so on. His idea was to maintain a pool of solutions to the given problem and to allocate each solution a fitness that measured how good it was. He stored possible solutions as 'chromosomes' - strings of numbers, each having some particular meaning in the problem. These reproduced according to their fitness. Because only some fixed number of chromosomes were stored, those which were less fit would tend to 'die off'.

In order to understand how we can use this model to solve optimisation problems, we must first understand how the chromosome stores information and reproduces. A chromosome contains all the information necessary to specify a position in the search space - a set of numbers called 'genes'. For example, when trying to find the bottom of a valley, we could use x, y coordinates, such as used for mapping, as genes. A position in the search space would then be represented by:

For more complex problems, we may need more than two numbers to specify a possible solution. In this case, we simply make the chromosome as long as is necessary to accommodate them all. The key requirement is that all possible combinations of genes represent valid points in the space. We form new chromosomes from old ones in two different ways, the most important of which is 'crossing-over'. In this process, we combine two parent chromosomes to make a child by aligning the parents, picking some random position along their common length, and swapping the two halves of the chromosome. An example for a chromosome with seven genes and a crosspoint between the third and fourth positions would be:

Crossover never changes the values of genes; it simply arranges existing gene values in different ways. It is analogous to sexual reproduction in animals, from which it draws direct inspiration. Crossover is the mechanism responsible for most of the diversity we see within species.

The second way of producing a new chromosome is by changing the values of one or more of its genes. This is called mutation. Unlike crossover, mutation changes the genes' values, thereby destroying information. It is necessary to have mutation because as less fit members of the population are killed off, all the instances of some value of a gene may disappear. Crossover alone would be able to explore only an ever-decreasing part of the search space; mutation is needed to allow the search to reach new parts of the space. Thus, we think of crossover as being the driving force underpinning the genetic algorithm, and of mutation as being responsible for keeping the gene pool, on which it draws, well-stocked.

There is a third important approach that should be used in a genetic algorithm - inversion. The aim of crossover is to develop 'co-adapted' sets of genes - ones that work well together. We would like these to be copied together as well, but if a chromosome develops co-adapted genes at its opposite ends, there is little chance of them remaining together because the splicing of the chromosome will usually break them up.

The closeness of a pair of genes on the chromosome is referred to as their linkage, and inversion provides a way of moving information around on the chromosome to alter the linkage between different parts of the solution. We can think of inversion as 'flipping' part of the chromosome, to change which genes are adjacent to one another without changing what those genes actually do. Clearly you cannot do this in the naive way, for crossover could not align the chromosomes properly. But there are ways of implementing inversion so that genes can, in effect, move around the chromosome and thus become more or less tightly linked.

Representing the solution to a problem as a chromosome with the characteristics described is not always simple. In particular, permutation problems, such as the travelling sales rep problem, cause difficulties. The obvious way to describe a tour would be to number the towns. The chromosome becomes a list of towns in the order that they should be visited. This is no good, however, because two such tours cannot usually be crossed over to produce another valid tour - the new tour would be legitimate only if the cros sed sections happened to contain the same subset of towns to be visited. In such circumstances, genetic algorithms do not usually perform particularly well.

David Glover at the University of Iowa used a genetic algorithm to solve the difficult problem of mapping East Asian languages onto conventional Latin alphabet keyboards. There are typically between 3000 and 5000 ideograms to represent in such languages, each made up from a few of perhaps 160 possible components.

Because no keyboard has 160 keys, you have to use each key for several characters. Providing various special keys like the shift key on a normal typewriter could solve the problem, but a neater solution is to make use of the fact that not all combinations of components are valid, and to try to arrange the components on keys in such a way that no pair of component sequences for different characters are the same.

This is an extremely hard optimisation problem, because the number of possible arrangements is very large (typically of the order 10180), and the density of good solutions is very small. The genetic algorithm employed was slightly different from the one employed for the travelling sales rep problem described above but it used all the same ideas. The measure of fitness used was related to the number of characters that had a unique sequence of keys in the keyboard configuration. The genetic algorithm consistently found configurations of a goodness that were as rare as one in 1016 after examining only a few tens of thousands of configurations.

Another group that has had success with genetic algorithms is in the department of mechanical engineering at the University of Edinburgh. Philip Husbands, Frank Mill and Stephen Warrington have used genetic algorithms for computer-aided process planning, in which a computer schedules machining operations automatically. The Edinburgh group treats each of the features of an object as a gene. A sequence of these genes then describes an order in which to carry out the machining. This generates an initial population of plans using several different strategies. One such strategy is 'do all the machining with one tool', while another is 'machine each feature on the cheapest tool'. Clearly, these approaches often contradict one another. One of the advantages of using genetic algorithms is that breeding and competition produce an optimal mix of these strategies which has a much lower total cost than either separately.

Using only mutation and crossover, the Edinburgh group has been able to produce plans that are much cheaper than those produced by traditional optimisation methods. The group's success, as measured by the cost functions of real-world economics, is helping to ensure the spread of these 'natural' techniques into other fields.

Nick Radcliffe and Greg Wilson work in the Parallel Computing Centre at the University of Edinburgh.

Further reading Genetic Algorithms and Simulated Annealing, Lawrence Davis (ed), Pitman, 1987.

From issue 1712 of New Scientist magazine, 14 April 1990, page

No comments: