Friday, September 14, 2007

Book review Emergence: From Chaos to Order

Emergence: From Chaos to Order

John H. Holland
Redwood City, California: Addison-Wesley
1998
Cloth: ISBN 0-201-14943-5
Order this book


Reviewed by
Tony Curzon Price, W3, ESRC Centre for Economic Learning and Social Evolution, University College London.

Cover of book

Here is one of my favourite illustrations of emergence:

... consider one particular copper atom at the tip of the nose of the statue of Sir Winston Churchill that stands in Parliament Square in London. Let me try to explain why that copper atom is there. It is because Churchill served as Prime Minister in the House of Commons nearby; and because his ideas and leadership contributed to the Allied victory in the Second World War; and because it is customary to honour such people by putting up statues of them; and because bronze is the traditional material for such statues, and so on. Thus we explain a low-level physical observation - the presence of a copper atom at a particular location - through extremely high level theories about emergent phenomena such as ideas, leadership, war and tradition. (Deutsch 1997, page 22).

John Holland's book "Emergence" is an unflagging attempt to characterise the conditions for emergence and in this way to define the term.[1] The example from Deutsch vividly expresses the fact that explanations rely on the organisation of the world into types. The best explanation for the presence of a copper atom on Churchill's nose will not (in this case) come from chemistry or physics, but from notions of "ideas, leadership, war and tradition". In a rough sense, "emergence" can be said to have occurred when one of the types that constitute reality is shown to be the product of other interacting types. The first type has then emerged from the others.

Holland studies phenomena (types) under the hypothesis that they are emergent. He thereby hopes to extract general rules about emergent phenomena, thus laying the groundwork for a science of emergence. This science of emergence will not only clarify the nature of explanation, but may also allow intelligent "re-engineering" of the world to foster the emergent phenomena we like, and discourage those we do not.

"It is the thesis of this book", Holland (1998, page 123) tells us (half way through it), "that the study of emergence is closely tied to [the] ability to specify a large, complicated domain via a small set of 'laws'". Euclid's, Newton's and Maxwell's models of the physical world all automatically generate huge numbers of deductions and predictions from a limited number of laws that we continue to explore to this day. The models provided by checkers (draughts), chess and Go do the same, although their "deductions" have a more tenuous relation to the ordinary parts of our world. (On the other hand, these games have been regarded as good representations of some aspects of the world, for example strategy during war.) Machines that mimic neurons and the connections between them show promise for the emergence alchemist's laboratory, where we seek settings for which "... more comes out than was put in ..." (ibid., page 225).

Holland shows us that the relatively concise language of constrained generating procedures (cgp's), and in particular the cgp's that allow for variable inter-connections (cgp-v's) can formally describe these diverse models.[2] At the same time, he wants to argue that emergence per se, rather than any particular emergent phenomena, is a worthy object of attention, and that it will one day allow us to control and predict aspects of the world that have so far eluded us. He certainly mentions economies, businesses, ecosystems, life, consciousness and systems in which "the interactions of agents produce an aggregate entity that is more flexible and adaptive than its component agents" (ibid., page 248). There are probably others I failed to pick up.

The book unfolds with characteristic breadth, clarity and synthetic talent. My train of thought kept to no rails as I read so the developing argument was fertile in suggesting new avenues for my own work. When I found myself persistently asking a question, I soon found it addressed explicitly. Holland usually digs down to an appropriate level of detail in his examples - the descriptions of Samuel's artificial Checkers' player or the "triangle-perceiver" are sufficient for the reader to build these machines. In later examples, once the elements of the argument are more familiar and the formalisms have been developed, examples are only sketched.

Laws (cgp's and games) allow us move from the few to the many, because the sets that they generate are huge. A version of this observation is very familiar. Science has been reductionist. It has taken phenomena and has decomposed them into interacting component parts. In what sense are these the same observation? A multitude of phenomena are explained by the (few) laws that science has discovered and by the particular parameters of the case. Conversely, the laws and parameters generate the multitude of phenomena. Holland tells us early on (1998, page 8) that "... emergence in rule-governed systems comes close to being the obverse of reduction".

Systematic thinking about reduction dates back at least as far as Berkeley's objection to Locke's treatment of abstraction. (In a position similar to Holland's, Berkeley explains that when we draw a triangle to demonstrate a geometric theorem, the triangle represents all triangles. It becomes a model of triangularity.) The twentieth century has seen a great deal of analysis of the logico-deductive method. A popular characterisation of the method presents knowledge as growing from a to-and-fro motion between phenomenon and theory which is very reminiscent of the design for Art Samuel's checkers' learner.

Start with your phenomenon ("the wave of water in front of the rock in a fast moving stream"), use your current theory to make a prediction relating to it ("it is composed of moving particles of water"), design an experiment to test the prediction ("pour coloured dye into the wave"). If the experiment confirms the theory, fine; if not, you might redescribe your experiment to make it fit ("the dye changed the behaviour of the stream because ..."), or modify the laws in your theory. Hence, we gather knowledge either through reduction (reduce the effect of the dye to make the theory generate the prediction), or through its opposite, emergence, when we change the theory. In other words, the standard logico-deductive view of the scientific process already uses the methods of emergence, and philosophers of science have theorised about emergence although they did not call it that. Yet Holland's book does not read like a tome on philosophy. What is he doing that is different?

Holland acknowledges the philosophical treatments of reduction (1998, page 8), but (barring the incontrovertible Dennett 1996), he sees these contributions as only peripherally relevant. As a scientist, his approach is to try to create "bare bones" emergence in the laboratory (in fact, in virtuo). Most of the book is therefore about modelling, and the basic components of the models which Holland discusses are "Games" and "Numbers".

For Holland, numbers illustrate our ability to "make the one stand for the many". A number stands for nothing in particular about something but only its quantity. One number can stand for a multitude of particulars because it strips out all but the one detail of quantity. The relevance to modelling is apparent. Just like Berkeley's triangle and just like simple numbers, models must be composed of elements that can "stand for" a wide range of others. This is one thing that takes a model beyond the merely "photographic".[3]

Models are possible because (somehow) we can solve the problem of "the one and the many".[4]

But that problem also has a more direct link to the problem of emergence. When we "parse" all triangles we perceive to be instances of the class "triangle", the concept of "triangularity" has emerged. This means that a proper account of emergence will also be an account of what makes "standing for" (reference) possible.[5]

Holland describes a set of interconnected artificial neurons that learn to identify triangles. The only behavioural detail imposed by the modeller is that the machine is connected to an "eye" which responds to boredom (situations when internal states change only small amounts) by shifting its attention to another vertex. An ordinary neural net would not have produced so much with so little modeller input, so Holland (1998, page 110) has to use neurons with variable thresholds, fatigue and connection weight updated by Hebb's rule, a reinforcement learning algorithm for synaptic weights. We presume that this is close to the minimum set of requirements for any "triangle parser". What does this tell us about the ability to generalise? The machine described could be used to sort the world into triangular and non triangular entities. In some sense, it does this by a process that is similar to Locke's description of abstraction. In Holland's simulation, a triangle in the world eventually produces a synchronous firing pattern in the net. Every triangle produces a different firing pattern, depending on orientation and size. But the firing patterns themselves induce firing patterns that have persistent common characteristics. Those patterns can in turn induce their own patterns and the further removed the firing pattern from that of a specific triangle, the more general it becomes. Patterns at any level can interact with those at any other level. In Locke's account, an impression is stripped of detail and turned into an "abstract idea" that is a picture of something but nothing in particular. In both cases, we are a long way from having an object that is capable of representing all triangles for the purposes of a Euclidean demonstration. (This was Berkeley's point). Categorisation is not all that is involved in the meaningful use of a concept. As Holland says:

[The process of net patterns firing up other net patterns] is a precursor of that everyday, but astonishing, human ability [...]: humans effortlessly parse unfamiliar scenes into familiar objects, an accomplishment that so far eludes even the most sophisticated computer programs.
The emphasis is mine - Holland is faultless at not claiming too much for his case, while also maintaining the sense of excitement surrounding the research program. (Not all authors of popular science are so careful.) Emergence relies on generalisation. Models that demonstrate the phenomenon will do some generalising, and will be composed of elements that may themselves have been "generalised" by other models, as if in a cascade.

The second basic component on which modelled emergence relies is illustrated by the concept of the Game. In chess, a small number of rules can generate a huge number of board configurations or "states".[6] The states of a game are analogous to the configurations of matter obeying natural laws. We usually think of games as invented by humans to generate a particular type of outcome. This is a kind of "reverse science" in which laws are invented to generate phenomena.

We can see the attraction of the "game design" model in attempts to display bare-bones emergence in the laboratory. It involves choosing components, inventing rules, letting phenomena emerge and (possibly) decomposing (that is, reducing) these to close the loop and understand the process. A good example is provided by a recent BBC/Discovery Channel television film on avalanches. Some Swiss engineers from the ETH in Zürich wanted to study avalanches in a controlled setting. The danger of doing this directly led them to build a (physical) model in a tank of liquid with snow represented as plastic pellets. They had (theoretical) reason to believe that their artificial world was a good replica of ours. They chose components (liquid of a given density and pellets to mimic different types of snow), the physics of this world determined the rules (that is what a physical model constrains one to, it can be a very efficient calculator) and avalanches occurred. As television will, this film left the subject too early. I presume that they are now designing controlled experiments, validating, and also trying to decompose the avalanches into the behaviour of elementary parts.

The model of the game establishes a clear definition of possibility, and of the requirements on every component (or agent) to generate each possibility. Take chess, for example, where for any state of the board, we could enumerate every legal next state, and for each of those, every possible next state ...This process defines possibility in the world of chess. Moreover, for any board in our list of possible boards, we will be able to list the sets of inputs that would have been required from the machines (or humans) playing the game to achieve it.[7]

Holland describes in fascinating detail the way that Art Samuel built a good checkers (draughts) playing machine in the 1950s. Holland uses the example both to elaborate on the correspondence between games and models, and to demonstrate the emergence, through automatic learning, of good play.[8] The example allows him to underline the huge number of possible states that a few simple rules generate. The exhaustive way of determining the possible states of the game is to lay out its entire tree. Every possible move at the first round is listed and each of these determines a board state for round two. For each round two board state, every possible set of moves is listed to determine the possible board states at round three, and so on until the end of the game. Once we have listed all the ways of finishing the game, our enumeration is over. Just a few rules about the permitted transitions from one state to another have provided a rich and complicated universe of possibilities.

Holland thinks that our social, biological and physical reality may stand in the same sort of relation to theory as the checkers' board tree does to the rules of checkers.[9] When "good" checkers' players confront each other, many of the paths through the tree will never be followed, because they would rely on bad mistakes being made. And in the same way, many social, biological or physical possible worlds will not usually be encountered because of their instability. Samuel's program shows us that once we have a set of "relevant" categories to apply to boards, an automatic procedure can create (or lead to the emergence of) a good checkers' player.[10] It does this by a procedure that is complicated (but actually intuitively attractive) in which categories contribute to explanations and forecasts of performance, and forecasting errors in turn lead to changes in the contributions. Generating good play is already impressive, despite the ex machina categories. The huge set of possible trees is effectively (although only approximately) pruned to the set of trees that involve good players.

The ex machina categories are themselves emergent phenomena of good play in checkers. Board configurations like "one piece advantage" or, more strangely, "board moment" (and some others that do even better) are phenomena of the checkers' world that would be tracked by a good player. Holland moves on to a discussion of the triangle perceiver (mentioned above) to show that categories could emerge automatically. So (filling in for Holland a little, I think) we can imagine the fully emergent checkers' world might be composed of the following hierarchy:

  • Some checkers' playing agents that are induced to play well. They might do this through differential reproduction.
  • These agents have the power to piece together categorising machines which might be like the triangle perceiver.
  • The agents would combine the outputs of perceiving machines to determine their checkers' moves. (They could do this by using the trick of honing predictions through "bootstrapping" before "playing for real".)
  • Eventually, the agents would play each other and the consequent pay-offs would determine inducements at which point the loop would starts again.

There is nothing trivial about building any of these machines. But we can see that a single rule - "survival in a noisy world depends on playing good legal checkers" - could lead to identification of emergent phenomena in that game.[11] Low-level patterns which persist (like the neural cycles associated with "one piece ahead") can combine with other low level patterns in an "emergent" player, that itself becomes a persistent pattern operated on by the selection algorithm.

Much of "Emergence" formalises the notions of the game, the agent and the "level". The first two are analysed as "constrained generating procedures" (cgp), and "variable structure cgp's" (cgp-v). The formalisation is very clear, and makes fertile reading for modellers. (New ways of describing problems suggest new modeling strategies.) The cgp and the cgp-v are both finite state machines. Although they are not the only way of modelling games, they have been used to considerable effect as a framework in evolutionary game theory (Abreu and Rubinstein 1988, Nowak et al. 1995). They are ideally suited to reductionist modelling (and therefore to the production of emergence) because they force the modeller to be totally explicit about the interactions of the component agents.

Much more elusive is the notion of a level. Emergent phenomena are composed of persistent patterns. They emerge from "messy" low-level interactions and their emergence is recognised by the fact that they can be operated on by a separate set of rules. This is the sense in which emergent phenomena define "levels". Holland's framework allows a nice formalisation of this notion. Take a system composed of interacting cgps (or cgp-vs) and aggregate the cgps in some way. Now construct a cgp which behaves just like the aggregate. The intuition is that one can define a function that performs the relevant transformations at the aggregate level while omitting the unnecessary detail on interaction between the "internal-only" connections of cgps. As usual, Holland provides a nice example to hook into as soon as the going gets a bit tough. He shows (1998, pages 194-197) that aggregations of cellular automata sufficient to carry "gliders" in Conway's game "Life" could be defined as an aggregate cgp in this way.

However, the "tiled automata" example provides a challenge. As Holland points out, the description of the aggregated cgp is potentially very large and complicated. Holland then notes that if we know the rules governing the component automata, then "... the glider is once more simply described, and many other regularities become obvious. If we must attempt our reduction without knowing the origin of the tiled automaton, standard techniques exist for exploring possible decompositions (reductions), but the task is not easy." Many emergent phenomena are "obvious regularities". Some examples that Holland is fond of (like the wave before a rock in a fast moving stream) are certainly of this sort, as is the example given in the opening quote from Deutsch. Explanation at the emergent level is quite simple, but gets much more difficult at the reduced level.

So is there a tension? On one hand, understanding the low-level rules of an aggregate means that "regularities become obvious". On the other, it is sometimes only possible to provide meaningful explanations by appealing to aggregate properties. In fact, there is no conflict. It is simply that reduction may be too computationally intensive to be useful. Even in the most successful case of reduction we know, that of chemistry to quantum physics, it is still the case that the fastest way to compute the results of a reaction is usually to carry it out![12] As Holland writes (page 201), "... when we observe regularities, we can often move the description up a level ... [at which] ... the regularities persist and a simpler, "derived" dynamics can be found."

In Holland's view, the definition of "levels" via persistent phenomena arises from the possibility of developing such "reduced form" models that operate in "standard conditions". However, it seems to me that the converse statement also has a certain plausibility: "... reduced form models are possible whenever phenomena persist". This formulation views persistence as the fundamental driver of emergence (and not our ability to develop reduced form models). I think Holland may agree with this, since he claims early on that emergence is not to be explained "... in the eye of the beholder." So if we have an emergent phenomenon, one modelling strategy which this suggests is to search for equilibria (persistent configurations) of relevant agents constrained by relevant rules.

I think that this modelling strategy is very close to the one adopted by neo-classical economics. Take the employment contract between ego and alter. It persists in the economy "under normal circumstances" just as the wave in front of the rock persists. It defines a flow of effort out (and earnings in) just as the standing wave is composed of flowing water and eroding rock. The explananda are the form of the contract (linear, non-linear, performance related or effort-related), its duration, flexibility, payment rate and many other attributes. These are equivalent to the elements of the wave that we would like to forecast, like its height and the velocity of the liquid. A very large number of "wage contract games" have been elaborated by neo-classical economists. In all cases, the method adopted is as advocated by Holland: find the agents and rules that produce the explananda as persistent states. These models themselves often rely on equilibria of lower level models, that they take as "reduced form inputs". One example of a model might explain the contract as a bargaining game. This model will need the value of "outside options" as an input. The "outside option" in a bargain is the value that can be guaranteed to a party even if no deal is struck. The value of the outside option can itself be the output of another model, for example representing the worker's outside option as a function of the search costs involved in finding another job. So the search process could be explicitly modelled, together with the worker's attitudes to costs and risks. The search process relies on many social parameters (geographical and social mobility or signalling through employment history) that are themselves targets for modeling. And the worker's preferences and attitude to risk can themselves be modeled by appeal to an evolutionary game in genes or memes. Hence, the wage contract is an emergent phenomena, a persistent pattern in a bargaining game, that itself builds on the emergent phenomenon of the outside option, which is a persistent pattern from a search problem, that itself arises out of forms of social organisation (themselves to be explained as emergent phenomena) and preferences, themselves persistent phenomena (equilibria) of genetic or memetic games ...[13]

There are many ways to build a rule-and-agent system that will produce the right kind of persistence. Yet there are models that become classics (Samuel's checkers' player, Hofstadter and Mitchell's "Copycat"), while others are forgotten. Holland spends his penultimate chapter trying to find out what it is that makes a good model, and whether the process of innovation can be sufficiently understood to be controlled. One admonition he makes which struck a chord with me (and might with others too, given the great facility with which we can create new models) is the importance of being "... so familiar with the elements of your discipline that you no longer have to think about how they are combined ..." for creative modelling to be possible. At the same time, he urges inter-disciplinarity, because he believes that just as metaphors acquire power through the overlapping "aura" of associations, so good new models (the scientific versions of metaphors) will come from the overlapping "aura" of disciplines. I, for one, can only hope that Holland is right here, because of the sheer fun of inter-disciplinarity.

Holland's non-technical writing has always pointed me in interesting directions. I was building large structural models of industrial sectors for a consultancy company when I read his Scientific American introduction to the Genetic Algorithm. In no time, I was dreaming of adding a layer of behavioural simulation to my models that would make them more than simple "scenario tools". In fact, I did not seriously get down to this until after reading Hidden Order, which made me think the project was in fact sensible. I feel the same will be the case with Emergence: I already try to see firms and industries as interconnected finite state machines and ideas from Emergence will certainly guide the exploration and explanation for models of this kind.

* Notes

1 Holland closes the book with almost necessary and almost sufficient conditions for emergence in models. It takes Holland much of his tightly argued and illustrated book to get this close to a definition, so I hope I will be excused for not offering any reduced form definition myself.

2 I only say a little more about cgp's and cgp-v's in this review. The cgp is essentially a collection of finite state machines. The outputs of some constituent machines are inputs to others. The inputs to a machine, and the functions it encodes, determine its state and its output. Holland provides a very lucid account of the way that such simple devices can encode a great variety of behaviours. The cgp-v is an even more wonderful device. The "variable" refers to the fact that the components of the cgp-v can modify their relationship to other components. Each component machine can take descriptions of machines as inputs. The functions encoded in each machine can therefore be a machine transforming function and the output can be a new machine. Holland illustrates all this with very clear examples, showing us, for example, that the general purpose programmable computer can be described as a cgp-v.

3 An author I can no longer remember once referred to a story by Borges that tells of a Babylonian king who grew wealthy and decided to build a perfect replica of his kingdom. He impoverished himself in emptying the old kingdom to fill its representation. Clearly, he would not have made a very good or imaginative scientist.

4 Here is another attempt at formulating the problem: when we make one entity stand for all relevantly similar entities, it seems as if we are going beyond any simple empirical operation. The (Platonic) temptation is to say that we have some sort of "template" (or "form") that is used to match instances to a type. But what makes the template of that type? Does it conform to some higher template? And if so, how do we halt the infinite regress which results?

5 This is the degree of ambition in Holland's overall project. That his building blocks are also his objects of explanation is not a surprise. In a way, Holland is trying to reduce emergence to its necessary and sufficient conditions.

6 Of course, there is a sense in which the fewer the rules, the larger the number of possible board configurations - I cannot make sense of the notion of no rules - because the fewer the constraints. The notion of a constrained generating procedure emphasises precisely that the rules of the game must restrict the realm of the possible as well as fill it.

7 I resist the temptation to say "the actions" or "the choices" rather than "the sets of inputs" because these notions are tainted by the overtones of freedom, deliberation and other mysterious attributes we feel are essentially human. Saying just "inputs" makes it clear that we could model billiard ball collisions as a game too. (Holland describes this process at greater length in his book Hidden Order.) It also makes it clear that to view phenomena as games is very much first a way of structuring information. Only later does it lead to theories of solution, equilibrium and so on.

8 Is saying that good play "emerges" not pushing the analogy a little? In what sense is this similar to the ETH avalanche, for instance? In fact, it is very similar. Good play and the avalanches are just as determined and reproducible in both cases.

9 Why only "may"? Holland writes: "That a modeling technique as abstract as mathematics should be so efficacious is a mystery often noted by scientists ..." That reality will continue to be explained in this way is still a working hypothesis.

10 I will discuss the role of the "relevant categories" further. Samuel gives his program some pre-programmed ways of categorising boards, like: "all boards in which opponent is +/- x pieces ahead" or "momentum of pieces about the centre is y". These categories are "given" in the sense that Samuel's good player has not discovered that these are useful emergent properties to keep track of if you want to win.

11 Could we make this single rule emergent? The "physics" of the world in which checkers emerges would need to be such as to make the persistence of good players more likely. The question here is the same as that addressed by some Artificial Life researchers who try to generate self-reproduction from chemical precursors. So, in principle, we could follow the checkers trail further.

12 If "levels" are about generating comprehensible explanations, what is the link with the ease of computation? If the elements of Deutsch's explanation (in terms of leadership and tradition) are in fact emergent, then in principle we could discern them in a chemical description of the world. That is, there do exist necessary and sufficient conditions for leadership elaborated entirely in the language of chemistry. But these will be incomprehensible to us because we are computationally limited.

13 Holland urges the modeller not feel overly bound to empirical validation of a model in its early development. This will also strike a chord with many social scientists.

* References


ABREU D. and A. Rubinstein 1988. The Structure of Nash Equilibrium in Repeated Games with Finite Automata. Econometrica, 56:1259-1281.

DENNETT D. 1996. Darwin's Dangerous Idea, The M.I.T. Press, Cambridge, MA.

DEUTSCH D. 1997. The Fabric of Reality. Allen Lane, London.

HOLLAND J. H. 1995. Hidden Order: How Adaptation Builds Complexity. Addison-Wesley, Redwood City, CA.

HOLLAND J. H. 1998. Emergence: From Chaos to Order. Addison-Wesley, Redwood City, CA.

NOWAK M. A., R. M. May and K. Sigmund 1995. The Arithmetics of Mutual Help. Scientific American, 272:50-53.

No comments: