Saturday, September 15, 2007

Automated Scientific Discovery, the holy goal of AI? from AAAI


"[Bruce] Buchanan was trained as a philosopher of science at a time when the profession was dominated by Popper's (1965) view that there is no logic of discovery. Buchanan stated the new research program:

'The traditional problem of finding an effective method for formulating true hypotheses that best explain phenomena has been transformed into finding heuristic methods that generate plausible explanations. The problem of giving rules for producing true scientific statements has been replaced by the problem of finding efficient heuristic rules for culling the reasonable candidates for an explanation from an appropriate set of possible candidates' [and finding methods for constructing the candidates].'"

-from Recent Work in Computational Scientific Discovery

Good Places to Start

2020 Computing: Exceeding human limits. Scientists are turning to automated processes and technologies in a bid to cope with ever higher volumes of data. But automation offers so much more to the future of science than just data handling. By Stephen H. Muggleton. Nature 440, 409-410 (23 March 2006). "During the twenty-first century, it is clear that computers will continue to play an increasingly central role in supporting the testing, and even formulation, of scientific hypotheses. This traditionally human activity has already become unsustainable in many sciences without the aid of computers. This is not only because of the scale of the data involved but also because scientists are unable to conceptualize the breadth and depth of the relationships between relevant databases without computational support. The potential benefits to science of such computerization are high -- knowledge derived from large-scale scientific data could well pave the way to new technologies, ranging from personalized medicines to methods for dealing with and avoiding climate change. [fn: Towards 2020 Science (Microsoft, 2006)]. ... Meanwhile, machine-learning techniques from computer science (including neural nets and genetic algorithms) are being used to automate the generation of scientific hypotheses from data. Some of the more advanced forms of machine learning enable new hypotheses, in the form of logical rules and principles, to be extracted relative to predefined background knowledge. ... One exciting development that we might expect in the next ten years is the construction of the first microfluidic robot scientist, which would combine active learning and autonomous experimentation with microfluidic technology."

'Knowledge discovery'. California Computer News (October 20, 2004). "In the recent science-fiction thriller 'Minority Report,' Tom Cruise plays a detective who solves future crimes by being immersed in a 'data cave,' where he rapidly accesses all the relevant information about the identity, location and associates of the potential victim. A team at Purdue University currently is developing a similar 'data-rich' environment for scientific discovery that uses high-performance computing and artificial intelligence software to display information and interact with researchers in the language of their specific disciplines. 'If you were a chemist, you could walk right up to this display and move molecules and atoms around to see how the changes would affect a formulation or a material's properties,' said James Caruthers, a professor of chemical engineering at Purdue. The method represents a fundamental shift from more conventional techniques in computer-aided scientific discovery. 'Most current approaches to computer-aided discovery center on mining data in a process that assumes there is a nugget of gold that needs to be found in a sea of irrelevant information,' Caruthers said. 'This data-mining approach is appropriate for some scientific discovery problems, but scientific understanding often proceeds through a different method, a 'knowledge discovery' approach. 'Instead of mining for a nugget of gold, knowledge discovery is more like sifting through a warehouse filled with small gears, levers, etc., none of which is particularly valuable by itself. After appropriate assembly, however, a Rolex watch emerges from the disparate parts.' ... Discovery informatics depends on a two-part repeating cycle made up of a 'forward model' and an 'inverse process' and two types of artificial intelligence software: hybrid neural networks and genetic algorithms."

Iridescent Software Illuminates Research Data. By Mike Martin. Sci-Tech Today (January 27, 2004). "Bioinformatics researchers at the University of Texas (UT) Southwestern Medical Center have developed Iridescent, a software program that helps scientists easily identify obscure commonalities in research data and directly relate them to their own work, saving money and speeding the process of discovery. 'This work is about teaching computers to 'read' the literature and make relevant associations so they can be summarized and scored for their potential relevance,' said Dr. Jonathan Wren, a researcher in the department of botany and microbiology at the University of Oklahoma. 'For humans to answer the same questions objectively and comprehensively could entail reading tens of thousands of papers.' ... Iridescent is unveiled in the current issue of the journal Bioinformatics"

  • Shared relationship analysis: ranking set cohesion and commonalities within a literature-derived relationship network. Wren JD, Garner HR. Bioinformatics. 2004 Jan 22;20(2):191-8. [Abstract]

Toward Automated Discovery in the Biological Sciences. By Bruce G. Buchanan and Gary R. Livingston. AI Magazine 25(1): Spring 2004, 69-84. "The end point of scientific discovery is a concept or hypothesis that is interesting and new (Buchanan 1966). Insofar as there is a distinction at all between discovery and hypothesis formation, discovery is often described as more opportunistic search in a less well-defined space, leading to a psychological element of surprise. The earliest demonstration of self-directed, opportunistic discovery was Doug Lenat's program, AM (Lenat 1982). It was a successful demonstration of AI methods for discovery in a formal domain characterized by axioms (set theory) or rules (games). AM used an agenda-based framework and heuristics to evaluate existing concepts and then create new concepts from the existing concepts. It continued creating and examining concepts until the 'nterestingness' of operating on new or existing concepts (determined using some of AM'S heuristics) dropped below a threshold. Although some generalization and follow-up research with AM was performed (Lenat 1983), this research was limited to discovery in axiomatic domains (Haase 1990; Shen 1990; Sims 1987). Our long-range goal is to develop an autonomous discovery system for discovery in empirical domains, namely, a program that peruses large collections of data to find hypotheses that are interesting enough to warrant the expenditure of laboratory resources and subsequent publication. Even longer range, we envision a scientific discovery system to be the generator of plausible hypotheses for a completely automated science laboratory in which the hypotheses can be verified experimentally by a robot that plans and executes new experiments, interprets their results, and maintains careful laboratory records with the new data."

A Machine With a Mind of Its Own - Ross King wanted a research assistant who would work 24/7 without sleep or food. So he built one. By Oliver Morton. Wired Magazine (August 2004, Issue 12.08). "The 'robot scientist' (King has resisted the temptation of a jazzy acronym) may look like a mere labor-saving gizmo, shuttling back and forth ad nauseam, but it's much more than that. Biology is full of tools with which to make discoveries. Here's a tool that can make discoveries on its own. ... It wasn't until he moved to Aberystwyth in the mid-'90s that King found comrades who fully appreciated the potential of AI and machine learning. One of the first people he encountered there was Douglas Kell, a voluble, handlebar-mustached biologist with a clear view of where his field was headed. ... Stephen Muggleton argues that the life sciences are peculiarly well suited to machine learning. 'There's an inherent structure in biological problems that lends itself to computational approaches,' he says. In other words, biology reveals the machinelike substructure of the living world; it's not surprising that machines are showing an aptitude for it."

  • Also see this related article: Mark of time. The Engineer Online (September 18, 2006). "A pioneering study at Manchester University is using a 'robot scientist' to examine blood samples for biological markers that may diagnose Alzheimer's disease. ... The robot scientist combines the automatic operation of a blood analysis technique called GCGC-MS with artificial intelligence to determine which experiment to carry out next. ... Douglas Kell, a professor of bioanalytical science at Manchester, was one of the developers of the robot scientist. 'The original idea was to automate the process of scientific discovery,' said Kell. 'There is a model by which we alternate the world of ideas with the world of experience. We carry out an experiment then revise our hypothesis in a cyclic loop. The robot scientist can combine working out what experiment is best to do next with actually carrying it out.' ... The robot uses Inductive Logic Programming, a machine learning process. The scientists give it the background knowledge about the experiment, called the domain. It then decides which hypothesis to follow using the available data."

Herbert A. Simon: Scientific Discovery. One of Professor Simon's departmental web pages (2001) at Carnegie Mellon University's Department of Psychology. "Understanding the processes scientists use to discover new laws and to test hypotheses has been an active domain of cognitive research and AI modeling for several decades, and was one of Herb Simon's chief areas of research activity. Scientific discovery is an interesting and important task domain because it involves highly ill-structured problems that call on the whole range of human cognitive resources, and thereby provides deep insights into complex and creative human thinking. ... Thus, research on scientific discovery requires one to address fundamental problems in cognitive psychology (the processes of discovery), in the philosophy of science (the relation between the discovery and validation, or disconfirmation, of hypotheses), and in computer science (languages for discovery, heuristic search in discovery environments)."

Readings Online

A robot that likes to play with test tubes. By David Akin. The Globe & Mail (January 17, 2004). "[The Robot Scientist] probably will become a vital tool for researchers, particularly in biological fields, to advance human knowledge. That is because in many scientific areas, such as nanotechnology, molecular genetics and the exploration of space, information is being generated too fast for humans to analyze it effectively. 'Biology is in a great data-gathering phase at the moment, a bit like it was in the 19th century,' said Stephen Oliver, a professor and genomics researcher at the University of Manchester in England and another of the eight researchers. The Human Genome Project, the monster science project that identified and explained the function of the genes in a human being, made great use of computers and sophisticated software programs to automate the scientific discovery progress. Indeed, there is now a branch of artificial intelligence research devoted to scientific discovery."

Robo-scientist goes it alone. BBC News (January 14, 2004). "The world's first 'robot scientist' that can interpret experiments without any human help has been developed by scientists at the University of Wales, Aberystwyth. It generates a set of hypotheses from what it knows about biochemistry, and then designs experiments to test them. ... Although artificial intelligence has made a number of significant contributions to scientific discovery during the last 30 years, its general impact on experimental science has been limited. But this may be about to change with the increased use of automation in scientific research."

Undergraduate Projects in the Application of Artificial Intelligence to Chemistry. II Self-organizing Maps. By Hugh Cartwright. (2000). The Chemical Educator, Volume 5, Issue 4; 196-206. "The determination of relationships among samples is a task to which Artificial Intelligence is increasingly being applied. In this paper we investigate the Self-Organizing Map (SOM), whose role is to perform just this kind of task; in other words, to cluster data samples so as to reveal the relationships that exist among them."

  • More resources are available from Dr H.M. Cartwright's home page and research group page at the Physical & Theoretical Chemistry Laboratory, University of Oxford.

Artificial Intelligence and Scientific Creativity. By Simon Colton and Graham Steel, Division of Informatics, University of Edinburgh. "Papers presented at the [the 1999 AISB Symposium on AI and Scientific Creativity, which took place in Edinburgh, Scotland] addressed the theoretical aspects of and computational possibilities for machine creativity. They also reported on systems implemented to achieve automated discovery in science. The intention of the symposium was that that the papers proposing models of scientific creativity would help researchers concerned with implementing discovery programs, and the papers discussing the successes and techniques employed in working systems will help researchers extract general frameworks for scientific machine discovery. This note is a survey of current research on creativity in science, and in particular the automation of discovery tasks in science."

Recent Work in Computational Scientific Discovery. By Lindley Darden. In Proceedings of the Nineteenth Annual Conference of the Cognitive Science Society. Michael Shafto and Pat Langley (Eds.). Mahwah, New Jersey: Lawrence Erlbaum, 1997, pp. 161-166. "The study of computational scientific discovery emerged from the view that science is a problem solving activity, that heuristics for problem solving can be applied to the study of scientific discovery in either historical or contemporary cases, and that methods in artificial intelligence provide techniques for building computational systems. Pioneers in this work are Bruce Buchanan (e.g., 1982) and Herbert Simon (e.g., 1977)."

  • Also by Lindley Darden (1998): Anomaly-Driven Theory Redesign: Computational Philosophy of Science Experiments. In T.W. Bynum and J.H. Moor, The Digital Phoenix: How Computers are Changing Philosophy. New York: Blackwell Publishers, pp. 62-78. " I have been asked to discuss how computers have affected my work in philosophy. This paper discusses the use of artificial intelligence (AI) models to investigate both the representation of scientific knowledge and reasoning strategies for scientific change. The focus is on the reasoning strategies used to revise a theory, given an anomaly, which is a failed prediction of the theory."

The computer revolution in science: Steps towards the realization of computer-supported discovery environments. By H. de Jong and A. Rip. (1997). Artificial Intelligence, 91(2). "The tools that scientists use in their search processes together form so-called discovery environments. The promise of artificial intelligence and other branches of computer science is to radically transform conventional discovery environments by equipping scientists with a range of powerful computer tools including large-scale, shared knowledge bases and discovery programs." -from the Abstract.

The Computer-Aided Discovery of Scientific Knowledge. By Pat Langley. 1998. Proceedings of the First International Conference on Discovery Science. "In this paper, we review AI research on computational discovery and its recent application to the discovery of new scientific knowledge. ... As evidence for the advantages of such human-computer cooperation, we report seven examples of novel, computer-aided discoveries that have appeared in the scientific literature...."

  • More of Pat Langley's publications can be found in his Computational Scientific Discovery collection which begins with this historical note: "I became fascinated with the nature of scientific discovery as an undergraduate at TCU, and the interest has remained to this day. My dissertation work at CMU focused on Bacon, an AI system that rediscovered numeric laws from the history of physics. Herbert Simon served as my advisor and contributed many ideas to the effort. Gary Bradshaw and I extended the system to handle additional laws, including ones from the history of chemistry. After Jan Zytkow joined our group, we developed new systems (Stahl and Dalton) that dealt with the discovery of qualitative laws and structural models. This CMU work forms the basis of my early publications on scientific discovery...."

Towards 2020 Science. Produced under the aegis of Microsoft Research Cambridge (2006). "In the summer of 2005, an international expert group was brought together for a workshop to define and produce a new vision and roadmap of the evolution, challenges and potential of computer science and computing in scientific research in the next fifteen years. The resulting document, Towards 2020 Science, sets out the challenges and opportunities arising from the increasing synthesis of computing and the sciences." In addition to the report and the roadmap, be sure to see the related, special issue of Nature.

Introducing robo-scientist - Could robots take over from graduate students in the lab? By Mark Peplow. Nature (January 15, 2004). "A robot scientist has been unveiled that can formulate theories, carry out experiments and interpret results - all more cheaply than its human counterparts. As far as artificial newspaper intelligence goes, the Robot Scientist - designed by Ross King of the University of Wales in Aberystwyth, UK, and his colleagues - isn't as smart as other computers, such as those that compete in international chess competitions. But combining the smarts of a computer with the agility of a robot wasn't trivial. ... Geneticist Stephen Oliver of the University of Manchester, UK, who helped to select the robot's research project, says there is potential for the robot to more than just drudgery. 'The next big step is to make our robot discover something completely new,' says Oliver, 'perhaps by applying it to drug discovery.'"

  • The journal article: Oliver, S. G. et al. Functional genomic hypothesis generation and experimentation by a robot scientist. Nature, 427, 247 - 252, doi:10.1038/nature02236 (2004).
  • And consider this: A Robot Scientist - As ye sow... A machine can now do science. The Economist (January 15, 2004). "One question is, if their robot does make an important discovery, will it be eligible to win a Nobel prize?"

Editorial: Scientific Discovery and Simplicity of Method. By Herbert A. Simon, Raul E. Valdes-Perez and Derek H. Sleeman. (1997). Artificial Intelligence, 91(2):177-181. ""[C]omplexity of programs or of their outputs is not a measure of their 'intelligence'. Given very complex tasks, complex algorithms may be a necessity, but they are clearly not a virtue. A critical lesson of artificial intelligence, and of computing in general, is that if a task domain has strong structure and if sufficient domain information can be obtained, either a priori or in the course of computation, then rather simple programs may suffice."

Systematic Methods of Scientific Discovery: Papers from the 1995 Spring Symposium, ed. Raul Valdes-Perez. Technical Report SS-95-03. American Association for Artificial Intelligence, Menlo Park, California. Here are just some of the papers you'll find in this collection:

  • Herbert A. Simon's What is a Systematic Method of Scientific Discovery?
  • Pat Langley's Stages in the Process of Scientific Discovery.
  • Joshua S. Lederberg's Notes on Systematic Hypothesis Generation, and Application to Disciplined Brainstorming.

Some Recent Human-Computer Discoveries in Science and What Accounts for Them. By Raul E. Valdes-Perez. AI Magazine 16(3): Fall 1995, 37-44. "My collaborators and I have recently reported in domain science journals several human-computer discoveries in biology, chemistry, and physics. One might ask what accounts for these findings, for example, whether they share a common pattern. My conclusion is that each finding involves a new representation of the scientific task: The problem spaces searched were unlike previous task problem spaces. Such new representations need not be wholly new to the history of science; rather, they can draw on useful representational pieces from elsewhere in natural or computer science. This account contrasts with earlier explanations of machine discovery based on the expert system view. My analysis also suggests a broader potential role for (AI) computer scientists in the practice of natural science."

Neural Networks Meet CombiChem. By Emil Venere. Bio.com (January 22, 2002). "The different types of software work together in a repeating two-phase cycle of discovery. First, hybrid neural networks analyze the formulas of the numerous catalysts, or other materials, created by the parallel technique. The neural networks determine the properties of the materials, based on their chemical structures. In the second phase, genetic algorithms cull the best materials and eliminate the poor performers, just like survival of the fittest. The algorithms also generate 'mutations' of the best materials to create even better versions, and the software determines the chemical structures of those mutations. The resulting formulas are returned to the neural network software, and the cycle starts over again, progressively creating better and better materials, said Venkat Venkatasubramanian, a professor of chemical engineering who has been working with Caruthers to develop the software for more than a decade. [James M.] Caruthers said he observes how formulation chemists come up with new ideas. Then he models their trains of thought in software programs."

Text-Based Discovery in Biomedicine: The Architecture of the DAD-system. By M. Weeber, H. Klein, A. R. Aronson, J. G. Mork, L. Jong-van den Berg, and R. Vos. Presented at The American Medical Informatics Association 2000 Symposium. "Current scientific research takes place in highly specialized contexts with poor communication between disciplines as a likely consequence. Knowledge from one discipline may be useful for the other without researchers knowing it. As scientific publications are a condensation of this knowledge, literature-based discovery tools may help the individual scientist to explore new useful domains. We report on the development of the DAD-system, a concept-based Natural Language Processing system for PubMed citations that provides the biomedical researcher such a tool."

Related Web Sites

"ARROWSMITH is interactive software that extends the power of a MEDLINE search. It operates on the output of a conventional search in a way that helps the user see new relationships and form and assess novel scientific hypotheses. It is based on the premise that information developed in one area of research can be of value in another without anyone being aware of the fact." At this site, which is maintained by Don R. Swanson at The University of Chicago, you'll find articles and manuals that show you how it works.

Imperial College Computational Bioinformatics Laboratory (CBL):

Related Pages

More Readings

Scientific Discovery - Computational Explorations of the Creative Processes. By Pat Langley, Herbert A. Simon, Gary L. Bradshaw and Jan M. Zytkow. The MIT Press (February 1987). "Using the methods and concepts of contemporary information-processing psychology (or cognitive science) the authors develop a series of artificial-intelligence programs that can simulate the human thought processes used to discover scientific laws. The programs - BACON, DALTON, GLAUBER, and STAHL - are all largely data-driven, that is, when presented with series of chemical or physical measurements they search for uniformities and linking elements, generating and checking hypotheses and creating new concepts as they go along."

Molecular Treasure Hunt - A software tool elicits previously undiscovered gene or protein pathways by combing through hundreds of thousands of journal articles. By Gary Stix. Scientific American (May 2005; subscription req'd.). "When Andrey Rzhetsky arrived at Columbia University as a research scientist in 1996, the first project he collaborated on involved a literature search to try to understand why white blood cells called lymphocytes do not die in chronic lymphocytic leukemia. The mathematician-biologist found a few hundred articles on apoptosis (programmed cell death) and the cancer.... The experience led him to an idea that would have made his job on that first project much easier: an automated search tool that could supplant the mind-numbing task of finding and reading all the literature. But it also might do much more; it could even let a machine conduct research on its own, discovering the patterns among the data much as a human would do...."

FYI: As explained in this announcement, on March 1, 2007 AAAI changed its name from the American Association for Artificial Intelligence to the Association for the Advancement of Artificial Intelligence.

No comments: