Page images

Appendix E

Edward A. Feigenbaum. “The art of artificial intelligence – Themes and case studies of knowledge engineering,” pp 227-240, in the Proceedings of the National Computer Conference 1978, copyrighted 1978. Reproduced by permission of AFIPS Press.

The art of artificial intelligence—Themes and case studies of knowledge engineering

by EDWARD A. FEIGENBAUM Stanford University Scanford, California


This paper will examine emerging themes of knowledge engineering, illustrate them with case studies drawn from the work of the Stanford Heuristic Programming Project, and discuss general issues of knowledge engineering art and practice.

Let me begin with an example new to our workbench: a system called PUFF, the early fruit of a collaboration between our project and a group at the Pacific Medical Center (PMC) in San Francisco.*

A physician refers a patient to PMC's pulmonary function testing lab for diagnosis of possible pulmonary function disorder. For one of the tests, the patient inhales and exhales a few times in a tube connected to an instrument computer combination. The instrument acquires data on flow rules and volumes, the so-called low-volume loop of the patient's lungs and airways. The computer measures certain parameters of the curve and presents them to the diagnostician (physician or PUFF) for interpretation. The diagnosis is made along these lines: normal or diseased; restricted lung disease or obstructive airways disease or a combination of both; the severity; the likely disease type(s) (e.8., emphysema, bronchitis, etc.); and other factors important for diagnosis.

PUFF is given not only the measured data but also certain items of information from the patient record, e.g., sex, age, number of pack-years of cigarette smoking. The task of the PUFF system is to infer a diagnosis and print it out in English in the normal medical summary form of the interpretation expected by the referring physician.

Everything PUFF knows about pulmonary function diagnosis is contained in (currently) 55 rules of the IF... THEN... form. No textbook of medicine currently records these rules. They constitute the party-public, partly-private knowledge of an expert pulmonary physiologist at PMC, and were extracted and polished by project engineers working intensively with the expert over a period of time. Here is an example of a PUFF rule (the unexplained acronyms refer to various data measurements):

RULE 31 IF: 1) The severity of obstructive airways disease of the patient is greater than or equal to mild, and 2) The degree of diffusion defect of the patient is greater than or equal to mild, and 3) The uc (body box) observed

predicted of
the patient is greater than or equal to 110
4) The observed-predicted difference in
rv/tlc of the patieat is greater than or
equal to 10

1) There is strongly suggestive evidence
(.9) that the subtype of obstructive airways
disease is emphysema, and
2) It is definite (1.0) that "OAD,
Diffusion Defect, elevated TLC, and elevated
RV together indicate emphysema." is one of
the findings.

One hundred cases, carefully chosen to span the variety of disease states with sufficient exemplary information for each, were used to extract the SS rules. As the knowledge emerged, it was represented in rule form, added to the system and tested by running additional cases. The expert was sometimes surprised, sometimes frustrated, by the occasional gaps and inconsistencies in the knowledge, and the incorrect diagnoses that were logical consequences of the existing rule set. The interplay between knowledge engineer and expert gradually expanded the set of rules to remove most of these problems.

As cumulation of techniques in the art demands and allows, a new tool was not invented when an old one would do. The knowledge engineers pulled out of their toolkit a version of the MYCIN system (to be discussed later), with the rules about infectious diseases removed, and used it as the inference engine for the PUFF diagnoses. Thus PUFF, like MYCIN, is a relatively simple backward-chaining infer

Dr. J. Osborn, Dr. R. Fallas. Jobin Kunz, Diane McClung.


cace system. It seeks a valid line-of-reasoning based on its rules and rooted in the instrumeat and patient data. With a little more work at fitting some existing tools together, PUFF will be able to explain this line-of-reasoning, just as MYCIN does.

As it is, PUFF only prints out the final interpretation, of which the following is an example:

The degree of dyspnea: MODERATELY-SEVERE
The severity of coughing: MILD
Sputum production MODERATELY-SEVERE
The number of pack-years of smoking: 48
referral diagnosis: BRONCHITIS
IVCIIVC-predicted: 80
RV/RV-predicted: 191
FVCFVC-predicted: 87
TLC(body box)observed/predicted: 127
Predicted FEV1/FVC: 83
TLC(DLCO)observed/predicted: 83
FEVVFVC racio: 50
RV/TLC Observed-Predicted: 21
MMF/MMF-predicted: 19
the DLCO/DLCO-predicted: 48
The slope (FSO_obs-F25_obs/FVcobs: 19

difference between observed and predicted
RV/TLC ratios.
Improvement in airway resistance indicates
some reversibility of airway
Airway obstruction is consistent with the
patient's smoking history.
The airway obstruction accounts for the
patient's dyspnea.
Although bronchodilators were not
useful in this one case, prolonged use may
prove to be beneficial to the patient.
The reduced diffusion capacity indicates
airway obstruction of the mixed
bronchitic and emphysemalous types.
Low diffusing capacity indicates loss of
alveolar capillary surface.
Obstructive Airways Disease of mixed types

150 cases not studied during the knowledge acquisition process were used for a test and validation of the rule set. PUFF inferred a diagnosis for each. PUFF-produced and expert-produced interpretations were coded for statistical analysis to discover the degree of agreement. Over various types of disease states, and for two conditions of match between human and computer diagnoses ("same degree of severity" and "within one degree of severity'), agreement ranged berween approximately 30 percent and 100 perceat.

The PUFF story is just beginning and will be told perhaps at a later NCC. The surprising puachline to my synopsis is that the current state of the PUFF system as described above was achieved ia less than 50 hours of interaction with the expert and less than 10 man-weeks of effort by the knowledge engineers. We have learned much in the past decade of the art of engineering knowledge-based intelligent agents!

In the remainder of this essay, I would like to discuss the route that one research group, the Stanford Heuristic Propramming Project, has taken, illustrating progress with case studies, and discussing themes of the work.



DEGREE OF OBSTRUCTIVE AIRWAYS DISEASE: OAD degree by SLOPE: (MODERATELY-SEVERE 700) OAD degree by MMF: (SEVERE 900) OAD degree by FEVI: (MODERATELY-SEVERE 700) FINAL OAD DEGREE: (MODERATELY-SEVERE 910) (SEVERE 900) No conflict. Final degree: (MODERATELY-SEVERE 910) INTERPRETATION: Obstruction is indicated by curvature of the flow-volume loop. Forced Vital Capacity is normal and peak flow rates are reduced, suggesting airway obstruction. Flow rate from 25-75 of expired volume is reduced, indicating severe airway obstruction. OAD, Diffusioa Defect, elevated TLC, and elevated RV together indicate emphysema. OAD, Diffusioa Defect, and elevated RV indicate emphysema. Change in expired flow rates following bronchodilation shows that there is reversibility of airway obstruction. The presence of a productive cough is an indication that the OAD is of the broachitic type. Elevated lung volumes indicate overinflation. Air trapping is indicated by the elevated

The dichotomy that was used to classify the collected papers in the volume Computers and Thoughe still charac. terizes well the motivations and research efforts of the Al community. First, there are some who work toward the construction of intelligent artifacts, or seek to uncover priociples, methods, and techniques useful in such construccion. Second, there are those who view artificial intelligence as (to use Newell's phrase) "theoretical psychology," seeking explicit and valid information processing models of human thought.

For purposes of this essay, I wish to focus on the morivations of the first group, these days by far the larger of the two. I label these motivations "the intelligent agent viewpoint" and here is my woderstanding of that viewpoint:

"The potential uses of computers by people to accommarily a consequence of the specialist's knowledge enployed by the agent, and only very secondarily related to the generality and power of the inference method employed. Our agents must be knowledge-rich, even if they are methods-poor. In 1970, reporting the first major summary-ofresults of the DENDRAL program to be discussed later), we addressed this issue as follows:

plish tasks can be 'one-dimensionalized into a spectrum representing the nature of instruction that must be given the computer to do its job. Call it the WHAT-to-HOW spectrum. At one extreme of the spectrum, the user supplies his intelligence to instruct the machine with precision exactly HOW to do his job, step-by-step. Progress in Computer Science can be seen as steps away from the extreme ‘HOW point on the spectrum: the familiar panoply of assembly languages, subroutine libraries, compilers, extensible languages, etc. At the other extreme of the spectrum is the user with his real problem (WHAT he wishes the computer, as his instrument, to do for him). He aspires to communicate WHAT he wants done in a language that is comfortable to him (perhaps English); via communication modes that are convenient for him (iocluding perhaps, speech or pictures); with some generality, some vagueness, imprecision, even error; without having to lay out in detail all necessary subgoals for adequate performance

with reasonable assurance that he is addressing an intelligent agent that is using knowledge of his world to understand his inteat, to fill in his vagueness, to make specific his abstractions, to correct his errors, to discover appropriate subgoals, and ultimately to translate WHAT he really wants done into processing steps that define HOW it shall be done by a real computer. The research activity aimed at creating computer programs that act as "intelligent agents" near the WHAT end of the WHAT-TO-HOW spectrum can be viewed as the long-range goal of Al research." (Feigenbaum, 1974)

. . general problem-solvers are too weak to be used as the basis for building high-performance systems. The behavior of the best general problem-solvers we know, human problem-solvers, is observed to be weak and shal. low, except in the areas in which the human problemsolver is a specialist. And it is observed that the transfer of expertise between specialty areas is slight. A chess master is unlikely to be an expert algebraist or an expert mass spectrum analyst, etc. In this view, the expert is the specialist, with a specialist's knowledge of his area and a specialist's methods and heuristics." (Feigenbaum, Buchanan and Lederberg, 1971, p. 187)

Subsequent evidence from our laboratory and all others has only confirmed this belief.

Al researchers have dramatically shifted their view on generality and power in the past decade. In 1967, the canonical question about the DENDRAL program was: “It sounds like good chemistry, but what does it have to do with AI?" In 1977, Goldstein and Papert write of a paradigm shift in Al:

"Today there has been a shift in paradigm. The fundamental problem of understanding intelligence is not the identification of a few powerful techniques, but rather the question of how to represent large amounts of knowledge in a fashion that permits their effective use and interaction." (Goldstein and Papert, 1977).

Our young science is still more art than science. An: "the principles or methods governing any craft or branch of learning." Art: "skilled workmanship, execution, or agency." These the dictionary teaches us. Knuth tells us that the endeavor of computer programming is an art, in just these ways. The art of constructing intelligent agents is both part of and an extension of the programming art. It is the art of building complex computer programs that represent and reason with knowledge of the world. Our art therefore lives in symbiosis with the other worldly arts, whose practitioners experts of their an-hold the knowledge we need to construct intelligent agents. In most "crafts or branches of learning" what we call 'expertise" is the essence of the art. And for the domains of knowledge that we touch with our art, it is the rules of expertise" or the rules of "good judgineat" of the expert practitioners of that domain that we seek to transfer to our programs.

The second insight from past work concerns the nature of the knowledge that an expert brings to the performance of a task. Experience has shown us that this knowledge is largely heuristic knowledge, experiential, uncertain mostly "good guesses" and "good practice," in lieu of facts and rigor. Experience has also taught us that much of this knowl edge is private to the expert, not because he is unwilling to share publicly how he performs, but because he is unable. He knows more than he is aware of knowing. (Why else is the Ph.D. or the Internship a guild-like apprenticeship to a presumed 'master of the craft?" What the masters really know is not written in the textbooks of the masters.) But we have learned also that this private knowledge can be uncovered by the careful, painstaking analysis of a second party, or sometimes by the expert himself, operating in the context of a large number of highly specific performance problems. Finally, we have learned that expertise is multifaceted, that the expert brings to bear many and varied sources of knowledge in performance. The approach to capcuring his expertise must proceed on many fronts simultaacously.

Lessons of the past

Two insights from previous work are pertinent to this essay.

The first concerns the quest for generality and power of the inference engine used in the performance of intelligent acts (what Minsky and Papert (see Goldstein and Papert, 1977] have labeled the power strategy”). We must hypothesize from our experience to date that the problem solving power exhibited in an intelligent agent's performance is pri

« PreviousContinue »