« PreviousContinue »
embody the given rules of inference). If one is found, the problem is solved. Or the search might fail completely. But finally, the search may yield one or more "problems” which are usually propositions from which I may be deduced directly. If one of these can, in turn, be proved a theorem the main problem will be solved. (The situation is actually slightly more complex.) Each such subproblem is adjoined to the "subproblem list” (after a limited preliminary attempt) and LT works around to it later. The full power of LT, such as it is, can be applied to each subproblem, for LT can use itself as a subroutine in a recursive fashion.
The heuristic technique of working backward yields something of a teleological process, and LT is a forerunner of more complex systems which construct hierarchies of goals and subgoals. Even so, the basic administrative structure of the program is no more than a nested set of searches through lists in memory. We shall first outline this structure and then mention a few heuristics that were used in attempts to improve performance. 1. Take the next problem from problem list.
(If there are no more problems, EXIT with total failure.) 2. Choose the next of the three basic Methods.
(If no more methods, go to 1.) 3. Choose the next member of the list of axioms and previous theorems.
(If no more, go to 2.)
If new subproblem arises, go to 4.
If problem is solved, EXIT with complete proof.
go to 3.
Among the heuristics that were studied were (1) a similarity test. to reduce the work in step 4 (which includes another search through the theorem list), (2) a simplicity test to select apparently easier problems from the problem list, and (3) a strong nonprovability test to remove from the problem list expressions which are probably false and hence not provable. In a series of experiments "learning" was used to find which earlier theorems had been most useful and should be given priority in step 3. We cannot review the effects of these changes in detail. Of interest was the balance between the extra cost for administration of certain heuristics and the resultant search reduction; this balance was quite delicate in some cases when computer memory became saturated. The system seemed to be quite sensitive to the training sequence—the order in which problems were given. And some heuristics which gave no significant over-all improvement did nevertheless affect the class of solvable problems. Curiously enough, the general efficiency of LT was not greatly improved by any or all of these devices. But all this practical experience is reflected in the design of the much more sophisticated “GPS” system described briefly in Sec. IVD2.
Wang (1960) has criticized the LT project on the grounds that there exist, as he and others have shown, mechanized proof methods which, for the particular run of problems considered, use far less machine effort than does LT and which have the advantage that they will ultimately find a proof for any provable proposition. (LT does not have this exhaustive “decision procedure” character and can fail ever to find proofs for some theorems.) The authors of “Empirical Explorations of the Logic Theory Machine,” perhaps unaware of the existence of even moderately efficient exhaustive methods, supported their arguments by comparison with a particularly inefficient exhaustive procedure. Nevertheless, I feel that some of Wang's criticisms are misdirected. He does not seem to recognize that the authors of LT are not so much interested in proving these theorems as they are in the general problem of solving difficult problems. The combinatorial system of Russell and Whitehead (with which LT deals) is far less simple and elegant than the system used by Wang.24 (Note, e.g., the emphasis in Newell, Shaw and Simon (1958a, 1958b).] Wang's problems, while logically equivalent, are formally much simpler. His methods do not include any facilities for using previous results (hence they are sure to degrade rapidly at a certain level of problem complexity), while LT is fundamentally oriented around this problem. Finally, because of the very effectiveness of Wang's method on the particular set of theorems in question, he simply did not have to face the fundamental heuristic problem of when to decide to give up on a line of attack. Thus the formidable performance of his program (1960) perhaps diverted his attention from heuristic problems that must again spring up when real mathematics is ultimately encountered.
This is not meant as a rejection of the importance of Wang's work and discussion. He and others working on “mechanical mathematics” have discovered that there are proof procedures which are much more efficient than has been suspected. Such work will unquestionably help in constructing intelligent machines, and these procedures will certainly be preferred, when available, to "unreliable heuristic methods.” Wang, Davis and
* Wang's procedure (1960a), too, works backward, and can be regarded as a generalization of the method of “falsification” for deciding truth-functional tautology. In Wang (1960b) and its unpublished sequel, he introduces more powerful methods (for much more difficult problems).
Putnam, and several others are now pushing these new techniques into the far more challenging domain of theorem proving in the predicate calculus (for which exhaustive decision procedures are no longer available). We have no space to discuss this area, 25 but it seems clear that a program to solve real mathematical problems will have to combine the mathematical sophistication of Wang with the heuristic sophistication of Newell, Shaw and Simon.26
B. Heuristics for Subproblem Selection In designing a problem-solving system, the programmer often comes equipped with a set of more or less distinct “Methods”-his real task is to find an efficient way for the program to decide where and when the different methods are to be used.
Methods which do not dispose of a problem may still transform it to create new problems or subproblems. Hence, during the course of solving one problem we may become involved with a large assembly of interrelated subproblems. A “parallel” computer, yet to be conceived, might work on many at a time. But even the parallel machine must have procedures to allocate its resources because it cannot simultaneously apply all its methods to all the problems. We shall divide this administrative problem into two parts: the selection of those subproblem(s) which seem most critical, attractive, or otherwise immediate, and, in the next section, the choice of which method to apply to the selected problem.
In the basic program for LT (Sec. IVA), subproblem selection is very simple. New problems are examined briefly and (if not solved at once) are placed at the end of the (linear) problem list. The main program proceeds along this list (step 1), attacking the problems in the order of their generation. More powerful systems will have to be more judicious (both in generation and selection of problems) for only thus can excessive branching be restrained.27 In more complex systems we can expect to consider for each subproblem, at least these two aspects: (1) its apparent “centrality”—how will its solution promote the main goal, and (2) its apparent “difficulty”—how much effort is it liable to consume. We need heuristic methods to estimate each of these quantities and, further, to
* See Davis and Putnam (1960), and Wang (1960b).
* All these efforts are directed toward the reduction of search effort. In that sense they are all heuristic programs. Since practically no one still uses “heuristic” in a sense opposed to “algorithmic,” serious workers might do well to avoid pointless argument on this score. The real problem is to find methods which significantly delay the apparently inevitable exponential growth of search trees.
* Note that the simple scheme of LT has the property that each generated problem will eventually get attention, even if several are created in a step 3. If one were to turn full attention to each problem, as generated, one might never return to alternate branches.
select accordingly one of the problems and allocate to it some reasonable quantity of effort.28 Little enough is known about these matters, and so it is not entirely for lack of space that the following remarks are somewhat cryptic.
Imagine that the problems and their relations are arranged to form some kind of directed-graph structure (Minsky, 1956b; Newell and Simon, 1956b; Gelernter and Rochester, 1958). The main problem is to establish a “valid” path between two initially distinguished nodes. Generation of new problems is represented by the addition of new, not-yet-valid paths, or by the insertion of new nodes in old p?ths. Then problems are represented by not-yet-valid paths, and "centrality" by location in the structure. Associate with each connection, quantities describing its current validity state (solved, plausible, doubtful, etc.) and its current estimated difficulty.
1. GLOBAL METHODS The most general problem-selection methods are "global”—at each step they look over the entire structure. There is one such simple scheme which works well on at least one rather degenerate interpretation of our problem graph. This is based on an electrical analogy suggested to us by a machine designed by Shannon (related to one described in Shannon (1955) which describes quite a variety of interesting game-playing and learning machines] to play a variant of the game marketed as “Hex” (and known among mathematicians as “Nash”). The initial board position can be represented as a certain network of resistors. (See Fig. 11.) One player's goal is to construct a short-circuit path between two given boundaries; the opponent tries to open the circuit between them. Each move consists of shorting (or opening), irreversibly, one of the remaining resistors. Shannon's machine applies a potential between the boundaries and selects that resistor which carries the largest current. Very roughly speaking, this resistor is likely to be most critical because changing it will have the largest effect on the resistance of the net and, hence, in the goal direction of shorting (or opening) the circuit. And although this argument is not perfect, nor is this a perfect model of the real combinatorial situation, the machine does play extremely well. (It can make unsound moves in certain artificial situations, but no one seems to have been able to force this during a game.)
The use of such a global method for problem selection requires that the available “difficulty estimates” for related subproblems be arranged to
* One will want to see if the considered problem is the same as one already considered, or very similar. See the discussion in Gelernter and Rochester (1958). This problem might be handled more generally by simply remembering the (Characters of) problems that have been attacked, and checking new ones against this memory, e.g., by methods of Mooers (1956), looking more closely if there seems to be a match.
Figure 11. This board game (due to C. E. Shannon) is played on a network of equal resistors. The first player's goal is to open the circuit between the end points; the second player's goal is to short the circuit. A move consists of opening or shortening a resistor. If the first player begins by opening resistor 1, the second player might counter by shorting resistor 4, following the strategy described in the text. The remaining move pairs (if both players use that strategy) would be (5, 8) (9, 13) (12, 10 or 2) (2 or 10 win). In this game the first player should be able to force a win, and the maximum-current strategy seems always to do so, even on larger networks.
combine in roughly the manner of resistance values. Also, we could regard this machine as using an “analog model” for “planning.” (See Sec. IVD.) 29
2. LOCAL, AND "HEREDITARY,” METHODS The prospect of having to study at each step the whole problem structure is discouraging, especially since the structure usually changes only slightly after each attempt. One naturally looks for methods which merely update or modify a small fragment of the stored record. Between the extremes of the “first-come-first-served” problem-list method and the full global-survey methods, lie a variety of compromise techniques. Perhaps the most attractive of these are what we will call the Inheritance methods—essentially recursive devices.
In an Inheritance method, the effort assigned to a subproblem is determined only by its immediate ancestry; at the time each problem is created it is assigned a certain total quantity Q of time or effort. When a problem is later split into subproblems, such quantities are assigned to them by some local process which depends only on their relative merits and on what remains of Q. Thus the centrality problem is managed implicitly. Such schemes are quite easy to program, especially with the new programming systems such as IPL (Newell and Tonge, 1960c) and LISP (McCarthy, 1960) (which are themselves based on certain hereditary or recursive operations). Special cases of the Inheritance method arise when one can get along with a simple all-or-none Q, e.g., a “stop condition"—this yields the
* A variety of combinatorial methods will be matched against the network-analogy opponent in a program being completed by R. Silver, Lincoln Laboratory, MIT, Lexington, Mass.