« PreviousContinue »
exploratory method called "backtracking” by Golumb (1961). The decoding procedure of Wozencraft (1961) is another important variety of Inheritance method.
In the complex exploration process proposed for chess by Newell, Shaw, and Simon (1958b) we have a form of Inheritance method with a nonnumerical stop condition. Here, the subproblems inherit sets of goals to be achieved. This teleological control has to be administered by an additional goal-selection system and is further complicated by a global (but reasonably simple) stop rule of the back-up variety (Sec. IIIC). (Note: we are identifying here the move-tree-limitation problem with that of problem selection.) Even though extensive experimental results are not yet available, we feel that the scheme of Newell, Shaw, and Simon (19586) deserves careful study by anyone planning serious work in this area. It shows only the beginning of the complexity sure to come in our development of intelligent machines.30
C. "Character-Method” Machines Once a problem is selected, we must decide which method to try first. This depends on our ability to classify or characterize problems. We first compute the Character of our problem (by using some pattern recognition technique) and then consult a “Character-Method” table or other device which is supposed to tell us which method(s) are most effective on problems of that Character. This information might be built up from experience, given initially by the programmer, deduced from “advice” (McCarthy, 1959), or obtained as the solution to some other problem, as suggested in the GPS proposal (Newell, Shaw and Simon, 1959a). In any case, this part of the machine's behavior, regarded from the outside, can be treated as a sort of stimulus-response, or “table look-up,” activity.
If the Characters (or descriptions) have too wide a variety of values, there will be a serious problem of filling a Character-Method table. One might then have to reduce the detail of information, e.g., by using only a few important properties. Thus the Differences of GPS (see Sec. IVD2) describe no more than is necessary to define a single goal, and a priority scheme selects just one of these to characterize the situation. Gelernter and Rochester (1958) suggest using a property-weighting scheme, a special case of the “Bayes net” described in Sec. IIG.
D. Planning Ordinarily one can solve a complicated problem only by dividing it into a number of parts, each of which can be attacked by a smaller search (or be further divided). Generally speaking, a successful division will reduce
» Some further discussion of this question may be found in Slagle (1961).
the search time not by a mere fraction, but by a fractional exponent. In a graph with 10 branches descending from each node, a 20-step search might involve 1020 trials, which is out of the question, while the insertion of just four lemmas or sequential subgoals might reduce the search to only 5 X 10+ trials, which is within reason for machine exploration. Thus it will be worth a relatively enormous effort to find such “islands” in the solution of complex problems.31 Note that even if one encountered, say, 10o failures of such procedures before success, one would still have gained a factor of perhaps 1010 in over-all trial reduction! Thus practically any ability at all to “plan,” or “analyze," a problem will be profitable, if the problem is difficult. It is safe to say that all simple, unitary, notions of how to build an intelligent machine will fail, rather sharply, for some modest level of problem difficulty. Only schemes which actively pursue an analysis toward obtaining a set of sequential goals can be expected to extend smoothly into increasingly complex problem domains.
Perhaps the most straightforward concept of planning is that of using a simplified model of the problem situation. Suppose that there is available. for a given problem, some other problem of “essentially the same character” but with less detail and complexity. Then we could proceed first to solve the simpler problem. Suppose, also, that this is done using a second set of methods, which are also simpler, but in some correspondence with those for the original. The solution to the simpler problem can then be used as a "plan" for the harder one. Perhaps each step will have to be expanded in detail. But the multiple searches will add, not multiply, in the total search time. The situation would be ideal if the model were, mathematically, a homomorphism of the original. But even without such perfection the model solution should be a valuable guide. In mathematics one's proof procedures usually run along these lines: one first assumes, e.8., that integrals and limits always converge, in the planning stage. Once the outline is completed, in this simpleminded model of mathematics, then one goes back to try to “make rigorous” the steps of the proof, i.e., to replace them by chains of argument using genuine rules of inference. And even if the plan fails, it may be possible to patch it by replacing just a few
of its steps.
Another aid to planning is the semantic, as opposed to the homomorphic, model (Minsky, 1956a, 1959a). Here we may have an interpretation of the current problem within another system, not necessarily simpler, kui with which we are more familiar and have already more powerful methits. Thus, in connection with a plan for the proof of a theorem, we will ini to know whether the proposed lemmas, or islands in the proof, are 10tually true; if not, the plan will surely fail. We can often easily tell if a proposition is true by looking at an interpretation. Thus the truth of a
* See sec. 10 of Ashby (1956).
proposition from plane geometry can be supposed, at least with great reliability, by actual measurement of a few constructed drawings (or the analytic geometry equivalent). The geometry machine of Gelernter and Rochester (1958, 1959) uses such a semantic model with excellent results; it follows closely the lines proposed in Minsky (1956a).
1. THE “CHARACTER-ALGEBRA” MODEL Planning with the aid of a model is of the greatest value in reducing search. Can we construct machines which find their own models? I believe the following will provide a general, straightforward way to construct certain kinds of useful, abstract models. The critical requirement is that we be able to compile a “Character-Method Matrix” (in addition to the simple Character-Method table in Sec. IVC). The CM matrix is an array of entries which predict with some reliability what will happen when methods are applied to problems. Both of the matrix dimensions are indexed by problem Characters; if there is a method which usually transforms prob. lems of character Ci into problems of character C; then let the matrix entry Cij be the name of that method (or a list of such methods). If there is no such method the corresponding entry is null.
Now suppose that there is no entry for Cis-meaning that we have no direct way to transform a problem of type C; into one of type Cj. Multiply the matrix by itself. If the new matrix has a non-null (i,j) entry then there must be a sequence of two methods which effects the desired transformation. If that fails, we may try higher powers. Note that [if we put unity for the (i,i) terms] we can reach the 2" matrix power with just n multiplications. We don't need to define the symbolic multiplication operation; one may instead use arithmetic entries--putting unity for any non-null entry and zero for any null entry in the original matrix. This yields a simple connection, or flow diagram, matrix, and its nth power tells us something about its set of paths of length 2". 32 [Once a non-null entry is discovered, there exist efficient ways to find the corresponding sequences of methods. The problem is really just that of finding paths through a maze, and the method of Moore (1959) would be quite efficient. Almost any problem can be converted into a problem of finding a chain between two terminal expressions in some formal system.) If the Characters are taken to be abstract representations of the problem expressions, this “CharacterAlgebra” model can be as abstract as are the available pattern-recognition facilities. See Minsky (1956a, 1959a).
The critical problem in using the Character-Algebra model for planning is, of course, the prediction reliability of the matrix entries. One cannot expect the Character of a result to be strictly determined by the Character of the original and the method used. And the reliability of the pre
*2 See, e.g., Hohn, Seshu, and Aufenkamp (1957).
dictions will, in any case, deteriorate rapidly as the matrix power is raised. But, as we have noted, any plan at all is so much better than none that the system should do very much better than exhaustive search, even with quite poor prediction quality.
This matrix formulation is obviously only a special case of the character planning idea. More generally, one will have descriptions, rather than fixed characters, and one must then have more general methods to calculate from a description what is likely to happen when a method is applied.
2. CHARACTERS AND DIFFERENCES In the GPS (General Problem Solver) proposal of Newell, Shaw, and Simon (1959a, 1960a) we find a slightly different framework: they use a notion of Difference between two problems (or expressions) where we speak of the Character of a single problem. These views are equivalent if we take our problems to be links or connections between expressions. But this notion of Difference (as the Character of a pair) does lend itself more smoothly to teleological reasoning. For what is the goal defined by a problem but to reduce the "difference" between the present state and the desired state? The underlying structure of GPS is precisely what we have called a “Character-Method Machine” in which each kind of Difference is associated in a table with one or more methods which are known to “reduce” that Difference. Since the characterization here depends always on (1) the current problem expression and (2) the desired end result, it is reasonable to think, as its authors suggest, of GPS as using “meansend” analysis.
To illustrate the use of Differences, we shall review an example (Newell, Shaw, and Simon, 1960a). The problem, in elementary propositional calculus, is to prove that from S^(-PDQ) we can deduce (Q VP) A S. The program looks at both of these expressions with a recursive matching process which branches out from the main connectives. The first Difference it encounters is that S occurs on different sides of the main connective "A." It therefore looks in the Difference-Method table under the heading "change position.” It discovers there a method which uses the theorem (A^B) = (B1A) which is obviously useful for removing, or “reducing," differences of position. GPS applies this method, obtaining (- P Q 1S. GPS now asks what is the Difference between this new expression and the goal. This time the matching procedure gets down into the connectives inside the left-hand members and finds a Difference between the connectives “)” and “V.” It now looks in the CM table under the heading "Change Connective” and discovers the appropriate method using ( - A ) B) = (A V B). It applies this method, obtaining (P VQ) A S. In the final cycle, the difference-evaluating procedure discovers the need for a “change position” inside the left member, and applies a
method using (A VB) = (BVA). This completes the solution of the problem.33
Evidently, the success of this “means-end” analysis in reducing general search will depend on the degree of specificity that can be written into the Difference-Method table-basically the same requirement for an effective Character-Algebra.
It may be possible to plan using Differences, as well. One might imagine a “Difference-Algebra” in which the predictions have the form D = D'D''. One must construct accordingly a difference-factorization algebra for discovering longer chains D = D. : ..D, and corresponding method plans. We should note that one cannot expect to use such planning methods with such primitive Differences as are discussed in Newell, Shaw, and Simon (1960a); for these cannot form an adequate Difference-Algebra (or Character-Algebra). Unless the characterizing expressions have many levels of descriptive detail, the matrix powers will too swiftly become degenerate. This degeneracy will ultimately limit the capacity of any formal planning device.
One may think of the general planning heuristic as embodied in a recursive process of the following form. Suppose we have a problem P:
1. Form a plan for problem P.
(If no more steps, exit with “success.") 3. Try the suggested method(s):
Success: return to (b), i.e., try next step in the pian.
rent plan to avoid this step.
problem of the current step. Observe that such a program schema is essentially recursive; it uses itself as a subroutine (explicitly, in the last step) in such a way that its current state has to be stored, and restored when it returns control to itself.34
3 Compare this with the "matching” process described in Newell, and Simon (1956). The notions of "Character,” “Character-Algebra," etc., originate in Minsky (1956) but seem useful in describing parts of the “GPS" system of Newell and Simon (1956) and Newell, Shaw, and Simon (1960a). The latter contains much additional material we cannot survey here. Essentially, GPS is to be self-applied to the problem of discovering sets of Differences appropriate for given problem areas. This notion of "bootstrapping"-applying a problem-solving system to the task of improving some of its own methods—is old and familiar, but in Newell, Shaw, and Simon (1960) we find perhaps the first specific proposal about how such an advance might be realized.
3* This violates, for example, the restrictions on “DO loops” in programming systems such as FORTRAN. Convenient techniques for programming such processes were developed by Newell, Shaw and Simon (1960b); the program state variables