-----

CS 151 (Artificial Intelligence)
Assignment #1: Unification Matcher

-----

Due Monday, 31 January, at 5pm.

See additional detail on submission added 27 Jan.

The Big Picture

For this assignment, you will write a unification matcher in Scheme. This assignment would be pretty simple, except for the fact that you have probably never used Scheme. So this assignment also serves as a crash introduction to Scheme. Because you are just starting with Scheme, this assignment gives rather more explicit hints about code structure than I'll give in future assignments.

Here are some examples for testing your functions.

Getting started

Ensure that you have a working account on turing, and that you have a copy of the R5RS Scheme manual and the handout on scheme48 extensions. Locate the Scheme page for CS 151, which contains a variety of useful pointers and information.

Then, work through the Scheme tutorial. This should familiarize you with the features you need to do the assignment, and help you get acquainted with the Scheme manuals.

What should the matcher do?

The top-level matcher function should take two expressions as input. It should return a list of variable substitutions that will make the expressions match one another. If the expressions will not match, it should return #f. Your matcher code must consist of a number of small functions: do not attempt to write this using a couple giant functions!

An expression is either a constant, a variable, or a function applied to some number of expressions.

You may remember from CS 80 that functions and predicates are distinct. For the purposes of this assignment, however, treat them as the same type of object.

Function applications can be nested, as in

   (is-dining-hall? (building-east-of olin))
   (<= (f (g ?x ?y ) 17) (sin (+ ?z ?x)))
   (more-famous-than? (boyfriend-of monica) (president-of hmc))

In order for two expressions to match, the function names and constants must match exactly. A variable in one expression can be matched to a sub-expression (e.g. a constant, a variable, a function application) in the other expression. For example, the following two expressions match, because ?x can be matched to monica, and ?y can be matched to (president-of hmc).

   (more-famous-than? (boyfriend-of monica) ?y)
   (more-famous-than? (boyfriend-of ?x) (president-of hmc))

It is ok to match a variable to another variable, and then also match the pair of them to something else. So the following pair of expressions match, with ?x, ?y, and (g ?z) all matched to one another.

   (f ?x ?y)
   (f (g ?z) ?x)

However, it is illegal to match a variable to two different, incompatible values. So these two expressions do not match.

   (more-famous-than? (boyfriend-of monica) ?x)
   (more-famous-than? (boyfriend-of ?x) (president-of hmc))

It is also illegal to match a variable to an expression containing that same variable, creating a sort of circular or recursive match. So these two expressions also do not match.

   (more-famous-than? (boyfriend-of monica) ?x)
   (more-famous-than? (boyfriend-of monica) (president-of ?x))

Getting started

An useful (but optional) exercise for getting started is to write a function that checks whether an input is a legal expression. That is:

    is the expression a symbol or number? (i.e. a variable or constant)
    if not, is the expression a list? (i.e. a function application)
      if so, 
         -- check that the list contains at least one
            element and that the first element is a non-variable symbol
         -- recursively call the checker on the other (i.e. non-first)
            elements in the list, to make sure they have a legitimate form
    otherwise indicate an error or return #f

The form of this function is classic lisp/scheme. There are several cases (usually done with a cond statement), each handling a different type of input object. The conditions for the cases use a lot of type predicates such as list?, symbol?, or null? Some of these cases call the main function recursively, to handle sub-structure.

Substitution lists

For this assignment, a list of variable substitutions (often called "bindings") should be represented as a list of pairs. Each pair should contain a variable and the value it is matched to. For example, if (f ?x ?y) is matched to (f (g ?z) ?x), the substitution list would be

   ((?x (g ?z)) (?y ?x))
or ((?y ?x) (?x (g ?z))) 

With the substitution list in this format, you can use the Scheme function assoc to find the value associated with a given variable. (Try this out in the interpreter.) However, that value could be another variable, which in turn is associated with a third value, as in the above list.

Now, write a function that takes a substitution list and a variable name, and returns the ultimate value for that variable. If there is a chain of substitutions, as in the above list, it would walk to the end of the chain. So, if given the above list and the variable ?y, it would return (g ?z) rather than ?x. In Scheme, it is usually easier to write such a function using recursion (having the function call itself) rather than a loop.

Also write a function that takes an expression and a substitution list, and applies the substitution list to the expression. That is, it returns a new expression in which each variable, if it occurs in the substitution list, is replaced by its ultimate value from the substitution list.

Because Scheme is interpreted, it is easy to test such utility functions before using them to construct bigger functions. Take advantage of this feature.

In order to write these functions, you will need a function that determines whether a scheme object is a variable. Because this uses some slightly obscure functions, I thought you'd prefer not to write it as your first Scheme exercise. So here it is:

  (define (variable? object)
     (and (symbol? object)
          (equal? (string-ref (symbol->string object) 0) #\?)))

When comparing symbols or other objects for equality, you should use the function equal? rather than similar functions such as eqv? or eq?.

Occurs-check

Now, write an auxiliary function called occurs-check. (There seems to be a tradition of always using this name for this function.) Occurs-check should take a variable and an expression as input. It should return #t or #f, depending on whether that variable occurs anywhere in that expression. Like most of the other functions in this assignment, occurs-check should call itself recursively to inspect sub-expressions for occurrences of the variable.

The matcher itself

The guts of the matcher should be a recursive function that takes two expressions and a substitution list as input. It should attempt to match those expressions, assuming that the substitutions in the list are already set in concrete, but perhaps adding more substitutions to the list. It should return this new substitution list, or #f if the expressions cannot be matched.

Here is pseudo-code for this recursive function:

unify-aux (expression1 expression2 substitutions)

   If either expression1 or expression2 is a variable, replace it
      with its ultimate substitution.

   Then examine the types of the two expressions:
      -- If the two expressions are equal, return substitutions.
      -- Otherwise if expression1 is a variable, check whether this 
         variable occurs in expression2.  If so, return #f.  If not,
         add the pair (expression1 expression2) to the substitution 
         list and return it.
      -- Otherwise if expression2 is a variable, do a similar thing.
      -- Otherwise if expression1 and expression2 are both lists, call 
         unify-lists to figure out whether they match. 
      -- Otherwise, the expressions don't match:  return #f.

Here is pseudo-code for the utility function unify-lists:

unify-lists (expression1 expression2 substitutions)

   Expression1 and expression2 are assumed to be lists, i.e. function calls.

   If the two expressions aren't the same length, or if their
     first elements (the function names) aren't equal, return #f.

   Otherwise, loop through the inputs to the two function calls,
     checking whether corresponding inputs match.  
     -- Suppose that expression1 has the form (ff s1 s2 ... sn) and
           expression2 has the form (ff t1 t2 ... tn).
     -- Call unify-aux to determine whether s1 and t1 match, with the
           input substitution list.  This will return a new substitution list.
     -- If that succeeds, call unify-aux on s2 and t2, giving unify-aux the
           new substitution list.
     -- Continue up to sn and tn, each time giving unify-aux the substitution
           list produced by the previous call to unify-aux
     -- If one of the calls to unify-aux fails, halt the loop and return #f.
     -- Otherwise, return the substitution list produced by the final call to 
           unify-aux (on sn and tn).

If you are curious, a similar, but slightly different, unification algorithm can be found in Russell and Norvig, p. 303. (They use "var/val" to indicate the pair (var val) in a substitution list.) And other AI texts (e.g. in Sprague) present yet more variant algorithms, all of which compute the same result.

What to turn in?

Submit a copy of your code using the submit system. Use the command cs151submit, e.g.

    cs151submit foo.scm

For those of you who haven't used cs70submit or its clones, this program will ask you for the assignment number, email your submission to the grader account (aka Prof. Fleck), and email you an acknowledgement. When running this program, you must be in the same directory as the file you are submitting.

There is also a program cs151submitall which submits all the files in your current directory that have suitable extensions (e.g. scm) or special names (e.g. README).

Also turn in a hardcopy of your code: in class, under my office door, in the box outside my office, or in my mailbox in Olin 240. Make sure that the code file has your name at the top, and that the code is commented.

With the hardcopy version of your code, turn in a brief (probably less than one page) description of what your code does and how it works. You don't have to say much about the algorithm, unless you did something different from what's in the instructions (e.g. modified algorithm, extra features). However, you should make sure to tell me the name of the top-level function and how to call it. Discuss what error checking it does. Also, provide a couple examples of inputs and what your program produced on them.

Remember that your code should be properly commented and formatted. Your description should be typed, written in literate English, and formatted so that it is easy to read. See the style guidelines.


This page is maintained by Margaret Fleck.