Depth-First Search in Prolog

It is assumed that the reader is familiar with depth-first search.
Gross and Rosen, A Linear Time Planarity Algorithm for 2-Complexes

We assume that the reader is familiar with depth-first search
Ramalingam, Identifying Loops in Almost Linear Time

In this discussion, we assume familiarity with depth-first search (DFS) algorithms Fu et al., A Genetic Algorithm-Based Approach for Building Accurate Decision Trees

…

Although Prolog can feel very “declarative”, using Prolog effectively requires understanding how Prolog works under the hood. There are two fundamental technologies required to execute queries in Prolog; the second is depth-first search.

Depth-First Search

In general, Depth-First Search (DFS) is a strategy where we make an initial decision, and then (recursively) try as hard as we can to make that decision work. Only when we have exhaustively checked all possible consequences will we backtrack to our original decision and try an alternative.

DFS can be contrasted with alternate strategies like Breadth-First Search (BFS), which considers all possible first decisions, then the consequences of those decisions, then the consequences of those consequences, and so on.

In the analogy of maze solving, DFS is a strategy where we keep moving as far and possible; if we get stuck we backtrack only as far as the last choice we made and try going forward again.

(BFS would be the strategy where we check all locations one step away from the start, then all locations two steps away, and so on.)

An Example

Prolog searches for proofs using Depth-First Search, mainly for efficiency reasons. It is easiest to see how this works with an example, so let’s return to our family tree database of facts:

A Family Tree

parent(homer, bart).    female(marge).   male(homer).
parent(marge, bart).    female(jackie).  male(gomer).
parent(homer, lisa).    female(selma).   male(gemini).
parent(marge, lisa).    female(patty).   male(glum).
parent(homer, maggie).  female(cher).    male(bart).
parent(marge, maggie).  female(lisa).    male(millhouse).
% ...etc...             ...etc...        ...etc...

sib(X, Y) :- parent(P, X), parent(P, Y), X \== Y.

Now suppose that we want to execute the query

?- sib(bart, Z).

Prolog will scan its database of facts and rules (starting from the top), and observe that the query sib(bart, P) unifies with the conclusion of the sib rule (with X = bart, Y = Z); so Prolog will recurively try to prove the premises of that rule; after the unification the premises are:

        parent(P, bart), parent(P, Z), bart \== Z

Since comma means “and”, we need to find values of P and Z making all three conditions true. There are a number of ways this could be achived, but Prolog always works left-to-right in DFS fashion. As a consequence, Prolog will start with the first subgoal parent(P, bart).

In our database, the first fact that unifies with this subgoal is parent(homer, bart), so Prolog will set P = homer and continue on with the remaining subgoals, which are now parent(homer, Z), bart \== Z .
1. Prolog continues working left-to-right. The first item in the database that unifies with parent(homer, Z) is parent(homer, bart), so Prolog sets Z = bart and continues on with the remaining subgoals, which (after unification) are just bart \== bart.
  - Prolog sees that this goal is impossible, and backtracks (undoing Z = bart !) to the last decision point.
2. Prolog continues through the database and discovers that the next way to satisfy the subgoal parent(homer, Z) is to set Z = lisa. Prolog continues on with the new subgoal bart \== lisa.
  - Prolog sees that this goal is trivially true, and reports success:
```
Z = lisa.
```
    (Prolog will show the variables in the original query, but not intermediate variables like P, X, or Y.) If the user accepts this answer (by hitting period or enter), the query is done. If the user rejects this answer (by hitting semicolon), Prolog backtracks and the search continues
3. The final way to satisfy the subgoal parent(homer, Z) is to set Z = maggie. Prolog continues on with the subgoal bart \== maggie and succeeds, reporting
  
  Z = maggie.
4. There are no more ways to satisfy parent(homer, Z) in the database, so if the answer maggie is rejected, Prolog has to backtrack even further, undoing the decision that P = homer
Finally are back to our original subgoals:
```
            parent(P, bart), parent(P, Z), bart \== Z
```
Prolog now looks for the next way to satisfy parent(P, bart), and finds the fact parent(marge, bart). This sets P = marge and search continues forward with the new subgoals parent(marge, Z), bart \== Z
1. As before, Prolog will consider Z = bart but then discover this choice fails the inequality, and then discover
```
Z = lisa.
```
  (the same answer as before, but via a different parent P !) and, similarly, if search continues, the duplicate solution
```
Z = maggie.
```
2. There are no more ways to satisfy parent(marge, Z), so if we get this far search backtracks to see if bart has any other parents
There are no other ways to satisfy parent(P, bart), so if all previous solutions were rejected, then Prolog reports a failed search.

Another Example

Suppose we have the same database of family-tree facts with the rule

aunt(A,N) :- parent(P,N), sib(P,A), female(A).

and this time we execute the query

?- aunt(A, bart)

which immediately expands into the subgoals

    parent(P,bart), sib(P,A), female(A).

The (universally quantified) variable A in the rule is a different variable from the (existentially quantified) variable A in the query, but since they’re immediately unified we will ignore the distinction and just write A for both variables.

Knowing that Prolog works in a left-to-right DFS fashion, we can predict that Bart’s aunts will be found as follows:

Prolog DFS

Prolog finds the first parent of bart, namely P = homer.
1. Prolog finds the first sibling of homer, namely A = gomer, and then tries to demonstrate female(gomer). This fails, so search backtracks.
2. In this database homer has no other siblings, so search backtracks further.
Prolog finds the next parent of bart in the database, namely P = marge.
1. Prolog finds the first sibling of marge, namely A = glum, and then tries to demonstrate female(glum). This fails, so search backtracks.
2. Prolog finds the next sibling of marge, namely A = selma, and then tries to demonstrate female(selma). This fact is in our database, so Prolog reports success
```
A = selma.
```
3. Though not shown in the diagram, if forced to continue Prolog will continue working on sib(marge, A) to find more solutions, including:
```
A = patty.
```
  (Marge’s third sibling in the database), and
```
A = selma.
```
```
A = patty.
```
  (because sib(marge, A) finds siblings via both parents, so half-siblings are returned once but full siblings are returned twice) before finally running out of siblings and backtracking.
Bart has no more parents, so if we get this far search fails.