Trees
and
their Implementation as Lists

trees
The Tree is a Pervasive Information Structure
Files & Directories
Family Trees
Management Hierarchies
Decision Trees
Image Trees
In the current discussion:
Trees are the abstraction
Lists are the implementation

Files and Directories
Family Trees
Organization Chart
(Management Hierarchy)
Decision Trees
Image Trees
Definition of Tree
There are many different varieties of trees.
We discuss only some of them.
Use your knowledge of these to generalize to other varieties.
We will base our definition on paths and related concepts.

Paths in Directed Graphs
A path in a graph G is a list of nodes n0, n1, É, nk such that each successive pair (ni, ni+1) is in the corresponding binary relation.

Cycles
A cycle is a path that starts and ends on the same node.
Examples:
a, c, e, a
e, a, c, e, a, c, e

Cyclic and Acyclic
A cyclic graph is one that has at least one cycle.
An acyclic graph is one that has no cycles.

DAGs
DAG is an acronym for
ÒDirected Acyclic GraphÓ
ÒDAGÓ is mainly used because it is more pronounceable than ADG
(ÒAcyclic Directed GraphÓ)

Target Set
The target set of a node n is the set of nodes to which there is an arc from n.
targets(a) = {b, c}
targets(b) = {d}
targets(c) = {d, e}
targets(d) = { }
targets(e) = {a}

Leaves
If a nodeÕs target set is empty, that node is called a leaf.

Fan-In
A directed graph is said to fan-in at node n if the node is in the target sets of two or more different nodes.
A directed graph Òhas fan-inÓ if it fans in at least one node.

Roots
A root of a directed graph is a node that is not in any nodeÕs target set.

Tree at Last
A tree is a directed graph such that:
The graph is acyclic.
There is exactly one root.
It has no fan-in.

Tree vs Not
Classify these for Tree-dom
More Graphs to Classify
Reverse Graphs
Some graphs that may look tree-like arenÕt technically trees unless we consider the reverse graph (one with all of the arcs of the original reversed).

Reconvergence, an Alternative
A reconvergence is a pair of different paths that start and end, respectively, on the same nodes.
Therefore, a tree can also
be characterized as a
directed graph that
has one root
has no cycles
has no reconvergences

Subsets of Three Properties
DAG: acyclic, but
may have multiple roots,
may have fan-in
Forest: acyclic, and no fan-in but
may have multiple roots
A forest can also be characterized as a collection of disjoint trees.
Each tree could be identified with its root.

Adding/Removing Arcs
Adding arcs to a ______ that is not a tree may make it into a tree.
Adding arcs to a ______ that is not a tree will never make it into a tree.
Removing arcs from a ______ that is not a tree may make it into a tree.
Removing arcs from a ______ that is not a tree will never it into a tree.

Ordered Directed Graphs
We use the adjective ordered to indicate that the order of targets of a node matters.
This property is implicit with trees much of the time.
Because we are going to represent trees by lists, we can have ordering for free if we want it.

Representing/Implementing
Trees as Lists
Every tree can be represented as a list.
Obvious:
Tree is a special kind of directed graph.
Every directed graph can be represented as a list of pairs.
But we want a representation that makes it clear that we have a tree.

First Try: Target Sets
We know that sets can be represented as lists.
List the nodes of the tree.
Associate each node with its list of targets.

Target-Lists Representation
[ [a, [b, c]],
  [b, []],
  [c, [d, e]],
  [d, [ ]],
  [e, [ ]]
]

Nested Target-Lists Representation
Recursively, list the root, followed by the representation of each sub-tree:
[a, ____, ____]
[a, [b], [c, ___, ___ ]]
[a, [b], [c, [d], [e]]]

Modified
Nested Target-Lists Representation
(Recall: A leaf is a node with no targets.)
When a sub-tree is a leaf, omit the brackets around it.
[a, ____, ____]
[a, b, [c, ___, ___ ]]
[a, b, [c, d, e]]
Less-cluttered appearance,
but also less uniform processing.

Testing leaf property
In rex, values are either:
atomic: numbers, strings
non-atomic: lists, arrays
atomic(X) tells whether X is atomic.

Representation of ÒUnlabelledÓ Trees
In this model, only leaves have labels.
A leaf is represented by its label
A non-leaf tree is represented by a list of the representations of the targets of the root.
[a, ____ ]
[a, [b, c]]

Exercise
How could you represent a tree in which both nodes and arcs have labels?

Representing Lists by Ordered Trees
This is a kind of converse to previous discussion.
Every list can be represented as an ordered binary tree (tree in which each node has at most two targets).
This corresponds to a ÒboxÓ storage abstraction, where the data items may themselves be lists.

Representing Lists as Trees
An atomic item (non-list) is represented by itself.
The null list is represented as a leaf [].
A list [First | Rest] is represented by a node with two targets:
The left target is the representation of First.
The right target is the representation of Rest.
Note that ordering of targets is essential.

Representing Lists as Trees
Representing Lists as Trees
Matters are actually simpler if we rotate the tree 45¼, so that ÒrightÓ is horizontally right and ÒleftÓ is down.

Example: Binary Tree
Represent as a binary tree: [1, 2, 3]

Example: Binary Tree
Represent as a binary tree:
[1, [2, 3], [4]]

Corresponding Box Diagram
[1, [2, 3], [4]]

Tree-Dichotomy for Recursion
We often want a different dichotomy than for lists:
The tree has a single node (and is thus a leaf).
The tree is a node with offspring (and is thus not a leaf).

Model Independence
We can abstract away the specific representation being used:
isLeaf(T) 1 when T is leaf, 0 otherwise.
offspring(T) the list of offspring of a non-leaf, undefined otherwise
The implementation depends on which tree model we are using.

Example Implementations
Labeled-Tree Implementation:
makeTree(Root, ListOfSubTrees) =
    [Root | ListOfSubTrees];
isLeaf(T) = null(rest(T));
getOffspring(T) = rest(T);
getRoot(T) = first(T);

Recursion on Trees
Basis: What happens on a single leaf.
Induction step: What happens on a non-leaf.

Example: Height of a Tree
The height of a tree is the length of the longest path from the root.
height(T) => isLeaf(T) ? 1;
height(T) => 1 + sum(map(height, getOffspring(T));
Let recursion do the work for you.

Searching Trees & Graphs
Searching a Labeled Tree
Suppose we want to find all nodes with labels having a property P.

Searching a Labeled Tree
findInTree(P, T) =

     Root = getRoot(T),

     foundInRest =
           mappend((S)=>findInTree(P, S), getOffspring(T)),
 
     P(Root) ? [Root | foundInRest] : foundInRest;

Depth-First Search
The preceding expresses only one form of search:

depth-first search

The pattern is to search ÒdeeperÓ before ÒbroaderÓ.

Depth- vs. Breadth- First
Breadth-First: Wavefront Analogy
Advantage of Breadth-First
Breadth-First
Searching a Labeled Tree
findInTreeBF(P, T) = findInForest(P, [T]);
findInForest(P, [ ]) => [ ];
 findInForest(P, [Tree | Trees]) =>
Root = root(Tree),
foundInRest = findInForest(P, append(Trees, getOffspring(Tree))),
P(Root) ? [Root | foundInRest] : foundInRest;

Searching Graphs vs. Trees
Basic ideas still apply, but
In graph, avoid re-searching same nodes due to fanin-in
In graph, avoid infinite loops.
How?

Depth- vs. Breadth- First in Graph
Searching Without Recursion
Depth-First: Use Stack
Breadth-First: Use Queue
Avoid Fan-in and Cycles:
ÒMarkÓ nodes as encountered
Refuse to re-search from a marked node
Marking can be metaphoric, e.g. by membership on a list, or
The node itself can be marked (non-functional programming).

Searching a Maze
A maze is an implicit graph
Nodes are identifiable by position
The arcs are implicit, determined by adjacent spatial positions.
Marking can be done in a Òparallel arrayÓ

Recovering Path
Searching for the first element satisfying some property.
Want not just the element, but the path from the starting point to the element.
How to accomplish?