Previous    Next

### Properties of DFS Forests

As noted in , the trees that describe the recursive structure of DFS method calls give us the key to understanding how DFS operates. In this section, we examine properties of the algorithm by examining properties of DFS trees. If we add external nodes to the DFS tree to record the moments when we skipped recursive calls for vertices that had already been visited, we get the compact representation of the dynamics of DFS illustrated in Screenshot. This representation is worthy of careful study. The tree is a representation of the graph, with a vertex corresponding to each graph vertex and an edge corresponding to each graph edge. We can choose to show the two representations of the edge that we process (one in each direction), as shown in the left part of the figure, or just one representation of each edge, as shown in the center and right parts of the figure. The former is useful in understanding that the algorithm processes each and every edge; the latter is useful in understanding that the DFS tree is simply another graph representation. Traversing the internal nodes of the tree in preorder gives the vertices in the order in which DFS visits them; moreover, the order in which we visit the edges of the tree as we traverse it in preorder is the same as the order in which DFS examines the edges of the graph.

##### Screenshot DFS tree representations

If we augment the DFS recursive-call tree to represent edges that are checked but not followed, we get a complete description of the DFS process (left). Each tree node has a child representing each of the nodes adjacent to it, in the order they were considered by the DFS, and a preorder traversal gives the same information as Screenshot: First we follow 0-0, then 0-2, then we skip 2-0, then we follow 2-6, then we skip 6-2, then we follow 6-4, then 4-3, and so forth. The ord vector specifies the order in which we visit tree vertices during this preorder walk, which is the same as the order in which we visit graph vertices in the DFS. The st vector is a parent-link representation of the DFS recursive-call tree (see Screenshot). There are two links in the tree for every edge in the graph, one for each of the two times it encounters the edge. The first is to an unshaded node and either corresponds to making a recursive call (if it is to an internal node) or to skipping a recursive call because it goes to an ancestor for which a recursive call is in progress (if it is to an external node). The second is to a shaded external node and always corresponds to skipping a recursive call, either because it goes back to the parent (circles) or because it goes to a descendent of the parent for which a recursive call is in progress (squares). If we eliminate shaded nodes (center), then replace the external nodes with edges, we get another drawing of the graph (right).

Indeed, the DFS tree in Screenshot contains the same information as the trace in Screenshot or the step-by-step illustration of Trémaux traversal in Figures 18.2 and 18.3. Edges to internal nodes represent edges (passages) to unvisited vertices (intersections), edges to external nodes represent occasions where DFS checks edges that lead to previously visited vertices (intersections), and shaded nodes represent edges to vertices for which a recursive DFS is in progress (when we open a door to a passage where the door at the other end is already open). With these interpretations, a preorder traversal of the tree tells the same story as that of the detailed maze-traversal scenario. To study more intricate graph properties, we classify the edges in a graph according to the role that they play in the search. We have two distinct edge classes:

• Edges representing a recursive call (tree edges)
• Edges connecting a vertex with an ancestor in its DFS tree that is not its parent (back edges)

When we study DFS trees for digraphs in , we examine other types of edges, not just to take the direction into account but also because we can have edges that go across the tree, connecting nodes that are neither ancestors nor descendants in the tree. Since there are two representations of each graph edge that each correspond to a link in the DFS tree, we divide the tree links into four classes, using the preorder numbers and the parent links (in the ord and st arrays, respectively) that our DFS code computes. We refer to a link from v to w in a DFS tree that represents a tree edge as

• A tree link if w is unmarked
• A parent link if st[w] is v

and a link from v to w that represents a back edge as

• A back link if ord[w] < ord[v]
• A down link if ord[w] > ord[v]

##### Screenshot DFS trace (tree link classifications)

This version of Screenshot shows the classification of the DFS tree link corresponding to each graph edge representation encountered. Tree edges (which correspond to recursive calls) are represented as tree links on the first encounter and parent links on the second encounter. Back edges are back links on the first encounter and down links on the second encounter.

##### Screenshot DFS forest

The DFS forest at the top represents a DFS of an adjacency-matrix representation of the graph at the bottom right. The graph has three connected components, so the forest has three trees. The ord vector is a preorder numbering of the nodes in the tree (the order in which they are examined by the DFS) and the st vector is a parent-link representation of the forest. The cc vector associates each vertex with a connected-component index (see Program 18.3). As in Screenshot, edges to circles are tree edges, edges that go to squares are back edges, and shaded nodes indicate that the incident edge was encountered earlier in the search, in the other direction.

With an adjacency-lists representation, we visit the edges connected to each vertex in an order different from that for the adjacency-matrix representation, so we get a different DFS forest, as illustrated in Screenshot. DFS trees and forests are graph representations that describe not only the dynamics of DFS but also the internal representation of the graph. For example, by reading the children of any node in Screenshot from left to right, we see the order in which they appear on the adjacency list of the vertex corresponding to that node. We can have many different DFS forests for the same graph—each ordering of nodes on adjacency lists leads to a different forest.

##### Screenshot Another DFS forest

This forest describes depth-first search of the same graph as Screenshot, but using an adjacency-list representation, so the search order is different because it is determined by the order that nodes appear in adjacency lists. Indeed, the forest itself tells us that order: It is the order in which children are listed for each node in the tree. For instance, the nodes on 0's adjacency list were found in the order 5 2 1 6, the nodes on 4's list are in the order 6 5 3, and so forth. As before, all vertices and edges in the graph are examined during the search, in a manner that is precisely described by a preorder walk of the tree. The ord and st vectors depend upon the graph representation and the search dynamics and are different from Screenshot, but the vector cc depends on graph properties and is the same.

Details of the structure of a particular forest inform our understanding of how DFS operates for particular graphs, but most of the important DFS properties that we consider depend on graph properties that are independent of the structure of the forest. For example, the forests in Figures 18.11 and 18.12 both have three trees (as would any other DFS forest for the same graph) because they are just different representations of the same graph, which has three connected components. Indeed, a direct consequence of the basic proof that DFS visits all the nodes and edges of a graph (see Properties 18.2 through 18.4) is that the number of connected components in the graph is equal to the number of trees in the DFS forest. This example illustrates the basis for our use of graph search throughout the tutorial: A broad variety of graph-processing class implementations are based on learning graph properties by processing a particular graph representation (a forest corresponding to the search). Potentially, we could analyze DFS tree structures with the goal of improving algorithm performance. For example, should we attempt to speed up an algorithm by rearranging the adjacency lists before starting the search? For some of the important classical DFS-based algorithms, the answer to this question is no, because they are optimal—their worst-case running time depends on neither the graph structure nor the order in which edges appear on the adjacency lists (they essentially process each edge exactly once). Still, DFS forests have a characteristic structure that is worth understanding because it distinguishes them from other fundamental search schema that we consider later in this chapter. Screenshot shows a DFS tree for a larger graph that illustrates the basic characteristics of DFS search dynamics. The tree is tall and thin, and it demonstrates several characteristics of the graph being searched and of the DFS process.

##### Screenshot Depth-first search

This figure illustrates the progress of DFS in a random Euclidean near-neighbor graph (left). The figures show the DFS tree vertices and edges in the graph as the search progresses through 1/4, 1/2, 3/4, and all of the vertices (top to bottom). The DFS tree (tree edges only) is shown at the right. As is evident from this example, the search tree for DFS tends to be quite tall and thin for this type of graph (as it is for many other types of graphs commonly encountered in practice). We normally find a vertex nearby that we have not seen before.

• There exists at least one long path that connects a substantial fraction of the nodes.
• During the search, most vertices have at least one adjacent vertex that we have not yet seen.
• We rarely make more than one recursive call from any node.
• The depth of the recursion is proportional to the number of vertices in the graph.

This behavior is typical for DFS, though these characteristics are not guaranteed for all graphs. Verifying facts of this kind for graph models of interest and various types of graphs that arise in practice requires detailed study. Still, this example gives an intuitive feel for DFS-based algorithms that is often borne out in practice. Screenshot and similar figures for other graph-search algorithms (see Figures 18.24 and 18.29) help us understand differences in their behavior.

#### Exercises

Draw the DFS forest that results from a standard adjacency-matrix DFS of the graph

Draw the DFS forest that results from a standard adjacency-lists DFS of the graph

Write a DFS trace program to produce output that classifies each of the two representations of each graph edge as corresponding to a tree, parent, back, or down link in the DFS tree, in the style of Screenshot.

18.18 Write a program that computes a parent-link representation of the full DFS tree (including the external nodes), using an array of E integers between 0 and V – 1. Hint: The first V entries in the array should be the same as those in the st array described in the text.

18.19 Instrument the spanning-forest DFS class (Program 18.2) by adding methods (and appropriate private data fields) that return the height of the tallest tree in the forest, the number of back edges, and the percentage of edges processed to see every vertex.

• 18.20 Run experiments to determine empirically the average values of the quantities described in Exercise 18.19 for graphs of various sizes, drawn from various graph models (see Exercises 17.6476).

• 18.21 Write a method that builds a graph by inserting edges from a given array into an initially empty graph, in random order. Using this method with an adjacency-lists implementation of the graph ADT, run experiments to determine empirically properties of the distribution of the quantities described in Exercise 18.19 for all the adjacency-lists representations of large sample graphs of various sizes, drawn from various graph models (see Exercises 17.6476).

 Previous    Next