Graph-Search ADT Methods
DFS and the other graph-search methods that we consider later in this chapter all involve following graph edges from vertex to vertex, with the goal of systematically visiting every vertex and every edge in the graph. But following graph edges from vertex to vertex can lead us to all the vertices in only the same connected component as the starting vertex. In general, of course, graphs might not be connected, so we need one call on a search method for each connected component. We will typically use graph-search methods that perform the following steps until all of the vertices of the graph have been marked as having been visited:
The method for marking vertices is not specified in this description, but we most often use the same method that we used for the DFS implementations in : We initialize all entries in a private vertex-indexed array to a negative integer, and mark vertices by setting their corresponding entry to a nonnegative value. Using this procedure amounts to using a single bit (the sign bit) for the mark; most implementations are also concerned with keeping other information associated with marked vertices in the array (such as, for the DFS implementation in , the order in which vertices are marked). The method for looking for a vertex in the next connected component is also not specified, but we most often use a scan through the array in order of increasing index. We pass an edge to the search method (using a dummy self-loop in the first call for each connected component), instead of passing its destination vertex, because the edge tells us how we reached the vertex. Knowing the edge corresponds to knowing which passage led to a particular intersection in a maze. This information is useful in many DFS classes. When we are simply keeping track of which vertices we have visited, this information is of little consequence; but more interesting problems require that we always know from whence we came. Program 18.2 is an implementation that illustrates these choices. It is a DFS-based class for computing a spanning forest, but first we just note that it uses the ord array to keep track of which vertices have been visited, allowing it to visit each vertex precisely once, as illustrated in Screenshot. Typically, the classes that we consider also examine all edges incident upon each vertex visited. In such cases, knowing that we visit all vertices tells us that we visit all edges as well, as in Trémaux traversal.
Screenshot Graph search
The table at the bottom shows vertex marks (contents of the ord vector) during a typical search of the graph at the top. Initially, the constructor unmarks all vertices by setting the marks all to -1 (indicated by an asterisk). Then it calls searchC for the dummy edge 0-0, which marks all of the vertices in the same connected component as 0 (second row) by setting them to nonnegative values (indicated by 0s). In this example, it marks 0, 1, 4, and 9 with the values 0 through 3 in that order. Next, it scans from left to right to find the unmarked vertex 2 and calls search for the dummy edge 2-2 (third row), which marks the seven vertices in the same connected component as 2. Continuing the left-to-right scan, it calls search for 8-8 to mark 8 and 11 (bottom row). Finally, the constructor completes the search by discovering that 9 through 12 are all marked.
To compute a spanning forest, Program 18.2 adds a private array st to hold a parent-link representation of the tree (initialized in the constructor); defines a searchC that is similar to searchC from Program 18.1, except that it takes an edge v-w as parameter and to set st[w] to v; and adds a query method that allows clients to learn the parent of any vertex. Spanning forests are of interest in many apps, but our primary interest in them in this chapter is their relevance in understanding the dynamic behavior of the DFS, the topic of . In a connected graph, the constructor in Program 18.2 calls searchC once for 0-0 and then finds that all the other vertices are marked. In a graph with more than one connected component, the constructor checks all the connected components in a straightforward manner. DFS is the first of several methods that we consider for searching a connected component. No matter which method (and no matter what graph representation) we use, we visit all the graph vertices. Property 18.2 A graph-search method checks each edge and marks each vertex in a graph if and only if the search method that it uses marks each vertex and checks each edge in the connected component that contains the start vertex. Proof: By induction on the number of connected components.
Thus, graph-search methods provide a systematic way of processing each vertex and each edge in a graph. Generally, our implementations are designed to run in linear or near-linear time, by doing a fixed amount of processing per edge. We prove this fact now for DFS, noting that the same proof technique works for several other search strategies. Property 18.3 DFS of a graph represented with an adjacency matrix requires time proportional to V2. Proof: An argument similar to the proof of Property 18.1 shows that searchC not only marks all vertices connected to the start vertex but also calls itself exactly once for each such vertex (to mark that vertex). An argument similar to the proof of Property 18.2 shows that invoking the constructor leads to exactly one invocation of searchC for each graph vertex. In searchC, the iterator checks every entry in the vertex's row in the adjacency matrix. In other words, the search checks each entry in the adjacency matrix precisely once.
Property 18.4 DFS of a graph represented with adjacency lists requires time proportional to V + E. Proof: From the argument just outlined, it follows that we call the recursive method precisely V times (hence the V term), and we examine each entry on each adjacency list (hence the E term).
The primary implication of Properties 18.3 and 18.4 is that they establish the running time of DFS to be linear in the size of the data structure used to represent the graph. In most situations, we are also justified in thinking of the running time of DFS as being linear in the size of the graph, as well: If we have a dense graph (with the number of edges proportional to V2) then either representation gives this result; if we have a sparse graph, then we assume use of an adjacency-lists representation. Indeed, we normally think of the running time of DFS as being linear in E. That statement is technically not true if we are using adjacency matrices for sparse graphs or for extremely sparse graphs with E << V and most vertices isolated, but we can usually avoid the former situation, and we can remove isolated vertices (see Exercise 17.34) in the latter situation. As we shall see, these arguments all apply to any algorithm that has a few of the same essential features of DFS. If the algorithm marks each vertex and examines all the latter's incident vertices (and does any other work that takes time per vertex bounded by a constant), then these properties apply. More generally, if the time per vertex is bounded by some function f(V,E), then the time for the search is guaranteed to be proportional to E + V f(V,E). In , we see that DFS is one of a family of algorithms that has just these characteristics; in s 19 through 22, we see that algorithms from this family serve as the basis for a substantial fraction of the code that we consider in this tutorial. Much of the graph-processing code that we examine is ADT-implementation code for some particular task, where we develop a class that does a basic search to compute structural information in other vertex-indexed arrays. Many of our graph-processing classes are of this nature because we typically can uncover a graph's structure by searching it. We normally add code to the search method that is executed when each vertex is marked, instead of working with a more generic search (for example, one that calls a specified method each time a vertex is visited), solely to keep the code compact and self-contained. Providing a more general ADT mechanism for clients to process all the vertices with a client-supplied method is a worthwhile exercise (see Exercises 18.13 and 18.14). In s 18.5 and 18.6, we examine numerous graph-processing methods that are based on DFS. In s 18.7 and 18.8, we look at other constructor implementations and at some graph-processing methods that are based on them. Although we do not build this layer of abstraction into our code, we take care to identify the basic graph-search strategy underlying each algorithm that we develop. For example, we use the term DFS class to refer to any implementation that is based on the recursive DFS scheme. The simple-path–search class Program 17.11 and the spanning-forest class Program 18.2 are examples of DFS classes. Many graph-processing methods are based on the use of vertex-indexed arrays. We typically include such arrays as private data fields in class implementations to hold information about the structure of graphs (which is discovered during the search) that helps us solve the problem at hand. Examples of such arrays are the deg array in Program 17.11 and the ord array in Program 18.1. Some implementations that we will examine use multiple arrays to learn complicated structural properties. Our convention in graph-search methods is to initialize all entries in vertex-indexed arrays to -1 and to set the entries corresponding to each vertex visited to nonnegative values in the search method. Any such array can play the role of the ord array (marking vertices as visited) in Programs 18.2 and 18.3. When a graph-search method is based on using or computing a vertex-indexed array, we often just implement the search and use that array to mark vertices, rather than maintaining the ord array. The specific outcome of a graph search depends not just on the nature of the search method but also on the graph representation and even the order in which the constructor examines the vertices. For specificity in the examples and exercises in this tutorial, we use the term standard adjacency-lists DFS to refer to the process of inserting a sequence of edges into a graph ADT implemented with an adjacency-lists representation (Program 17.9), then doing a DFS with, for example, Program 18.2. For the adjacency-matrix representation, the order of edge insertion does not affect search dynamics, but we use the parallel term standard adjacency-matrix DFS to refer to the process of inserting a sequence of edges into a graph ADT implemented with an adjacency-matrix representation (Program 17.7), then doing a DFS with, for example, Program 18.2.