Reachability in DAGs
To conclude our study of DAGs, we consider the problem of computing the transitive closure of a DAG. Can we develop algorithms for DAGs that are more efficient than the algorithms for general digraphs that we considered in ? Any method for topological sorting can serve as the basis for a transitive-closure algorithm for DAGs, as follows: We proceed through the vertices in reverse topological order, computing the reachability array for each vertex (its row in the transitive-closure matrix) from the rows corresponding to its adjacent vertices. The reverse topological sort ensures that all those rows have already been computed. In total, we check each of the V entries in the array corresponding to the destination vertex of each of the E edges, for a total running time proportional to VE. Although it is simple to implement, this method is no more efficient for DAGs than for general digraphs. When we use a standard DFS for the topological sort (see Program 19.7), we can improve performance for some DAGs, as shown in Program 19.9. Since there are no cycles in a DAG, there are no back edges in any DFS. More important, both cross edges and down edges point to nodes for which the DFS has completed. To take advantage of this fact, we develop a recursive method to compute all vertices reachable from a given start vertex, but (as usual in DFS) we make no recursive calls for vertices for which the reachable set has already been computed. In this case, the reachable vertices are represented by a row in the transitive closure, and the recursive method takes the logical or of all the rows associated with its adjacent edges. For tree edges, we do a recursive call to compute the row; for cross edges, we can skip the recursive call because we know that the row has been computed by a previous recursive call; for down edges, we can skip the whole computation because any reachable nodes that it would add have already been accounted for in the set of reachable nodes for the destination vertex (lower and earlier in the DFS tree). Using this version of DFS might be characterized as using dynamic coding to compute the transitive closure because we make use of results that have already been computed (and saved in the adjacency matrix rows corresponding to previously-processed vertices) to avoid making unnecessary recursive calls. Screenshot illustrates the computation of the transitive closure for the sample DAG in Screenshot.
Screenshot Transitive closure of a DAG
This sequence of row vectors is the transitive closure of the DAG in Screenshot, with rows created in reverse topological order, computed as the last action in a recursive DFS function (see Program 19.9). Each row is the logical or of the rows for adjacent vertices, which appear earlier in the list. For example, to compute the row for 0 we take the logical or of the rows for 5, 2, 1, and 6 (and put a 1 corresponding to 0 itself) because the edges 0-5, 0-2, 0-1, and 0-6 take us from 0 to any vertex that is reachable from any of those vertices. We can ignore down edges because they add no new information. For example, we ignore the edge from 0 to 3 because the vertices reachable from 3 are already accounted for in the row corresponding to 2.
Property 19.13 With dynamic coding and DFS, we can support constant query time for the abstract transitive closure of a DAG with space proportional to V2 and time proportional to V2 + VX for preprocessing (computing the transitive closure), where X is the number of cross edges in the DFS forest. Proof: The proof is immediate by induction from the recursive method in Program 19.9. We visit the vertices in reverse topological order. Every edge points to a vertex for which we have already computed all reachable vertices, and we can therefore compute the set of reachable vertices of any vertex by merging together the sets of reachable vertices associated with the destination vertex of each edge. Taking the logical or of the specified rows in the adjacency matrix accomplishes this merge. We access a row of size V for each tree edge and each cross edge. There are no back edges, and we can ignore down edges because we accounted for any vertices they reach when we processed any ancestors of both nodes earlier in the search.
If our DAG has no down edges (see Exercise 19.42), the running time of Program 19.9 is proportional to VE and represents no improvement over the transitive-closure algorithms that we examined for general digraphs in (such as, for example, Program 19.4) or the approach based on topological sorting that is described at the beginning of this section. On the other hand, if the number of down edges is large (or, equivalently, the number of cross edges is small), Program 19.9 will be significantly faster than these methods. The problem of finding an optimal algorithm (one that is guaranteed to finish in time proportional to V2) for computing the transitive closure of dense DAGs is still unsolved. The best known worst-case performance bound is VE. However, we are certainly better off using an algorithm that runs faster for a large class of DAGs, such as Program 19.9, than we are using one that always runs in time proportional to VE, such as Program 19.4. As we see in , this performance improvement for DAGs has direct implications for our ability to compute the transitive closure of general digraphs as well.