Transitive Closure Revisited
By putting together the results of the previous two sections, we can develop an algorithm to solve the abstract–transitive-closure problem for digraphs that—although it offers no improvement over a DFS-based solution in the worst case—will provide an optimal solution in many situations. The algorithm is based on preprocessing the digraph to build the latter's kernel DAG (see Property 19.2. The algorithm is efficient if the kernel DAG is small relative to the size of the original digraph. If the digraph is a DAG (and therefore is identical to its kernel DAG) or if it has just a few small cycles, we will not see any significant cost savings; however, if the digraph has large cycles or large strong components (and therefore a small kernel DAG), we can develop optimal or near-optimal algorithms. For clarity, we assume the kernel DAG to be sufficiently small that we can afford an adjacency-matrix representation, although the basic idea is still effective for larger kernel DAGs. To implement the abstract transitive closure, we preprocess the digraph as follows:
We can use Kosaraju's, Tarjan's, or Gabow's algorithm to find the strong components; a single pass through the edges to build the kernel DAG (as described in the next paragraph); and DFS (Program 19.9) to compute its transitive closure. After this preprocessing, we can immediately access the information necessary to determine reachability. Once we have a vertex-indexed array with the strong components of a digraph, building an adjacency-matrix representation of its kernel DAG is a simple matter. The vertices of the DAG are the component numbers in the digraph. For each edge s-t in the original digraph, we simply set D.adj[sc[s]][sc[t]] to true. We would have to cope with duplicate edges in the kernel DAG if we were using an adjacency-lists representation—in an adjacency matrix, duplicate edges simply correspond to setting to true a matrix entry that has already been set to true. This small point is significant because the number of duplicate edges is potentially huge (relative to the size of the kernel DAG) in this app. Property 19.17 Given two vertices s and t in a digraph D, let sc(s) and sc (t), respectively, be their corresponding vertices in D's kernel DAG K. Then, t is reachable from s in D if and only if sc(t) is reachable from sc(s) in K. This simple fact follows from the definitions. In particular, this property assumes the convention that a vertex is reachable from itself (all vertices have self-loops). If the vertices are in the same strong component (sc(s) = sc(t)), then they are mutually reachable.
We determine whether a given vertex t is reachable from a given vertex s in the same way as we built the kernel DAG: We use the vertex-indexed array computed by the strong-components algorithm to get the component numbers sc (s ) and sc (t ) (in constant time), which we interpret as abstract vertex indices for the kernel DAG. Using them as vertex indices for the transitive closure of the kernel DAG tells us the result. Program 19.13 is an implementation of the abstract–transitive-closure ADT that embodies these ideas. It uses an abstract–transitive-closure interface for the kernel DAG as well! The running time of this implementation depends not just on the number of vertices and edges in the digraph but also on properties of its kernel DAG. For purposes of analysis, we suppose that we use an adjacency-matrix representation for the kernel DAG because we expect the kernel DAG to be small, if not also dense.
Property 19.18 We can support constant query time for the abstract transitive closure of a digraph with space proportional to V + v2 and time proportional to E + v2 + vx for preprocessing (computing the transitive closure), where v is the number of vertices in the kernel DAG and x is the number of cross edges in its DFS forest. Proof: Immediate from Property 19.13.
If the digraph is a DAG, then the strong-components computation provides no new information, and this algorithm is the same as Program 19.9; in general digraphs that have cycles, however, this algorithm is likely to be significantly faster than Warshall's algorithm or the DFS-based solution. For example, Property 19.18 immediately implies the following result.
Table 19.2. Properties of random digraphs
We might consider other variations on these bounds. For example, if we are willing to use space proportional to E, we can achieve the same time bounds when the kernel DAG has up to vertices. Moreover, these time bounds are conservative because they assume that the kernel DAG is dense with cross edges—and certainly it need not be so. The primary limiting factor in the applicability of this method is the size of the kernel DAG. The more similar our digraph is to a DAG (the larger its kernel DAG), the more difficulty we face in computing its transitive closure. Note that (of course) we still have not violated the lower bound implicit in Property 19.9, since the algorithm runs in time proportional to V3 for dense DAGs; we have, however, significantly broadened the class of graphs for which we can avoid this worst-case performance. Indeed, constructing a random-digraph model that produces digraphs for which the algorithm is slow is a challenge (see Exercise 19.142). Table 19.2 displays the results of an empirical study; it shows that random digraphs have small kernel DAGs even for moderate densities and even in models with severe restrictions on edge placement. Although there can be no guarantees in the worst case, we can expect to see huge digraphs with small kernel DAGs in practice. When we do have such a digraph, we can provide an efficient implementation of the abstract–transitive-closure ADT.