Glossary and Rules of the Game
Our definitions for digraphs are nearly identical to those in for undirected graphs (as are some of the algorithms and programs that we use), but they are worth restating. The slight differences in the wording to account for edge directions imply structural properties that will be the focus of this chapter. Definition 19.1 A digraph (or directed graph ) is a set of vertices plus a set of directed edges that connect ordered pairs of vertices (with no duplicate edges). We say that an edge goes from its first vertex to its second vertex. As we did with undirected graphs, we disallow duplicate edges in this definition but reserve the option of allowing them when convenient for various apps and implementations. We explicitly allow self-loops in digraphs (and usually adopt the convention that every vertex has one) because they play a critical role in the basic algorithms. Definition 19.2 A directed path in a digraph is a list of vertices in which there is a (directed) digraph edge connecting each vertex in the list to its successor in the list. We say that a vertex t is reachable from a vertex s if there is a directed path from s to t. We adopt the convention that each vertex is reachable from itself and normally implement that assumption by ensuring that self-loops are present in our digraph representations. Understanding many of the algorithms in this chapter requires understanding the connectivity properties of digraphs and the effect of these properties on the basic process of moving from one vertex to another along digraph edges. Developing such an understanding is more complicated for digraphs than for undirected graphs. For example, we might be able to tell at a glance whether a small undirected graph is connected or contains a cycle; these properties are not as easy to spot in digraphs, as indicated in the typical example illustrated in Screenshot.
Screenshot A grid digraph
This small digraph is similar to the large grid network that we first considered in Chapter 1, except that it has a directed edge on every grid line, with the direction randomly chosen. Even though the graph has relatively few nodes, its connectivity properties are not readily apparent. Is there a directed path from the upper-left corner to the lower-right corner?
While examples like this highlight differences, it is important to note that what a human considers difficult may or may not be relevant to what a program considers difficult—for instance, writing a DFS class to find cycles in digraphs is no more difficult than for undirected graphs. More important, digraphs and graphs have essential structural differences. For example, the fact that t is reachable from s in a digraph indicates nothing about whether s is reachable from t. This distinction is obvious, but critical, as we shall see. As mentioned in , the representations that we use for digraphs are essentially the same as those that we use for undirected graphs. Indeed, they are more straightforward because we represent each edge just once, as illustrated in Screenshot. In the adjacency-lists representation, an edge s-t is represented as a list node containing t in the linked list corresponding to s. In the adjacency-matrix representation, we need to maintain a full V-by-V matrix and to represent an edge s-t by a true entry in row s and column t. We do not put a true entry in row t and column s unless there is also an edge t-s. In general, the adjacency matrix for a digraph is not symmetric about the diagonal.
Screenshot Digraph representations
The adjacency-matrix and adjacency-lists representations of a digraph have only one representation of each edge, as illustrated in the adjacency-matrix (top) and adjacency-lists (bottom) representation of the graph depicted in Screenshot. These representations both include self-loops at every vertex, which is typical in digraph processing.
There is no difference in these representations between an undirected graph and a directed graph with self-loops at every vertex and two directed edges for each edge connecting distinct vertices in the undirected graph (one in each direction). Thus, we can use the algorithms that we develop in this chapter for digraphs to process undirected graphs, provided that we interpret the results appropriately. In addition, we use the programs that we considered in as the basis for our digraph programs: Our Graph implementations in Programs 17.7 through 17.10 build digraphs when the constructor has true as a second parameter.
The indegree of a vertex in a digraph is the number of directed edges that lead to that vertex. The outdegree of a vertex in a digraph is the number of directed edges that emanate from that vertex. No vertex is reachable from a vertex of outdegree 0, which is called a sink; a vertex of indegree 0, which is called a source, is not reachable from any other vertex. A digraph where self-loops are allowed and every vertex has outdegree 1 is called a map (a function from the set of integers from 0 to V – 1 onto itself). We can easily compute the indegree and outdegree of each vertex (and therefore also find sources and sinks) in linear time and space proportional to V, using vertex-indexed arrays (see Exercise 19.19). The reverse of a digraph is the digraph that we obtain by switching the direction of all the edges. Screenshot shows the reverse and its representations for the digraph of Screenshot. We use the reverse in digraph algorithms when we need to know from where edges come because our standard representations tell us only where the edges go. For example, indegree and outdegree change roles when we reverse a digraph.
Screenshot Digraph reversal
For an adjacency-matrix representation, we could compute the reverse by making a copy of the matrix and transposing it (interchanging its rows and columns). If we know that the graph is not going to be modified, we can actually use the reverse without any extra computation by simply interchanging the vertices in our references to edges when we want to refer to the reverse. For example, an edge s-t in a digraph G is indicated by true in adj[s][t]. Thus, if we were to compute the reverse R of G, it would have true in adj[t][s]. We do not need to do so, however, if we base our client implementations on the edge test edge(s, t), because to switch to the reverse we just replace every such reference by edge(t, s). This opportunity may seem obvious, but it is often overlooked. For an adjacency-lists representation, the reverse is a completely different data structure, and we need to take time proportional to the number of edges to build it, as shown in Program 19.1. Yet another option, which we shall address in , is to maintain two representations of each edge, in the same manner as we do for undirected graphs (see ) but with an extra bit that indicates edge direction. For example, to use this method in an adjacency-lists representation we would represent an edge s-t by a node for t on the adjacency list for s (with the direction bit set to indicate that to move from s to t is a forward traversal of the edge) and a node for s on the adjacency list for t (with the direction bit set to indicate that to move from t to s is a backward traversal of the edge). This representation supports algorithms that need to traverse edges in digraphs in both directions. It is also generally convenient to include pointers connecting the two representations of each edge in such cases. We defer considering this representation in detail to , where it plays an essential role. In digraphs, by analogy to undirected graphs, we speak of directed cycles, which are directed paths from a vertex back to itself, and simple directed paths and cycles, where the vertices and edges are distinct. Note that s-t-s is a cycle of length 2 in a digraph but that cycles in undirected graphs must have at least three distinct vertices. In many apps of digraphs, we do not expect to see any cycles, and we work with yet another type of combinatorial object. Definition 19.3 A directed acyclic graph (DAG) is a digraph with no directed cycles. We expect DAGs, for example, in apps where we are using digraphs to model precedence relationships. DAGs not only arise naturally in these and other important apps but also, as we shall see, in the study of the structure of general digraphs. A sample DAG is illustrated in Screenshot.
Screenshot A directed acyclic graph (DAG)
This digraph has no cycles, a property that is not immediately apparent from the edge list or even from examining its drawing.
Screenshot Digraph terminology
Sources (vertices with no edges coming in) and sinks (vertices with no edges going out) are easy to identify in digraph drawings like this one, but directed cycles and strongly connected components are more difficult to identify. What is the longest directed cycle in this digraph? How many strongly connected components with more than one vertex are there?
Directed cycles are therefore the key to understanding connectivity in digraphs that are not DAGs. An undirected graph is connected if there is a path from every vertex to every other vertex; for digraphs, we modify the definition as follows: Definition 19.4 A digraph is strongly connected if every vertex is reachable from every vertex. The graph in Screenshot is not strongly connected because, for example, there are no directed paths from vertices 9 through 12 to any of the other vertices in the graph. As indicated by strongly, this definition implies a relationship between each pair of vertices stronger than reachability. In any digraph, we say that a pair of vertices s and t are strongly connected or mutually reachable if there is a directed path from s to t and a directed path from t to s. (Our convention that each vertex is reachable from itself implies that each vertex is strongly connected to itself.) A digraph is strongly connected if and only if all pairs of vertices are strongly connected. The defining property of strongly connected digraphs is one that we take for granted in connected undirected graphs: If there is a path from s to t, then there is a path from t to s. In the case of undirected graphs, we know this fact because the same path fits the bill, traversed in the other direction; in digraphs, it must be a different path. Another way of saying that a pair of vertices is strongly connected is that they lie on some directed cyclic path. Recall that we use the term cyclic path instead of cycle to indicate that the path does not need to be simple. For example, in Screenshot, 5 and 6 are strongly connected because 6 is reachable from 5 via the directed path 5-4-2-0-6 and 5 is reachable from 6 via the directed path 6-4-3-5; and these paths imply that 5 and 6 lie on the directed cyclic path 5-4-2-0-6-4-3-5, but they do not lie on any (simple) directed cycle. Note that no DAG that contains more than one vertex is strongly connected. Like simple connectivity in undirected graphs, this relation is transitive: If s is strongly connected to t, and t is strongly connected to u, then s is strongly connected to u. Strong connectivity is an equivalence relation that divides the vertices into equivalence classes containing vertices that are strongly connected to each other. (See for a detailed discussion of equivalence relations.) Again, strong connectivity provides a property for digraphs that we take for granted with respect to connectivity in undirected graphs. Property 19.1 A digraph that is not strongly connected comprises a set of strongly connected components (strong components, for short), which are maximal strongly connected subgraphs, and a set of directed edges that go from one component to another. Proof: Like components in undirected graphs, strong components in digraphs are induced subgraphs of subsets of vertices: Each vertex is in exactly one strong component. To prove this fact, we first note that every vertex belongs to at least one strong component, which contains (at least) the vertex itself. Then we note that every vertex belongs to at most one strong component: If a vertex were to belong to two different components, then there would be paths through that vertex connecting vertices in those components to each other, in both directions, which contradicts the maximality of both components.
For example, a digraph that consists of a single directed cycle has just one strong component. At the other extreme, each vertex in a DAG is a strong component, so each edge in a DAG goes from one component to another. In general, not all edges in a digraph are in the strong components. This situation is in contrast to the analogous situation for connected components in undirected graphs, where every vertex and every edge belongs to some connected component, but similar to the analogous situation for edge-connected components in undirected graphs. The strong components in a digraph are connected by edges that go from a vertex in one component to a vertex in another but do not go back again. Property 19.2 Given a digraph D, define another digraph K (D) with one vertex corresponding to each strong component of D and one edge in K (D) corresponding to each edge in D that connects vertices in different strong components (connecting the vertices in K that correspond to the strong components that it connects in D). Then, K(D) is a DAG (which we call the kernel DAG of D). Proof: If K(D) were to have a directed cycle, then vertices in two different strong components of D would fall on a directed cycle, and that would be a contradiction.
Screenshot shows the strong components and the kernel DAG for a sample digraph. We look at algorithms for finding strong components and building kernel DAGs in .
Screenshot Strong components and kernel DAG
This digraph (top) consists of four strong components, as identified (with arbitrary integer labels) by the vertex-indexed array id (center). Component 0 consists of the vertices 9, 10, 11 and 12; component 1 consists of the single vertex 1; component 2 consists of the vertices 0, 2, 3, 4, 5, and 6; and component 3 consists of the vertices 7 and 8. If we draw the graph defined by the edges between different components, we get a DAG (bottom)
From these definitions, properties, and examples, it is clear that we need to be precise when referring to paths in digraphs. We need to consider at least the following three situations: Connectivity We reserve the term connected for undirected graphs. In digraphs, we might say that two vertices are connected if they are connected in the undirected graph defined by ignoring edge directions, but we generally avoid such usage. Reachability In digraphs, we say that vertex t is reachable from vertex s if there is a directed path from s to t. We generally avoid the term reachable when referring to undirected graphs, although we might consider it to be equivalent to connected because the idea of one vertex being reachable from another is intuitive in certain undirected graphs (for example, those that represent mazes). Strong connectivity Two vertices in a digraph are strongly connected if they are mutually reachable; in undirected graphs, two connected vertices imply the existence of paths from each to the other. Strong connectivity in digraphs is similar in certain ways to edge connectivity in undirected graphs. We wish to support digraph ADT operations that take two vertices s and t as parameters and allow us to test whether
What resource requirements are we willing to expend for these operations? As we saw in , DFS provides a simple solution for connectivity in undirected graphs that takes time proportional to V, but if we are willing to invest preprocessing time proportional to V + E and space proportional to V, we can answer connectivity queries in constant time. Later in this chapter, we examine algorithms for strong connectivity that have these same performance characteristics. But our primary aim is to address the fact that reachability queries in digraphs are more difficult to handle than connectivity or strong connectivity queries. In this chapter, we examine classical algorithms that require preprocessing time proportional to VE and space proportional to V2, develop implementations that can achieve constant-time reachability queries with linear space and preprocessing time for some digraphs, and study the difficulty of achieving this optimal performance for all digraphs.