A substantial amount of nomenclature is associated with graphs. Most of the terms have straightforward definitions, and, for reference, it is convenient to consider them in one place: here. We have already used some of these concepts when considering basic algorithms in Part 1; others of them will not become relevant until we address associated advanced algorithms in s 18 through 22. Definition 17.1 A graph is a set of vertices and a set of edges that connect pairs of distinct vertices (with at most one edge connecting any pair of vertices). We use the names 0 through V-1 for the vertices in a V-vertex graph. The main reason that we choose this system is that we can access quickly information corresponding to each vertex, using array indexing. In , we consider a program that uses a symbol table to establish a 1–1 mapping to associate V arbitrary vertex names with the V integers between 0 and V – 1. With that program in hand, we can use indices as vertex names (for notational convenience) without loss of generality. We sometimes assume that the set of vertices is defined implicitly, by taking the set of edges to define the graph and considering only those vertices that are included in at least one edge. To avoid cumbersome usage such as "the ten-vertex graph with the following set of edges," we do not explicitly mention the number of vertices when that number is clear from the context. By convention, we always denote the number of vertices in a given graph by V, and denote the number of edges by E. We adopt as standard this definition of a graph (which we first encountered in Chapter 5), but note that it embodies two technical simplifications. First, it disallows duplicate edges (mathematicians sometimes refer to such edges as parallel edges, and a graph that can contain them as a multigraph). Second, it disallows edges that connect vertices to themselves; such edges are called self-loops. Graphs that have no parallel edges or self-loops are sometimes referred to as simple graphs. We use simple graphs in our formal definitions because it is easier to express their basic properties and because parallel edges and self-loops are not needed in many apps. For example, we can bound the number of edges in a simple graph with a given number of vertices. Property 17.1 A graph with V vertices has at most V (V – 1)/2 edges. Proof: The total of V2 possible pairs of vertices includes V self-loops and accounts twice for each edge between distinct vertices, so the number of edges is at most (V2 – V)/2 = V(V – 1)/2.
No such bound holds if we allow parallel edges: a graph that is not simple might consist of two vertices and billions of edges connecting them (or even a single vertex and billions of self-loops). For some apps, we might consider the elimination of parallel edges and self-loops to be a data-processing problem that our implementations must address. For other apps, ensuring that a given set of edges represents a simple graph may not be worth the trouble. Throughout the tutorial, whenever it is more convenient to address an app or to develop an algorithm by using an extended definition that includes parallel edges or self-loops, we shall do so. For example, self-loops play a critical role in a classical algorithm that we will examine in ; and parallel edges are common in the apps that we address in . Generally, it is clear from the context whether we intend the term "graph" to mean "simple graph" or "multigraph" or "multigraph with self-loops." Mathematicians use the words vertex and node interchangeably, but we generally use vertex when discussing graphs and node when discussing representations—for example, in Java data structures. We normally assume that a vertex can have a name and can carry other associated information. Similarly, the words arc, edge, and link are all widely used by mathematicians to describe the abstraction embodying a connection between two vertices, but we consistently use edge when discussing graphs and link when discussing references in Java data structures. When there is an edge connecting two vertices, we say that the vertices are adjacent to one another and that the edge is incident on both vertices. The degree of a vertex is the number of edges incident on it. We use the notation v-w to represent an edge that connects v and w; the notation w-v is an alternative way to represent the same edge. A subgraph is a subset of a graph's edges (and associated vertices) that constitutes a graph. Many computational tasks involve identifying subgraphs of various types. If we identify a subset of a graph's vertices, we call that subset, together with all edges that connect two of its members, the induced subgraph associated with those vertices. We can draw a graph by marking points for the vertices and drawing lines connecting them for the edges. A drawing gives us intuition about the structure of the graph; but this intuition can be misleading, because the graph is defined independently of the representation. For example, the two drawings in Screenshot and the list of edges represent the same graph, because the graph is only its (unordered) set of vertices and its (unordered) set of edges (pairs of vertices)—nothing more. Although it suffices to consider a graph simply as a set of edges, we examine other representations that are particularly suitable as the basis for graph data structures in .
Screenshot Three different representations of the same graph
A graph is defined by its vertices and its edges, not by the way that we choose to draw it. These two drawings depict the same graph, as does the list of edges (bottom), given the additional information that the graph has 13 vertices labeled 0 through 12.
Placing the vertices of a given graph on the plane and drawing them and the edges that connect them is known as graph drawing. The possible vertex placements, edge-drawing styles, and aesthetic constraints on the drawing are limitless. Graph-drawing algorithms that respect various natural constraints have been studied heavily and have many successful apps (see reference section). For example, one of the simplest constraints is to insist that edges do not intersect. A planar graph is one that can be drawn in the plane without any edges crossing. Determining whether or not a graph is planar is a fascinating algorithmic problem that we discuss briefly in . Being able to produce a helpful visual representation is a useful goal, and graph drawing is a fascinating field of study, but successful drawings are often difficult to realize. Many graphs that have huge numbers of vertices and edges are abstract objects for which no suitable drawing is feasible. For some apps, such as graphs that represent maps or circuits, a graph drawing can carry considerable information because the vertices correspond to points in the plane and the distances between them are relevant. We refer to such graphs as Euclidean graphs. For many other apps, such as graphs that represent relationships or schedules, the graphs simply embody connectivity information, and no particular geometric placement of vertices is ever implied. We consider examples of algorithms that exploit the geometric information in Euclidean graphs in s 20 and 21, but we primarily work with algorithms that make no use of any geometric information and stress that graphs are generally independent of any particular representation in a drawing or in a computer. Focusing solely on the connections themselves, we might wish to view the vertex labels as merely a notational convenience and to regard two graphs as being the same if they differ in only the vertex labels. Two graphs are isomorphic if we can change the vertex labels on one to make its set of edges identical to the other. Determining whether or not two graphs are isomorphic is a difficult computational problem (see Screenshot and Exercise 17.5). It is challenging because there are V! possible ways to label the vertices—far too many for us to try all the possibilities. Therefore, despite the potential appeal of reducing the number of different graph structures that we have to consider by treating isomorphic graphs as identical structures, we rarely do so.
Screenshot Graph isomorphism examples
The top two graphs are isomorphic because we can relabel the vertices to make the two sets of edges identical (to make the middle graph the same as the top graph, change 10 to 4, 7 to 3, 2 to 5, 3 to 1, 12 to 0, 5 to 2, 9 to 11, 0 to 12, 11 to 9, 1 to 7, and 4 to 10). The bottom graph is not isomorphic to the others because there is no way to relabel the vertices to make its set of edges identical to either.
As we saw with trees in Chapter 5, we are often interested in basic structural properties that we can deduce by considering specific sequences of edges in a graph. Definition 17.2 A path in a graph is a sequence of vertices in which each successive vertex (after the first) is adjacent to its predecessor in the path. In a simple path, the vertices and edges are distinct. A cycle is a path that is simple except that the first and final vertices are the same. We sometimes use the term cyclic path to refer to a path whose first and final vertices are the same (and is otherwise not necessarily simple); and we use the term tour to refer to a cyclic path that includes every vertex. An equivalent way to define a path is as the sequence of edges that connect the successive vertices. We emphasize this in our notation by connecting vertex names in a path in the same way as we connect them in an edge. For example, the simple paths in Screenshot include 3-4-6-0-2, and 9-12-11, and the cycles in the graph include 0-6-4-3-5-0 and 5-4-3-5. We define the length of a path or a cycle to be its number of edges. We adopt the convention that each single vertex is a path of length 0 (a path from the vertex to itself with no edges on it, which is different from a self-loop). Apart from this convention, in a graph with no parallel edges and no self-loops, a pair of vertices uniquely determines an edge, paths must have on them at least two distinct vertices, and cycles must have on them at least three distinct edges and three distinct vertices. We say that two simple paths are disjoint if they have no vertices in common other than, possibly, their endpoints. Placing this condition is slightly weaker than insisting that the paths have no vertices at all in common, and is useful because we can combine simple disjoint paths from s to t and t to u to get a simple disjoint path from s to u if s and u are different (and to get a cycle if s and u are the same). The term vertex disjoint is sometimes used to distinguish this condition from the stronger condition of edge disjoint, where we require that the paths have no edge in common. Definition 17.3 A graph is a connected graph if there is a path from every vertex to every other vertex in the graph. A graph that is not connected consists of a set of connected components, which are maximal connected subgraphs. The term maximal connected subgraph means that there is no path from a subgraph vertex to any vertex in the graph that is not in the subgraph. Intuitively, if the vertices were physical objects, such as knots or beads, and the edges were physical connections, such as strings or wires, a connected graph would stay in one piece if picked up by any vertex, and a graph that is not connected comprises two or more such pieces. Definition 17.4 An acyclic connected graph is called a tree (see Chapter 4). A set of trees is called a forest. A spanning tree of a connected graph is a subgraph that contains all of that graph's vertices and is a single tree. A spanning forest of a graph is a subgraph that contains all of that graph's vertices and is a forest. For example, the graph illustrated in Screenshot has three connected components and is spanned by the forest 7-8 9-10 9-11 9-12 0-1 0-2 0-5 5-3 5-4 4-6 (there are many other spanning forests). Screenshot highlights these and other features in a larger graph.
Screenshot Graph terminology
This graph has 55 vertices, 70 edges, and 3 connected components. One of the connected components is a tree (right). The graph has many cycles, one of which is highlighted in the large connected component (left). The diagram also depicts a spanning tree in the small connected component (center). The graph as a whole does not have a spanning tree, because it is not connected.
We explore further details about trees in Chapter 4 and look at various equivalent definitions. For example, a graph G with V vertices is a tree if and only if it satisfies any of the following four conditions:
Any one of these conditions is necessary and sufficient to prove the other three, and we can develop other combinations of facts about trees from them (see Exercise 17.1). Formally, we should choose one condition to serve as a definition; informally, we let them collectively serve as the definition and freely engage in usage such as the "acyclic connected graph" choice in Definition 17.4. Graphs with all edges present are called complete graphs (see Screenshot). We define the complement of a graph G by starting with a complete graph that has the same set of vertices as the original graph and then removing the edges of G. The union of two graphs is the graph induced by the union of their sets of edges. The union of a graph and its complement is a complete graph. All graphs that have V vertices are subgraphs of the complete graph that has V vertices. The total number of different graphs that have V vertices is 2V(V–1)/2 (the number of different ways to choose a subset from the V(V – 1)/2 possible edges). A complete subgraph is called a clique.
Screenshot Complete graphs
These complete graphs, with every vertex connected to every other vertex, have 10, 15, 21, 28, and 36 edges (bottom to top). Every graph with between 5 and 9 vertices (there are more than 68 billion such graphs) is a subgraph of one of these graphs.
Most graphs that we encounter in practice have relatively few of the possible edges present. To quantify this concept, we define the density of a graph to be the average vertex degree, or 2E/V. A dense graph is a graph whose average vertex degree is proportional to V; a sparse graph is a graph whose complement is dense. In other words, we consider a graph to be dense if E is proportional to V2 and sparse otherwise. This asymptotic definition is not necessarily meaningful for a particular graph, but the distinction is generally clear: A graph that has millions of vertices and tens of millions of edges is certainly sparse, and a graph that has thousands of vertices and millions of edges is certainly dense. We might contemplate processing a sparse graph with billions of vertices, but a dense graph with billions of vertices would have an overwhelming number of edges. Knowing whether a graph is sparse or dense is generally a key factor in selecting an efficient algorithm to process the graph. For example, for a given problem, we might develop one algorithm that takes about V2 steps and another that takes about E lg E steps. These formulas tell us that the second algorithm would be better for sparse graphs, whereas the first would be preferred for dense graphs. For example, a dense graph with millions of edges might have only thousands of vertices: in this case V2 and E would be comparable in value, and the V2 algorithm would be 20 times faster than the E lg E algorithm. On the other hand, a sparse graph with millions of edges also has millions of vertices, so the E lg E algorithm could be millions of times faster than the V2 algorithm. We could make specific tradeoffs on the basis of analyzing these formulas in more detail, but it generally suffices in practice to use the terms sparse and dense informally to help us understand fundamental performance characteristics. When analyzing graph algorithms, we assume that V/E is bounded above by a small constant, so that we can abbreviate expressions such as V(V + E) to VE. This assumption comes into play only when the number of edges is tiny in comparison to the number of vertices—a rare situation. Typically, the number of edges far exceeds the number of vertices (V/E is much less than 1). A bipartite graph is a graph whose vertices we can divide into two sets such that all edges connect a vertex in one set with a vertex in the other set. Screenshot gives an example of a bipartite graph. Bipartite graphs arise in a natural way in many situations, such as the matching problems described at the beginning of this chapter. Any subgraph of a bipartite graph is bipartite.
Screenshot A bipartite graph
All edges in this graph connect odd-numbered vertices with even-numbered ones, so it is bipartite. The bottom diagram makes the property obvious.
Graphs as defined to this point are called undirected graphs. In directed graphs, also known as digraphs, edges are one-way: we consider the pair of vertices that defines each edge to be an ordered pair that specifies a one-way adjacency where we think about having the ability to get from the first vertex to the second but not from the second vertex to the first. Many apps (for example, graphs that represent the Web, scheduling constraints, or telephone-call transactions) are naturally expressed in terms of digraphs. We refer to edges in digraphs as directed edges, though that distinction is generally obvious in context (some authors reserve the term arc for directed edges). The first vertex in a directed edge is called the source; the second vertex is called the destination. (Some authors use the terms tail and head, respectively, to distinguish the vertices in directed edges, but we avoid this usage because of overlap with our use of the same terms in data-structure implementations.) We draw directed edges as arrows pointing from source to destination and often say that the edge points to the destination. When we use the notation v-w in a digraph, we mean it to represent an edge that points from v to w; it is different from w-v, which represents an edge that points from w to v. We speak of the indegree and outdegree of a vertex (the number of edges where it is the destination and the number of edges where it is the source, respectively). Sometimes, we are justified in thinking of an undirected graph as a digraph that has two directed edges (one in each direction); other times, it is useful to think of undirected graphs simply in terms of connections. Normally, as discussed in detail in , we use the same representation for directed and undirected graphs (see Screenshot). That is, we generally maintain two representations of each edge for undirected graphs, one pointing in each direction, so that we can immediately answer questions such as "Which vertices are connected to vertex v?"
Screenshot Two digraphs
The drawing at the top is a representation of the example graph in Screenshot interpreted as a directed graph, where we take the edges to be ordered pairs and represent them by drawing an arrow from the first vertex to the second. It is also a DAG. The drawing at the bottom is a representation of the undirected graph from Screenshot that indicates the way that we usually represent undirected graphs: as digraphs with two edges corresponding to each connection (one in each direction).
is devoted to exploring the structural properties of digraphs; they are generally more complicated than the corresponding properties for undirected graphs. A directed cycle in a digraph is a cycle in which all adjacent vertex pairs appear in the order indicated by (directed) graph edges. A directed acyclic graph (DAG) is a digraph that has no directed cycles. A DAG (an acyclic digraph) is not the same as a tree (an acyclic undirected graph). Occasionally, we refer to the underlying undirected graph of a digraph, meaning the undirected graph defined by the same set of edges, but where these edges are not interpreted as directed. s 20 through 22 are generally concerned with algorithms for solving various computational problems associated with graphs in which other information is associated with the vertices and edges. In weighted graphs, we associate numbers (weights) with each edge, which generally represents a distance or cost. We also might associate a weight with each vertex, or multiple weights with each vertex and edge. In we work with weighted undirected graphs; in s 21 and 22 we study weighted digraphs, which we also refer to as networks. The algorithms in solve network problems that arise from a further extension of the concept known as flow networks. As was evident even in Chapter 1, the combinatorial structure of graphs is extensive. The extent of this structure is all the more remarkable because it springs forth from a simple mathematical abstraction. This underlying simplicity will be reflected in much of the code that we develop for basic graph processing. However, this simplicity sometimes masks complicated dynamic properties that require deep understanding of the combinatorial properties of graphs themselves. It is often far more difficult to convince ourselves that a graph algorithm works as intended than the compact nature of the code might suggest.