Previous Next 
Graph GeneratorsTo develop further appreciation for the diverse nature of graphs as combinatorial structures, we now consider detailed examples of the types of graphs that we use later to test the algorithms that we study. Some of these examples are drawn from apps. Others are drawn from mathematical models that are intended both to have properties that we might find in real graphs and to expand the range of input trials available for testing our algorithms. To make the examples concrete, we present them as clients of Program 17.1 so that we can put them to immediate use when we test implementations of the graph algorithms that we consider. In addition, we consider the implementation of the scan method from Program 17.4, which reads a sequence of pairs of arbitrary names from standard input and builds a graph with vertices corresponding to the names and edges corresponding to the pairs.
The implementations that we consider in this section are based upon the interface of Program 17.1, so they function properly, in theory, for any graph representation. In practice, however, some combinations of interface and representation can have unacceptably poor performance, as we shall see. As usual, we are interested in having "random problem instances," both to exercise our programs with arbitrary inputs and to get an idea of how the programs might perform in real apps. For graphs, the latter goal is more elusive than for other domains that we have considered, although it is still a worthwhile objective. We shall encounter various different models of randomness, starting with one for sparse graphs and another for dense graphs (see Screenshot). Screenshot Two random graphsBoth of these random graphs have 50 vertices. The sparse graph at the top has 50 edges, while the dense graph at the bottom has 500 edges. The sparse graph is not connected, with each vertex connected only to a few others; the dense graph is certainly connected, with each vertex connected to 20 others, on the average. These diagrams also indicate the difficulty of developing algorithms that can draw arbitrary graphs (the vertices here are placed in random position). Random edges This model is simple to implement, as indicated by the generator given in Program 17.12. For a given number of vertices V, we generate random edges by generating pairs of numbers between 0 and V – 1. The result is likely to be a random multigraph with selfloops. A given pair could have two identical numbers (hence, selfloops could occur); and any pair could be repeated multiple times (hence, parallel edges could occur). Program 17.12 generates edges until the graph is known to have E edges, leaving to the implementation the decision of whether to eliminate parallel edges. If parallel edges are eliminated, the number of edges generated is substantially higher than then number of edges used (E) for dense graphs (see Exercise 17.62); so this method is normally used for sparse graphs.
Random graph The classic mathematical model for random graphs is to consider all possible edges and to include each in the graph with a fixed probability p. If we want the expected number of edges in the graph to be E, we can choose p = 2E/V(V – 1). Program 17.13 is a method that uses this model to generate random graphs. This model precludes duplicate edges, but the number of edges in the graph is only equal to E on the average. This implementation is wellsuited for dense graphs, but not for sparse graphs, since it runs in time proportional to V(V – 1)/2 to generate just E = pV(V – 1)/2 edges. That is, for sparse graphs, the running time of Program 17.13 is quadratic in the size of the graph (see Exercise 17.68). These models are well studied and are not difficult to implement, but they do not necessarily generate graphs with properties similar to the ones that we see in practice. In particular, graphs that model maps, circuits, schedules, transactions, networks, and other practical situations are usually not only sparse but also exhibit a locality property—edges are much more likely to connect a given vertex to vertices in a particular set than to vertices that are not in the set. We might consider many different ways of modeling locality, as illustrated in the following examples. kneighbor graph The graph depicted at the top in Screenshot is drawn from a simple modification to a randomedges graph generator, where we randomly pick the first vertex v, then randomly pick the second from among those whose indices are within a fixed constant k of v (wrapping around from V – 1 to 0, when the vertices are arranged in a circle as depicted). Such graphs are easy to generate and certainly exhibit locality not found in random graphs. Screenshot Random neighbor graphsThese figures illustrate two models of sparse graphs. The neighbor graph at the top has 33 vertices and 99 edges, with each edge restricted to connect vertices whose indices differ by less than 10 (modulo V). The Euclidean neighbor graph at the bottom models the types of graphs that we might find in apps where vertices are tied to geometric locations. Vertices are random points in the plane; edges connect any pair of vertices within a specified distance d of each other. This graph is sparse (177 vertices and 1001 edges); by adjusting d, we can generate graphs of any desired density. Euclidean neighbor graph The graph depicted at the bottom in Screenshot is drawn from a generator that generates V points in the plane with random coordinates between 0 and 1 and then generates edges connecting any two points within distance d of one another. If d is small, the graph is sparse; if d is large, the graph is dense (see Exercise 17.74). This graph models the types of graphs that we might expect when we process graphs from maps, circuits, or other apps where vertices are associated with geometric locations. They are easy to visualize, exhibit properties of algorithms in an intuitive manner, and exhibit many of the structural properties that we find in such apps. One possible defect in this model is that the graphs are not likely to be connected when they are sparse; other difficulties are that the graphs are unlikely to have highdegree vertices and that they do not have any long edges. We can change the models to handle such situations, if desired, or we can consider numerous similar examples to try to model other situations (see, for example, Exercises 17.72 and 17.73). Or, we can test our algorithms on real graphs. In many apps, there is no shortage of problem instances drawn from actual data that we can use to test our algorithms. For example, huge graphs drawn from actual geographic data are easy to find; two more examples are listed in the next two paragraphs. The advantage of working with real data instead of a random graph model is that we can see solutions to real problems as algorithms evolve. The disadvantage is that we may lose the benefit of being able to predict the performance of our algorithms through mathematical analysis. We return to this topic when we are ready to compare several algorithms for the same task, at the end of . Transaction graph Screenshot illustrates a tiny piece of a graph that we might find in a telephone company's computers. It has a vertex defined for each phone number, and an edge for each pair i and j with the property that i made a telephone call to j within some fixed period. This set of edges represents a huge multigraph. It is certainly sparse, since each person places calls to only a tiny fraction of the available telephones. It is representative of many other apps. For example, a financial institution's credit card and merchant account records might have similar information. Screenshot Transaction graphA sequence of pairs of numbers like this one might represent a list of telephone calls in a local exchange, or financial transfers between accounts, or any similar situation involving transactions between entities with unique identifiers. The graphs are hardly random—some phones are far more heavily used than others and some accounts are far more active than others. Method invocation graph We can associate a graph with any computer program with methods as vertices and an edge connecting X and Y whenever method X invokes method Y. We can instrument the program to create such a graph (or have a compiler do it). Two completely different graphs are of interest: the static version, where we create edges at compile time corresponding to the method invocations that appear in the program text of each method; and a dynamic version, where we create edges at run time when the invocations actually happen. We use static method invocation graphs to study program structure and dynamic ones to study program behavior. These graphs are typically huge and sparse. In apps such as these, we face massive amounts of data, so we might prefer to study the performance of algorithms on real sample data rather than on random models. We might choose to try to avoid degenerate situations by randomly ordering the edges or by introducing randomness in the decision making in our algorithms, but that is a different matter from generating a random graph. Indeed, in many apps, learning the properties of the graph structure is a goal in itself. In several of these examples, vertices are natural named objects, and edges appear as pairs of named objects. For example, a transaction graph might be built from a sequence of pairs of telephone numbers, and a Euclidean graph might be built from a sequence of pairs of cities or towns. Program 17.14 is an implementation of the scan method in Program 17.4, which we can use to build a graph in this common situation. For the client's convenience, it takes the set of edges as defining the graph and deduces the set of vertex names from their use in edges. Specifically, the program reads a sequence of pairs of symbols from standard input, uses a symbol table to associate the vertex numbers 0 to V – 1 to the symbols (where V is the number of different symbols in the input), and builds a graph by inserting the edges, as in Programs 17.12 and 17.13. We could adapt any symboltable implementation to support the needs of Program 17.14; Program 17.15 is an example that uses ternary search trees (TSTs) (see Chapter 14). These programs make it easy for us to test our algorithms on real graphs that may not be characterized accurately by any probabilistic model.
Program 17.14 is also significant because it validates the assumption we have made in all of our algorithms that the vertex names are integers between 0 and V – 1. If we have a graph that has some other set of vertex names, then the first step in representing the graph is to use Program 17.15 to map the vertex names to integers between 0 and V – 1.
Some graphs are based on implicit connections among items. We do not focus on such graphs, but we note their existence in the next few examples and devote a few exercises to them. When faced with processing such a graph, we can certainly write a program to construct explicit graphs by enumerating all the edges; but there also may be solutions to specific problems that do not require that we enumerate all the edges and therefore can run in sublinear time. Degreesofseparation graph Consider a collection of subsets drawn from V items. We define a graph with one vertex corresponding to each element in the union of the subsets and edges between two vertices if both vertices appear in some subset (see Screenshot). If desired, the graph might be a multigraph, with edge labels naming the appropriate subsets. All items incident on a given item v are said to be 1 degree of separation from v. Otherwise, all items incident on any item that is i degrees of separation from v (that are not already known to be i or fewer degrees of separation from v) are (i+1) degrees of separation from v. This construction has amused people ranging from mathematicians (Erdös number) to movie buffs (separation from Kevin Bacon). Screenshot Degreesofseparation graphThe graph at the bottom is defined by the groups at the top, with one vertex for each person and an edge connecting a pair of people whenever they are in the same group. Shortest path lengths in the graph correspond to degrees of separation. For example, Frank is three degrees of separation from Alice and Bob. Interval graph Consider a collection of V intervals on the real line (pairs of real numbers). We define a graph with one vertex corresponding to each interval, with edges between vertices if the corresponding intervals intersect (have any points in common). de Bruijn graph Suppose that V is a power of 2. We define a digraph with one vertex corresponding to each nonnegative integer less than V, with edges from each vertex i to 2i and (2i + 1) mod lg V. These graphs are useful in the study of the sequence of values that can occur in a fixedlength shift register for a sequence of operations where we repeatedly shift all the bits one position to the left, throw away the leftmost bit, and fill the rightmost bit with 0 or 1. Screenshot depicts the de Bruijn graphs with 8, 16, 32, and 64 vertices. Screenshot de Bruijn graphsA de Bruijn digraph of order n has 2^{n} vertices with edges from i to 2i mod n and (2i+1) mod 2^{n}, for all i. Pictured here are the underlying undirected de Bruijn graphs of order 6, 5, 4, and 3 (top to bottom). The various types of graphs that we have considered in this section have a wide variety of different characteristics. However, they all look the same to our programs: They are simply collections of edges. As we saw in Chapter 1, learning even the simplest facts about them can be a computational challenge. In this tutorial, we consider numerous ingenious algorithms that have been developed for solving practical problems related to many types of graphs. Based just on the few examples presented in this section, we can see that graphs are complex combinatorial objects, far more complex than those underlying other algorithms that we studied in Parts 1 through 4. In many instances, the graphs that we need to consider in apps are difficult or impossible to characterize. Algorithms that perform well on random graphs are often of limited applicability because it is often difficult to be persuaded that random graphs have structural characteristics the same as those of the graphs that arise in apps. The usual approach to overcome this objection is to design algorithms that perform well in the worst case. While this approach is successful in some instances, it falls short (by being too conservative) in others. While we are often not justified in assuming that performance studies on graphs generated from one of the random graph models that we have discussed will give information sufficiently accurate to allow us to predict performance on real graphs, the graph generators that we have considered in this section are useful in helping us to test implementations and to understand our algorithms' performance. Before we even attempt to predict performance for apps, we must at least verify any assumptions that we might have made about the relationship between the app's data and whatever models or sample data we may have used. While such verification is wise when we are working in any apps domain, it is particularly important when we are processing graphs, because of the broad variety of types of graphs that we encounter. Exercises

Previous Next 