Graph-Processing Problems

Armed with the basic tools developed in this chapter, we consider in s 18 through 22 a broad variety of algorithms for solving graph-processing problems. These algorithms are fundamental ones and are useful in many apps, but they serve as only an introduction to the subject of graph algorithms. Many interesting and useful algorithms have been developed that are beyond the scope of this tutorial, and many interesting problems have been studied for which good algorithms have still not yet been invented. As is true in any domain, the first challenge that we face in addressing a new graph-processing problem is determining how difficult it is to solve. For graph processing, this decision can be far more difficult than we might imagine, even for problems that appear to be simple to solve. Moreover, our intuition is not always helpful in distinguishing easy problems from difficult or hitherto unsolved ones. In this section, we describe briefly important classical problems and the state of our knowledge of them. Given a new graph-processing problem, what type of challenge do we face in developing an implementation to solve it? The unfortunate truth is that there is no good method to answer this question for any problem that we might encounter, but we can provide a general description of the difficulty of solving various classical graph-processing problems. To this end, we will roughly categorize the problems according to the difficulty of solving them, as follows:

  • Easy
  • Tractable
  • Intractable
  • Unknown

These terms are intended to convey information relative to one another and to the current state of knowledge about graph algorithms. As indicated by the terminology, our primary reason for categorizing problems in this way is that there are many graph problems, such as the Hamilton tour problem, that no one knows how to solve efficiently. We will eventually see (in Part 8) how to make that statement meaningful in a precise technical sense; at this point, we can at least be warned that we face significant obstacles to writing an efficient program to solve these problems. We defer full context on many of the graph-processing problems until later in the tutorial. Here, we present brief statements that are easily understood in order to introduce the general issue of classifying the difficulty of graph-processing problems. An easy graph-processing problem is one that can be solved by the kind of elegant and efficient short programs to which we have grown accustomed in Parts 1 through 4. We often find the running time to be linear in the worst case, or bounded by a small-degree polynomial in the number of vertices or the number of edges. Generally, as we have done in many other domains, we can establish that a problem is easy by developing a brute-force solution that, although it may be too slow for huge graphs, is useful for small and even intermediate-sized problems. Then, once we know that the problem is easy, we look for efficient solutions that we might use in practice and try to identify the best among those. The Euler tour problem that we considered in is a prime example of such a problem, and we shall see many others in s 18 through 22 including, most notably, the following. Simple connectivity Is a given graph connected? That is, is there a path connecting every pair of vertices? Is there a cycle in the graph, or is it a forest? Given two vertices, are they on a cycle? We first considered these basic graph-processing question in Chapter 1. We consider numerous solutions to such problems in . Some are trivial to implement in linear time; others have rather sophisticated linear-time solutions that bear careful study. Strong connectivity in digraphs Is there a directed path connecting every pair of vertices in a digraph? Given two vertices, are they connected by directed paths in both directions (are they on a directed cycle)? Implementing efficient solutions to these problems is a much more challenging task than for the corresponding simple-connectivity problem in undirected graphs, and much of is devoted to studying them. Despite the clever intricacies involved in solving them, we classify the problems as easy because we can write a compact, efficient, and useful implementation. Transitive closure What set of vertices can be reached by following directed edges from each vertex in a digraph? This problem is closely related to strong connectivity and to other fundamental computational problems. We study classical solutions that amount to a few lines of code in . Minimum spanning tree In a weighted graph, find a minimum-weight set of edges that connects all the vertices. This is one of the oldest and best-studied graph-processing problems; is devoted to the study of various classical algorithms to solve it. Researchers still seek faster algorithms for this problem. Single-source shortest paths What are the shortest paths connecting a given vertex v with each other vertex in a weighted digraph (network)? is devoted to the study of this problem, which is extremely important in numerous apps. The problem is decidedly not easy if edge weights can be negative. A tractable graph-processing problem is one for which an algorithm is known whose time and space requirements are guaranteed to be bounded by a polynomial function of the size of the graph (V + E). All easy problems are tractable, but we make a distinction because many tractable problems have the property that developing an efficient practical program to solve is an extremely challenging, if not impossible, task. Solutions may be too complicated to present in this tutorial, because implementations might require hundreds or even thousands of lines of code. The following examples are two of the most important problems in this class. Planarity Can we draw a given graph without any of the lines that represent edges intersecting? We have the freedom to place the vertices anywhere, so we can solve this problem for many graphs, but it is impossible to solve for many other graphs. A remarkable classical result known as Kuratowski's theorem provides an easy test for determining whether a graph is planar: It says that the only graphs that cannot be drawn with no edge intersections are those that contain some subgraph that, after removing vertices of degree 2, is isomorphic to one of the graphs in Screenshot. A straightforward implementation of that test, even without taking the vertices of degree 2 into consideration, would be too slow for large graphs (see Exercise 17.110), but in 1974 R. Tarjan developed an ingenious (but intricate) algorithm for solving the problem in linear time, using a depth-first search scheme that extends those that we consider in . Tarjan's algorithm does not necessarily give a practical layout; it just certifies that a layout exists. As discussed in , developing a visually pleasing layout in apps where vertices do not necessarily relate directly to the physical world has turned out to be a challenging research problem.

Screenshot Forbidden subgraphs in planar graphs

Neither of these graphs can be drawn in the plane without intersecting edges, nor can any graph that contains either of these graphs as a subgraph (after we remove vertices of degree two); but all other graphs can be so drawn.

Java graphics 17fig24.gif


Matching Given a graph, what is the largest subset of its edges with the property that no two are connected to the same vertex? This classical problem is known to be solvable in time proportional to a polynomial function of V and E, but a fast algorithm that is suitable for huge graphs is still an elusive research goal. The problem is easier to solve when restricted in various ways. For example, the problem of matching students to available positions in selective institutions is an example of bipartite matching: We have two different types of vertices (students and institutions) and we are concerned with only those edges that connect a vertex of one type with a vertex of the other type. We see a solution to this problem in . The solutions to some tractable problems have never been written down as programs, or have running times so high that we could not contemplate using them in practice. The following example is in this class. It also demonstrates the capricious nature of the mathematical reality of the difficulty of graph processing. Even cycles in digraphs Does a given digraph have a cycle of even length? This question would seem simple to resolve because the corresponding question for undirected graphs is easy to solve (see ), as is the question of whether a digraph has a cycle of odd length. However, for many years, the problem was not sufficiently well understood for us even to know whether or not there exists an efficient algorithm for solving it (see reference section). A theorem establishing the existence of an efficient algorithm was proved in 1999, but the method is so complicated that no mathematician or programmer would attempt to implement it. One of the important themes of is that many tractable graph problems are best handled by algorithms that can solve a whole class of problems in a general setting. The shortest-paths algorithms of , the network-flow algorithms of , and the powerful network-simplex algorithm of are capable of solving many graph problems that otherwise might present a significant challenge. Examples of such problems include the following. Assignment This problem, also known as bipartite weighted matching, is to find a perfect matching of minimum weight in a bipartite graph. It is easily solved with network-flow algorithms. Specific methods that attack the problem directly are known, but they have been shown to be essentially equivalent to network-flow solutions. General connectivity What is the minimum number of edges whose removal will separate a graph into two disjoint parts (edge connectivity)? What is the minimum number of vertices whose removal will separate a graph into two disjoint parts (vertex connectivity)? As we see in , these problems, although difficult to solve directly, can both be solved with network-flow algorithms. Mail carrier Given a graph, find a tour with a minimal number of edges that uses every edge in the graph at least once (but is allowed to use edges multiple times). This problem is much more difficult than the Euler tour problem but much less difficult than the Hamilton tour problem. The step from convincing yourself that a problem is tractable to providing software that can be used in practical situations can be a large step, indeed. On the one hand, when proving that a problem is tractable, researchers tend to brush past numerous details that have to be dealt with in an implementation; on the other hand, they have to account for numerous potential situations even though they may not arise in practice. This gap between theory and practice is particularly acute when investigators are considering graph algorithms, both because mathematical research is filled with deep results describing a bewildering variety of structural properties that we may need to take into account when processing graphs, and because the relationships between those results and the properties of graphs that arise in practice are little understood. The development of general schemes such as the network-simplex algorithm has been an extremely effective approach to dealing with such problems. An intractable graph-processing problem is one for which there is no known algorithm that is guaranteed to solve the problem in a reasonable amount of time. Many such problems have the characteristic that we can use a brute-force method where we can try all possibilities to compute the solution—we consider them to be intractable because there are far too many possibilities to consider. This class of problems is extensive and includes many important problems that we would like to know how to solve. The term NP-hard describes the problems in this class. Most experts believe that no efficient algorithms exist for these problems. We consider the bases for this terminology and this belief in more detail in Part 8. The Hamilton path problem that we discussed in is a prime example of an NP-hard graph-processing problem, as are those on the following list. Longest path What is the longest simple path connecting two given vertices in a graph? Despite its apparent similarity to shortest-paths problems, this problem is a version of the Hamilton tour problem and is NP-hard. Colorability Is there a way to assign one of k colors to each of the vertices of a graph such that no edge connects two vertices of the same color? This classical problem is easy for k = 2 (see ), but it is NP-hard for k = 3. Independent set What is the size of the largest subset of the vertices of a graph with the property that no two are connected by an edge? Just as we saw when contrasting the Euler and Hamilton tour problems, this problem is NP-hard, despite its apparent similarity to the matching problem, which is solvable in polynomial time. Clique What is size of the largest clique (complete subgraph) in a given graph? This problem generalizes part of the planarity problem, because if the largest clique has more than four nodes, the graph is not planar. These problems are formulated as existence problems—we are asked to determine whether or not a particular subgraph exists. Some of the problems ask for the size of the largest subgraph of a particular type, which we can do by framing an existence problem where we test for the existence of a subgraph of size k that satisfies the property of interest, then use binary search to find the largest. In practice, we actually often want to find a complete solution, which is potentially much harder to do. For example, the famous four-color theorem tells us that it is possible to use just four colors to color all the vertices of a planar graph such that no edge connects two vertices of the same color. But the theorem does not tell us how to do so for a particular planar graph: Knowing that a coloring exists does not help us find a complete solution to the problem. Another famous example is the traveling salesperson problem, which asks us to find the minimum-length tour through all the vertices of a weighted graph. This problem is related to the Hamilton path problem, but it is certainly no easier: If we cannot find an efficient solution to the Hamilton path problem, we cannot expect to find one for the traveling salesperson problem. As a rule, when faced with difficult problems, we work with the simplest version that we cannot solve. Existence problems are within the spirit of this rule, but they also play an essential role in the theory, as we shall see in Part 8. The problems just listed are but a few of the thousands of NP-hard problems that have been identified. They arise in all types of computational apps, as we shall discuss in Part 8; they are particularly prevalent in graph processing, so we have to be aware of their existence throughout this tutorial. Note that we are insisting that our algorithms guarantee efficiency, in the worst case. Perhaps we should instead settle for algorithms that are efficient for typical inputs (but not necessarily in the worst case). Similarly, many of the problems involve optimization. Perhaps we should instead settle for a long path (not necessarily the longest) or a large clique (not necessarily the maximum). For graph processing, it might be easy to find a good answer for graphs that arise in practice, and we may not even be interested in looking for an algorithm that could find an optimal solution in fictional graphs that we will never see. Indeed, intractable problems can often be attacked with straightforward or general-purpose algorithms similar to Program 17.17 that, although they have exponential running time in the worst case, can quickly find a solution (or a good approximation) for specific problem instances that arise in practice. We would be reluctant to use a program that will crash or produce a bad answer for certain inputs, but we do sometimes find ourselves using programs that run in exponential time for certain inputs. We consider this situation in Part 8. There are also many research results proving that various intractable problems remain intractable even when we relax various restrictions. Moreover, there are many specific practical problems that we cannot solve because no one knows a sufficiently fast algorithm. In this part of the tutorial, we will label problems NP-hard when we encounter them and interpret this label as meaning, at the very least, that we are not going to expect to find efficient algorithms to solve them and that we are not going to attack them without using advanced techniques of the type discussed in Part 8 (except perhaps to use brute-force methods to solve tiny problems). There are some graph-processing problems whose difficulty is unknown. Neither is there an efficient algorithm known for them, nor are they known to be NP-hard. It is possible, as our knowledge of graph-processing algorithms and properties of graphs expands, that some of these problems will turn out to be tractable, or even easy. The following important natural problem, which we have already encountered (see Screenshot), is the best-known problem in this class.

Table 17.2. Difficulty of classical graph-processing problems

This table summarizes the discussion in the text about the relative difficulty of various classical graph-processing problems, comparing them in rough subjective terms. These examples indicate not only the range of difficulty of the problems but also that classifying a given problem can be a challenging task.

 

E

T

I

?

undirected graphs

connectivity

*

     

general connectivity

 

*

   

Euler tour

*

     

Hamilton tour

   

*

 

bipartite matching

*

     

maximum matching

 

*

   

planarity

 

*

   

maximum clique

   

*

 

2-colorability

*

     

3-colorability

   

*

 

shortest paths

*

     

longest paths

   

*

 

vertex cover

   

*

 

isomorphism

     

*

digraphs

transitive closure

*

     

strong connectivity

*

     

odd-length cycle

*

     

even-length cycle

 

*

   

weighted graphs

minimum spanning tree

*

     

traveling salesperson

   

*

 

networks

shortest paths (nonnegative weights)

*

     

shortest paths (negative weights)

   

*

 

maximum flow

*

     

assignment

 

*

   

minimum-cost flow

 

*

   

Key: E Easy—efficient classical algorithm known (see reference) T Tractable—solution exists (implementation difficult) I Intractable—no efficient solution known (NP-hard) ? Unknown

Graph isomorphism Can we make two given graphs identical by renaming vertices? Efficient algorithms are known for this problem for many special types of graphs, but the difficulty of the general problem remains open. The number of significant problems whose intrinsic difficulty is unknown is very small in comparison to the other categories that we have considered, because of intense research in this field over the past several decades. Certain problems in this class, such as graph isomorphism, are of immense practical interest; other problems in this class are of significance primarily by virtue of having resisted classification. For the class of easy algorithms, we are used to the idea of comparing algorithms with different worst-case performance characteristics and trying to predict performance through analysis and empirical tests. For graph processing, these tasks are particularly challenging because of the difficulty of characterizing the types of graphs that might arise in practice. Fortunately, many of the important classical algorithms have optimal or near-optimal worst-case performance, or have a running time that depends on only the number of vertices and edges, rather than on the graph structure; we thus can concentrate on streamlining implementations and still can predict performance with confidence. In summary, there is a wide spectrum of problems and algorithms known for graph processing. Table 17.2 summarizes some of the information that we have discussed. Every problem comes in different versions for different types of graphs (directed, weighted, bipartite, planar, sparse, dense), and there are thousands of problems and algorithms to consider. We certainly cannot expect to solve every problem that we might encounter, and some problems that appear to be simple are still baffling the experts. Despite a natural a priori expectation that we should have no problem distinguishing easy problems from intractable ones, the many examples that we have discussed illustrate that placing a problem even into these rough categories can turn into a significant research challenge. As our knowledge about graphs and graph algorithms expands, given problems may move among these categories. Despite a flurry of research activity in the 1970s and intensive work by many researchers since then, the possibility still remains that all the problems that we are discussing will someday be categorized as "easy" (solvable by an algorithm that is compact, efficient, and possibly ingenious). Having developed this context, we shall press on to consider numerous useful graph-processing algorithms. Problems that we can solve do arise often, the graph algorithms that we study serve well in a great variety of apps, and these algorithms serve as the basis for attacking numerous other problems that we need to handle even if we cannot guarantee efficient solutions.

Exercises

• 17.108 Prove that neither of the two graphs depicted in Screenshot is planar.

Write a graph ADT client that determines whether or not a graph contains one of the graphs depicted in Screenshot, using a brute-force algorithm where you test all possible subsets of five vertices for the clique and all possible subsets of six vertices for the complete bipartite graph. Note: This test does not suffice to show whether the graph is planar, because it ignores the condition that removing vertices of degree 2 in some subgraph might give one of the two forbidden subgraphs.

Give a drawing of the graph Java graphics 080equ01.gif


that has no intersecting edges, or prove that no such drawing exists.

Find a way to assign three colors to the vertices of the graph Java graphics 080equ02.gif


such that no edge connects two vertices of the same color, or show that it is not possible to do so.

Solve the independent-set problem for the graph Java graphics 080equ03.gif


What is the size of the largest clique in a de Bruijn graph of order n?



   
Comments