Prim's Algorithm and Priority-First Search
Prim's algorithm is perhaps the simplest MST algorithm to implement, and it is the method of choice for dense graphs. We maintain a cut of the graph that is comprised of tree vertices (those chosen for the MST) and nontree vertices (those not yet chosen for the MST). We start by putting any vertex on the MST, then put a minimal crossing edge on the MST (which changes its nontree vertex to a tree vertex) and repeat the same operation V – 1 times, to put all vertices on the tree. A brute-force implementation of Prim's algorithm follows directly from this description. To find the edge to add next to the MST, we could examine all the edges that go from a tree vertex to a nontree vertex, then pick the shortest of the edges found to put on the MST. We do not consider this implementation here because it is overly expensive (see Exercises 20.36 through 20.38). Adding a simple data structure to eliminate excessive recomputation makes the algorithm both simpler and faster. Adding a vertex to the MST is an incremental change: To implement Prim's algorithm, we focus on the nature of that incremental change. The key is to note that our interest is in the shortest distance from each nontree vertex to the tree. When we add a vertex v to the tree, the only possible change for each nontree vertex w is that adding v brings w closer than before to the tree. In short, we do not need to check the distance from w to all tree vertices—we just need to keep track of the minimum and check whether the addition of v to the tree necessitates that we update that minimum. To implement this idea, we need data structures that can give us the following information:
The simplest implementation for each of these data structures is a vertex-indexed array (we can use such an array for the tree edges by indexing on vertices as they are added to the tree). Program 20.6 is an implementation of Prim's algorithm for dense graphs. It uses the arrays mst, fr, and wt (respectively) for these three data structures. After adding a new edge (and vertex) to the tree, we have two tasks to accomplish:
The implementation in Program 20.6 accomplishes both of these tasks with a single scan through the nontree vertices, updating wt[w] and fr[w] if v-w brings w closer to the tree, then updating the current minimum if wt[w] (the length of fr[w]) indicates that w is closer to the tree than any other nontree vertex with a lower index). Property 20.6 Using Prim's algorithm, we can find the MST of a dense graph in linear time. Proof: It is immediately evident from inspection of the program that the running time is proportional to V2 and therefore is linear for dense graphs.
Program 20.6 is suitable only for dense graphs, since it uses the edge existence test. To ensure proper operation for all Graph representations, we could use GraphUtilities.densecopy to a private DenseGraph, as in Program 19.3. Also, for use by clients, the GraphMST class in Program 20.6 (and all of the other MST implementations in this chapter) needs to be augmented with one or more accessor methods and a toString method, as discussed at the end of . Screenshot shows an example MST construction with Prim's algorithm; Screenshot shows the evolving MST for a larger example.
Screenshot Prim's MST algorithm
The first step in computing the MST with Prim's algorithm is to add 0 to the tree. Then we find all the edges that connect 0 to other vertices (which are not yet on the tree) and keep track of the shortest (top left). The edges that connect tree vertices with nontree vertices (the fringe) are shadowed in gray and listed below each graph drawing. For simplicity in this figure, we list the fringe edges in order of their length so that the shortest is the first in the list. Different implementations of Prim's algorithm use different data structures to maintain this list and to find the minimum. The second step is to move the shortest edge 0-2 (along with the vertex that it takes us to) from the fringe to the tree (second from top, left). Third, we move 0-7 from the fringe to the tree, replace 0-1 by 7-1 and 0-6 by 7-6 on the fringe (because adding 7 to the tree brings 1 and 6 closer to the tree), and add 7-4 to the fringe (because adding 7 to the tree makes 7-4 an edge that connects a tree vertex with a nontree vertex) (third from top, left). Next, we move edge 7-1 to the tree (bottom, left). To complete the computation, we take 7-6, 7-4, 4-3, and 3-5 off the queue, updating the fringe after each insertion to reflect any shorter or new paths discovered (right, top to bottom). An oriented drawing of the growing MST is shown at the right of each graph drawing. The orientation is an artifact of the algorithm: We generally view the MST itself as a set of edges, unordered and unoriented.
Screenshot Prim's MST algorithm
This sequence shows how the MST grows as Prim's algorithm discovers 1/4, 1/2, 3/4, and all of the edges in the MST (top to bottom). An oriented representation of the full MST is shown at the right.
Program 20.6 is based on the observation that we can interleave the find the minimum and update operations in a single loop where we examine all the nontree edges. In a dense graph, the number of edges that we may have to examine to update the distance from the nontree vertices to the tree is proportional to V, so looking at all the nontree edges to find the one that is closest to the tree does not represent excessive extra cost. But in a sparse graph, we can expect to use substantially fewer than V steps to perform each of these operations. The crux of the strategy that we will use to do so is to focus on the set of potential edges to be added next to the MST—a set that we call the fringe. The number of fringe edges is typically substantially smaller than the number of nontree edges, and we can recast our description of the algorithm as follows. Starting with a self loop to a start vertex on the fringe and an empty tree, we perform the following operation until the fringe is empty:
From this formulation, it is clear that Prim's algorithm is nothing more than a generalized graph search (see ), where the fringe is a priority queue based on a remove the minimum operation (see Chapter 9). We refer to generalized graph searching with priority queues as priority-first search (PFS). With edge weights for priorities, PFS implements Prim's algorithm. This formulation encompasses a key observation that we made already in connection with implementing BFS in . An even simpler general approach is to simply keep on the fringe all of the edges that are incident upon tree vertices, letting the priority-queue mechanism find the shortest one and ignore longer ones (see Exercise 20.42). As we saw with BFS, this approach is unattractive because the fringe data structure becomes unnecessarily cluttered with edges that will never get to the MST. The size of the fringe could grow to be proportional to E (with whatever attendant costs having a fringe this size might involve), while the PFS approach just outlined ensures that the fringe will never have more than V entries. As with any implementation of a general algorithm, we have a number of available approaches for interfacing with priority-queue ADTs. One approach is to use a priority queue of edges, as in our generalized graph-search implementation of Program 18.10. Program 20.7 is an implementation that is essentially equivalent to Program 18.10 but uses a vertex-based approach so that it can use an indexed priority-queue, as described in Section 9.6. (We consider a complete implementation of the specific priority-queue interface used by Program 20.7 is at the end of this chapter, in Program 20.10.) We identify the fringe vertices, the subset of nontree vertices that are connected by fringe edges to tree vertices, and keep the same vertex-indexed arrays mst, fr, and wt as in Program 20.6. The priority queue contains the index of each fringe vertex, and that entry gives access to the shortest edge connecting the fringe vertex with the tree and the length of that edge, through the second and third arrays. The priority P in Program 20.7 is assigned a value such that this class implements Prim's MST algorithm; other priority value assignments give other algorithms. The first call to pfs in the constructor in Program 20.7 finds the MST in the connected component that contains vertex 0, and subsequent calls find MSTs in other components, so this class actually finds minimal spanning forests in graphs that are not connected (see Exercise 20.35). Property 20.7 Using a PFS implementation of Prim's algorithm that uses a heap for the priority-queue implementation, we can compute an MST in time proportional to E lg V. Proof: The algorithm directly implements the generic idea of Prim's algorithm (add next to the MST a minimal edge that connects a vertex on the MST with a vertex not on the MST). Each priority-queue operation requires less than lg V steps. Each vertex is chosen with a remove the minimum operation; and, in the worst case, each edge might require a change priority operation.
Priority-first search is a proper generalization of breadth-first and depth-first search because those methods also can be derived through appropriate priority settings. For example, we can (somewhat artificially) use a variable cnt to assign a unique priority cnt++ to each vertex when we put that vertex on the priority queue. If we define P to be cnt, we get preorder numbering and DFS because newly encountered nodes have the highest priority. If we define P to be V-cnt, we get BFS because old nodes have the highest priority. These priority assignments make the priority queue operate like a stack and a queue, respectively. This equivalence is purely of academic interest since the priority-queue operations are unnecessary for DFS and BFS. Also, as discussed in , a formal proof of equivalence would require a precise attention to replacement rules to obtain the same sequence of vertices as result from the classical algorithms. As we shall see, PFS encompasses not just DFS, BFS, and Prim's MST algorithm but also several other classical algorithms. The various algorithms differ only in their priority functions. Thus, the running times of all these algorithms depend on the performance of the priority-queue ADT. Indeed, we are led to a general result that encompasses not just the two implementations of Prim's algorithms that we have examined in this section but also a broad class of fundamental graph-processing algorithms. Property 20.8 For all graphs and all priority functions, we can compute a spanning tree with PFS in linear time plus time proportional to the time required for V insert, V delete the minimum, and E decrease key operations in a priority queue of size at most V. Proof: The proof of Property 20.7 establishes this more general result. We have to examine all the edges in the graph; hence the "linear time" clause. The algorithm never increases the priority (it changes the priority to only a lower value); by more precisely specifying what we need from the priority-queue ADT (decrease key, not necessarily change priority), we strengthen this statement about performance.
In particular, use of an unordered-array priority-queue implementation gives an optimal solution for dense graphs that has the same worst-case performance as the classical implementation of Prim's algorithm (Program 20.6). That is, Properties 20.6 and 20.7 are special cases of Property 20.8; throughout this tutorial we shall see numerous other algorithms that essentially differ in only their choice of priority function and their priority-queue implementation. Property 20.7 is an important general result: The time bound stated is a worst-case upper bound that guarantees performance within a factor of lg V of optimal (linear time) for a large class of graph-processing problems. Still, it is somewhat pessimistic for many of the graphs that we encounter in practice, for two reasons. First, the lg V bound for priority-queue operations holds only when the number of vertices on the fringe is proportional to V, and even then it is just an upper bound. For a real graph in a practical app, the fringe might be small (see Figures 20.10 and 20.11), and some priority-queue operations might take many fewer than lg V steps. Although noticeable, this effect is likely to account for only a small constant factor in the running time; for example, a proof that the fringe never has more than vertices on it would improve the bound by only a factor of 2. More important, we generally perform many fewer than E decrease key operations since we do that operation only when we find an edge to a fringe node that is shorter than the current best-known edge to a fringe node. This event is relatively rare: Most edges have no effect on the priority queue (see Exercise 20.41). It is reasonable to regard PFS as an essentially linear-time algorithm unless V lg V is significantly greater than E.
Screenshot PFS implementation of Prim's MST algorithm
With PFS, Prim's algorithm processes just the vertices and edges closest to the MST (in gray).
Screenshot Fringe size for PFS implementation of Prim's algorithm
The plot at the bottom shows the size of the fringe as the PFS proceeds for the example in Screenshot. For comparison, the corresponding plots for DFS, randomized search, and BFS from Screenshot are shown above in gray.
The priority-queue ADT and generalized graph-searching abstractions make it easy for us to understand the relationships among various algorithms. Since these abstractions (and software mechanisms to support their use) were developed many years after the basic methods, relating the algorithms to classical descriptions of them becomes an exercise for historians. However, knowing basic facts about the history is useful when we encounter descriptions of MST algorithms in the research literature or in other texts, and understanding how these few simple abstractions tie together the work of numerous researchers over a time span of decades is persuasive evidence of their value and power. Thus, we consider briefly here the origins of these algorithms. An MST implementation for dense graphs essentially equivalent to Program 20.6 was first presented by Prim in 1961 and, independently, by Dijkstra soon thereafter. It is usually referred to as Prim's algorithm, although Dijkstra's presentation was more general, so some scholars refer to the MST algorithm as a special case of Dijkstra's algorithm. But the basic idea was also presented by Jarnik in 1939, so some authors refer to the method as Jarnik's algorithm, thus characterizing Prim's (or Dijkstra's) role as finding an efficient implementation of the algorithm for dense graphs. As the priority-queue ADT came into use in the early 1970s, its app to finding MSTs of sparse graphs was straightforward; the fact that MSTs of sparse graphs could be computed in time proportional to E lg V became widely known without attribution to any particular researcher. Since that time, as we discuss in , many researchers have concentrated on finding efficient priority-queue implementations as the key to finding efficient MST algorithms for sparse graphs.