Comparisons and Improvements
Table 20.1 summarizes the running times of the basic MST algorithms that we have considered; Table 20.2 presents the results of an empirical study comparing them. From these tables, we can conclude that the adjacencymatrix implementation of Prim's algorithm is the method of choice for dense graphs, that all the other methods perform within a small constant factor of the best possible (the time that it takes to extract the edges) for graphs of intermediate density, and that Kruskal's method essentially reduces the problem to sorting for sparse graphs. In short, we might consider the MST problem to be "solved" for practical purposes. For most graphs, the cost of finding the MST is only slightly higher than the cost of extracting the graph's edges. This rule holds except for huge graphs that are extremely sparse, but the available performance improvement over the best known algorithms even in this case is approximately a factor of 10 at best. The results in Table 20.2 are dependent on the model used to generate graphs, but they are borne out for many other graph models as well (see, for example, Exercise 20.81). Still, the theoretical results do not deny the existence of an algorithm that is guaranteed to run in linear time for all graphs; here we take a look at the extensive research on improved implementations of these methods. First, much research has gone into developing better priorityqueue implementations. The Fibonacci heap data structure, an extension of the binomial queue, achieves the theoretically optimal performance of taking constant time for decrease key operations and logarithmic time for remove the minimum operations, which behavior translates, by Property 20.8, to a running time proportional to E + V lg V for Prim's algorithm. Fibonacci heaps are more complicated than binomial queues and are somewhat unwieldy in practice, but some simpler priorityqueue implementations have similar performance characteristics (see reference section).
Table 20.1. Cost of MST algorithms
This table summarizes the cost (worstcase running time) of various MST algorithms considered in this chapter. The formulas are based on the assumptions that an MST exists (which implies that E is no smaller than V – 1) and that there are X edges not longer than the longest edge in the MST (see Property 20.10). These worstcase bounds may be too conservative to be useful in predicting performance on real graphs. The algorithms run in nearlinear time in a broad variety of practical situations.

algorithm

worstcase cost

comment

Prim (standard)

V^{2}

optimal for dense graphs

Prim (PFS, heap)

E lg V

conservative upper bound

Prim (PFS, dheap)

E log_{d}V

linear unless extremely sparse

Kruskal

E lg E

sort cost dominates

Kruskal (partial sort)

E + X lg V

cost depends on longest edge

Boruvka

E lg V

conservative upper bound

One effective approach is to use radix methods for the priorityqueue implementation. Performance of such methods is typically equivalent to that of radix sorting for Kruskal's method, or even to that of using a radix quicksort for the partialsorting method that we discussed in . Another simple early approach, proposed by D. Johnson in 1977, is one of the most effective: Implement the priority queue for Prim's algorithm with dary heaps, instead of with standard binary heaps (see Screenshot). Program 20.10 is a complete implementation of the priorityqueue interface that we have been using that is based on this method. For this priorityqueue implementation, decrease key takes less than log_{d}V steps, and remove the minimum takes time proportional to dlog_{d}V. By Property 20.8, this behavior leads to a running time proportional to Vdlog_{d}V + Elog_{d}V for Prim's algorithm, which is linear for graphs that are not sparse.
Screenshot 2, 3, and 4ary heaps
When we store a standard binary heapordered complete tree in an array (top), we use implicit links to take us from a node i down the tree to its children 2i and 2i + 1 and up the tree to its parent i/2. In a 3ary heap (center), implicit links for i are to its children 3i – 1, 3i, and 3i + 1 and to its parent (i + 1)/3; and in a 4ary heap (bottom), implicit links for i are to its children 4i – 2, 4i – 1, 4i, and 4 i + 1 and to its parent (i + 2)/4. Increasing the branching factor in an implicit heap implementation can be valuable in apps, like Prim's algorithm, that require a significant number of decrease key operations.
Table 20.2. Empirical study of MST algorithms
This table shows relative timings of various algorithms for finding the MST for random weighted graphs of various density. For low densities, Kruskal's algorithm is best because it amounts to a fast sort. For high densities, the classical implementation of Prim's algorithm is best because it does not incur listprocessing overhead. For intermediate densities, the PFS implementation of Prim's algorithm runs within a small factor of the time that it takes to examine each graph edge.

E

V

C

H

J

P

K

K*

e/E

B

e/E

density 2

20000

10000

2

22

27


9

11

1.00

14

3.3

50000

25000

8

69

84


24

31

1.00

38

3.3

100000

50000

15

169

203


49

66

1.00

89

3.8

200000

100000

30

389

478


108

142

1.00

189

3.6

density 20

20000

1000

2

5

4

20

6

5

.20

9

4.2

50000

2500

12

12

13

130

16

15

.28

25

4.6

100000

5000

14

27

28


34

31

.30

55

4.6

200000

10000

29

61

61


73

68

.35

123

5.0

density 100

100000

1000

14

17

17

24

30

19

.06

51

4.6

250000

2500

36

44

44

130

81

53

.05

143

5.2

500000

5000

73

93

93


181

113

.06

312

5.5

1000000

10000

151

204

198


377

218

.06

658

5.6

density V/2.5

400000

1000

61

60

59

20

137

78

.02

188

4.5

2500000

2500

597

409

400

128

1056

687

.01

1472

5.5

Key: C extract edges only H Prim's algorithm (adjacency lists/indexed heap) J Johnson's version of Prim's algorithm (dheap priority queue) P Prim's algorithm (adjacencymatrix representation) K Kruskal's algorithm K* Partialsort version of Kruskal's algorithm B Boruvka's algorithm e edges examined (union operations)

Property 20.12 Given a graph with V vertices and E edges, let d denote the density E/V. If d < 2, then the running time of Prim's algorithm is proportional to V lg V. Otherwise, we can improve the worstcase running time by a factor of lg(E/V) by using a E/Vary heap for the priority queue. Proof: Continuing the discussion in the previous paragraph, the number of steps is Vdlog_{d}V + Elog_{d}V, so the running time is at most proportional to Elog_{d}V=(E lg V)/lgd.
When E is proportional to V^{1+}, Property 20.12 leads to a worstcase running time proportional to E/, and that value is linear for any constant . For example, if the number of edges is proportional to V^{3/2}, the cost is less than 2E; if the number of edges is proportional to V^{4/3}, the cost is less than 3E; and if the number of edges is proportional to V^{5/4}, the cost is less than 4E. For a graph with 1 million vertices, the cost is less than 6E unless the density is less than 10. The temptation to minimize the bound on the worstcase running time in this way needs to be tempered with the realization that the Vdlog_{d}V part of the cost is not likely to be avoided (for remove the minimum, we have to examine d successors in the heap as we sift down), but the E lg d part is not likely to be achieved (since most edges will not require a priorityqueue update, as we showed in the discussion following Property 20.8). For typical graphs such as those in the experiments in Table 20.2, decreasing d has no effect on the running time, and using a large value of d can slow down the implementation slightly. Still, the slight protection offered for worstcase performance makes the method worthwhile since it is so easy to implement. In principle, we could tune the implementation to pick the best value of d for certain types of graphs (choose the largest value that does not slow down the algorithm), but a small fixed value (such as 4, 5, or 6) will be fine except possibly for some particular huge classes of graphs that have atypical characteristics. Using dheaps is not effective for sparse graphs because d has to be an integer greater than or equal to 2, a condition that implies that we cannot bring the asymptotic running time lower than V lg V. If the density is a small constant, then a lineartime MST algorithm would have to run in time proportional to V. The goal of developing practical algorithms for computing the MST of sparse graphs in linear time remains elusive. A great deal of research has been done on variations of Boruvka's algorithm as the basis for nearly lineartime MST algorithms for extremely sparse graphs (see reference section). Such research still holds the potential to lead us eventually to a practical lineartime MST algorithm and has even shown the existence of a randomized lineartime algorithm. While these algorithms are generally quite complicated, simplified versions of some of them may yet be shown to be useful in practice. In the meantime, we can use the basic algorithms that we have considered here to compute the MST in linear time in most practical situations, perhaps paying an extra factor of lg V for some sparse graphs.
Multiway heap PQ implementation
This class uses multiway heaps to implement the indirect priorityqueue interface that we use in this tutorial. It is based on changing Program 9.12 to have the constructor take a reference to an array of priorities, to implement getmin and lower instead of delmax and change, and to generalize sink and swim so that they maintain a dway heap.
class doublePQi
{ private int N, d = 3;
private double[] a; private int[] pq, qp;
private boolean less(int i, int j)
{ return a[pq[i]] < a[pq[j]]; }
private void exch(int i, int j)
{ int t = pq[i]; pq[i] = pq[j]; pq[j] = t;
qp[pq[i]] = i; qp[pq[j]] = j; }
private void swim(int k)
{ while (k > 1 && less(k, (k+d2)/d))
{ exch(k, (k+d2)/d); k = (k+d2)/d; } }
private void sink(int k, int N)
{ int j;
while ((j = d*(k1)+2) <= N)
{
for (int i = j+1; i < j+d && i <= N; i++)
if (less(i, j)) j = i;
if (!(less(j, k))) break;
exch(k, j); k = j;
}
}
doublePQi(int maxN, double[] a)
{ this.a = a; this.N = 0;
pq = new int[maxN+1]; qp = new int[maxN+1];
for (int i = 0; i <= maxN; i++)
{ pq[i] = 0; qp[i] = 0; }
}
boolean empty() { return N == 0; }
void insert(int v)
{ pq[++N] = v; qp[v] = N; swim(N); }
int getmin()
{ exch(1, N); sink(1, N1); return pq[N]; }
void lower(int k)
{ swim(qp[k]); }
}

Exercises
20.72 [V. Vyssotsky] Develop an implementation of the algorithm discussed in that builds the MST by adding edges one at a time and deleting the longest edges on the cycle formed (see Exercise 20.34). Use a parentlink representation of a forest of MST subtrees. Hint: Reverse links when traversing paths in trees.
Run empirical tests to compare the running time of your implementation in Exercise 20.72 with that of Kruskal's algorithm, for various weighted graphs (see Exercises 20.9–14). Check whether randomizing the order in which the edges are considered affects your results.
• 20.74 Describe how you would find the MST of a graph so large that only V edges can fit into main memory at once.
20.75 Develop a priorityqueue implementation for which remove the minimum and find the minimum are constanttime operations, and for which decrease key takes time proportional to the logarithm of the priorityqueue size. Compare your implementation with 4heaps when you use Prim's algorithm to find the MST of sparse graphs, for various weighted graphs (see Exercises 20.9–14).
Run empirical studies to compare the performance of various priorityqueue implementations when used in Prim's algorithm for various weighted graphs (see Exercises 20.9–14). Consider dheaps for various values of d, binomial queues, the balanced trees (Java's TreeMap class), and any other data structure that you think might be effective.
Develop an implementation that generalizes Boruvka's algorithm to maintain a generalized queue containing the forest of MST subtrees. (Using Program 20.9 corresponds to using a FIFO queue.) Experiment with other generalizedqueue implementations, for various weighted graphs (see Exercises 20.9–14).
• 20.78 Develop a generator for random connected cubic graphs (each vertex of degree 3) that have random weights on the edges. Finetune for this case the MST algorithms that we have discussed, then determine which is the fastest.
20.79 For V = 10^{6}, plot the ratio of the upper bound on the cost for Prim's algorithm with dheaps to E as a function of the density d, for d in the range from 1 to 100.
20.80 Table 20.2 suggests that the standard implementation of Kruskal's algorithm is significantly faster than the partialsort implementation for lowdensity graphs. Explain this phenomenon.
• 20.81 Run an empirical study, in the style of Table 20.2, for random complete graphs that have Gaussian weights (see Exercise 20.19).
