Suppose that we are given N points in the plane and we want to find the shortest set of lines connecting all the points. This geometric problem is called the Euclidean MST problem (see Screenshot). One way to solve it is to build a complete graph with N vertices and N(N – 1) /2 edges—one edge connecting each pair of vertices weighted with the distance between the corresponding points. Then, we can use Prim's algorithm to find the MST in time proportional to N2.
Screenshot Euclidean MST
Given a set of N points in the plane (top), the Euclidean MST is the shortest set of lines connecting them together (bottom). This problem is not just a graph-processing problem, because we need to make use of global geometric information about the points to avoid having to process all N2 implicit edges connecting the points.
This solution is generally too slow. The Euclidean problem is somewhat different from the graph problems that we have been considering because all the edges are implicitly defined. The size of the input is just proportional to N, so the solution that we have sketched is a quadratic algorithm for the problem. Research has proved that it is possible to do better. The geometric structure makes most of the edges in the complete graph irrelevant to the problem, and we do not need to add most of them to the graph before we construct the MST. Property 20.13 We can find the Euclidean MST of N points in time proportional to NlogN. This fact is a direct consequence of two basic facts about points in the plane that we discuss in detail in Part 7. First, a graph known as the Delauney triangulation contains the MST, by definition. Second, the Delauney triangulation is a planar graph whose number of edges is proportional to N.
In principle, then, we could compute the Delauney triangulation in time proportional to N log N, then run either Kruskal's algorithm or the priority-first search method to find the Euclidean MST, in time proportional to N log N. But writing a program to compute the Delauney triangulation is a challenge for even an experienced programmer, so this approach may be overkill for this problem in practice. Other approaches derive from the geometric algorithms that we consider in Part 7. For randomly distributed points, we can divide up the plane into squares such that each square is likely to contain about lg N/2 points, as we did for the closest-point computation in Program 3.20. Then, even if we include in the graph only the edges connecting each point to the points in the neighboring squares, we are likely (but are not guaranteed) to get all the edges in the MST; in that case, we could use Kruskal's algorithm or the PFS implementation of Prim's algorithm to finish the job efficiently. The example that we have used in Screenshot, Screenshot, Screenshot, and similar figures was created in this way (see Screenshot). Or, we could develop a version of Prim's algorithm based on using near-neighbor algorithms to avoid updating distant vertices.
Screenshot Euclidean near-neighbor graphs
One way to compute the Euclidean MST is to generate a graph with edges connecting every pair of points within a distance d, as in the graph in Screenshot et al. However, this method yields too many edges if d is too large (top) and is not guaranteed to have edges connecting all the points if d is smaller than the longest edge in the MST (bottom).
With all the possible choices that we have for approaching this problem and with the possibility of linear algorithms for the general MST problem, it is important to note that there is a simple lower bound on the best that we could do. Property 20.14 Finding the Euclidean MST of N points is no easier than sorting N numbers. Proof: Given a list of numbers to be sorted, convert the list into a list of points where the x coordinate is taken from the corresponding number of the list and the y coordinate is 0. Find the MST of that list of points. Then (as we did for Kruskal's algorithm), put the points into a graph ADT and run DFS to produce a spanning tree, starting at the point with the lowest x coordinate. That spanning tree amounts to a linked-list sort of the numbers in order.
Precise interpretations of this lower bound are complicated because the basic operations used for the two problems (comparisons of coordinates for the sorting problem, distances for the MST problem) are different and because there is a possibility of using methods such as radix sort and grid methods. However, we may interpret the bound to mean that, as we do sorting, we should consider a Euclidean MST algorithm that uses N lg N comparisons to be optimal unless we exploit numerical properties of the coordinates, in which case we might expect it to be linear time (see reference section). It is interesting to reflect on the relationship between graph and geometric algorithms that is brought out by the Euclidean MST problem. Many of the practical problems that we might encounter could be formulated either as geometric problems or as graph problems. If the physical placement of objects is a dominating characteristic, then the geometric algorithms of Part 7 may be called for; but if interconnections between objects are of fundamental importance, then the graph algorithms of this section may be better. The Euclidean MST seems to fall at the interface between these two approaches (the input involves geometry and the output involves interconnections), and the development of simple, straightforward methods for the Euclidean MST remains an elusive goal. In , we see a similar problem that falls at this interface, but where a Euclidean approach admits substantially faster algorithms than do the corresponding graph problems.