Previous   Next

Performance Characteristics

How do we choose among randomized BSTs, splay BSTs, red–black BSTs, and skip lists for a particular app? We have concentrated on the differing nature of these algorithms' performance guarantees. Time and space are always primary considerations, but we must also consider a number of other factors. In this section, we shall briefly discuss implementation issues, empirical studies, estimates of running time, and space requirements.

All the tree-based algorithms depend on rotations; implementation of rotations along the search path is an essential ingredient of most balanced tree algorithms. We have used recursive implementations that implicitly save references to nodes on the search path in local variables on the recursion stack, but each of the algorithms can be implemented in a nonrecursive fashion, operating on a constant number of nodes and performing a constant number of link operations per node in one top-down pass through the tree.

Randomized BSTs are the simplest to implement of the three tree-based algorithms. The prime requirements are to have confidence in the random-number generator and to avoid spending too much time generating the random bits. Splay BSTs are slightly more complicated but are a straightforward extension to the standard root-insertion algorithm. Red–black BSTs involve slightly more code stillf in order to check and manipulate the color bits. One advantage of red–black trees over the other two is that the color bits can be used for both a consistency check for debugging and for a guarantee of a quick search at any time during the lifetime of the tree. There is no way to know from examining a splay BST whether or not the code that produced it made all the proper transformations; a bug might lead (only!) to performance problems. Similarly, a bug in the random-number generator for randomized BSTs or skip lists could lead to otherwise unnoticed performance problems.

Skip lists are easy to implement and are particularly attractive if a full range of symbol-table operations is to be supported, because search, insert, remove, join, select, and sort all have natural implementations that are easy to formulate. The inner loop for searching in skip lists is longer than that for trees (it involves an additional index into the link array or an additional recursive call to move down a level), so the time for search and insert is longer. Skip lists also put the programmer at the mercy of the random-number generator—debugging a program whose behavior is random is a challenge, and some programmers find it particularly unsettling to work with nodes having a random number of links.

Table 13.1 gives empirical data on both the performance of the four methods that we have discussed in this chapter and on the elementary BST implementations from , for keys that are random 32-bit integers. The information in this table confirms what we expect from the analytic results in Sections 13.2, 13.4, and 13.5. Red–black BSTs are much faster than the others for random keys. Paths in red– black BSTs are 35 percent shorter than in randomized or splay BSTs, and there is less work to do in the inner loop. Randomized trees and skip lists require that we generate at least one new random number for every insertion, and splay BSTs involve a rotation at every node for every insertion and every search. By contrast, the overhead for red–black BSTs is that we check the value of 2 bits at every node during insertion and occasionally need to do a rotation. For nonuniform access, splay BSTs may involve shorter paths, but this savings is likely to be offset by the fact that both search and insertion involve rotations at every node in the inner loop, except possibly in extreme cases.

Splay BSTs require no extra space for balance information, red– black BSTs require 1 extra bit, and randomized BSTs require a count field. For many apps, the count field is maintained for other reasons, so it may not represent an extra cost for randomized BSTs. Indeed, we might need to add this field if we use splay BSTs, red–black BSTs, or skip lists. If necessary, we can make red–black BSTs as space-efficient as splay BSTs by eliminating the color bit (see Exercise 13.65). In modern apps, space is less critical than it once was, but the careful programmer still needs to be vigilant against waste. For example, we need to be aware that some systems might use a whole 32-bit word for a small count field or a 1-bit color field in a node, and that some other systems might pack the fields in memory such that unpacking them requires a significant amount of extra time. If space is tight, skip lists with large t can reduce by nearly one-half the space for links, at the cost of a slower—but still logarithmic—search. With some programming, the tree-based methods can also be implemented with one link per node (see Exercise 12.68).

Table 13.1. Empirical study of balanced tree implementations

These relative timings for building and searching BSTs from random sequences of N 32-bit integers, for various values of N, indicate that all the methods have good performance, even for huge tables, but that red–black trees are significantly faster than are the other methods. All the methods use standard BST search, except splay BSTs, where we splay on search to bring frequently accessed keys near the top, and skip lists, which use essentially the same algorithm with a different underlying data structure.

 

construction

search misses

N

B

T

T*

S

R

L

B

T

T*

S

R

L

1250

20

28

28

10

14

15

11

10

10

10

10

16

2500

10

36

40

24

25

21

15

12

11

12

11

19

5000

22

33

65

145

42

35

26

26

26

27

19

46

12500

90

128

197

267

92

145

75

74

86

80

60

145

25000

167

569

492

215

181

385

175

180

212

195

158

355

50000

381

648

1105

1125

430

892

420

415

505

433

359

878

100000

1004

2593

2656

1148

1190

3257

1047

1041

1357

1113

861

2094

200000

2628

6121

6341

2784

2936

7493

2553

2573

2893

2649

2114

5109

Key:

B Standard BST(Program 12.15)

T BST built by root insertion (Program 12.19)

T* Randomized BST (Program 13.2)

S Splay BST(Exercise 13.33 and Program 13.5)

R Red–black BST (Program 13.6)

L Skip list (Programs 13.7 and 13.9)

In summary, all the methods that we have discussed in this chapter will provide good performance for typical apps, and each has its virtues for people interested in developing a high-performance symbol-table implementation. Splay BSTs will provide good performance as a self-organizing search method, particularly when frequent access to a small set of keys is a typical pattern; randomized BSTs are likely to be faster and easier to implement for a fully functional symbol table BST; skip lists are easy to understand and can provide logarithmic search with less space than the other methods, and red–black BSTs are attractive for symbol-table library implementations, because they provide guaranteed performance bounds in the worst case and the fastest search and insertion algorithms for random data.

Beyond specific uses in apps, this panoply of solutions to the problem of developing efficient implementations of the symbol-table ADT is important because it illustrates fundamental approaches to algorithm design that are available to us when we consider solutions to other problems. In our constant quest for simple, optimal algorithms, we often encounter useful near-optimal algorithms, such as the ones discussed here. Moreover, as we saw with sorting, comparison-based algorithms such as these are only the beginning of the story—by moving to a lower-level abstraction, where we can process pieces of keys, we can develop implementations that are even faster than the ones discussed in this chapter, as we shall see in Chapters 14 and 15.

Exercises

Develop a symbol-table implementation using randomized BSTs that includes a clone implementation and supports the construct, count, search, insert, remove, and join operations for a symbol-table ADT, with support for client handles (see Exercises 12.6 and 12.7).

Develop a symbol-table implementation using skip lists that includes a clone implementation and supports the construct, count, search, insert, remove, and join operations for a symbol-table ADT, with support for client handles (see Exercises 12.6 and 12.7).


Previous   Next
Comments