Priority Queues and Heapsort - Algorithms - Java Programming Language

Many apps require that we process records with keys in order, but not necessarily in full sorted order and not necessarily all at once. Often, we collect a set of records, then process the one with the largest key, then perhaps collect more records, then process the one with the current largest key, and so forth. An appropriate data structure in such an environment supports the operations of inserting a new element and deleting the largest element. Such a data structure is called a priority queue. Using priority queues is similar to using queues (remove the oldest) and stacks (remove the newest), but implementing them efficiently is more challenging. The priority queue is the most important example of the generalized queue ADT that we discussed in . In fact, the priority queue is a proper generalization of the stack and the queue, because we can implement these data structures with priority queues, using appropriate priority assignments (see Exercises 9.3 and 9.4).

Definition 9.1 A priority queue is a data structure of items with keys which supports two basic operations: insert a new item, and remove the item with the largest key.

apps of priority queues include simulation systems, where the keys might correspond to event times, to be processed in chronological order; job scheduling in computer systems, where the keys might correspond to priorities indicating which users are to be served first; and numerical computations, where the keys might be computational errors, indicating that the largest should be dealt with first.

We can use any priority queue as the basis for a sorting algorithm by inserting all the records, then successively removing the largest to get the records in reverse order. Later on in this tutorial, we shall see how to use priority queues as building blocks for more advanced algorithms. In Part 5, we shall see how priority queues are an appropriate abstraction for helping us understand the relationships among several fundamental graph-searching algorithms; and in Part 6, we shall develop a file-compression algorithm using routines from this chapter. These are but a few examples of the important role played by the priority queue as a basic tool in algorithm design.

In practice, priority queues are more complex than the simple definition just given, because there are several other operations that we may need to perform to maintain them under all the conditions that might arise when we are using them. Indeed, one of the main reasons that many priority-queue implementations are so useful is their flexibility in allowing client app programs to perform a variety of different operations on sets of records with keys. We want to build and maintain a data structure containing records with numerical keys (priorities) that supports some of the following operations:

Construct a priority queue from N given items.
Insert a new item.
Remove the maximum item.
Change the priority of an arbitrary specified item.
Remove an arbitrary specified item.
Join two priority queues into one large one.

If records can have duplicate keys, we take "maximum" to mean "any record with the largest key value." As with many data structures, we also need to add a standard test if empty operation and perhaps a copy (clone) operation to this set.

There is overlap among these operations, and it is sometimes convenient to define other, similar operations. For example, certain clients may need frequently to find the maximum item in the priority queue, without necessarily removing it. Or, we might have an operation to replace the maximum item with a new item. We could implement operations such as these using our two basic operations as building blocks: Find the maximum could be remove the maximum followed by insert, and replace the maximum could be either insert followed by remove the maximum or remove the maximum followed by insert. We normally get more efficient code, however, by implementing such operations directly, provided that they are needed and precisely specified. Precise specification is not always as straightforward as it might seem. For example, the two options just given for replace the maximum are quite different: the former always makes the priority queue grow temporarily by one item, and the latter always puts the new item on the queue. Similarly, the change priority operation could be implemented as a remove followed by an insert, and construct could be implemented with repeated uses of insert.

Basic priority-queue ADT

This interface defines operations for the simplest type of priority queue: initialize, test if empty, add a new item, remove the largest item. Elementary implementations of these methods using arrays and linked lists can require linear time in the worst case, but we shall see implementations in this chapter where all operations are guaranteed to run in time at most proportional to the logarithm of the number of items in the queue. The constructor's parameter specifies the maximum number of items expected in the queue and may be ignored by some implementations.

class PQ // ADT interface { // implementations and private members hidden PQ(int) boolean empty() void insert(ITEM) ITEM getmax() };

For some apps, it might be slightly more convenient to switch around to work with the minimum, rather than with the maximum. We stick primarily with priority queues that are oriented toward accessing the maximum key. When we do need the other kind, we shall refer to it (a priority queue that allows us to remove the minimum item) as a minimum-oriented priority queue.

The priority queue is a prototypical abstract data type (ADT) (see ): It represents a well-defined set of operations on data, and it provides a convenient abstraction that allows us to separate apps programs (clients) from various implementations that we will consider in this chapter. The interface given in Program 9.1 defines the most basic priority-queue operations; we shall consider a more com-plete interface in . Strictly speaking, different subsets of the various operations that we might want to include lead to different abstract data structures, but the priority queue is essentially characterized by the remove-the-maximum and insert operations, so we shall focus on them.

Different implementations of priority queues afford different performance characteristics for the various operations to be performed, and different apps need efficient performance for different sets of operations. Indeed, performance differences are, in principle, the only differences that can arise in the abstract-data-type concept. This situation leads to cost tradeoffs. In this chapter, we consider a variety of ways of approaching these cost tradeoffs, nearly reaching the ideal of being able to perform the remove the maximum operation in logarithmic time and all the other operations in constant time.

First, in , we illustrate this point by discussing a few elementary data structures for implementing priority queues. Next, in Sections 9.2 through 9.4, we concentrate on a classical data structure called the heap, which allows efficient implementations of all the operations but join. In , we also look at an important sorting algorithm that follows naturally from these implementations. In Sections 9.5 and 9.6, we look in more detail at some of the problems involved in developing complete priority-queue ADTs. Finally, in , we examine a more advanced data structure, called the binomial queue, that we use to implement all the operations (including join) in worst-case logarithmic time.

During our study of all these various data structures, we shall bear in mind both the basic tradeoffs dictated by linked versus sequential memory allocation (as introduced in ) and the problems involved with making packages usable by apps programs. In particular, some of the advanced algorithms that appear later in this tutorial are client programs that make use of priority queues.