Suppose that the keys are small integers (as is the case, for example, with Programs and ). In this case, the simplest search algorithm is based on storing the items in an array, indexed by the keys, as in the implementation given in . The code is straightforward: Operator new[] initializes all the entries with null; then we insert an item with key value k simply by storing it in st[k], and search for an item with key value k by looking in st[k]. To remove an item with key value k, we put null in st[k]. The select, sort, and count implementations in use a linear scan through the array, skipping null items. The implementation leaves to the client the tasks of handling items with duplicate keys and checking for conditions such as specifying remove for a key not in the table.

This implementation does not implement the interface of because it requires that keys be integers (it uses intkeyItem, not ITEM and KEY). We start with this implementation because it a simple one that exemplifies all of the symbol-table implementations that we consider in this chapter and in Chapters through . The indexing operation upon which key-indexed search is based is the same as the basic operation in the key-indexed counting sort method that we examined in . When it is applicable, key-indexed search is the method of choice, because search and insert could hardly be implemented more efficiently.

If there are no items at all (just keys), we can use a table of bits. The symbol table in this case is called an existence table, because we may think of the kth bit as signifying whether k exists among the set of keys in the table. (The Java BitSet class implements such a symbol table.) For example, we could use this method to determine quickly whether a given 4-digit number in a telephone exchange has already been assigned, using a table of 313 words on a 32-bit computer (see ).

Key-indexed-array-based symbol table

This implementation uses integer keys and further assumes that keys are positive and less than the constructor's parameter so that it can use them as indices into an array. The primary costs to watch are both the amount of space required when the array is large and the amount of time required for new to initialize all the array entries to null when the number of keys in the array is small relative to its size.

class ST { private intkeyItem[] st; ST(int M) { st = new intkeyItem[M]; } int count() { int N = 0; for (int i = 0; i < st.length; i++) if (st[i] != null) N++; return N; } void insert(intkeyItem x) { st[x.key()] = x; } void remove(int key) { st[key] = null; } intkeyItem search(int key) { return st[key]; } intkeyItem select(int k) { for (int i = 0; i < st.length; i++) if (st[i] != null && k-- == 0) return st[i]; return null; } public String toString() { String s = ""; for (int i = 0; i < st.length; i++) if (st[i] != null) s += st[i] + "\n"; return s; } } 

Property 12.1

If key values are positive integers less than M and items have distinct keys, then the symbol-table data type can be implemented with key-indexed arrays of items such that insert, search, and remove require constant time; and initialize, select, and sort require time proportional to M, whenever any of the operations are performed on an N-itemtable.

This fact is immediate from inspection of the code. Note that the conditions on the keys imply that N Screenshot M. Screenshot

does not handle duplicate keys, and it assumes that the key values are between 0 and M-1. We could use linked lists or one of the other approaches mentioned in to store any items with duplicate keys, and we could do simple transformations of the keys before using them as indices (see ); but we defer considering these cases in detail to , when we consider hashing, which uses this same approach to implement symbol tables for general keys, by transforming keys from a potentially large range such that they fall within a small range, then taking appropriate action for items with duplicate keys. For the moment, we assume that an old item with a key value equal to the key in an item to be inserted can be either silently ignored (as in ) or treated as an error condition.

The implementation of count in is a lazy approach where we do work only when the method count is called. An alternative (eager) approach is to maintain the count of nonempty table positions in a local variable, incrementing the variable if insert is into a table position that contains null, and decrementing it if remove is for a table position that does not contain null (see ). The lazy approach is the better of the two if the count operation is used rarely (or not at all) and the number of possible key values is small; otherwise, the eager approach is better. For a library routine, the eager approach is preferred, because it provides optimal worst-case performance at the cost of a small constant factor for insert and remove; for the inner loop in an app with a huge number of insert and remove operations but few count operations, the lazy approach is preferred, because it gives the fastest implementation of the common operations. This type of tradeoff is common in the design of ADTs that must support a varying mix of operations, as we have seen on several occasions.

There are various other design decisions that we also need to make in developing a general-purpose interface. For example, should the key range be the same for all objects, or be different for different objects? If the latter option is chosen, then it may be necessary to add parameters to the constructor and to have methods giving the client access to the key range.

Key-indexed arrays are useful for many apps, but they do not apply if keys do not fall into a small range. Indeed, we might think of this and the next several chapters as being concerned with designing solutions for the case where the keys are from such a large range that it is not feasible to have an indexed table with one potential place for each key.

Exercises

Java graphics icon01 12.10 Modify Programs and to be an interface and a client corresponding to the implementation of .

Java graphics icon01 12.11 Modify the implementation of to provide an eager implementation of count (by keeping track of the number of nonnull entries).

Implement a clonable symbol-table ADT (see ) that uses key-indexed arrays.

Modify your implementation from to provide an eager implementation of count (see ).

Develop a version of that assumes that KEY has a method h that converts keys to nonnegative integers less than M, with no two keys mapping to the same integer. (This improvement makes the implementation useful whenever keys are in a small range (not necessarily starting at 0) and in other simple cases.)

Develop a version of for the case when items are keys that are positive integers less than M (no associated information). In the implementation, use an array of about M/32 integers.

Use your implementation from for experiments to determine empirically the average and standard deviation for the number of distinct integers in a random sequence of N nonnegative integers less than N, for N close to the memory available to a program on your computer, expressed as a number of bits (see ).

Develop a solution to that uses a boolean array and compare its performance with your original solution for the task posed in .