Key-Indexed Search - Algorithms - Java Programming Language

Suppose that the keys are small integers (as is the case, for example, with Programs 12.6 and 12.4). In this case, the simplest search algorithm is based on storing the items in an array, indexed by the keys, as in the implementation given in Program 12.7. The code is straightforward: Operator new[] initializes all the entries with null; then we insert an item with key value k simply by storing it in st[k], and search for an item with key value k by looking in st[k]. To remove an item with key value k, we put null in st[k]. The select, sort, and count implementations in Program 12.7 use a linear scan through the array, skipping null items. The implementation leaves to the client the tasks of handling items with duplicate keys and checking for conditions such as specifying remove for a key not in the table.

This implementation does not implement the interface of Program 12.5 because it requires that keys be integers (it uses intkeyItem, not ITEM and KEY). We start with this implementation because it a simple one that exemplifies all of the symbol-table implementations that we consider in this chapter and in Chapters 13 through 15. The indexing operation upon which key-indexed search is based is the same as the basic operation in the key-indexed counting sort method that we examined in . When it is applicable, key-indexed search is the method of choice, because search and insert could hardly be implemented more efficiently.

If there are no items at all (just keys), we can use a table of bits. The symbol table in this case is called an existence table, because we may think of the kth bit as signifying whether k exists among the set of keys in the table. (The Java BitSet class implements such a symbol table.) For example, we could use this method to determine quickly whether a given 4-digit number in a telephone exchange has already been assigned, using a table of 313 words on a 32-bit computer (see Exercise 12.15).

Key-indexed-array-based symbol table

This implementation uses integer keys and further assumes that keys are positive and less than the constructor's parameter so that it can use them as indices into an array. The primary costs to watch are both the amount of space required when the array is large and the amount of time required for new to initialize all the array entries to null when the number of keys in the array is small relative to its size.

class ST { private intkeyItem[] st; ST(int M) { st = new intkeyItem[M]; } int count() { int N = 0; for (int i = 0; i < st.length; i++) if (st[i] != null) N++; return N; } void insert(intkeyItem x) { st[x.key()] = x; } void remove(int key) { st[key] = null; } intkeyItem search(int key) { return st[key]; } intkeyItem select(int k) { for (int i = 0; i < st.length; i++) if (st[i] != null && k-- == 0) return st[i]; return null; } public String toString() { String s = ""; for (int i = 0; i < st.length; i++) if (st[i] != null) s += st[i] + "\n"; return s; } }

Property 12.1

If key values are positive integers less than M and items have distinct keys, then the symbol-table data type can be implemented with key-indexed arrays of items such that insert, search, and remove require constant time; and initialize, select, and sort require time proportional to M, whenever any of the operations are performed on an N-itemtable.

This fact is immediate from inspection of the code. Note that the conditions on the keys imply that N Screenshot M.

Program 12.7 does not handle duplicate keys, and it assumes that the key values are between 0 and M-1. We could use linked lists or one of the other approaches mentioned in to store any items with duplicate keys, and we could do simple transformations of the keys before using them as indices (see Exercise 12.14); but we defer considering these cases in detail to , when we consider hashing, which uses this same approach to implement symbol tables for general keys, by transforming keys from a potentially large range such that they fall within a small range, then taking appropriate action for items with duplicate keys. For the moment, we assume that an old item with a key value equal to the key in an item to be inserted can be either silently ignored (as in Program 12.7) or treated as an error condition.

The implementation of count in Program 12.7 is a lazy approach where we do work only when the method count is called. An alternative (eager) approach is to maintain the count of nonempty table positions in a local variable, incrementing the variable if insert is into a table position that contains null, and decrementing it if remove is for a table position that does not contain null (see Exercise 12.11). The lazy approach is the better of the two if the count operation is used rarely (or not at all) and the number of possible key values is small; otherwise, the eager approach is better. For a library routine, the eager approach is preferred, because it provides optimal worst-case performance at the cost of a small constant factor for insert and remove; for the inner loop in an app with a huge number of insert and remove operations but few count operations, the lazy approach is preferred, because it gives the fastest implementation of the common operations. This type of tradeoff is common in the design of ADTs that must support a varying mix of operations, as we have seen on several occasions.

There are various other design decisions that we also need to make in developing a general-purpose interface. For example, should the key range be the same for all objects, or be different for different objects? If the latter option is chosen, then it may be necessary to add parameters to the constructor and to have methods giving the client access to the key range.

Key-indexed arrays are useful for many apps, but they do not apply if keys do not fall into a small range. Indeed, we might think of this and the next several chapters as being concerned with designing solutions for the case where the keys are from such a large range that it is not feasible to have an indexed table with one potential place for each key.

Exercises

12.10 Modify Programs 12.5 and 12.6 to be an interface and a client corresponding to the implementation of Program 12.7.
12.11 Modify the implementation of Program 12.7 to provide an eager implementation of count (by keeping track of the number of nonnull entries).
Implement a clonable symbol-table ADT (see Exercise 12.6) that uses key-indexed arrays.
Modify your implementation from Exercise 12.12 to provide an eager implementation of count (see Exercise 12.11).
Develop a version of Program 12.7 that assumes that KEY has a method h that converts keys to nonnegative integers less than M, with no two keys mapping to the same integer. (This improvement makes the implementation useful whenever keys are in a small range (not necessarily starting at 0) and in other simple cases.)
Develop a version of Program 12.7 for the case when items are keys that are positive integers less than M (no associated information). In the implementation, use an array of about M/32 integers.
Use your implementation from Exercise 12.15 for experiments to determine empirically the average and standard deviation for the number of distinct integers in a random sequence of N nonnegative integers less than N, for N close to the memory available to a program on your computer, expressed as a number of bits (see Program 12.6).
Develop a solution to Exercise 12.15 that uses a boolean array and compare its performance with your original solution for the task posed in Exercise 12.16.