Creation of a New ADT - Algorithms - Java Programming Language

s 4.2 through 4.4 present a complete example of Java code that captures one of our most important abstractions: the pushdown stack. The interface in defines the basic operations; client programs such as those in can use those operations without dependence on how the operations are implemented; and implemen-tations such as those in provide the necessary concrete representation and program code to realize the abstraction.

To design a new ADT, we often enter into the following process: Starting with the task of developing a client program to solve an apps problem, we identify operations that seem crucial-that is, what would we like to be able to do with our data? Then, we define an interface and write client code to test the hypothesis that the existence of the ADT would make it easier for us to implement the client program. Next, we consider the idea of whether or not we can implement the operations in the ADT with reasonable efficiency. If we cannot, we perhaps can seek to understand the source of the inefficiency and to modify the interface to include operations that are better suited to efficient implementation. These modifications affect the client program, and we modify it accordingly. After a few iterations, we have a working client program and a working implementation, so we freeze the interface by adopting a policy of not changing it. At this moment, the development of client programs and the development of implementations are separable: We can write other client programs that use the same ADT (perhaps we write some driver programs that allow us to test the ADT), we can write other implementations, and we can compare the performance of multiple implementations.

In other situations, we might define the ADT first. This approach might involve asking questions such as these: What basic operations would client programs want to perform on the data at hand? Which operations do we know how to implement efficiently? After we develop an implementation, we might test its efficacy on client programs. We might modify the interface and do more tests before eventually freezing the interface.

In , we considered a detailed example where thinking on an abstract level helped us to find an efficient algorithm for solving a complex problem. We consider next the use of the general approach that we are discussing in this chapter to encapsulate the specific abstract operations that we exploited in .

Program 4.11 defines the interface, in terms of two operations (in addition to construct) that seem to characterize the algorithms that we considered in for connectivity, at a high abstract level. Whatever the underlying algorithms and data structures, we want to be able both to check whether or not two nodes are known to be connected and to declare that two nodes are connected.

Equivalence-relations ADT interface

Our ADT interface mechanism makes it convenient for us to encode precisely our decision to consider the connectivity algorithm in terms of a class that suppports three abstract operations: initialize an abstract data structure to keep track of connections among the given number of nodes, find whether two given nodes are connected, and unite two given nodes to consider them connected henceforth.

class UF // ADT interface { // implementations and private members hidden UF(int) boolean find(int, int) void unite(int, int) }

Program 4.12 is a client program that uses the ADT defined in the interface of Program 4.11 to solve the connectivity problem. One benefit of using the ADT is that this program is easy to understand, because it is written in terms of abstractions that allow the computation to be expressed in a natural way.

Program 4.13 is an implementation of the union-find interface defined in Program 4.11 that uses a forest of trees represented by two arrays as the underlying representation of the known connectivity information, as described in . The different algorithms that we considered in represent different implementations of this ADT, and we can test them as such without changing the client program at all.

This ADT leads to programs that are slightly less efficient than those in for the connectivity app, because it does not take advantage of the property of that client that every union operation is immediately preceded by a find operation. We sometimes incur extra costs of this kind as the price of moving to a more abstract representation. In this case, there are numerous ways to remove the inefficiency, perhaps at the cost of making the interface or the implementation more complicated (see Exercise 4.30). In practice, the paths are extremely short (particularly if we use path compression), so the extra cost is likely to be negligible in this case.

Equivalence-relations ADT client

The ADT of Program 4.11 allows us to separate the connectivity algorithm of Program 1.1 from the union-find implementation, thereby making both more accessible. For example, using the ADT allows us to try various union-find implementations such as Programs 1.2 through 1.4 without changing the connectivity code at all.

class Equivalence { public static void main(String[] args) { int p, q, N = Integer.parseInt(args[0]); UF info = new UF(N); for (In.init(); !In.empty(); ) { p = In.getInt(); q = In.getInt(); if (!info.find(p, q)) { info.unite(p, q); Out.println(p + "-" + q); } } } }

Programs 4.12 and 4.13 are equivalent to Program 1.3, but splitting the program into two parts is a better approach because it

Separates the task of solving the high-level (connectivity) problem from the task of solving the low-level (union-find) problem, allowing us to work on the two problems independently
Gives us a natural way to compare different algorithms and data structures for solving the problem
Defines, through the interface, a way to check that the software is operating as expected
Provides a mechanism that allows us to upgrade to new representations (new data structures or new algorithms) without changing the client program at all
Gives us an abstraction that we can use to build other algorithms

Equivalence-relations ADT implementation

This code for the weighted-quick-union code from implements the interface of Program 4.11, packaging the code in a form that makes it convenient for use in other apps.

class UF { private int[] id, sz; private int find(int x) { while (x != id[x]) x = id[x]; return x; } UF(int N) { id = new int[N]; sz = new int[N]; for (int i = 0; i < N; i++) { id[i] = i; sz[i] = 1; } } boolean find(int p, int q) { return (find(p) == find(q)); } void unite(int p, int q) { int i = find(p), j = find(q); if (i == j) return; if (sz[i] < sz[j]) { id[i] = j; sz[j] += sz[i]; } else { id[j] = i; sz[i] += sz[j]; } } }

These benefits are widely applicable to many tasks that we face when developing computer programs, so the basic tenets underlying ADTs are widely used.

How do we switch from one implementation to another? The simplest method is to rename files. Java always expects the implementation of, say, the UF class to be in a file named UF.java, so we can substitute a different implementation by renaming it to be UF.java (after, presumably, giving the old one some different name or saving it somewhere). However, moving around multiple files with the same name can be confusing and cumbersome, and this task is so common that many Java environments provide a specific mechanism to support it: a programmer can specify a class path sequence that tells the Java interpreter which directories it should check to look for the code that implements the classes. Thus, we can keep one implementation in one directory and another in another directory, then choose between them by specifying an appropriate class path.

Abstract class for equivalence-relations ADT

This Java code constitutes an interface for the equivalence-relations ADT that provides complete separation between client and implementation (see text).

interface uf { int find(int x); boolean find(int p, int q); void unite(int p, int q); }

Our code in Program 4.13 mixes the interface with the implementation and therefore does not provide the full separation of client, interface, and implementation that we would prefer to have for ADTs. With private, we can keep client programs from any dependence on the data representation, but if we make any changes in the data representation, we have to recompile all the clients. In many software engineering scenarios, we may have no information about the clients, so this would be an onerous requirement. In other scenarios, this arrangement does make sense. For a huge and complex ADT, we might settle on a data representation and an interface, then have multiple programmers working on different parts of the implementation. In this case, the public part of the interface serves as a contract between the programmers and the clients, and the private part of the interface serves as a contract among programmers. This strategy also often provides precisely what we need in this tutorial, when we want to consider different algorithms that use a particular data structure. In this way, we might be able to realize performance gains by changing a small part of a huge system.

The Java language does provide a mechanism that is specifically designed to allow us to write programs with a well-defined interface that completely separates clients from implementations. It is based on the Java inheritance mechanism. Through inheritance, we can add members to an existing class or redefine any of its methods; including abstract in a method declaration means that the method must be redefined in some extended class. A class that has any abstract methods is known as an abstract class. No implementation is provided for an abstract method in an abstract class (only its signature is needed): some extended class will provide the implementation. An abstract class whose public methods are all abstract is similar to what we refer to as an ADT interface.

Java also has an interface mechanism that is similar to an abstract class whose methods are all abstract. Any class that extends an interface must define all of the methods in an interface; so, in our terminology, it is an implementation. Clients can use the interface and the Java system can enforce the contract between clients and implementations, even when clients and implementations are compiled separately. For example, Program 4.14 shows an interface uf for equivalence relations; changing the first line of Program 4.13 to

class UF implements uf

would indicate that UF defines (at least) all of the methods in uf-that is, it is an implementation of interface uf.

Unfortunately, using an abstract class or a Java interface incurs significant runtime costs, because every call to an abstract method requires (at least) following a reference through a table of references to methods. Since the algorithms and data structures that we consider in this tutorial are often in performance-critical parts of systems, we may not wish to pay these penalties to gain the flexibility that abstract classes and interfaces provide. Moreover, there is not an exact match between the Java interface mechanism and the ADT interface that we have been discussing: for example, constructors cannot be part of a Java interface, but a proper ADT interface needs to specify just which constructors clients can use and implementations must include. Also, a Java interface cannot have static methods, but we might need them in an ADT interface.

Another reason that we use an informal mechanism to define interfaces rather than a language construct is that when we use inheritance to extend a class, we are defining an implicit interface whose scope we may not even wish to explicitly articulate. For example, the methods equals, hashCode, clone, getClass, andfinalize are defined for all Java objects through inheritance, but we do not list them in every interface that we design. The convention that we use is very similar to the standard used to document Java class libraries: to describe what a client can expect from a class, we list the signatures of its public methods.

The kind of flexibility that we can achieve with interfaces and inheritance still leaves open the possibility that the implied contract between clients and implementations about what an ADT is to be may be broken, perhaps unwittingly, at some future time. All of these mechanisms ensure that client programs and implementations connect up properly, but they also depend on one another to do things, in ways that we generally cannot specify formally. For example, suppose that some uninformed programmer finds our weighted quick-union algorithm too difficult to understand and decides to replace it with a quick-find algorithm (or worse, an implementation that does not even give the right answer). We have insisted on allowing such a change to be made easily, but in this case it might cripple a client in a critical app that depends upon the implementation's having good performance for huge problems. Programming lore is filled with tales of such problems, and it is quite difficult to protect against them.

Such considerations are drawing us into considering properties of languages, compilers, interpreters, and virtual machines, and are rather far afield from algorithms. Accordingly, we will most often stick to our simple two-file convention where we implement ADTs with Java classes, the public methods constituting the interface. Our primary reason for doing so is to provide is a convenient and compact expression of our data structures and algorithms. If, for a particular app, we need the extra flexibility afforded by the other approaches just mentioned, we can restructure our classes along these lines.

Exercises

Give a class that implements the same interface as Program 4.13, but uses path compression by halving.
Remove the inefficiency mentioned in the text by adding an operation to Program 4.11 that combines union and find, providing an implementation in Program 4.13 and modifying Program 4.12 accordingly.
Modify our equivalence-relations interface (Program 4.11) and implementation (Program 4.13) to provide a method that will return the number of nodes known to be connected to a given node.
Modify Program 4.13 to use an array of objects instead of parallel arrays for the underlying data structure.
Build a solution to the postfix-expression evaluation problem that uses a Java interface for the stack-of-integers ADT interface. Make sure that your client program (your version of Program 4.5) can be compiled separately from your stack implementation (your version of Program 4.7).
4.34 Create a full implementation of the equivalence-relations ADT based on a Java interface, and compare its performance against Program 4.13 on huge connectivity problems, in the style of Table 1.1.