Abstract Data Types - Algorithms - Java Programming Language

Developing abstract models for our data and for the ways in which our programs process those data is an essential ingredient in the process of solving problems with a computer. We see examples of this principle at a low level in everyday coding (for example, when we use arrays and linked lists, as discussed in ) and at a high level in problem-solving (as we saw in , when we used union-find forests to solve the connectivity problem). In this chapter, we consider abstract data types (ADTs), which allow us to build programs that use high-level abstractions. With abstract data types, we can separate the conceptual transformations that our programs perform on our data from any particular data-structure representation and algorithm implementation.

All computer systems are based on layers of abstraction: We adopt the abstract model of a bit that can take on a binary 0-1 value from certain physical properties of silicon and other materials; then we adopt the abstract model of a machine from dynamic properties of the values of a certain set of bits; then we adopt the abstract model of a coding language that we realize by controlling the machine with a machine-language program; then we adopt the abstract notion of an algorithm implemented as a Java language program. Abstract data types allow us to take this process further to develop abstract mechanisms for certain computational tasks at a higher level than provided by the Java system, to develop app-specific abstract mechanisms that are suitable for solving problems in numerous apps areas, and to build higher-level abstract mechanisms that use these basic mechanisms. Abstract data types give us an ever-expanding set of tools that we can use to attack new problems.

On the one hand, our use of abstract mechanisms frees us from detailed concern about how they are implemented; on the other hand, when performance matters in a program, we need to be cognizant of the costs of basic operations. We use many basic abstractions that are built into the computer hardware and provide the basis for machine instructions; we implement others in software; and we use still others that are provided in previously written systems software. Often we build higher-level abstract mechanisms in terms of more primitive ones. The same basic principle holds at all levels: We want to identify the critical operations in our programs and the critical characteristics of our data, to define both precisely at an abstract level, and to develop efficient concrete mechanisms to support them. We consider many examples of this principle in this chapter.

To develop a new layer of abstraction, we need to define the abstract objects that we want to manipulate and the operations that we perform on them; we need to represent the data in some data structure and to implement the operations; and (the point of the exercise) we want to ensure that the objects are convenient to use to solve an apps problem. Moreover, we want to separate the client from the implementation so that a single client can use multiple implementations and so that a single implementation can be used by many clients, without having to change any code. In Java systems, this concept is familiar: the Java virtual machine is a layer of abstraction that separates Java programs from virtual machine implementations so that we know that we can run a Java program on a variety of different computers without having to change any code. In this chapter, we consider how to adapt the basic Java class mechanism to achieve this same kind of separation in layers of abstraction built with Java code.

Definition 4.1 An abstract data type (ADT) is a data type (a set of values and a collection of operations on those values) that is accessed only through an interface. We refer to a program that uses an ADT as a client, and a program that specifies the data type as an implementation.

The key distinction that makes a data type abstract is drawn by the word only: with an ADT, client programs do not access any data values except through the operations provided in the interface. The representation of the data and the methods that implement the operations are in the implementation and are completely separated from the client, by the interface. We say that the interface is opaque: the client cannot see the implementation through the interface. In Java programs, we normally draw a slightly finer distinction, because the simplest way to set up an interface involves including the data representation in the interface but specifying that client programs are not allowed to access data directly. That is, client programmers may know the data representation, but they have no way to use it.

As an example, we start with the data type for points (Program 3.2) that we considered in , which explicitly declares that points are represented as classes with pairs of floating-point numbers, with data fields named x and y. Indeed, this use of data types is common in large software systems: we develop a set of conventions for how data is to be represented (and define a number of associated operations) and make those conventions available in an interface for use by client programs that comprise a large system. The data type ensures that all parts of the system are in agreement on the representation of core system-wide data structures. While valuable, this strategy has a flaw: if we need to change the data representation, we then need to change all the client programs. Program 3.2 again provides a simple example: one reason for developing the data type is to make it convenient for client programs to manipulate points, and we expect that clients will access the individual coordinates when needed. But we cannot change to a different representation (polar coordinates, say, or three dimensions, or even different data types for the individual coordinates) without changing all the client programs.

Program 4.1 is a more elaborate version of Program 3.2 (which we can still use with Program 3.7) that includes two additional methods and some modifications to make it an ADT implementation. The key distinction that makes its data type abstract has to do with access to information, as specified by the keyword private. A private class member can be referred to only within the class; a member that is not private can be referred to by other classes. Private members can be either fields or methods: in Program 4.1 only the fields are private, but we shall see numerous examples of classes that use private methods as well.

Point class implementation

This Class defines a data type consisting of the set of values "pairs of floating-point numbers" (which are presumably interpreted as points in the Cartesian plane). The class includes eight methods: two constructors, two accessor methods that return the values of the data fields, two methods for converting to polar coordinates, a method for computing the distance to another Point, and a toString method. The data representation is private and can be accessed or modified only by the class methods, but the methods can be used by any client.

class Point { private double x, y; Point() { x = Math.random(); y = Math.random(); } Point(double x, double y) { this.x = x; this.y = y; } double x() { return x; } double y() { return y; } double r() { return Math.sqrt(x*x + y*y); } double theta() { return Math.atan2(y, x); } double distance(Point p) { double dx = this.x() - p.x(); double dy = this.y() - p.y(); return Math.sqrt(dx*dx + dy*dy); } public String toString() { return "(" + x + ","+y+")"; } }

For example, no client program that uses the Point class of Program 4.1 can refer directly to the fields p.x, q.y, and so forth (as can any client of the Point class of Program 3.2) because the x and y fields are private. All that a client can do is use the public methods to process points. Those methods do have direct access to the members of any object in the class. For example, when a client invokes the distance method in Program 4.1 with the call p.distance(q), the name x within the distance method refers to the x field in the point p (because distance was invoked as a member of the instance p), but the name p.x refers to the x field in the point q (because q is the actual parameter corresponding to the formal parameter p). We can also say this.x instead of x to eliminate possible ambiguity or confusion-the keyword this refers to the object for which a method is invoked.

Each member in a Java class has one of four possible access control states: private, protected, andpublic and no modifier (default). Informally, throughout this tutorial we use the term private to mean private or protected and the term public to mean default or public. We do so solely for economy in presenting our programs: most of our class members have no access control modifier (default); those that need to be hidden are private; and some (like main and toString) are public so as to adhere to Java system conventions.

The two added methods x() and y() in Program 4.1 provide client programs with the capability to read the values of data fields (since they cannot access them directly). These methods do not modify the fields of the object on which they are called; they just provide their values. Such methods are known as accessor methods. We often include methods of this kind in Java classes. Note that we do not provide a way to change the values of the data fields once the object is constructed: objects that cannot be changed are said to be immutable and are common in Java. To define a different kind of Point that can move we could add a method move with the same code as the two-parameter constructor, which allows clients to change the values of the x and y fields.

Program 4.2 illustrates the fundamental reason that we go to the trouble of carefully defining ADTs: By not allowing clients to access directly the data representation, we are free to change it! In this case, we switch to polar coordinates to represent points, but any client program should perform the same computation with one implementation as with the other.

Why would we want to make such a change? The most common reason to do so is to improve the implementation. Suppose that we have a large system with numerous client programs using an ADT and we discover a bug in the implementation of one of the methods, which causes it to give an incorrect answer in some circumstances. We have no way of knowing how the existence of that bug affects clients, but we can go ahead and fix it without having to change the clients at all. Or, in the case of the Point class, perhaps we notice that clients use the r() method frequently and the other methods infrequently. Then we can improve performance for these clients by switching to Program 4.2. In this case, the performance gain is relatively modest, but we shall see many examples in this tutorial where we can reap huge savings. Indeed, we will investigate many situations where the potential exists for performance gains that can vastly expand the scope of applicability of client programs, without having to change them at all.

Point class (alternate implementation)

This implementation of the data type for points uses polar coordinates for its internal representation. In principle, clients such as Programs 3.7 and 3.18 should be able to substitute this implementation for Program 4.1 without noticing the difference, except possibly for performance characteristics.

class Point { private double r, theta; Point() { double x = Math.random(), y = Math.random(); this = new Point(x, y); } Point(double x, double y) { r = Math.sqrt(x*x + y*y); theta = Math.atan2(y, x); } double r() { return r; } double theta() { return theta; } double x() { return r*Math.cos(theta); } double y() { return r*Math.sin(theta); } double distance(Point p) { double dx = x() - p.x(); double dy = y() - p.y(); return Math.sqrt(dx*dx + dy*dy); } public String toString() { return "(" + x() + ", " + y() + ")"; } }

For many apps, the ability to change implementations is a requirement. For example, suppose that we are developing soft-ware for a company that needs to process mailing lists of potential customers. With a Java class, we can define methods that allow client programs to manipulate the data without directly accessing it. With an ADT, we instead provide methods that return the data of interest. For example, we might provide client programs with an interface defining operations such as extract customer name or add customer record. The most important implication of this arrangement is that we can use the same client programs even if we need to change the format used for the mailing lists. That is, we can change the data representation and the implementation of the methods that access the data without having to change the client programs.

We also can take advantage of the flexibility afforded by ADTs in implementing methods themselves-for example, since we use method invocations like p.x() instead of field references like p.x in the implementation of distance in Program 4.1 we do not have to change that code when we change the data representation. By making these methods private, we can give ourselves this flexibility even in classes where we do not want the client to have access to the data

How does the Java class concept relate to the client-interfaceimplementation paradigm and ADTs? It provides direct language support but is sufficiently general that there are a number of different approaches that we could take. We shall usually adopt a simple convention: the signatures of the methods that are not private in a class comprise its interface. In other words, class members that are not part of its interface are private. That is, we keep the data representation hidden, where it cannot be accessed by programs that use the class (client programs). All the client programs "know" about the class is the non-private information about its methods (name, type of return value, and types of parameters).

In this tutorial, to emphasize the nature of the interface defined by a class, we will often consider interfaces before examining implementations. The essential point is that this arrangement makes it easy for us to consider other implementations, with different data representations and different implementations of the methods, and to test and compare implementations without changing the client programs at all. The convention that we use to define interfaces is illustrated in Program 4.3: we derive an interface definition from a class implementation by deleting the private members, the implementations of the public methods, and the parameter names, leaving only the signatures of the public methods. Doing so for any two classes that implement the interface should result in the same interface. The order in which the method signatures appear might differ, but that order is immaterial. These interface definitions are not Java code, but each serves as a concise and complete description of the contract between clients and implementations that is the basis for an effective ADT design.

Point ADT interface

By convention, we derive the interface associated with a class ADT implementation by removing the private members and by replacing methods with their signatures. This interface is derived in this way from Programs 4.1 and 4.2 (which implement the same interface). We can use different implementations that have the same interface without changing any code in the client programs that use the ADT.

class Point // ADT interface { // implementations and private members hidden Point() Point(double, double) double x() double y() double r() double theta() double distance(Point) public String toString() }

Our use of Java classes to implement ADTs with the convention that the public method signatures comprise the interface is not a perfect arrangement because interface and implementation are not completely separate, and we use a convention, not Java code, to define interfaces. In , we briefly discuss Java's interface facility, which does not fully support ADTs as we have defined them. For example, we cannot put constructors in a Java interface, so clients access constructors without going through the interface, which violates Definition 4.1.

Class data-type implementations of this sort are sometimes called concrete data types. This terminology is ironic because a data type that adheres to these conventions actually meets our definition of an abstract data type (Definition 4.1)-the distinction is a matter of precisely defining words like "access," "refer," and "specify," which is tricky business that we shall leave to programming-language theorists. Indeed, Definition 4.1 does not specify what an interface is or how the data type and the operations are to be described. This imprecision is necessary because specifying such information in full generality requires a formal mathematical language and eventually leads to difficult mathematical questions. This question is central in coding language design. We shall discuss the specification issue further after we consider examples of ADTs.

ADTs have emerged as an effective mechanism supporting modular programming as an organizing principle for large modern software systems. They provide a way to limit the size and complexity of the connections between (potentially complicated) algorithms and associated data structures and (a potentially large number of) programs that use the algorithms and data structures. This arrangement makes it easier to understand a large apps program as a whole. Moreover, unlike simple data types, ADTs provide the flexibility necessary to make it convenient to change or improve the fundamental data structures and algorithms in the system. Most important, the ADT interface defines a contract between users and implementors that provides a precise means of communicating what each can expect of the other.

With a carefully designed ADT, we can make use of the separation between client and implementations in many interesting ways. For example, we commonly use driver programs when developing or debugging ADT implementations. Similarly, we often use incomplete implementations of ADTs, called stubs, as placeholders while building systems in order to learn properties of clients.

We examine ADTs in detail in this chapter because they also play an important role in the study of data structures and algorithms. Indeed, the essential motivation behind the development of nearly all the algorithms that we consider in this tutorial is to provide efficient implementations of the basic operations for certain fundamental ADTs that play a critical role in many computational tasks. Designing an ADT is only the first step in meeting the needs of apps programs-we also need to develop viable implementations of the associated opera-tions and underlying data structures that enable them. Those tasks are the topic of this tutorial. Moreover, we use abstract models directly to develop and to compare the performance characteristics of algorithms and data structures, as in the example in : Typically, we develop an apps program that uses an ADT to solve a problem, then develop multiple implementations of the ADT and compare their effectiveness. In this chapter, we consider this general process in detail, with numerous examples.

Java programmers use data types and ADTs regularly. At a low level, when we process integers using only the operations provided by Java for integers, we are essentially using a system-defined abstraction for integers. The integers could be represented and the operations implemented some other way on some new machine, but a program that uses only the operations specified for integers will work properly on the new machine. In this case, the various Java operations for integers constitute the interface, our programs are the clients, and the Java virtual machine provides the implementation. We can run our program on a different computer with, say, different representations for integers or floating-point numbers, without having to change our programs at all.

Most Java classes are examples of data-type implementations, though they exist in a more complex context than we have sketched and do not always conform completely to the standards outlined here. They allow us to not only use different implementations of the methods, but also base them upon different underlying data structures. Again, the key distinction that characterizes ADTs is that we can make such changes without modifying any client programs at all, because of the requirement that the data type be accessed only through the interface. In Java, we enforce this restriction by making private everything but the public methods that constitute the interface.

We shall see numerous examples throughout this chapter of using Java classes to build ADT implementations. After we have developed a feel for the concept, we shall return at the end of the chapter to a discussion of philosophical and practical implications.