Collection Concepts

During the early days of object-oriented programming, one of its main deficiencies was its inability to manage collections of objects. For years, C++ suffered because the collections management that was available was not standardized. To attack this problem, various vendors designed packages of collection classes that could be downloaded and implemented. However, even these packages did not solve the portability problem among cooperating vendors. For example, if my company had bought Rogue Wave Tools.h++, all of your business partners would have needed to make a similar purchase to extend your software. To fix this fundamental deficiency, Sun introduced standard collection classes in the earliest version of the JDK. This decision, along with the other common class libraries in the JDK, contributed to the rapid rise of Java technology. Now, instead of my partner companies having to make a separate and often expensive purchase, they can simply use the collections in the JDK.

An Interface-Based Approach

The JDK collection classes can be found in the java.util package. They are structured using an interface-implementation model. For each type of collection, there is one interface and various implementations. For example, the List interface is implemented by the AbstractList, ArrayList, LinkedList, and Vector classes.

Don't worry about these implementation classes for now; we will discuss them later in the chapter.

Interfaces are a special approach to object-oriented engineering. While classes implement the concept, the interface defines the concept. For example, an ArrayList and a LinkedList are both conceptually lists. The interfaces of the java.util package are the only components that should be exposed to the users of your classes, unless you have no other choice. This follows from the idea of encapsulation, which dictates that the programmer should not expose implementation details to the user of a class, but only the interface to that class. For example, the following code is an example of a good interface-based technique:

public class SomeClass {
 HashSet someSet = new HashSet( );
 protected Set someMethod( ) {
 return this.someSet;
 }
}

In this example, the programmer exposes only the interface of someSet to the users and does not let them know which implementation is being used. Code programmed in this manner is very flexible and bug-resistant. For example, if you decide that HashSet does not perform well enough, you can always change the implementation without affecting the users of the class. By contrast, consider the following code, which does not use interfaces:

public class SomeClass {
 HashSet someSet = new HashSet( );
 protected HashSet someBadMethod( ) {
 return this.someSet;
 }
}

Chasing Bugs

In corporate programming, resources are often wasted when bugs are chased throughout the code. This process starts when a user reports a bug on a particular functionality. The programmers find the bug and fix it. However, the fix causes three other bugs, which appear to be completely unrelated to the previous bug. The programmers quickly fix these as well, but end up with bugs in other parts of the code. The code is undoubtedly extremely fragile. This fragility is caused by extremely tight integration of various components within the code base and circular dependencies in the code. For example, Package A often depends on Package B, which in turn depends on Package A. Finding these bugs is very expensive. The way to avoid such traps is to build walls between code components using various techniques, which include using dependency analysis tools to check for circular dependencies and using interfaces in the java.util package. Bugs cannot easily pass through interfaces, so if one component talks to another using only interfaces, then the two components are far less susceptible to bug migration.

In this example, if you found a bug that occurred because you were using a HashSet, you could not easily change the implementation used for the set. If you did, you would then have to change all of the code that is using your class. This could echo around your code base for quite a while as you chase bugs. You should expose the underlying collection implementation in your code only when you have no other choice, such as when the users of your code need access to a specific functionality of the implementation. In fact, this is a good policy to follow within classes as well. The example would have been much better written as:

public class SomeClass {
 Set someOtherSet = new HashSet( );
 protected Set someOtherMethod( ) {
 return this.someOtherSet;
 }
}

In the revised class, you leverage the usage of interfaces for the instance variable type and for external communication. This gives you even more protection from migrating bugs that may result from an implementation change. If another method in the class uses someOtherSet as a Set and not a HashSet, then you can easily change to another Set implementation as the need arises.

Collection Types

Now that you have mastered the interface approach, you should examine the actual interfaces in the java.util package and the differences between them. There are two types of general collections that form the basis of all collection classes:

java.util.Map: This interface represents collection types that store values in a lookup table. Each value is stored by using a key that can later fetch that value from the map. Maps can take many forms; one recognizable form is a dictionary. A dictionary class stores words and their definitions. The java.util.Map classes work the same way.
java.util.Collection: This interface represents generic collections other than those that store values in a key-based lookup table. These collections have a variety of access mechanisms.

These two interfaces form the conceptual base for the entire java.util package. Other collections that extend the conceptual basis are derived from these interfaces.

Maps

The java.util.Map class defines a map that is unsorted and iterated in a fairly unpredictable manner. In fact, one of the warnings for the Map class in the Javadoc is that the iteration order is not predictable. This is not a problem in many situations, such as storing event listeners that listen for specific events. In this case, the key could be the event type and the value store could be a set of listeners; since the map would never be iterated in order, having a defined iteration order is not needed. However, there may be times when you want more control over the iteration order of a Map. For example, if you are creating a tree control to show the files on a user's computer, you may want to display these objects in alphabetical order for easy reading and interaction. For this purpose, you would need a specialized kind of map:

java.util.SortedMap: This interface extends a map to specify the iteration order of the keys in the map. This map has a restriction: it can only use keys that it can sort. You must give the class a Comparator, which is an implementation of the java.util.Comparator to implement the sorting, or you must use keys that implement the java.lang.Comparable interfaces.

Other than the difference in iteration order, the main difference between SortedMaps and Maps is that SortedMaps are generally slower when it comes to inserting and removing items because the map must rearrange itself to accommodate new values. Regular Maps don't have to worry about these issues. For this reason, you should use only a SortedMap if you must have a fixed iteration order. One thing that Maps and SortedMaps have in common is that they are unable to contain the same key more than once. Each key has a defined and assigned value. If you add a new element to a map with a key that the map is already using, the map implementation will simply replace the old value with the new one.

Collections

The Collection interface has several variants that are used to extend the Collection functionality. The Collection class itself defines only collections of objects and says nothing about their iteration order, method of access, or ability to contain duplicates. However, it is fairly uncommon for developers to directly use a Collection to store anything. Usually, you need to be more specific about the constraints of your Collection. For this purpose, the JDK provides you with three types of Collections:

java.util.Set: Sets are collections that do not guarantee the order in which they will be iterated. However, they do guarantee that no duplicates will be found inside the set.
java.util.List: Lists are collections that are in a specific order and can be accessed through a numerical index. They can also contain duplicate elements. Note that just because the elements are arrayed in an order doesn't necessarily mean they are sorted. The elements in a list are usually in the same order in which they were inserted. They may or may not be sorted, depending on the programmer's needs.
java.util.SortedSet: Sorted sets maintain the order of the objects within the set, depending on one of two criteria. Either the objects are sorted according to their natural sorting order, as defined by their implementation of Comparable, or they are sorted using a supplied comparator. We will discuss the specifics of both procedures later in the chapter.

Now that you have all of the nifty features of these subtypes, the question becomes, "What good is the Collection interface?" The first reason it exists is because it provides a conceptual basis for all collections. The second reason is to give the collections an easy way to convert from one type of collection to another. In this manner, you can copy lists to sets, and so on.

Previous Next

Comments