Collection Gotchas

Collections provide you with a powerful tool for storing and traversing data. However, they have their share of gotchas that the savvy developer needs to beware of.

Storage by Reference

One of the most prevalent gotchas has to do with how collections store data. Collections store data by making references to that data, not by copying the actual data. This is important to remember because the collection holding the data is not necessarily the only object that can access the underlying value of that data. Since the variables for all constructed types in Java are references, and these reverences are passed by value, another part of your program could have a reference to a data object in your collection.

Failure to Override hashCode( )

When placing collections in a hash-based data set, the collections will place objects with similar hash codes in the same bucket. This can be a problem if you forget to override the hashCode( ) method in your collected objects. If you don't override hashCode( ), the collections will use the default implementation in the java.lang.Object class. This implementation needs the memory location of the object to compute the hash code. However, if you create a lot of objects, it is likely that they will be close to each other within the memory. The result would be a HashMap with most of the objects in the first bucket and few, if any, in the other buckets. Such an unbalanced HashMap would behave poorly; in extreme conditions, it could degrade from O(1) to O(n) efficiency. The solution to this problem is to make sure you override the hashCode( ) method to give your data an even distribution inside the buckets of the hash-based collection. Instead of calculating based on location, you should calculate based on the data in the object. However, don't forget to override equals( ) if you override hashCode( ).

Screenshot

The Jakarta Commons Lang library contains utilities that make creating high-quality hash codes easy. I highly recommend you check it out at http://jakarta.apache.org/.


Lack of Type Safety

One of the most persistent problems with the Java collection classes is the lack of type safety. Since the collection classes store Object instances, a user can insert anything into a collection. This could easily lead to multiple ClassCastException instances being thrown. It also introduces problems with quality data validation, which we will discuss in . In JDK 1.5, Sun plans to introduce the concept of parameterized types, which will provide this type safety. However, the subject of parameterized types is far outside the scope of this chapter; we will cover them in . However, until JDK 1.5 hits the market, you need to realize that just because you think the collection contains only one type doesn't necessarily mean there isn't a rogue object in the collection. The only solution to this quandary is vigilance and good code management through exhaustive testing and checking.

Collecting Problems

There are many things that you can do with Java collections. However, like any power tool, you can do a great deal of harm if you aren't careful. See Example 4-2.

Example 4-2. A bean with a collection hole
package oracle.hcj.review;
public class Customer extends Object implements Serializable {
 private Set purchases;
 public void setPurchases(final Set purchases) throws PropertyVetoException {
 final Set oldPurchases = this.purchases;
 vetoableChangeSupport.fireVetoableChange("purchases", oldPurchases, this.purchases);
 this.purchases = purchases;
 propertyChangeSupport.firePropertyChange("purchases", oldPurchases, this.purchases);
 }
 public Set getPurchases( ) {
 return this.purchases;
 }
}


This is almost exactly how my IDE generated my bean property. The only thing I did was add the keyword final to the parameter declaration and to the oldPurchases variable. The class looks pretty good, but it has a huge, gaping hole. The getter returns a reference to the Set when someone asks for it. The problem with returning this reference is shown in the usage of your property:

package oracle.hcj.review;
public class BeanCollections {
 public static void someFunction(final Customer customer) {
 if (customer == null) {
 throw new NullPointerException( );
 }
 Set purchs = customer.getPurchases( );
 Set names = new HashSet( ); // going to use to store customer names. names.add(new String("Jason"));
 purchs.add(new String("Fred")); // typo; he meant names, not purchs. }
}


In the above code, a String was added to a Set that isn't meant to contain String objects. After this code runs, a String object will be inside the purchases Set that is meant to contain only Purchase objects. Since adding the String to the purchases Set bypasses the setter, the internals of the Customer object were changed while all type-checking code was being bypassed! The defective Set is not detected, and the code compiles, deploys, and still doesn't break. Down the line, more code is written for your system:

public void makeCustomerReport( ) {
 Set purchases = someObject.getCustomer(12345).getPurchases( );
 for (Iterator iter = purchases.iterator( );; iter.hasNext( );) {
 Purchase purchase = (Purchase)iter.next( ); // ClassCastException reportingObject.add(purchase); }
}


Because there is a String object in a Set that is supposed to contain only Purchase objects, a ClassCastException is in this piece of code. This is one of those mysterious bugs that can baffle you for two hours and then make you want to break something when you figure it out. Worse, if this bug occurs only intermittently, you have an evil problem to deal with. However, you can prevent this headache before it even starts:

package oracle.hcj.review;
public class Customer extends Object implements Serializable {
 public void setPurchases2(final Set purchases2) throws PropertyVetoException {
 final Set newPurchases2;
 if (purchases2 != null) {
 newPurchases2 = Collections.unmodifiableSet(purchases2);
 } else {
 newPurchases2 = null;
 }
 final Set oldpurchases2 = this.getPurchases2( );
 vetoableChangeSupport.fireVetoableChange("purchases2", oldpurchases2, newPurchases2);
 this.purchases2 = new HashSet(purchases2);
 propertyChangeSupport.firePropertyChange("purchases2", oldpurchases2, getPurchases2( ));
 }
 public Set getPurchases2( ) {
 if (this.purchases2 == null) {
 return null;
 }
 return Collections.unmodifiableSet(this.purchases2);
 }
}


The new version of the Customer class can encapsulate much more efficiently. When setting the purchases property, instead of copying the reference, you actually copy the Set itself. When getting the purchases property, you give the caller an unmodifiable set. The end result is a minor performance hit but superior code from a debugging and maintenance perspective. With this technique, if the user tries to add a String object to the returned Set, he will get an UnsupportedOperationException at the line where he tries to add the String object:

 purchs.add(new String("Fred")); // <= UnsupportedOperationException


This prevents the user of the Customer class from changing the internals of the Customer object without going through the setter. More importantly, no one can bypass the property veto listeners and checking code in the Customer class. You have traded a few clock cycles for several man-hours. If this technique adopted as a general policy in your office, the savings could be measured in thousands of man-hours. One of the ugly aspects of the setPurchases2( ) method is all of the checks have to account for null sets. Every time you call the getter of the purchases property, you have to check for null. These checks can become a real hassle and make the code difficult to read. However, with a coding standard, you can avoid all of this unpleasantness. Example 4-3 shows the optimal code for your Customer class.

Example 4-3. Optimal structure for collection-based properties
package oracle.hcj.review;
public class Customer extends Object implements Serializable {
 private Set purchases3 = new HashSet( );
 public void setPurchases3(final Set purchases3) throws PropertyVetoException {
 if (purchases3 == null) {
 throw new NullPointerException( );
 }
 final Set oldPurchases3 = this.getPurchases3( );
 final Set newPurchases3 = Collections.unmodifiableSet(purchases3);
 vetoableChangeSupport.fireVetoableChange("purchases3", oldPurchases3,
 newPurchases3);
 this.purchases3 = new HashSet(purchases3);
 propertyChangeSupport.firePropertyChange("purchases3", oldPurchases3, getPurchases3( ));
 }
 public Set getPurchases3( ) {
 return Collections.unmodifiableSet(this.purchases3);
 }
}


In the optimal version of your Customer class, the property purchases3 can never be null. You have implemented a coding standard in which the property will always be a Set. It can be an empty Set if there are no purchases for a particular customer, but it will always be an initialized object. This makes your life a lot easier and your code much cleaner. To illustrate, see Example 4-4, which operates on a data model that allows nulls in Sets.

Example 4-4. Dealing with collection properties that can be null
package oracle.hcj.review;
public static void makeGroupReport(final Set customers) {
 if (customers == null) {
 throw new NullPointerException( );
 }
 Iterator purchaseIter = null;
 Iterator customerIter = null;
 Set purchases = null;
 Customer customer = null;
 Purchase purch = null;
 customerIter = customers.iterator( );
 while (customerIter.hasNext( )) {
 customer = (Customer)customerIter.next( );
 System.out.println("Purchases for " + customer.getName( ));
 purchases = customer.getPurchases3( );
 if (purchases != null) {
 purchaseIter = purchases.iterator( );
 while (purchaseIter.hasNext( )) {
 purch = (Purchase)purchaseIter.next( );
 System.out.println(purch.getItemName( ) + "\t" + purch.getPrice( ));
 }
 }
 System.out.print("Total Purchases = ");
 if (purchases != null) {
 System.out.println(purchases.size( ));
 } else {
 System.out.println(0);
 }
 System.out.println( );
 }
}


The emphasized lines show the number of checks for null you have to make. Let's compare this to the code in Example 4-5, which does the same job with a data model that does not allow sets to be null.

Example 4-5. Dealing with collection properties that cannot be null
package oracle.hcj.review;
public static void makeGroupReportBetter(final Set customers) {
 if (customers == null) {
 throw new NullPointerException( );
 }
 Iterator purchaseIter = null;
 Iterator customerIter = null;
 Set purchases = null;
 Customer customer = null;
 Purchase purch = null;
 customerIter = customers.iterator( );
 while (customerIter.hasNext( )) {
 customer = (Customer)customerIter.next( );
 System.out.println("Purchases for " + customer.getName( ));
 purchases = customer.getPurchases3( );
 purchaseIter = purchases.iterator( );
 while (purchaseIter.hasNext( )) {
 purch = (Purchase)purchaseIter.next( );
 System.out.println(purch.getItemName( ) + "\t" + purch.getPrice( ));
 }
 System.out.println("Total Purchases = " + purchases.size( ));
 System.out.println( );
 }
}


This code is much cleaner and shorter. Since you never have to test for null, you can simply grab an Iterator directly. If a customer has no purchases, the inner while loop will merely exit when the call to purchaseIter.hasNext( ) returns false. Example 4-5 is far superior to Example 4-4 in terms of code maintenance, readability, and speed. The benefits of non-null sets also extend to other data structures. Collections, maps, and arrays should never be null, but they should be empty if they don't have data.

      
Comments