CMP Versus BMP

The EJB container handles persistence for entities with CMP, requiring the developer only to implement any logic and define the bean properties to be persisted. In EJB 2.0, the container can also manage relationships and finders (specified in a special query language - EJB QL - used in the deployment descriptor). The developer is required only to write abstract methods defining persistent properties and relationships, and provide the necessary information in the deployment descriptor to allow the container to generate the implementing code. The developer doesn't need to write any code specific to the data store using APIs such as JDBC. On the negative side, the developer usually can't control the persistence code generated. The container may generate less efficient SQL queries than the developer would write (although some containers allow generated SQL queries to be tuned).


The following discussion refers to a relational database as an example. However, the points made about how data must be loaded apply to all types of persistence store.

In the case of entities with BMP, the developer is completely responsible for handling persistence, usually by implementing the ejbLoad() and ejbStore() callback methods to load state and write state to persistent storage. The developer must also implement all finder methods to return a Collection of primary key objects for the matching entities, as well as ejbCreate() and ejbRemove() methods. This is a lot more work, but gives the developer greater control over how the persistence is managed. As no container can offer CMP implementations for all conceivable data sources, BMP may be the only choice for entity beans when there are unusual persistence requirements. The CMP versus BMP issue is another quasi-religious debate in the J2EE community. Many developers believe that BMP will prove more performant than CMP, because of the greater control it promises. However, the opposite is usually true in practice. The BMP entity bean lifecycle - in which data must either be loaded in the ejbLoad() method and updated in the ejbStore() method, or loaded in individual property getters and updated in individual property setters - makes it very difficult to generate SQL statements that efficiently meet the app's data usage patterns. For example, if we want to implement lazy loading, or want to retrieve and update a subset of the bean's persistent fields as a group to reflect usage patterns, we'll need to put in a lot of effort. An EJB container's CMP implementation, on the other hand, can easily generate the code necessary to support such optimizations (WebLogic, for example, supports both). It is much easier to write efficient SQL when implementing a DAO used by a session bean or ordinary Java object than when implementing BMP entity beans. The "control" promised by BMP is completely illusory in one crucial area. The developer can choose how to extract and write data from the persistent store, but not when to do so. The result is a very serious performance problem: the n+1 query finder problem. This problem arises because the contract for BMP entities requires developers to implement finders to return entity bean primary keys, not entities. Consider the following example, based on a real case from a leading UK web site. A User entity ran against a table like this, which contained three million users:










This entity was used both when users accessed their accounts (when one entity was loaded at a time) and by workers on the site's helpdesk. Helpdesk users frequently needed to access multiple user accounts (for example, when looking up forgotten passwords). Occasionally, they needed to perform queries that resulted in very large resultsets. For example, querying all users with certain post codes, such as North London's Nl, returned thousands of entities, which caused BMP finder methods to time out. Let's look at why this occurred. The finder method implemented by the developer of the User entity returned 5,000 primary keys from the following perfectly reasonably SQL query:


Even though there was no index on the POSTCODE column, because such searches didn't happen frequently enough to justify it, this didn't take too long to run in the Oracle database. The catch was in what happened next. The EJB container created or reused 5,000 User entities, populating them with data from 5,000 separate queries based on each primary key:

 SELECT PK, NAME, <other required columns> FROM USERS WHERE PK = <first match>
 SELECT PK, NAME, <other required columns> FROM USERS WHERE PK = <5000 N match>

This meant a total of n+1 SELECT statements, where n is the number of entities returned by a finder. In this (admittedly extreme) case, n is 5,000. Long before this part of the site reached production, the development team realized that BMP entity beans wouldn't solve this problem. Clearly this is appallingly inefficient SQL, and being forced to use it demonstrates the limits of the "control" BMP actually gives us. Any decent CMP implementation, on the other hand, will offer the option of preloading the rows, using a single, efficient query such as:

 SELECT PK, NAME, <other required columns> FROM USERS WHERE POSTCODE LIKE 'Nl%'

This is still overkill if we only want the first few rows, but it will run far quicker than the BMP example. In WebLogic's CMP implementation, for example, preloading happens by default and this finder will execute in a reasonable time.


Although CMP performance will be much better with large resultsets, entity beans are usually a poor choice in such situations, because of the high overhead of creating and populating this number of entity beans.

There is no satisfactory solution to the n + 1 finder problem in BMP entities. Using coarse-grained entities doesn't avoid it, as there won't necessarily be fewer instances of a coarse-grained entity than a fine-grained entity. The coarse-grained entity is just used as a gateway to associated objects that would otherwise be modeled as entities in their own right. This app used fine-grained entities related to the User entity, such as Address and SavedSearch, but making the User entity coarse-grained wouldn't have produced any improvement in this situation. The so-called "Fat Key" pattern has been proposed to evade the problem. This works by holding the entire bean's data in the primary key object. This allows finders to perform a normal SELECT, which populates the "fat" objects with all entity data, while the bean implementation's ejbLoad() method simply obtains data from the "fat" key. This strategy does work, and doesn't violate the entity bean contract, but is basically a hack. There's something wrong with any technology that requires such a devious approach to deliver adequate performance. See for a discussion of the "Fat Key" pattern.


Why does the BMP contract force the finders to return primary keys and not entities when it leads to this problem? The specification requires this to allow containers to implement entity bean caches. The container can choose to look in its cache to see if it already has an up-to-date instance of the entity bean with the given primary key before loading all the data from the persistent store. We'll discuss caching later. However, permitting the container to perform caching is no consolation in the large result set situation we iw just described. Caching entities for all users for a populous London postcode following such a search would simply waste server resources, as hardly any of these entities would be accessed before they were evicted from the cache.

One of the few valid arguments in favor of using BMP is that BMP entities are more portable than CMP entities; there is less reliance on the container, so behavior and performance can be expected to be similar across different app servers. This is a consideration in rare apps that are required to run on multiple servers. BMP entities are usually much less maintainable than CMP entities. While it's possible to write efficient and maintainable data-access code using JDBC in a helper class used by a session bean, the rigidity of the BMP contract is likely to make data-access code less maintainable. There are few valid reasons to use BMP with a relational database. If BMP entity beans have any legitimate use, it's to work with legacy data stores. Using BMP against a relational database makes it impossible to use the batch functionality that relational databases are designed for.


Don't use entity beans with BMP. Use persistence from stateless session beans instead. This is discussed in the next chapter. Using BMP entity beans adds little value and much complexity, compared with performing data access in a layer of DAO.