O/R Mapping and the " Impedance Mismatch " - Design and Development - J2EE - Java - Languages - Programming

O/R Mapping and the "Impedance Mismatch"

The fact that most J2EE apps access data from relational databases has both positive and negative implications for J2EE architects and developers. On the positive side, RDBMS have over twenty years of experience behind them, and the best are proven to work very well. On the negative side, mapping between an object model and an RDBMS schema is difficult. Much effort has been put into Object-Relational (O/R) mapping in Java and other OO languages, with mixed results.

Important

O/R mapping is the attempt to map the state of Java objects onto data in an RDBMS, providing transparent persistence.

The relational database and object-oriented models of the world differ markedly. Relational databases are based on mathematical concepts for storing and retrieving data. The goal of relational database design is to normalize data (eliminate data redundancy). The goal of OO design is to model a business process by breaking it into objects with identity, state, and behavior. Relational databases do not support object concepts such as classes, inheritance, encapsulation or polymorphism. A modern RDBMS is not merely a bucket of data, but can also hold rules guaranteeing data integrity and operations acting on data. However, this does not amount to the OO inclusion of behavior as part of object definition. The challenges these different models pose for O/R mapping are often collectively termed the Object-Relational impedance mismatch. Some of the key problems are:

How do we convert between column values in a SQL query result and Java objects?
How do we efficiently issue SQL updates when the state of mapped Java objects changes?
How do we model object relationships?
How do we model inheritance in Java objects mapped to the database?
How do we model Java objects whose data spans multiple tables in the RDBMS?
What caching strategies should we use in the object layer to try to reduce the number of calls to the RDBMS?
How do we perform aggregate functions?

Few solutions meet all – or even most – of these challenges. O/R mapping solutions typically map each object onto a single row of data, usually in one table, but occasionally resulting from a join. (It may be possible to use a view to simplify the mapping if the RDBMS supports updateable views. Typically, O/R mapping solutions allow this mapping to be done without custom coding, hiding the low-level data access from the programmer. The mapping is normally held in metadata outside the mapped classes. O/R mapping works very well in some situations, but is probably oversold. The assumption that O/R mapping is the solution for all J2EE apps that access relational databases goes largely unchallenged. I believe that this assumption is questionable. O/R mapping has drawbacks as well as advantages, which mean that we should think carefully before using it. The central value propositions of O/R mapping are that it removes the need for developers to write low-level data access code (which can deliver large productivity gains in some apps); ensures that app code deals exclusively with objects; and can lead to the creation of a domain object model that can support multiple use cases. However, there is a risk that O/R mapping doesn't so much reduce total complexity as move it elsewhere. The result may be complex deployment descriptors, such as those necessary for entity bean CMP, and the price for transparent data access is reduced control over that access. Efficiency is also questionable. O/R mapping solutions typically assume that RDBMSs are intended to operate on individual rows and columns. This is a fallacy: RDBMSs operate best on sets of tuples. For example, we can update many rows in a single SQL operation much faster than each row individually. O/R mapping solutions deliver excellent performance if it's feasible to cache data in the object layer; if this is impossible or when aggregate updates are required, O/R mapping usually adds significant overhead. Really sophisticated O/R mapping solutions allow us to enjoy O/R mapping benefits without some of these drawbacks.

Important

Don't assume that O/R mapping is the best solution to all data access problems. It works very well in some situations; but sometimes adds little value.

The following are indications that an O/R mapping solution is not fulfilling a useful role:

In the case of object-driven modeling, it results in an unnatural RDBMS schema, which limits performance and is useless to other processes. Indications of an unnatural RDBMS schema include the need for complex joins in common data retrieval operations; inability of the RDBMS to enforce referential integrity; and the need to issue many individual updates where a better schema could have permitted efficient use of an aggregate operation.
In the case of data-driven modeling, it produces a layer of objects with a one-to-one relationship to the tables in the RDBMS. Unless the tables were produced from the object model, these are probably not true objects, and working with them is likely to prove unnatural and inefficient. Should the schema ever change, all the code that works with those objects will also need to change?
It results in inefficient queries or updates. (It's a good idea to examine the queries running in the database as a result of using any O/R mapping layer.)
Some tasks that could be performed easily and efficiently inside the database using relational operations may require substantial Java coding to accomplish in the J2EE server, or may lead to the unnecessary creation of many Java objects.

In such cases, there are legitimate alternatives to O/R mapping, as we'll see.

Note

0/R mapping solutions are often a good choice in OLTP (On-Line Transaction Processing) systems, in which users typically perform operations on a small dataset, and which are often based on simple queries. However, they are seldom a good choice where there are OLAP (On-Line Analytic Processing) or data warehousing requirements. OLAP involves the manipulation of very large data sets and the execution of complex queries. These are best handled using relational operations.

Previous Next

Comments