Friday, October 24, 2008

Where Do You Generate Primary Keys?

Every once and a while I like to throw out any prior conceptions and biases I have amassed over the years and start with a fresh perspective. After all, how do you know what you take for granted on a regular basis is hard earned knowledge or simply just a case of "That's the way I've always done it."

A colleague and I have long debated the best approach to assigning primary keys on domain objects. He takes the stance that primary keys should be assigned prior to persistence (at instantiation), and mostly to be contrary, I take the opposite stance. Let the database assign the primary key, especially if it supports an IDENTITY or AUTOINCREMENT column type. To be truthful, I can find little fault in his approach. In fact, assigning a UUID as a primary key helps:

  1. When you are adding transient entities to a collection prior to persistence.
  2. To simplify equals and hash code implementations which can be quite a problem at times.
  3. Keys are unique across all rows in the database which make it much easier for the DBA to perform a merge of tables as schemas evolve.

Perhaps I rely on the database mostly out of paranoia. His approach usually results in CHAR primary keys which have been shown to be less performant than INT/LONG keys. It is possible to reduce a UUID to a INT/LONG if you are willing to slightly increase your chances of a primary key collision.

The Hibernate guys, Gavin King and Christian Bauer have advocated the use of database-driven primary keys, and those guys seem to have a lot of practical experience and study in the field. Because of this, I decided to use an AUTOINCREMENT INT field for my primary keys in MySQL.

Test Driven Development

On my newest project, I threw out my personal biases and began with some unit tests. A typical unit test I write looks like this:

@Test
public void testCreate() {
 User user =
     new User("fred flintstone", "fred@bedrock.com");
 final Integer pk = userDao.insert(user);
 Assert.assertNotNull(pk);
}

It's fairly common for me to want the primary key of my newly inserted object. This can cause an issue for those of us who choose to let the database assign the primary key. At the time the insert method is called, the user's id value is null.

No problem, we just push responsibility for getting the primary key to the DAO tier.

In the DAO

I'm using JPA in my DAO tier. The JPA implementation of my DAO's insert method looks like this:

public void insert(final User user) {
 getEntityManager().persist(user);

/**
* Make an entity instance managed and persistent.
*/
public void persist(Object entity);

So here we can see that the persist() method, as mandated by the JPA specification, does not return a value; it is declared void. After issuing this call, our user object still has a value of null.

So what options do we have at this point? I mean, besides ditching JPA and going straight to JDBC which doesn't suffer from a handicapped abstraction layer?

One approach might be to make a call to retrieve the recently inserted object based on some unique properties of the object. In my particular case this is plausible, since the email address is a unique constraint in my database. Ironically however, if your object doesn't have any uniquely distinguishing set of properties, you might find yourself adding a dummy UUID to your class just to use as a unique identifier.

Re-fetch the Object

Even the benign sounding solution of fetching the recently inserted object from the database is not so clear cut. Even if there are unique properties to load the object, the object may not even be inserted in the database yet. I'm using Hibernate as my ORM, but I believe this issue would apply to most approaches that use a session cache as Hibernate does.

Often during a transaction, dozens of objects can be persisted or updated. By default, Hibernate defers the flush of SQL until the transaction ends, or until some other event forces a premature-flush. What this means is when calling getEntityManager().persist(user), no SQL is typically made to the database immediately.

The ORM has to be very smart about when it actually writes to the database, because when I follow the insert up with a getEntityManager.getUserByEmail(user.email), the ORM has to know that it is now time to execute the insert SQL call. After all, it's the only way the database is going to generate the primary key for the object.

Where to Fetch

So now the question becomes where does the fetch of the object belong. Do you fetch the object in the DAO in order to return the primary key, or do you force the client to fetch the object based on something other than the primary key?

// In UserDaoImpl
public Integer insert(final User user) {
 getEntityManager().persist(user);
 final User fetchUser =
         selectByEmail(user.getEmail());
 return fetchUser.getUserId();
}

// In Client
final User user =
 new User("fred flintstone", "fred@bedrock.com");
userDao.insert(user);
final User fetchUser =
 userDao.selectByEmail(user.getEmail());
Assert.assertNotNull(fetchUser);

Conclusion

In the end, the choice will obviously be yours. I'm interested to hear from others about your approach to this recurring problem. As for my colleague, his wisdom on this matter is proven by his lack of concern regarding any of these problems. He simply doesn't have to deal with it. I've chosen to heed to his example.

1 comment:

  1. Typically, I leave primary key generate up to the database - regardless of whether I'm using a GUID/UUID or an IDENTITY column. As far as I'm concerned, the only time to generate your PK before getting to the database is when you're doing some heavy duty distributed data storage and you're sending records to different servers based on some magic involving a checksum of the PK.

    ReplyDelete