Saturday, October 25, 2008

If a Test Case Falls in the Woods...

I've long been a fan of the Spring Framework and the base test classes they provide. A few years ago, a fellow consultant on a project introduced me to the concept of rollback testing.

Rollback testing implies that a transaction begins and ends with your test method. At the end of your test method, the transaction rolls back and nothing is persisted to the database. This certainly eliminates the problem of cleaning up state that is left hanging around at the end of an integration test. You simply have to do...nothing!

Spring also provides some base classes that you may extend to get this functionality with no additional work on your end. Their classes AbstractTestNGSpringContextTests and AbstractJUnit4SpringContextTests provide a convenient superclass that your classes can extend to benefit from this feature.

I'm a user of TestNG, so I used that flavor of Spring abstract test class as my starting point. From there, it is very simple to add a test that will validate that the create method works the way it is supposed to. I am using JPA with Hibernate as my JPA provider.

@ContextConfiguration(locations = {"classpath*:META-INF/spring.xml"})
public class TestDomain 
       extends AbstractTransactionalTestNGSpringContextTests {

    @Autowired(required = true)
    private DomainDao domainDao;

    @Test
    public void testCreate() {
        final Integer pk;
        {
            Domain domain = new Domain("md", "mccoy");
            pk = domainDao.insert(domain);
            Assert.assertNotNull(pk);
        }

        {
            final Domain domain = domainDao.select(pk);
            Assert.assertEquals(domain.getDomainId(), pk);
            Assert.assertEquals(domain.getDomainName(), "mccoy");
            Assert.assertEquals(domain.getTld(), "md");
            Assert.assertEquals(
                domain.getStatus(), Domain.Status.CANDIDATE);
        }
    }
}

You can imagine how pleased I was to see my integration test passed the first time I ran it. Damn, I'm good.

I've had the pleasure of working with a friend who has never met an ORM he trusts, and he has taught me many times to look at the log files and examine the SQL that is being executed. He does this to understand better how the ORM is approaching a problem. When I looked at the logs, I was surprised to see a lot of logging, but no SQL statements. I checked to make sure I had told Hibernate to log SQL and I had. In fact, I saw SQL statements in some other places, but none associated with this particular unit test.

Timber!

So why was there no SQL executed against my database. My DAO classes have a transactional boundary, so any call to them (like insert and select in my example), will create a transaction and close the transaction. When a transaction closes, it is a signal to Hibernate to flush the SQL statements and commit the transaction.

The reason I see no SQL is because my transaction begins and ends in relation to the test case, not because of my DAO. The transactional context surrounding my DAO calls recognize that a transaction is already in progress and simply participate in the existing transaction. When using Spring's base test classes, all transaction will rollback at the end of the test case. This means, unless something happened that forced Hibernate to flush SQL, there will be no SQL written to the database.

Is This A Problem?

Some people might argue whether this is a problem or not. After all, you are testing your integration code, not Hibernate's ability to persist your objects. In the past, I have bought that argument, however by closely watching my log files it is easy to see that some of my code can pass muster even though it is woefully wrong.

This rule sounds a bit obvious:

When changing the typical transactional boundaries of your application to surround the test case and not the service tier or DAO tier you may not be testing database impact on your application.

Even though the rule is obvious, the impact may be hard to tell when you are writing code. For example, in my testCreate() method above, this test will pass even if a annotate my domain class with a non-existent database table. I can specify @Table(name = "he_hate_me") and my test will pass. I can also supply the wrong schema  to this annotation or in my persistence.xml config and the test still will pass.

As you might expect, if you have a table with a unique index (other than the primary key) and you have a unit test that performs multiple inserts attempting to trip the constraint exception; these tests will also pass.

Workarounds

For the reasons I have shown, I would consider blind use of rollback testing to be a bad thing. Most JPA providers will function the same way as Hibernate and only flush statements when Hibernate feels it is necessary.

Of course, Hibernate can be explicitly told to flush, and there isn't a real downside to doing this in an integration test. But where and when should you flush? I find that a call to flush is warranted any time you make a call to a service method or DAO where a transactional context would normally be started if one wasn't already in progress.

Here is the previous example modified to show the call to flush().

@ContextConfiguration(locations = {"classpath*:META-INF/spring.xml"})
public class TestDomain 
       extends AbstractTransactionalTestNGSpringContextTests {

    @Autowired(required = true)
    private DomainDao domainDao;

    @Test
    public void testCreate() {
        final Integer pk;
        {
            Domain domain = new Domain("md", "mccoy");
            pk = domainDao.insert(domain);
            flush();
            Assert.assertNotNull(pk);
        }

        {
            final Domain domain = domainDao.select(pk);
            flush();
            Assert.assertEquals(domain.getDomainId(), pk);
            Assert.assertEquals(domain.getDomainName(), "mccoy");
            Assert.assertEquals(domain.getTld(), "md");
            Assert.assertEquals(
                domain.getStatus(), Domain.Status.CANDIDATE);
        }
    }
}

Flush Your Problems Away

Both Hibernate's Session and JPA's EntityManager interfaces have a flush() method. It can be a bit tricky to obtain the EntityManager interface in order to call flush in it, so I use a base class in my integration tests that includes a flush() method. The implementation looks like this:

@ContextConfiguration(locations = {"classpath*:META-INF/spring.xml"})
public class BaseTest 
       extends AbstractTransactionalTestNGSpringContextTests {

    protected EntityManager sharedEntityManager;

    protected void flush() {
        sharedEntityManager.flush();
    }

    @Autowired(required = true)
    public void setEntityManagerFactory(EntityManagerFactory emf) {
        sharedEntityManager = SharedEntityManagerCreator.
            createSharedEntityManager(emf);
    }
}

Some knowledgable Spring users will probably point out there is already an AbtractJpaTests class that exposes a sharedEntityManager. They would be right, and it takes care of load-time weaving cases that can make integration testing of JPA difficult to work with. Unfortunately for me, the good Spring folks tied this abstract class to JUnit (no TestNG version) and I'm using Hibernate which doesn't require load-time weaving.

Friday, October 24, 2008

Where Do You Generate Primary Keys?

Every once and a while I like to throw out any prior conceptions and biases I have amassed over the years and start with a fresh perspective. After all, how do you know what you take for granted on a regular basis is hard earned knowledge or simply just a case of "That's the way I've always done it."

A colleague and I have long debated the best approach to assigning primary keys on domain objects. He takes the stance that primary keys should be assigned prior to persistence (at instantiation), and mostly to be contrary, I take the opposite stance. Let the database assign the primary key, especially if it supports an IDENTITY or AUTOINCREMENT column type. To be truthful, I can find little fault in his approach. In fact, assigning a UUID as a primary key helps:

  1. When you are adding transient entities to a collection prior to persistence.
  2. To simplify equals and hash code implementations which can be quite a problem at times.
  3. Keys are unique across all rows in the database which make it much easier for the DBA to perform a merge of tables as schemas evolve.

Perhaps I rely on the database mostly out of paranoia. His approach usually results in CHAR primary keys which have been shown to be less performant than INT/LONG keys. It is possible to reduce a UUID to a INT/LONG if you are willing to slightly increase your chances of a primary key collision.

The Hibernate guys, Gavin King and Christian Bauer have advocated the use of database-driven primary keys, and those guys seem to have a lot of practical experience and study in the field. Because of this, I decided to use an AUTOINCREMENT INT field for my primary keys in MySQL.

Test Driven Development

On my newest project, I threw out my personal biases and began with some unit tests. A typical unit test I write looks like this:

@Test
public void testCreate() {
 User user =
     new User("fred flintstone", "fred@bedrock.com");
 final Integer pk = userDao.insert(user);
 Assert.assertNotNull(pk);
}

It's fairly common for me to want the primary key of my newly inserted object. This can cause an issue for those of us who choose to let the database assign the primary key. At the time the insert method is called, the user's id value is null.

No problem, we just push responsibility for getting the primary key to the DAO tier.

In the DAO

I'm using JPA in my DAO tier. The JPA implementation of my DAO's insert method looks like this:

public void insert(final User user) {
 getEntityManager().persist(user);

/**
* Make an entity instance managed and persistent.
*/
public void persist(Object entity);

So here we can see that the persist() method, as mandated by the JPA specification, does not return a value; it is declared void. After issuing this call, our user object still has a value of null.

So what options do we have at this point? I mean, besides ditching JPA and going straight to JDBC which doesn't suffer from a handicapped abstraction layer?

One approach might be to make a call to retrieve the recently inserted object based on some unique properties of the object. In my particular case this is plausible, since the email address is a unique constraint in my database. Ironically however, if your object doesn't have any uniquely distinguishing set of properties, you might find yourself adding a dummy UUID to your class just to use as a unique identifier.

Re-fetch the Object

Even the benign sounding solution of fetching the recently inserted object from the database is not so clear cut. Even if there are unique properties to load the object, the object may not even be inserted in the database yet. I'm using Hibernate as my ORM, but I believe this issue would apply to most approaches that use a session cache as Hibernate does.

Often during a transaction, dozens of objects can be persisted or updated. By default, Hibernate defers the flush of SQL until the transaction ends, or until some other event forces a premature-flush. What this means is when calling getEntityManager().persist(user), no SQL is typically made to the database immediately.

The ORM has to be very smart about when it actually writes to the database, because when I follow the insert up with a getEntityManager.getUserByEmail(user.email), the ORM has to know that it is now time to execute the insert SQL call. After all, it's the only way the database is going to generate the primary key for the object.

Where to Fetch

So now the question becomes where does the fetch of the object belong. Do you fetch the object in the DAO in order to return the primary key, or do you force the client to fetch the object based on something other than the primary key?

// In UserDaoImpl
public Integer insert(final User user) {
 getEntityManager().persist(user);
 final User fetchUser =
         selectByEmail(user.getEmail());
 return fetchUser.getUserId();
}

// In Client
final User user =
 new User("fred flintstone", "fred@bedrock.com");
userDao.insert(user);
final User fetchUser =
 userDao.selectByEmail(user.getEmail());
Assert.assertNotNull(fetchUser);

Conclusion

In the end, the choice will obviously be yours. I'm interested to hear from others about your approach to this recurring problem. As for my colleague, his wisdom on this matter is proven by his lack of concern regarding any of these problems. He simply doesn't have to deal with it. I've chosen to heed to his example.

Friday, October 17, 2008

Who Needs Workflow?

Surprisingly enough, maybe you?

Workflow is one of those terms with which you have probably played buzzword bingo. If not the over-generalized category name, perhaps you have a bingo card covered in the acronyms that proliferate in this space, like the BP brothers -- BPM, BPEL, BPMN and BPDM. And, don't forget their webified cousins, WS-BPEL, WS-CDL and WSFL. Uh, uhh, BINGO!

For as many players in this space, it is surprising how few developers have ever laid their hands upon a workflow engine. Many companies tend to shy away from these offerings because they are part of a larger SOA initiative which takes a great amount of time and resources to roll out. There are also very few quality open source players in this area.

I've been meaning to dig in to see what makes a workflow engine tick, and a recent project afforded me the chance. It has a small workflow with twenty odd steps that has a few manual stages, credit card processing and email events. I could of written the process manually, but I wouldn't of learned anything new.

So now I can share with you some of my stumbles along the way and how workflow can function in a real world application. The first step is to choose your foil.

Keep in mind, I am not very knowledgeable of the players in the workflow arena. As I started this journey, I gained hands-on experience with only a few. Also note that I am developing personal software and price was a factor. :)

The Players

There are a boatload of players in the workflow market, and many have come and gone over the years. When I look at some of the players in this field, I sometimes get the feeling that programmers created a workflow for their in-house project and then open sourced it on the world. For many engines, there are probably only one or two applications that use it.

There are dozens of sites that contain a long list of commercial and open source workflow engines written in Java. Notably, OpenWFE was one of the more popular Java workflow engines and is now being developed as a Ruby application.

Open Source Workflow Engines Written in Java

www.manageability.org

Commercial Workflow BPM Vendors - Part I

Commercial Workflow/BPM Vendors - Part II

www.waria.com

While reviewing some of the different open source products, it dawned on me the world of the big standards belongs to the big commercial vendors as evidenced by the following chart from Gartner. That's fine with me because my needs are much simpler.

Gartner Magic Quadrant - BPMS

If anyone knows, I would be interested to see if any of these products are free or open source and if they are embeddable. Most that I inspected were commercial and standalone or integrated into an SOA or J2EE application server.

My Needs

I am modeling an order workflow, but I will probably create additional processes once I have handled my first priority. I need a workflow engine that I can embed in my application. I do not wish to run the workflow engine as a standalone server who's sole purpose in life is to process workflow lifecycles. These products have their place when the scale of an enterprise demands such needs, however my requirements are much more modest. It may be important for the engine I pick to have this capability or at least be able to cluster if scale becomes important.

I do want to be able to kick off tasks (execution processes in workflow-speak) and have the workflow engine shepherd the task along its path. At certain points along the workflow process I need the engine to delegate control to custom tasks that I have written in Java or Groovy. These tasks are very short and focus on a small part of the overall process like authorizing a credit card or checking the availability of a product.

I know there will be some situations when the workflow will have to pause for human intervention as might happen if a credit card was authorized successfully, but failed the capture stage. Or perhaps a human will manually review orders over a certain dollar amount to ensure the validity of the order. When these cases occur, the workflow is essentially paused while the user reacts to each situation on a case by case matter. This means that I will need to somehow track when these issues arise and bring them to the attention of the appropriate user. Once the employee handles the issue, control will return to the workflow engine to proceed based on the outcome of the manual intervention.

I also want an audit log that helps me visualize the path taken by the customer's order. This is helpful when dealing with customer service problems. I usually use the Spring Framework, and although I didn't need Spring to wrapper the engine for me, it would be nice if my java extensions to the workflow process could be configured within Spring.

My needs are simple. I don't need a graphical tool for manipulating the workflow, but neither will I turn one down if it is offered. I don't need identity support which allows the workflow engine to route tasks to users based on their roles in the system. I have one employee and that person is me.

Also note that I don't need any of the acronyms mentioned at the beginning of this post. I need a workflow engine, and I don't particularly care if is supports BPEL, BPMN or OSSAD. I don't have anything against standards, but they are still in a state of flux after nearly 10 years of formulation. Most workflow engines have a low level model similar to a finite state machine, and the business language du jour is just a layer above this. Many platforms that are designed this way will support a variety of business process modeling languages existing today. BPEL seems to be the current de facto standard in use amongst most of the market leaders right now.

The Search

Like most Java developers, I start my search for a library by searching the web for current articles and who the players are in the open source area. Unlike technologies like ORM where one or two libraries stand out from the rest, the open source Java workflow market is huge, which makes research difficult. Contrast this with the .NET market where there is one player. I wonder who?

The closest thing I found to an objective comparison of workflow engine's capabilities was this report on the Workflow Patterns website. The report confirmed that no one dominates this space, but it at least highlights three decent players who are JBoss jBPM, OpenWFE, and Enhydra Shark. I have not yet heard of Shark, but I have great respect for Enhydra from my very early days in Java when they competed with JBoss in the application server space.

These recent articles on The Server Side featured a new Finite State Machine from the Wazee Group called Physhun. It is completely configured using Spring, so it seemed like a potential fit.

Enhydra Shark had no documentation available on its website. The latest news said that version 2 was recently released, but checking in the logs it shows that happened in May 2007. As mentioned earlier, OpenWFE is now OpenWFEru where the "ru" stands for Ruby. I didn't look at it, but I would bet that the process definition language used can benefit from Ruby's DSL-like language capabilities. Physhun is very nice, and documented sparsely, but includes several good examples that are easy to follow along to. I find Physhun to work really well for what it is intended; Finite State Machines. It doesn't have the layers to handle tasks requiring human intervention. It seems like a decent base to build these capabilities on top of if you were so inclined. In the end, a FSM is not the same thing as a workflow engine.

JBoss jBPM

Like most things developed by JBoss, their workflow engine is meant to complement their application server, but can be used independently as well. jBPM version 3 has been out for three years and based on the forum posts, seems to have a decent user base. Support by the actual developers has waned over the past two years for whatever reasons, so this is a bit of a concern. Ronald van Kuijk is a member of the community and seems to be a one man army when it comes to helping people with jBPM. jBPM 4 is being developed, so perhaps the product will continue to be improved, and maybe that's what the core developers are spending their time on. jBPM has some relationship to the commercial JBoss product offering by Red Hat, so I would assume it is an important component to their product suite and is not going away anytime soon.

To help with my needs, jBPM provides me with a single jar library that is easily embeddable in my application. Although jBPM supports the BPEL standard, my needs are sufficiently handled by jPDL; an XML-based graph oriented programming model. (That's JBoss-speak for an XML configuration.)

You may read somewhere that jBPM does not require a database, however in order to do anything useful such as supporting tasks and being able to recover its state across server restarts, a database is required. Somewhat unfortunately, it uses Hibernate to persist its internal model. Not a deal breaker, but those people who weren't already using Hibernate will be now, and those using it already may have to take steps to get two different Hibernate session factories to play nice in the same application scope. This could involve JTA transaction management if you cannot share session factories. It would be nice if this could be extracted out of the engine in the future with a Hibernate implementation being the default.

On the plus side, jBPM is a workflow engine supported by the spring modules project. This project provides an object factory integration such that jBPM actions can be Spring Beans. Also, the creation of a Hibernate SessionFactory and sharing it with jBPM is made substantially easier.

What's Next

In the next installment, we will get into the code and some more concepts behind the workflow engine.