Friday, February 18, 2011

EJB transactions: going deeper

Disclaimer

While I still endorse the majority of what is said in this article, please consider that this article is several years old (and written against the EJB 3.0 specification), and thus written by a several years younger me with several years less working experience. This article is going to be replaced sooner rather than later with an up to date JavaEE7 version, built around a code example that is slightly more realistic. Until then, I've cleaned up a few falsehoods from this older version which were either partly wrong because of additional tooling that I neglected to mention, or were basically my personal opinion rather than fact.


Introduction

EJB technology is something received with mixed feelings. Some people love it (like me), other people can't seem to find the benefit of it. Yet others are frustrated by the fact that they can't seem to get the hang of it.

The trap with EJBs, especially since version 3.0 of the specification, is that you are given a false sense of security. Read any article on when EJB 3.0 was just released. Pick up any book. You'll get the same message: EJB 3.X is sooooooo easy! Really, you don't have to do anything basically, just slap some annotations on there and you're good to go!

In other words: you are meant to believe that the technology will do the work for you.

Of course you know better and you'll just instantly dismiss such claims as you rightfully should. The technology most certainly does not do the work for you. But, if you know how to use it, it CAN help you to make your job a whole lot easier. But you are still the captain of the ship and you cannot and should not let go of that wheel until you're safe in the harbor.

In this article I would like to go a little bit beyond the average article you can find on the net and in stead incorporate a few clues you'll only find in the obscure forums after you have gotten yet another vague error message from your application server. I want to talk about EJB transaction management, the core of EJB technology. But not only about how it works, but also how you really apply it.

Note that in this article I'll deal only with container managed transactions. For completeness I'll discuss the material from start to finish, but this is not a tutorial on how to write EJBs or how to use JPA. I expect you to know at least the foundations of both.


What is a managed transaction?

Managed transactions. First off lets get something out of the way: an EJB (or to be more precise: container) managed transaction is not a database transaction. Part of an EJB transaction might be a database transaction (or multiple database transactions!), but it goes far beyond the datasource: an EJB transaction models, or is supposed to model, an actual transaction that can take place in the real world. After all when you build an enterprise system, you are trying to solve a real world problem.

Take a money transaction. That is not only changing some numbers around in one or more databases. There is also administration, notification, confirmation and validation going on.

What if the money transaction fails? Then we enter a failure path, which will among other things include restoring the system to its original unbroken state and more notification (a call to systems management and the client for example). The person overlooking the transfer may have been already on the line to confirm the transfer to a client; the call will be broken off to be able to enter the failure path.

Many steps that can either succeed or fail. Sometimes failure is acceptable, other times the transaction step needs to be delayed, but most of the time when something breaks you'll want to undo the damage already done.

Of course we deal in software here and our software isn't going to call anybody. But it does have the capacity to notify through the all-powerful JMS. Sending and processing messages can be part of a managed transaction.


Commit and rollback

Back to database terms. When all steps in a transaction succeed (transfer, confirm, notify, etc.), you'll want to commit it. After committing the transaction, it is permanent and you can't easily undo it anymore. Generally upon commit the transaction is over, and will be cleaned up.

When things go sour, you will want to rollback the transaction. This means that any mutation that was part of the active transaction will be undone. The system must go back to its virgin state as if nothing went wrong. Of course something did go wrong, but your failure handling routines should be able to cope with that.

Since the transactions are managed by the container, in general you also want the container to manage when a transaction is committed and rolled back. The rule is quite easy.

- when your code succeeds, commit
- when your code throws an exception (the bubbles out of the EJB method), roll back

But of course we don't want the server to have full control. You can control when a rollback is automatic or not; it only happens when the exception is either;

- a runtime exception
- an EjbException
- any other exception marked with the annotation @ApplicationException

With the annotation you can fully control if an automatic rollback should happen or not on your own exceptions. You could declare a runtime exception for example, but not have it rollback the transaction.


@ApplicationException(rollback=false)
public class MyException extends RuntimeException {

  ...
}



The first EJB


Lets begin with a little practical knowledge now to let the theory sink in. It is all fine and well that the container can manage transactions for us, but how and when does it happen?


@Local
public interface MyFirstEjb {

  public void helloWorld();
}

@Stateless
public class MyFirstEjbBean implements MyFirstEjb {

  public void helloWorld(){
    
    System.out.println("Hello world!");
  }
}



Here we have a stateless EJB with a single business method. Note that this is my specific naming convention for EJBs, it may not match your own. Do what you feel is best.

So where is the transaction? Right here:


public void helloWorld(){ // transaction starts here
    
    System.out.println("Hello world!");
  } // transaction ends here


Its as simple as that. It is after all a container managed transaction; you don't have to do anything to create one. It is "just there". Later on we'll see how you can impose influence here.

Adding a persistence unit

Of course most of the time you'll be doing transaction stuff that incorporates datastore mutations. Part of the EJB3 specification is JPA, which has a mode in which the JPA transactions can be managed by the container also. You'll only have to declare your persistence unit like this in the META-INF/persistence.xml:


<?xml version="1.0" encoding="UTF-8"?>
<persistence xmlns="http://java.sun.com/xml/ns/persistence"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://java.sun.com/xml/ns/persistence
    http://java.sun.com/xml/ns/persistence/persistence_1_0.xsd" version="1.0">

 <persistence-unit name="your-pu" transaction-type="JTA">
    java:/YourDS
        <!--JPA provider configuration here -->
        </persistence-unit>
</persistence>



Transaction type "JTA" means "Java Transaction API", which is an API to standardize the way transactions are managed. The nice thing about JTA is that transactions can cross over different technologies; they all use the container managed JTA based transaction.

Now that you have your JTA persistence unit setup, you can gain access to it through a simple annotation:


public class MyFirstEjbBean implements MyFirstEjb {

  @PersistenceContext(name="your-pu")
  private EntityManager em;

  ...
}


Voila, a container managed entity manager. This means that when you enter a business method in this EJB that will have a managed transaction, you are guaranteed that the entity manager will also have an active transaction and that this transaction will be either committed or rolled back based on the success rate of your business method.


public class MyFirstEjbBean implements MyFirstEjb {

  @PersistenceContext(name="your-pu")
  private EntityManager em;

  public void saveCustomer(String name, String address, String city){

    Customer customer = new Customer(name, address, city);
    em.persist(customer);

    // nah, I don't care what you think. All our customers are named bill
    customer.setName("bill");
  }
}


A not too realistic example that requires little imagination. Customer is a JPA entity and it has some basic customer properties.

Why I choose this example is to see JPA transaction management at work. At first the customer is persisted. At that point the customer is managed by JPA, thus according to the rules of JPA any changes we make to the entity must be automatically persisted when you either flush or commit the active transaction.

Thus we can change the name without any need for a call to any EntityManager related persist call. The container will commit the transaction for us, and thus the name change is automatically committed for you. In fact the container is smart enough even to know when to flush changes to the database halfway through the EJB call, should this be necessary to make further mutations work. Neat huh?

Similarly, when you fetch entities through JPA these will also become managed entities for the duration of our transaction.


public Employer findBoss(String name){
    String q = "select b from Employer b where b.name=:name";
    Query qo = Query.createQuery(q).setParameter("name", name);

    try{
      return (Employer) qo.getSingleResult();
    } catch(NoResultException nre){
      return null;
    }
  }

  public void saveEmployee(String name, String address, String city, String bossname){

    Employee bill = new Employee(name, address, city);
    em.persist(bill);

    Employer boss = findBoss(bossname);
    bill.setEmployer(boss);
  }



As you should know, relational mappings in JPA will only work properly if you are putting managed entities in there. Because you are doing it in an EJB call, the boss instance will be managed - this also works for the findBoss() method as it is called within the scope of the saveEmployee() call and thus it simply shares the transaction of saveEmployee(). How and why will be explained later, there is a trap here I will make you aware of once we've covered a little more ground.


Transaction attributes

So far you've seen the default behavior of the EJB 3 specification.

- the EJB has container managed transactions
- each business method has an active transaction

The fun thing about transactions is that they can span across multiple EJBs. Or not! Thats basically up to you. For the next part, lets define unrealistic example number three.


@Stateless
public class MyFirstEjbBean implements MyFirstEjb {

  @EJB
  private MySecondEjb other;

  public void firstEjb(){

    other.secondEjb();
  }
}


@Stateless
public class MySecondEjbBean implements MySecondEjb {

  public void secondEjb(){

  }
}


I chose these names so that it stays clear what is being called.

Alright then. As said each business method, unless specified otherwise, has a managed transaction. This happens because by default business methods are assigned the transaction attribute REQUIRED. You could also enforce it yourself, like this:


@TransactionAttribute(TransactionAttributeType.REQUIRED)
  public void firstEjb(){
     ...
  }


You don't have to be paranoid however. The default values are declared in the EJB specification itself; every server that is EJB 3.X compliant must follow the same rules. So you can safely leave out the TransactionAttribute annotation in most cases.

Attribute REQUIRED basically means "you must have a transaction. So if there isn't one already create one". This is important when we follow the code. Lets say we call MyFirstEjb.firstEjb(). A transaction is created for us. In this business method we make a call to MySecondEjb.secondEjb(). Because we do this through the business interface (other) we inject into MyFirstEjb, the secondEjb() call is a container managed invocation. In other words: an EJB invocation, not a local method invocation.

This is an important distinction, because it determines what will happen to our transaction. We are now in secondEjb() and the container has some work to do. Will a new transaction be created for secondEjb()?

The answer is: no. Because REQUIRED says "create a new transaction if none exists already". But one does exist already, the transaction created in firstEjb(). secondEjb() will now adopt the transaction of firstEjb()!

What does that say to us? Lets revisit an earlier example.


public class MyFirstEjbBean implements MyFirstEjb {

  @EJB
  private MySecondEjb other;

  public void saveEmployee(String name, String address, String city, String bossname){

    Employee bill = new Employee(name, address, city);
    em.persist(bill);

    Employer boss = other.findBoss(bossname);
    bill.setEmployer(boss);
  }

}

@Stateless
public class MySecondEjbBean implements MySecondEjb {

  public Employer findBoss(String name){
    String q = "select b from Employer b where b.name=:name";
    Query qo = Query.createQuery(q).setParameter("name", name);

    try{
      return (Employer) qo.getSingleResult();
    } catch(NoResultException nre){
      return null;
    }
  }
}



The employer/employee example, adapted to be split among two different EJBs. The fact of the matter is that the code will still work; the boss may be fetched through JPA in MySecondEjb and returned to MyFirstEjb, both EJB methods are sharing the same transaction and thus the entity returned to firstEjb() is still managed at that point.

At this point you might shrug and think "is that so special?". Its easy isn't it, to just neglect all the magic that is being done for you. All you are doing now is invoke some bean methods and pass around some objects. But in the background the container is keeping an eye on things to make sure your transactional state is in order; it will even do that across JVMs in case of a remote EJB call.

And not a single line of actual database transaction related code!

Transaction attributes: going deeper

REQUIRED is not the only choice you have. Lets have them all.


NOT_SUPPORTED
This means that the EJB call will have no transaction at all.

In the case of secondEjb() in the example above, this means the transaction created in firstEjb() is suspended, and reactivated as soon as secondEjb() finishes its work.

NOT_SUPPORTED may seem like baggage at first, but it serves a few purposes.

- documentation. The annotation instantly tells you that the method does nothing transactional. Or should do nothing transactional, an important message to other people that may need to touch the code.
- resources. There is always a cost in managing a transaction, so if the container doesn't have to, give it a break.
- decreased whoops factor. Lets face it, you are going to make mistakes. By being precise with the transaction attributes you'll catch transaction mistakes far sooner in your development cycle as they'll be more fragile.

NEVER
NEVER is more drastic than NOT_SUPPORTED. If a transaction is active when this EJB is called, the container will throw an exception.

If you are dealing with a complex transaction management setup, NEVER can be a useful tool to catch programming mistakes early on. There is a gotcha however; NEVER would imply that during the runtime of the EJB method there will actually never be any kind of transaction. This is not entirely true however; when you make a call to another EJB, that EJB may safely create its own isolated transaction. Be aware of that, as if you make lots of EJB calls then NEVER may actually become a performance hog because of many mini-transactions being created, possibly without your knowledge.

SUPPORTS
This will make the container lazy. "If a transaction exists then fine, I'll adopt it. If none exist then I'm not going to make the effort to create one."

SUPPORTS is not particularly useful for any specific solution; if there is a real reason to use it you probably can (and likely should) redesign your code so it isn't needed anymore. In fact you should take care using it as it can lead to really nasty to pinpoint transaction issues. I see it as a "don't care" type of deal - you have a method that does not need an active transaction. You could mark it as NOT_SUPPORTED, but then the container will put a running transaction to sleep. When you put SUPPORTS, the transaction will remain active leading to less bookkeeping overhead for the container. If you have some sort of utility EJB method that gets called a lot from an EJB context, it can be a small optimization to give it the SUPPORTS transaction attribute in stead of NOT_SUPPORTED. And that would be my recommendation for its use: stick to NOT_SUPPORTED, but when you do performance optimizations identify methods that could benefit from SUPPORTS and only then apply it.

REQUIRES_NEW
The most interesting of the bunch. REQUIRES_NEW will always create a new transaction, even if one already exists. In other words you'll be working with parallel transactions, or to be more precise a nested transaction. However it isn't as complicated as all that; the outer transaction is put to sleep until the inner EJB call finishes, at which point the inner transaction is wrapped up. So there will still be only one active transaction at a time. Note that the inner transaction does not share the managed entities of the outer transaction, they are completely isolated.

This is then also a source of many programming mistakes. Because you create a new transaction, any entities managed inside it will become detached again when the EJB call finishes, even if you return an instance to the outer EJB method and its transaction! You'd have to do a find() on the entity in the outer EJB call to make it managed again.

MANDATORY
The opposite of NEVER; when the EJB is called there must be an active transaction already. Within container based transaction management you will be basically saying "call this business method only from another container managed resource such as an EJB, MDB or webservice". You may not realize it yet, but MANDATORY is a wickedly powerful tool that can help you to make your transactional code so much more robust.

For example, when I have a DAO class I like to mark storage DAO methods that accept (managed) entities as a parameter as MANDATORY. This way I don't have to add any code that makes the parameter entities managed before I slap them in entities I want to persist, I just dumbly assume that they already are because they come from a transacted environment. If they are not: well that is your own fault, but 99/100 times entities will actually already be managed at that point in time.


Lets set an example shall we? To properly demonstrate this, we'll have to make our code fail on purpose.


public class MyFirstEjbBean implements MyFirstEjb {

  @EJB
  private MySecondEjb ejb2;

  @PersistenceContext(name="your-pu")
  private EntityManager em;


  public void createEmployee(String name, String address, String city, String bossname){

    Employee steve = new Employee(name, address, city);
    em.persist(steve);

    Employer bill = ejb2.passAlongBoss(bossname);
    steve.setEmployer(bill);
  }
}


EJB number one is responsible for creating our employer.



@Stateless
public class MySecondEjbBean implements MySecondEjb {
  
  @EJB
  private MyThirdEjb ejb3;

  @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
  public Employer passAlongBoss(String name){
    return ejb3.findBoss(name);
  }
}


Our second EJB is a middle-man; our first EJB demands to know the boss for our employee; our second EJB supplies it by asking it of our third EJB.


@Stateless
public class MyThirdEjbBean implements MyThirdEjb {

  @PersistenceContext(name="your-pu")
  private EntityManager em;

  @TransactionAttribute(TransactionAttributeType.MANDATORY)
  public Employer findBoss(String name){
    String q = "select b from Employer b where b.name=:name";
    Query qo = Query.createQuery(q).setParameter("name", name);

    try{
      return (Employer) qo.getSingleResult();
    } catch(NoResultException nre){
      return null;
    }
  }
}


And our third EJB delivers. Now lets put the transaction attributes in line for a moment:

createCustomerREQUIREDtransaction T1 created
passAlongBossREQUIRES_NEWtransaction T2 created
findBossMANDATORYtransaction T2 adopted

This code will fail. Can you spot where?

It will fail as soon as createCustomer() finishes and transaction T1 is committed. Because even though we are so clever, we have messed with the transactions here. Lets see what happens at the entity level.

createCustomerEmployee createdmanaged in T1
passAlongBossEmployer received from findBoss()managed in T2
findBossEmployer fetchedmanaged in adopted T2

Our Employer entity is passed all the way from findBoss() to passAlongBoss() to createCustomer(). The trouble is that it is managed in T2, not in T1. So as soon as passAlongBoss() ends, T2 is wrapped up and the Employer entity becomes detached. The end result: you are setting a detached entity reference into the managed Employee entity, resulting in the JPA provider not being able to persist that change.


The local method trap

With the knowledge of transaction attributes fresh in your mind, let me throw a common mistake at you.


public class MyFirstEjbBean implements MyFirstEjb {

  public void businessMethod1(){
    
    businessMethod2();
  }

  @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
  public void businessMethod2(){
    // do some stuff
  }
}


Two business methods in the same EJB. Assuming we call, businessMethod1(), how many transactions do you think will be created in total?

Answer: 1.

What? businessMethod2() is supposed to get its own personal transaction because of REQUIRES_NEW right? You'd be right... if businessMethod2() would be invoked through an EJB interface. But it is not. From the perspective of the container, businessMethod2() is simply a local method call inside businessMethod1() and is not instrumented accordingly. Remember: EJBs are at the core still plain old Java classes.

The easy solution would be to use two EJBs in this case, as has been demonstrated earlier. Alternatively the EJB stack is flexible enough that it allows you to programmatically obtain a reference to the EJB interface so you can call the method that way, for which you can find more information here and here.

Me personally creating this kind of cyclic bean lookup setup is a code smell and might indicate that your system architecture might need some redesigning. Does businessMethod2() really belong in MyFirstEjbBean, or is perhaps better suited in another EJB class? I believe that the latter can always be made true. But it is a personal thing, if you are using the trick in the linked pages and you think your code is readable and maintainable then by all means, don't change it.


Multiple persistence units

You know how it is with manuals, articles and tutorials. Everything is fine when the basics are covered. But then you go out into the real world and you want to actually apply the material. Then you take the next step, pushing the technology beyond the scope of the manual, the articles and the tutorials because the more difficult problems are always forgotten by the authors as if you will never face them. But of course you do, programming is not easy.

The interesting topic when it comes to transactions is when you go beyond one persistence unit into multiple persistence units, that may target different databases. JTA can certainly handle that, and thus so can EJB technology. It doesn't just work out of the box though.

First of all, you need a very specific type of datasource: an XA datasource. To keep it short and simple: an XA datasource is specifically designed to take part in a "two phase" transaction, or a transaction that targets multiple resources. Most established database implementations support XA datasources through their drivers.

With the XA datasources in place, JTA is all setup to deal with you throwing multiple persistence units at it. What you shouldn't do however is to try and force two persistence units onto the same EJB. In stead, separate.


@Stateless
public class MyFirstEjbBean implements MyFirstEjb {

  @PersistenceContext(name="first-pu")
  private EntityManager em;

  public void doSomething(){

  }
}



@Stateless
public class MySecondEjbBean implements MySecondEjb {
  
  @PersistenceContext(name="second-pu")
  private EntityManager em;

  public void doSomething(){

  }
}



So far so good, two EJBs with two different persistence units. What will work with XA datasources and fail without it is this:


@Stateless
public class MyThirdEjbBean implements MyThirdEjb {
  
  @EJB
  private MyFirstEjb first;

  @EJB
  private MySecondEjb second;


  public void reallyDoSomething(){
    first.doSomething();
    second.doSomething();
  }
}


Notice how in one business method call the transactions are mixed and mashed. But with XA datasources and JTA, this can work.

Just note that as soon as you get into the realm of XA datasources and distributed transactions, when something blows up you'll probably get the most vague exceptions you'll ever encounter with exceptions yelling TWO PHASE COMMIT FAILURE ABORT errors at you. Shrug it off and don't be intimidated however, the truth is usually as simple as a query borking somewhere and the root cause will be hidden somewhere in the logged stacktraces. But it may seem like the container is on the brink of destruction when you first have to deal with this.


Dealing with exceptions

Lets say that an exception occurs in an EJB method. How would you deal with that?


@Stateless
public class MySecondEjbBean implements MySecondEjb {

  public void suicideMission(){
    throw new IllegalStateException("Blowing up!");
  }
}


suicideMission() throws a RuntimeException when called which is not handled, so its transaction will be rolled back by the container. This means that when called from another EJB, it will share the transaction of that EJB (if any) and that transaction will be marked for rollback.


@Stateless
public class MyFirstEjbBean implements MyFirstEjb {

  @EJB
  private MySecondEjb ejb2;

  public void doingSomething(){
    ejb2.suicideMission(); // bang
  }

}


Now lets say you choose to deal with that and you want to do some error handling by storing the error in a database.



public void doingSomething(){
    try{
      ejb2.suicideMission(); // bang
    } catch(Throwable t){
      ErrorLog log = new ErrorLog(t.getMessage());
      em.persist(log); // bang number 2
    }
  }


Sorry amigo, this isn't going to work. You'll be notified by the container that the transaction is marked for rollback. You cannot do any more mutations on it - it makes no sense anyway because all your new changes will be rolled back!

To incorporate error handling in your service layer, you'll have to make clever use of transaction boundaries using the attributes. For example you could give suicideMission() its own private transaction with REQUIRES_NEW to blow up so the transaction of doingSomething() remains intact (as long as you deal with the exception), or you could let the error handling be managed by another EJB method that has its own private transaction. When you do the latter, remember the local method call trap discussed earlier.


@Stateless
public class MySecondEjbBean implements MySecondEjb {

  @PersistenceContext(name="your_pu")
  private EntityManager em;


  public void suicideMission(){
    throw new IllegalStateException("Blowing up!");
  }

  @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
  public void handleError(Throwable t){
     ErrorLog log = new ErrorLog(t.getMessage());
     em.persist(log); // okay, done in a private transaction
  }
}




public void doingSomething(){

    try{
      ejb2.suicideMission(); // bang
    } catch(Throwable t){
      ejb2.handleError(t); // no bang
    }
  }



Of course you can always let the exception bubble all the way back to the root caller, which will most likely be a piece of code that does not have a container managed transaction like a servlet, a webservice or a JSF backing bean. This is usually the best strategy to follow. But the overall message is: Be careful with exception handling in the service layer, there are many surprises if you don't keep in mind what the state of a transaction is.


Long running tasks

When working with EJB technology you may have run into a problem: the dreaded transaction timeout. Like any other transaction, an EJB transaction will have a certain timeout bound to it; if a transaction takes longer than the timeout, the transaction is aborted.

Sometimes you will have EJB code that will be long running however. When processing data in files of several gigabytes on an external SFTP server you may have a runtime of several hours for example. Heavy database stuff may also be a culprit.

There are of course ways to cope with that, and it requires precise transaction management. Here are a few tricks.

Batching
A good way to deal with large volumes of data is to process it in batches. This also helps to keep the transaction size small. Imagine using JPA to store 1 million entities for example; even if you create the entities on the fly, they will all go into the persistence store which may likely give you memory issues. Not only that but the database transaction would become huge with such volumes of data.

So in stead of doing the entire set at once, split into smaller batches of say 10000. You would create two EJBs to handle this: EJB1 has a "master" method in which NO transaction is active (TransactionAttributeType.NEVER); this method will deal with the large volume of data and split up the transactional part into smaller batches. Each batch is then passed on to EJB2, which has a "support" method that does create a transaction.

Using this setup you will have a new transaction created per batch, which will each be short-lived and small scoped transactions. Voila, no time out and no memory issues.


Use bean-managed transactions
I don't call this a real 'solution' as the two-ejb setup I described earlier can make this work without a problem using container managed transactions. But the fact of the matter is that bean managed transactions offer you control over when a transaction starts and ends and when a transaction times out, making batching setups possible within only one EJB method. If you don't mind using both management types in the same application layer, by all means go for it. It saves you from having to add yet another class.


Increase timeout
Not really a stable solution as runtime speed is determined by many factors, including machine load. So you can never predict how long a certain operation is going to take, it is always going to vary. But if you can say with certainty that a single transaction is going to take AT LEAST 10 minutes, you could always increase the timeout to support such runtimes. As long as you don't start setting timeouts of hours or days; in such cases you really need to fix the problem at the root.

How to do that is server specific however, you'll have to look it up in your server's documentation.

A word on manual transaction management

I don't want to end this article and leave the topic untouched. Either you use JPA in a client application or you use bean managed transactions in an EJB environment, there will be times in your career where you will have to manage transactions yourself.

Managing transactions is fairly painless (one call to start it, one call to commit it or roll it back). What you should be wary about is the persistence store, an invisible storage of entities you persist. Every entity that becomes managed is added to the store, which will grow and grow in size until ultimately it can grow so big that your application runs out of memory! Before that time you'll find that persisting new entities becomes slower and slower as the store becomes bigger and bigger.

The biggest job you have while performing manual transaction management is to manage that persistence store; you will want to keep it as small as possible to keep resource usage low and keep things as speedy as possible. JPA gives you multiple tools to do just that.

a) EntityManager.clear() will empty the persistence store and make all entities that were in it detached, even when a transaction is still active
b) EntityManager.delete() will remove the entity from the store... but also the database!
c) EntityManager.close() should be obvious.
d) persist outside of a transaction so the entities do not become managed; this will only work if you do not have entity relations of course.

Out of personal experience I can tell you that committing a transaction will not make entities detached and will not remove them from the persistence store; they remain cached. The best strategy that you can follow when working with large volumes of entities (say in a batched insert) is to have a dedicated entity manager for each transaction you create. So create the entity manager before starting the transaction and close the entity manager after committing or rolling back the transaction. This way you mimic closely what the container does during a container managed transaction.


Conclusion

Of course, I have yet to cover everything there is to know about transactions but whole books have been filled with this very subject; I aim to cover what is useful but more than the bare basics that most articles seem to restrict to. For a more complete picture you should read some books. My two favorite books on the subjects touched upon in this article are Enterprise Javabeans and Pro JPA 2. With those two on your desk you'll be up and running with EJB and JPA technology very quickly indeed.