Java Concurrency (&c): April 2009

Today, I'll discuss a little tweak I made to java.util.Logging that made my logging throughput double. I want to use it as an illustration that it often isn't very difficult to improve the performance of concurrent code by doing things that are actually pretty easy to do.

So, "I" have an application that is running a couple of hundred threads on an 8-core machine, and it wants to log about 2MB a second using java.util.Logger. When I say "I have", I actually mean "someone else has", because if "I" had to log a megabyte a second, there is no way I would use java.util.Logger to do it. Still, we all make our choices.

When I came to this code, it was already doing sensible things like buffering its output. It wasn't doing something more complicated, like using a dedicated logging thread. It could log about 1MB a second, and was chewing through CPU pretty rapidly. I just decided to run our profiler on the code and see where the bottlenecks were. Our profiler is based on the undocumented AsyncGetCallTrace profiling call in HotSpot, and can actually profile your code without interfering with its performance characteristics. This is nice, because you don't end up optimizing the wrong things. But I digress.

Anyway, the profiler showed that we were spending a LOT of time in Logger.log(), on lines that look fairly harmless, but aren't:


    public void log(LogRecord record) {
        if (record.getLevel().intValue() < levelValue || levelValue == offValue) {
            return;
        }
        synchronized (this) {
            if (filter != null && !filter.isLoggable(record)) {
                return;
            }
        }

        // Post the LogRecord to all our Handlers, and then to
        // our parents' handlers, all the way up the tree.

        Logger logger = this;
        while (logger != null) {
            Handler targets[] = logger.getHandlers();

            if (targets != null) {
                for (int i = 0; i < targets.length; i++) {
                    targets[i].publish(record);
                }
            }

            if (!logger.getUseParentHandlers()) {
                break;
            }

            logger = logger.getParent();
        }
    }

There are no fewer than four locks being acquired in this method. There is the obvious call to synchronized(this). logger.getHandlers() is also a synchronized method. getUseParentHandlers() is synchronized. getParent() is also synchronized. Acquiring and releasing these locks was killing our throughput!

It turns out that there are some very simple things you can do to eliminate the locks in this code without exposing yourself to any correctness issues:

The filter field is protected by the lock on this. That lock is really only protecting one field; the lock is held while one line of code writes to it, and one line of code reads from it. If you want to make something that does the same thing (from the perspective of concurrency), you can make that field into a volatile. If you are worried about the filter variable changing in between when you read it and when you write it, you can read it to a local variable first:
```
Filter theFilter = filter;  // filter is volatile
if (theFilter != null && !theFilter.isLoggable(record)) {
   return;
}
```
There is a global lock protecting the handlers. It turns out that the only thing this is protecting is a rarely-updated array of Handlers. If you have a rarely updated concurrent array, you should be using the non-blocking CopyOnWriteArrayList instead.

getUseParentHandlers() is synchronized for the exact same reason as the filter, and can be replaced with a volatile in the same way.

These are all pretty simple improvements, but they doubled my throughput. The changes are going to be incorporated into JDK7, and are, in fact, already in the downloads you can get from OpenJDK.

I should rush to say that I don't blame the original author for not making these changes; java.util.logger was added in JDK 1.4, before the addition of java.util.concurrent and before a rigorous definition of volatile. Plus, it really isn't designed for throughput.

Why am I blogging about pretty simple improvements? There are a few simple morals here:

Learn your libraries. Don't have a synchronized array that is almost never updated, for example.

Know when you can use tricky language features. Knowing that volatile can be used for simple, single-variable updates is a very useful thing to know. I've written about volatile frequently, go read those posts.

It is actually worth it to participate in OpenJDK. We (or I, at least) have a tendency to assume that JDK code is really high quality and well-optimized. This is probably true in a lot of areas (java.util.concurrent, or java.lang.MostStuff, or java.util.YourFavoriteDataStructure), but there is still plenty of work to be done. If you get involved, you not only help yourself, but you also help everyone.

I don't know, low-hanging fruit == good?

There is, by the way, a very large section of the definitely-recommended book Java Concurrency in Practice (lgt amazon) devoted to clever things you can do to speed up your multithreaded logging.

Java Concurrency (&c)

Saturday, April 4, 2009

Faster Logging with Faster Logger Classes

Blog Archive

About Me

Java Concurrency (&c)

Saturday, April 4, 2009

Faster Logging with Faster Logger Classes

Blog Archive

About Me

Subscribe To This Blog