Thursday, December 10, 2009

Allocation Instrumenter for Java

In brief: We've open sourced a tool that allows you to provide a callback every time your program performs an allocation. The Java Allocation Instrumenter can be found here. Give it a whirl, if you are interested.

One thing that crops up a lot at my employer is the need to take an action on every allocation. This can happen in a lot of different contexts:
  1. The programmer has a task, and wants to know how much memory the task allocates, so wants to increment a counter on every allocation.
  2. The programmer wants to keep a histogram of most frequently accessed call sites.
  3. The programmer wants to prevent a task from allocating too much memory, so it keeps a counter on every allocation and throws an exception when the counter reaches a certain value.

Because of the demand for this, a few of us put together a tool that instruments your code and invokes a callback on every allocation. The Allocation Instrumenter is a Java agent written using the java.lang.instrument API and ASM. Each allocation in your Java program is instrumented; a user-defined callback is invoked on each allocation.

The easiest way to explain this is with an example. Assume you have a program that creates 10 strings, and you want to instrument it:

public class Test {
public static void main(String [] args) throws Exception {
for (int i = 0 ; i < 10; i++) {
new String("foo");
}
}
}
To do this, you create an instance of the interface Sampler:

import com.google.monitoring.runtime.instrumentation.AllocationRecorder;
import com.google.monitoring.runtime.instrumentation.Sampler;

public class Test {
public static void main(String [] args) throws Exception {
AllocationRecorder.addSampler(new Sampler() {
public void sampleAllocation(int count, String desc,
Object newObj, long size) {
System.out.println("I just allocated the object " + newObj +
" of type " + desc + " whose size is " + size);
if (count != -1) { System.out.println("It's an array of size " + count); }
}
});
for (int i = 0 ; i < 10; i++) {
new String("foo");
}
}
}

You can then compile and run the program:

% javac -classpath path/to/allocation.jar Test.java
% java -javaagent:path/to/allocation.jar Test

The output will look something like this:

I just allocated the object foo of type java/lang/String whose size is 24
I just allocated the object foo of type java/lang/String whose size is 24
I just allocated the object foo of type java/lang/String whose size is 24
I just allocated the object foo of type java/lang/String whose size is 24
I just allocated the object foo of type java/lang/String whose size is 24
I just allocated the object foo of type java/lang/String whose size is 24
I just allocated the object foo of type java/lang/String whose size is 24
I just allocated the object foo of type java/lang/String whose size is 24
I just allocated the object foo of type java/lang/String whose size is 24
I just allocated the object foo of type java/lang/String whose size is 24

So, by my standards, it is really pretty easy to use. If you find it useful, please let me know!

Edited to add I noticed this on Twitter: Cool, even if it uses Ant (so probably I will never try it). This is funny, because I only added an ant buildfile so more people would try it. You can download the source and compile it with javac in about one line.

Monday, September 7, 2009

Cliff Click on Java-vs-C Performance

Cliff Click has a terrific post on Java-vs-native-code performance. Recommended.

Cliff is the Chief JVM Architect at Azul Systems, was the lead designer behind Hotspot's server JIT compiler, and is an all-around smart guy. His main drawback is that he (apparently) gets dragged into flamewars on YouTube's comment section.

Tuesday, July 7, 2009

How Hotspot Decides to Clear SoftReferences

I got asked about this twice in one day, and I didn't know the answer, so I sat down and puzzled it out a bit.

A SoftReference is a reference that the garbage collector can decide to clear if it is the only reference left to an object, and the GC decides (through some undefined process) that there is enough memory pressure to warrant clearing it. SoftReferences are generally used to implement memory-sensitive caches. A mistake many people make is to confuse it with a WeakReference, which is a reference that, if it is the only reference left to the object, the garbage collector will aggressively clear.

I got asked how Hotspot's GC actually decides whether to clear SoftReferences. Twice. On the same day. (Hotspot is the Sun / OpenJDK JVM). The answer was that I had no idea, so I had to go look it up. Here is what I found out; I hope it is a useful reference for someone (pun intended).

First, there is a global clock variable that is set with the current time (in millis) every time a garbage collection occurs. Every SoftReference has a timestamp field that is set to the current value of clock when it is accessed (when it is constructed or the get()) method is called. This gives a very coarse ordering over the SoftReferences; the timestamp indicates the last GC before they were accessed.

When a garbage collection occurs, the decision to clear a SoftReference is based on two factors:
  1. How old the reference's timestamp is, and
  2. How much free space there is in memory.
The calculation is pretty simple. If:
  • free_heap is the amount of free heap space in MB,
  • interval is the time between the last GC's clock and the timestamp of the ref we are currently examining, and
  • ms_per_mb is a constant number of milliseconds to keep around a SoftReference for each free megabyte in the heap
Then the decision is made by:
interval <= free_heap * ms_per_mb
To take an example, let's say that we have a SoftReference with a timestamp of 2000ms, the last GC's clock time was 5000ms, the ms_per_mb is 1000 and the free space is 1MB. We then test whether:
5000 - 2000 <= 1 * 1000
This is false (3000 > 1000), so we clear the reference.

Now let's say there is more free space — say, 4MB — and so less reason to clear the SoftReferences. The calculation is now
5000 - 2000 <= 4 * 1000
This is true (3000 <= 4000), so we don't clear the reference.

One thing to notice about this is that it implies that SoftReferences will always be kept for at least one GC after their last access. Why is that? Well, for the interval, we are using the clock value of the last garbage collection, not the current one. As a result, if a SoftReference has been accessed since the last garbage collection, it will have the same timestamp as that garbage collection, and the interval will be 0. 0 <= free_heap * 1000 for any amount of free_heap, so any SoftReference accessed since the last garbage collection is guaranteed to be kept. This is actually how this question came up; some of my colleagues notices their SoftReferences weren't being cleared during a garbage collection, and they didn't know why.

ETA: The above paragraph originally said "one Full GC", not "one GC". In our case, it was one full GC, because the objects being allocated were too big to be allocated in the young generation. In most cases, it will be one GC of the generation in which the object was allocated, which will usually be the new generation / TLAB.

Another thing to notice is that the value of 1000 for ms_per_mb is fairly arbitrary. It can be adjusted with the JVM flag -XX:SoftRefLRUPolicyMSPerMB=n. If you adjust it down, then free_heap * ms_per_mb will be smaller, and so SoftReferences are more likely to be cleared. If you adjust it up, you get the opposite effect.

Wednesday, June 17, 2009

Volatile Arrays in Java

I get asked a lot about how the volatile keyword interacts with arrays, so it is probably worth a blog post on the subject.

Those of you who have read my posts on volatile (Volatile Fields and Synchronization, Volatile Does Not Mean Atomic and, most importantly, What Volatile Means in Java) will have a pretty good idea of what volatile means, but it is probably worth it to provide a reminder.

Basically, if you write to a volatile field, and then you have a later read that sees that write, then the actions that happened before that write are guaranteed to be ordered before and visible to the actions that happen after the read. In practice, what this means is that the compiler and the processor can't do any sneaky reordering to move actions that come before the write to after it, or actions that come after the write to before it. See my post on What Volatile Means in Java for more detail.

With that out of the way, let's go through some examples of what you can do with volatile arrays:
volatile int [] arr = new int[SIZE];

arr = arr;
int x = arr[0];
arr[0] = 1;
The first lesson to learn, which will guide us here, is that arr is a volatile reference to an array, not a reference to an array of volatile elements! As a result, the write to arr[0] is not a volatile write. If you write to arr[0], you will get none of the ordering or visibility guarantees that you want from a volatile write.

What examples are there of a volatile write in the code above? Well, both of the writes to arr — the self-assignment and the write of new int[SIZE] — are volatile writes, because they are writing to arr, not one of its elements.

That explains where the volatile writes are. Where are the volatile reads in our example? It turns out that each of the lines after the declaration contains a volatile read:
arr = arr
This one is easy. The volatile read is on the right hand side of the assignment statement.
int x = arr[0];
This one is slightly more subtle. The volatile read is not the read of the array element. The right hand side of that assignment is a two step process. First, you read the array reference, then you read the 0th element of that array. The volatile read is the read of the array reference, not the read of the 0th element.
arr[0] = 1;
The previous example should give you a hint of where the volatile read is on this line. As in that example, the left-hand side is a two step process. First, you read the array reference, then you assign to the 0th element of that array. As odd as it seems, the read of the array reference is a volatile read.

The astute reader will notice that there is no actual way to get volatile write semantics by writing to an element of an array referenced by a volatile field. The easiest way to get volatile array semantics is to use the Atomic[Reference/Long/Integer]Array classes in java.util.concurrent.atomic, which provide volatile semantics for reads and writes of individual elements.

(Why not Float/Short/Double Array? With APIs, you never ask "why not", you ask "why". Meanwhile, you have 32- and 64-bit bit patterns, so the Float.floatToIntBits and Float.intBitsToFloat family of functions are your friends.)

These classes are somewhat problematic, though. If nothing else, you are endlessly boxing and unboxing values, which may make access expensive. Ugh — I really do know better than this, really! As a result, there is more to this story.

You may have noticed that I did provide a way of getting a volatile write above with just arrays: by writing out a self-reference. I have been asked if that technique can be leveraged to provide volatile access to array elements. Here's what that would look like:
// I wouldn't do this if I were you.
volatile int [] arr = new int[SIZE];

arr[0] = 1;
arr = arr;
This definitely does provide the volatile write. However, what good does it do you? The virtue of a volatile write is that a corresponding read can detect that it happened, and do something based on the new value. For example, you can use a volatile flag to force one thread to loop indefinitely until another one sets the flag. In this case, you can't actually detect that another thread performed the write, because it is writing the same value to the variable.

You can use sun.misc.Unsafe to emulate the behavior of a volatile array. But not everyone is working on a Sun VM. And they are trying to discourage the adoption of sun.misc.Unsafe, so I'm not going to put the code here.

Don't despair too much, though — Doug Lea and the clever folks involved in JSR-166 are working on better solutions for Java 7. More news as events warrant.

Tuesday, June 9, 2009

Welcome!

Mailinator's Paul Tyma linked to me. If you are following from that link, the relevant blog entry you are looking for is probably this one, specifically, the entry labeled "visibility".

Saturday, April 4, 2009

Faster Logging with Faster Logger Classes

Today, I'll discuss a little tweak I made to java.util.Logging that made my logging throughput double. I want to use it as an illustration that it often isn't very difficult to improve the performance of concurrent code by doing things that are actually pretty easy to do.

So, "I" have an application that is running a couple of hundred threads on an 8-core machine, and it wants to log about 2MB a second using java.util.Logger. When I say "I have", I actually mean "someone else has", because if "I" had to log a megabyte a second, there is no way I would use java.util.Logger to do it. Still, we all make our choices.

When I came to this code, it was already doing sensible things like buffering its output. It wasn't doing something more complicated, like using a dedicated logging thread. It could log about 1MB a second, and was chewing through CPU pretty rapidly. I just decided to run our profiler on the code and see where the bottlenecks were. Our profiler is based on the undocumented AsyncGetCallTrace profiling call in HotSpot, and can actually profile your code without interfering with its performance characteristics. This is nice, because you don't end up optimizing the wrong things. But I digress.

Anyway, the profiler showed that we were spending a LOT of time in Logger.log(), on lines that look fairly harmless, but aren't:

public void log(LogRecord record) {
if (record.getLevel().intValue() < levelValue || levelValue == offValue) {
return;
}
synchronized (this) {
if (filter != null && !filter.isLoggable(record)) {
return;
}
}

// Post the LogRecord to all our Handlers, and then to
// our parents' handlers, all the way up the tree.

Logger logger = this;
while (logger != null) {
Handler targets[] = logger.getHandlers();

if (targets != null) {
for (int i = 0; i < targets.length; i++) {
targets[i].publish(record);
}
}

if (!logger.getUseParentHandlers()) {
break;
}

logger = logger.getParent();
}
}
There are no fewer than four locks being acquired in this method. There is the obvious call to synchronized(this). logger.getHandlers() is also a synchronized method. getUseParentHandlers() is synchronized. getParent() is also synchronized. Acquiring and releasing these locks was killing our throughput!

It turns out that there are some very simple things you can do to eliminate the locks in this code without exposing yourself to any correctness issues:

  1. The filter field is protected by the lock on this. That lock is really only protecting one field; the lock is held while one line of code writes to it, and one line of code reads from it. If you want to make something that does the same thing (from the perspective of concurrency), you can make that field into a volatile. If you are worried about the filter variable changing in between when you read it and when you write it, you can read it to a local variable first:
    Filter theFilter = filter;  // filter is volatile
    if (theFilter != null && !theFilter.isLoggable(record)) {
    return;
    }
  2. There is a global lock protecting the handlers. It turns out that the only thing this is protecting is a rarely-updated array of Handlers. If you have a rarely updated concurrent array, you should be using the non-blocking CopyOnWriteArrayList instead.

  3. getUseParentHandlers() is synchronized for the exact same reason as the filter, and can be replaced with a volatile in the same way.


These are all pretty simple improvements, but they doubled my throughput. The changes are going to be incorporated into JDK7, and are, in fact, already in the downloads you can get from OpenJDK.

I should rush to say that I don't blame the original author for not making these changes; java.util.logger was added in JDK 1.4, before the addition of java.util.concurrent and before a rigorous definition of volatile. Plus, it really isn't designed for throughput.

Why am I blogging about pretty simple improvements? There are a few simple morals here:

  1. Learn your libraries. Don't have a synchronized array that is almost never updated, for example.

  2. Know when you can use tricky language features. Knowing that volatile can be used for simple, single-variable updates is a very useful thing to know. I've written about volatile frequently, go read those posts.

  3. It is actually worth it to participate in OpenJDK. We (or I, at least) have a tendency to assume that JDK code is really high quality and well-optimized. This is probably true in a lot of areas (java.util.concurrent, or java.lang.MostStuff, or java.util.YourFavoriteDataStructure), but there is still plenty of work to be done. If you get involved, you not only help yourself, but you also help everyone.

  4. I don't know, low-hanging fruit == good?

There is, by the way, a very large section of the definitely-recommended book Java Concurrency in Practice (lgt amazon) devoted to clever things you can do to speed up your multithreaded logging.

Saturday, February 28, 2009

Small Language Changes for JDK7

For those who haven't been paying attention, there is a new project from Sun to attract proposals for small language changes to JDK7. Even in the first day of open calls for proposals, the entries have been very interesting:
  • Strings in Switch, a proposal from Joe Darcy to be able to use Strings in switch statements.

  • Automated Resource Blocks, a proposal from Josh Bloch to be able to say things like:

    try (BufferedReader br = new BufferedReader(new FileReader(path)) {
    return br.readLine();
    }
    instead of:

    BufferedReader br = new BufferedReader(new FileReader(path));
    try {
    return br.readLine();
    } finally {
    br.close();
    }
    based on having BufferedReader implement a Disposable interface.

  • Block expressions, a proposal from Neal Gafter to be allow things like:
    double pi2 = (double pi = Math.PI ; pi*pi)**; // pi squared*

    Basically, the principle here is that the (parenthesis-delimited) block would return a value.

  • Exception handling improvements, another proposal from Neal Gafter to allow things like:
    try {
    doWork(file);
    } catch (final IOException | SQLException ex) {
    logger.log(ex);
    throw ex;
    }

  • Improved Type Inference, a proposal from me (although I can't claim credit for the idea) to be able to replace things like this:
    Map<String, List<String>> anagrams = new HashMap<String, List<String>>();
    with things like this:
    Map<String, List<String>> anagrams = new HashMap<>();

  • A new syntax for wildcard variance, again from the prolific Neal Gafter, allowing the user to replace "? extends" with "out" and "? super" with "in". (Can you tell that Neal's been working on C#?)

I don't love all of these proposals, but I do love some of them, and I think it is great to see that the pent-up demand for work on the Java language finally has an outlet.

(For those of you following the various Closures proposals: Closures don't count as a small change, and are therefore not in scope for this JSR.)

ETA: The "use Scala instead" and "where is BGGA?" comments have been made, so please read the comments and feedback before reiterating them.