Saturday, March 31, 2007

Safe Construction and the JMM

Another question from a viewer. We have this code:

class C {
  private int x;

  public C(int v) {
    this.x = v;
  }

  public synchronized int get() {
    return this.x;
  }

  public synchronized void set(int v) {
    this.x = v;
  }
}

(For the actual question as it was worded, the short answer is, "yes")

The question was one I get a lot. He asked how one thread can safely read the initial write of v to x. That is to say, if we have:


Thread 1:
globalRef = new C(5);

Thread 2:
C ref = globalRef;
if (ref != null) {
  int r1 = ref.x;
}


How can we change this code to guarantee that the read in Thread 2 will see the value 5 for x (assuming that ref is not null)?

Let's back up for a minute, and talk about why, in the code as written, the read in Thread 2 will not see the value 5 for x. Under the Java memory model, for one thread to be guaranteed to see an update made to a variable by another thread, there must be a happens-before relationship between the update and the read. This means that sometime after the write, there must be an action that performs what we call a "release", and sometime before the read, there must be an action that performs a corresponding "acquire".

The easiest way to get a release and a corresponding acquire is to use locking on the same variable. In this case, we would do something like this:


Thread 1:
synchronized (C.class) {
  globalRef = new C(5);
}

Thread 2:
synchronized (C.class) {
  C ref = globalRef;
  if (ref != null) {
    int r1 = ref.x;
  }
}


There is a release when the lock is released at the end of Thread 1, and an acquire when the lock is subsequently acquired at the beginning of Thread 2. This provides the guarantee that the updates that happen-before the release will be seen by the reads that happen-after the acquire.

For the really wonky -- what could happen? Well, we have


globalRef = new C(5);


which is really:


localRef = <allocate>;
localRef.x = 5;
globalRef = localRef;


There's no dependency between the second and third lines there, so we could easily have:


localRef = <allocate>;
globalRef = localRef;
localRef.x = 5;


And another thread could read globalRef and see the reference to the object before x is assigned the value 5.

Next post -- why this explanation is bad, bad, bad! Any guesses?

Tuesday, March 27, 2007

Volatile Fields and Synchronization

A question from a viewer. In my talk, I had a class that looked like this:

class Future {
 private volatile boolean ready;
 private Obj data;
 ...
 public synchronized void setOnce(Obj o) {
  if (ready) throw ...;
  data = o;
  ready = true;
 }
 public Object get() {
  if (!ready)
   return null;
  return data;
 }
}


The setOnce method is executed by one thread, and get is executed by another. The point of this example is to examine what difference the volatile modifier on ready makes. In this case, if ready is not marked volatile, the compiler can reorder the writes to data and ready. This has the result that the thread that invokes get can see the value true when it reads ready, even if data has not been set yet. Messy.

The questioner asked why setOnce is synchronized. This is somewhat orthogonal to the example, which is why I didn't mention it in the talk. However, I thought it was important enough to include it on the slide. If this method is not synchronized, and multiple threads call it, then those threads can interfere with each other. The take away message here is that volatile is not a magic wand that can take away your concurrency issues with no additional cost.

Why is it in there, if I specified a single writer when I talked about it? What happened here was that I wrote this, said, "Do people really want to write this code without the synchronized block?", answered "No," and put in the synchronized block.

Monday, March 26, 2007

JMM Talk at Google


This fellow is rather handsome, dashing and well-informed
. If you would like a primer on the Java memory model, this is a good place to start. I was running out of time toward the end, though, so the last 15 minutes may be a little rushed. Perhaps I'll do a two-part followup at some point.

One thing I forgot to say, at the very beginning: The Java memory model defines how threads interact through memory. That's why it's called the Java memory model. It has nothing to do with garbage collection or the object model (except where multithreading is important to those things).

If the mood strikes, please feel free to ask questions about the memory model, or anything I said in the talk. I put a note in the talk to ask questions at:

http://jeremymanson.blogspot.com/2007/03/jmm-questions-go-here.html.

Thursday, March 22, 2007

HashMap Behavior in JDK6

It is important to read the documentation. Lots of people don't seem to do this, and they get bitten by it.

The iterator for java.util.HashSet is documented as follows:

Returns an iterator over the elements in this set. The elements are returned in no particular order.

Of course, I've now seen code that ignores this, so I thought I would draw a few underlines for the kind of folks who miss it. I wish to state for the record that I did not write this code.

It turns out that HashSets (and HashMaps) don't just use the user-defined hashCode as the hash value for objects they are passed. They rehash the number that method returns with their own hash function, to defend against poor quality hash functions. This makes a lot of sense, as many people write bad hash functions.

Anyway, this means that when they change this function, objects will hash to different places. Then, when iterators visit these items, they will be visited in a different order.

I wouldn't be posting this unless they had changed it between JDK5 and JDK6, of course. For a quick illustration:

import java.util.HashSet;

public class Test {
  public static void main(String [] args) {
   HashSet<string> set = new HashSet<string>();
   set.add("axons");
   set.add("bandrils");
   set.add("chumblies");
   System.out.println(set);
  }
}

Run that in JDK5, and it will print "[chumblies, bandrils, axons]". Run it in JDK6, and it will print "[axons, bandrils, chumblies]".

If you are having trouble switching from JDK5 to JDK6, and this comes up in your code, smack your forehead with your palm, say, "duh!", and move on with your life.

Wednesday, March 21, 2007

Loop Invariant Code Motion and the Java Memory Model

I gave a talk on the Java memory model this afternoon, and there was a question from the audience which I didn't answer to my own satisfaction, so I am answering it again, for my vast readership of none.

At issue was the following bit of code:

class Animator implements Runnable {
 private volatile boolean stop = false;
 public void stop() { stop = true; }
 public void run() {
  while (!stop) {
   oneStep();
   try { Thread.sleep(100); } …;
  }
 }
 private void oneStep() { /*...*/ }
}

This is a pretty common idiom. One thread calls run(), and another thread (eventually) calls stop(), which sets the stop flag to true, and halts the loop. This is fairly intuitive.

What is not intuitive about this code is that if the variable is not declared volatile, then regardless of whether another thread calls stop(), this code is not guaranteed to terminate. A compiler (in this case, likely Java's JIT compiler) can:
  1. Look at the run() method,
  2. Notice that stop is never updated, and therefore
  3. Hoist the read of stop outside the loop.
The resulting code would look something like this:

public void run() {
 if (!stop) {
  while (true) {
   oneStep();
   try { Thread.sleep(100); } …;
  }
 }
}

... which would likely run forever. But only if stop is not declared volatile.

Now, the questioner asked if I knew of any compilers that were likely to do this. Pretty much every modern compiler does loop invariant code motion -- basically, if it decides that some program instruction will evaluate the same on every loop iteration (as the read of stop does), then it will arrange the code so that program instruction is only executed once, before the loop.

What I said was that I didn't know if any particular compiler did this. What I didn't mention was that this is because, in this case, the compiler has a little extra work to do. It has to determine that the actions in the onestep() method don't affect the value of stop, and it also has to make the decision to do loop invariant code motion for the loop guard (there is no reason not to do code motion for the loop guard).

You are actually pretty likely to run into this sort of thing in practice. Whether you will on this particular example is a question for debate.

I feel a little better having gotten that off my chest.

JMM questions go here

I just gave a talk in which I said that people could post questions on the Java memory model here. So feel free to post JMM questions below!

Tuesday, March 13, 2007

C++ Is Getting More Like Java Every Day

Lawrence Crowl gave an interesting talk at Google last week. It would seem that the next revision of the C++ standard is getting:
  1. A new memory model,
  2. Garbage collection,
  3. Concepts (which are more-or-less type checking for templates. This roughly boils down to "generics")
  4. Unicode support,
  5. A regular expression library,
  6. A for-each loop!
This is all starting to sound terribly familiar. I'm starting to like that language! Now all they have to do is get rid of multiple inheritance and their awful operator overloading semantics, and replace the STL, and I'll be sold.

The new standard will be C++0x. They haven't decided whether x is 8 or 9 yet. Knowing the standards process, I imagine it will be 9. I hope they keep x to single digits...

Anyway, in addition to all of this, it is getting move semantics for rvalues, lots of new syntax for initializers, and buckets of other things. Check out the talk for more.