Wednesday, March 21, 2007

Loop Invariant Code Motion and the Java Memory Model

I gave a talk on the Java memory model this afternoon, and there was a question from the audience which I didn't answer to my own satisfaction, so I am answering it again, for my vast readership of none.

At issue was the following bit of code:

class Animator implements Runnable {
 private volatile boolean stop = false;
 public void stop() { stop = true; }
 public void run() {
  while (!stop) {
   oneStep();
   try { Thread.sleep(100); } …;
  }
 }
 private void oneStep() { /*...*/ }
}

This is a pretty common idiom. One thread calls run(), and another thread (eventually) calls stop(), which sets the stop flag to true, and halts the loop. This is fairly intuitive.

What is not intuitive about this code is that if the variable is not declared volatile, then regardless of whether another thread calls stop(), this code is not guaranteed to terminate. A compiler (in this case, likely Java's JIT compiler) can:
  1. Look at the run() method,
  2. Notice that stop is never updated, and therefore
  3. Hoist the read of stop outside the loop.
The resulting code would look something like this:

public void run() {
 if (!stop) {
  while (true) {
   oneStep();
   try { Thread.sleep(100); } …;
  }
 }
}

... which would likely run forever. But only if stop is not declared volatile.

Now, the questioner asked if I knew of any compilers that were likely to do this. Pretty much every modern compiler does loop invariant code motion -- basically, if it decides that some program instruction will evaluate the same on every loop iteration (as the read of stop does), then it will arrange the code so that program instruction is only executed once, before the loop.

What I said was that I didn't know if any particular compiler did this. What I didn't mention was that this is because, in this case, the compiler has a little extra work to do. It has to determine that the actions in the onestep() method don't affect the value of stop, and it also has to make the decision to do loop invariant code motion for the loop guard (there is no reason not to do code motion for the loop guard).

You are actually pretty likely to run into this sort of thing in practice. Whether you will on this particular example is a question for debate.

I feel a little better having gotten that off my chest.

14 comments:

Felipe Coury said...

Jeremy,

One question - consider the following (changed) snippet:

public void run() {
while (true) {
if (stop) {
return;
}
oneStep();
try { Thread.sleep(100); } …;
}
}

I agree that using stop as volatile even in this situation wouldn't hurt but I'd like to ask anyway: would this work even if stop is not volatile or even this code could eventually get optimized by the compiler?

Jeremy Manson said...

This is the same code, more or less, so the exact same argument applies. The stop variable is loop-invariant, so it can be hoisted out of the loop.

The take away message here is that if you are going to communicate values between threads, you have to use some explicit form of communication, whether it be in the form of a synchronized block, a volatile field modifier, or something else entirely.

pveentjer said...

Hi Jeremy,

Next to the reordering problem the code is also subject to visibility problems (when a normal variable is used instead of a volatile one). The 'victim' thread doesn't need to see the most recently written value, and therefore could keep seeing the initial value which effectively changes the loop to an infinitive one.

ps: I add this comment to make this post more complete, I know you understand the JMM.

Jeremy Manson said...

Peter -- this is absolutely true. The real problem here from the point of view of Java semantics is the lack of a happens-before relationship, which I am sure is defined in another blog entry somewhere.

Anonymous said...

Hey Jeremy,

JMM ensures that code within a synchronized block will not be reordered but is it true for code within a try block also?

pveentjer said...

If instructions can't cause exceptions like a simple assignment, I don't see the problem of moving these instructions from above to inside and vice versa, as long it doesn't violate the within-thread as-if-serial semantics of the code.

Jeremy Manson said...

Mostly what Peter said. If you have:

class Foo {
Object o = new Object();
void Thread1() {
o = null;
}
void Thread2() {
if (o == null) return;
try {
int x = o.hashCode();
} catch (NullPointerException e) {
}
}
}

You still have to catch the NPE if Thread1 sets o to null after the if statement executes.

Anonymous said...

Thank you guys.

Anonymous said...

So what this means is

try {
int a = 0;
int b = 1;
} catch(Exception e) {
}

The compiler is free to reorder the above.

There are chances that one thread might see the above as
int a = 0; int b = 1;

whereas another might see it as
int b = 1; int a = 0;

pveentjer said...

That is correct.

ps: if the variables are local and not instance variables, other threads are not able to see the variables. So your example is not going to lead to reordering/visibility problems.

Anonymous said...

Thanks. Had this doubt lingering in my mind for sometime.

Anonymous said...

Well said.

Unknown said...

Perhaps you ve answered this in some other blog but I m trying to figure out what barriers are involved with volatile load/stores:

T0:
store(volatile_var, 1)
load(volatile_var)
.....
....
load(volatile_var)

T1:
store(volatile_var, 30)

Assuming the way the interleaving goes as follows: T1's last store happens before T0's last load, I would assume either the volatile variable is not stored in WB memory or if it is something invalidates it before doing a load else it will get 1 no? The JSR cookbook seems to imply we would use a LoadLoad Barrier before the last volatile load which IIRC is lfence on x86 and that is a noop for WB memory? So volatile is never placed in WB memory?

Jeremy Manson said...

@bank kus - LoadLoad barriers on x86 are a no-op because the memory consistency barriers make them unnecessary, not because that level of consistency is unavailable. Specifically, every two instructions on x86 behave as if there is an intervening LoadLoad barrier. The JSR-133 cookbook is right - basically, no explicit coherence operations are needed for volatile reads, and a StoreLoad is needed after volatile writes.