One of the code optimization techniques performed by the HotSpot compiler (C2) is Loop Invariant Code Motion. Loop invariants are values that do not change during execution of a loop, and so can be moved out of the loop, speeding up loop execution. For e.g., in the following loop, the expression (x + y) doesn't change during execution of the loop. Hence, its a loop invariant:

int x,y;

for (i = 0; i < SIZE; i++) {
    a[i] = x + y;
}
Compilers can safely move the expression (x + y) out of the loop or in other words, hoist the expression above the loop as shown here:
int x,y;

int temp = x + y;
for (i = 0; i < SIZE; i++) {
    a[i] = temp;
}
Now, consider the following example.
public class BusyTask implements Runnable {
	private boolean stop;

	public void shutdown() {
		this.stop = true;
	}

	public void run() {
		while (!stop) {
		}
		System.out.println("Busy task stopped");
	}
}
public class ShutdownDriver {

	public static void main(String[] s) {
		BusyTask task = new BusyTask();
		new Thread(task).start();

		try {
			Thread.sleep(2000);
		} catch (InterruptedException e) {
			e.printStackTrace();
		}

		task.shutdown();
		System.out.println("BusyTask should stop now.");
	}
}

Let's compile and run. I'll redirect the output to a file called before.log.

vkandy@www:~/workspace/Hoisting$ $DEBUG_JAVA_HOME/bin/javac -d bin src/*.java
vkandy@www:~/workspace/Hoisting$ $DEBUG_JAVA_HOME/bin/java -server -XX:CompileCommand=print,*BusyTask.run -cp bin ShutdownDriver > before.log 2>/dev/null

As Jeremy Manson explains, this program is not guaranteed to terminate. An examination of code after compiler is done with optimization, shows why. The following is the JIT'd BusyTask.run() method:

000   N79: #	B1 <- BLOCK HEAD IS JUNK   Freq: 1
000   	INT3
      	NOP 	# 3 bytes pad for loops and calls

008   B1: #	B7 B2 <- BLOCK HEAD IS JUNK   Freq: 1
008   	# stack bang
	PUSHL  EBP
	SUB    ESP,24	# Create frame
016   	MOV    EBP,[ECX]
018   	MOV    [ESP + #0],ECX
01b   	CALL_LEAF,runtime  OSR_migration_end
        No JVM State Info
        #
020   	MOV    ECX,[EBP + #4]
023   	NullCheck EBP
023
023   B2: #	B5 B3 <- B1  Freq: 0.999999
023   	CMPu   ECX,precise klass BusyTask: 0x0a0116c0:Constant:exact *
029   	Jne,us B5  P=0.000001 C=-1.000000
029
02b   B3: #	B6 B4 <- B2  Freq: 0.999998
02b   	#checkcastPP of EBP
02b   	MOVZX8 ECX,[EBP + #8]	# ubyte -> int ! Field BusyTask.stop
02f   	TEST   ECX,ECX
031   	Jne,s  B6  P=0.000000 C=416502.000000
031
033   B4: #	B4 <- B3 B4  top-of-loop Freq: 1e-35
033   	TSTL   #polladdr,EAX	! Safepoint: poll for GC        # BusyTask::run @ bci:7  L[0]=EBP
        # OopMap{ebp=Oop off=51}
039   	JMP,s  B4
039
03b   B5: #	N79 <- B2  Freq: 9.99999e-07
03b   	MOV    ECX,#-75
040   	NOP 	# 3 bytes pad for loops and calls
043   	CALL,static  wrapper for: uncommon_trap(reason='unreached' action='reinterpret')
        # BusyTask::run @ bci:0  L[0]=EBP
        # OopMap{ebp=Oop off=72}
048   	INT3   ; ShouldNotReachHere
048
04d   B6: #	N79 <- B3  Freq: 4.99999e-07
04d   	MOV    ECX,#22
052   	NOP 	# 1 bytes pad for loops and calls
053   	CALL,static  wrapper for: uncommon_trap(reason='unloaded' action='reinterpret' index='22')
        # BusyTask::run @ bci:10  L[0]=_
        # OopMap{off=88}
058   	INT3   ; ShouldNotReachHere

I've hilighted the while loop code. The loop begins at label B4 and after 3 lines we jump back to label B4 - it's actually an infinite loop! Apart from safepoint polling, there's nothing else in the loop. What happened to reading the value of stop as we wrote in BusyTask.run() method?

The answer is at label B3 (above B4). We read the value of stop and check if the value is true or false by performing an AND operation. If the value is false, we jump to label B6 and call System.out.println. We revert back to interpreter, if the call to static method (System.out.println) doesn't succeed. That's what the uncommon trap refers to.

In Java, the code at labels B3 and B4 would be something like this:

        int temp = stop;                          // Label B3: load 'stop' field (stop == false)
        if(temp1 & temp1 == 1) {                  // is it true or false? (temp == false)
            while (true) {                        // Label B4:
                                                  // Safepoint polling added
            }
        }
        System.out.println("Busy task stopped!"); // Label B6: revert back to interpreter for execution

The visibility problem

Why did the compiler reorder code this way? Because, this reordered code behaves the same way as the original version in a single-threaded environment and C2 doesn't know that we intend to change the value of stop by means of another thread. In a single-threaded program, the value of stop isn't going to change - it's a loop invariant! Hence, it can be safely moved out of the loop. So the compiler optimizes the loop by removing the need to check the value of stop each time. But by reordering code, C2 has altered the semantics of our program. The resulting program never terminates.

Formally, if an action in one thread is visible to another thread, then the result of that action can be observed by the second thread. In order to guarantee that the results of one action are observable to a second action, the first must happen before the second. Because there is no happens-before relationship between the two threads, BusyTask never sees the update to stop performed by ShutdownDriver. The compiler detects that no writes are performed to stop in BusyTask and hoists the reading of stop out of the loop, transforming it into an infinite loop.

The Fix

The keyword volatile ensures happens-before relationship between the two threads. All actions on volatiles happen in a single order, and each write to a volatile field happens before any read of that volatile that occurs later in that order. Since JSR 133, the Java Memory Model ensures this guarantee for volatile. It also places restrictions on the scope of reorderings in the presence of synchronization, and this is how we guarantee that threads can have a consistent view of variables shared across more than one thread.

So the fix is to declare stop as volatile. The other way is to synchronize all reads and writes on stop (using a common lock).

public class BusyTask implements Runnable {
	private volatile boolean stop;

	public void shutdown() {
		this.stop = true;
	}

	public void run() {
		while (!stop) {
		}
		System.out.println("Busy task stopped");
	}
}

Let's compile and run. I'll redirect the output to a file called after.log.

vkandy@www:~/workspace/Hoisting$ $DEBUG_JAVA_HOME/bin/javac -d bin src/*.java
vkandy@www:~/workspace/Hoisting$ $DEBUG_JAVA_HOME/bin/java -server -XX:CompileCommand=print,*BusyTask.run -cp bin ShutdownDriver > after.log 2>/dev/null

Following is BusyTask.run() code after making stop as volatile:

000   N91: #	B1 <- BLOCK HEAD IS JUNK   Freq: 1
000   	INT3
      	NOP 	# 3 bytes pad for loops and calls

008   B1: #	B7 B2 <- BLOCK HEAD IS JUNK   Freq: 1
008   	# stack bang
	PUSHL  EBP
	SUB    ESP,24	# Create frame
016   	MOV    EBX,[ECX]
018   	MOV    [ESP + #0],ECX
01b   	CALL_LEAF,runtime  OSR_migration_end
        No JVM State Info
        #
020   	MOV    ECX,[EBX + #4]
023   	NullCheck EBX
023
023   B2: #	B6 B3 <- B1  Freq: 0.999999
023   	CMPu   ECX,precise klass BusyTask: 0x082ea6c0:Constant:exact *
029   	Jne,us B6  P=0.000001 C=-1.000000
029
02b   B3: #	B5 B4 <- B2  Freq: 0.999998
02b   	#checkcastPP of EBX
02b   	LEA    ECX,[EBX + #8]
02e   	MOVZX8 EAX,[ECX]	# ubyte -> int ! Field  VolatileBusyTask.stop
031   	MEMBAR-acquire ! (empty encoding)
031   	TEST   EAX,EAX
033   	Jne,s  B5  P=0.000000 C=384208.000000
      	NOP 	# 11 bytes pad for loops and calls

040   B4: #	B4 B5 <- B3 B4 	Loop: B4-B4 inner  Freq: 999998
040   	TSTL   #polladdr,EAX	! Safepoint: poll for GC        # BusyTask::run @ bci:7  L[0]=EBX
        # OopMap{ecx=Derived_oop_ebx ebx=Oop off=64}
046   	MOVZX8 EDX,[ECX]	# ubyte -> int ! Field  VolatileBusyTask.stop
049   	MEMBAR-acquire ! (empty encoding)
049   	TEST   EDX,EDX
04b   	Je,s  B4  P=1.000000 C=384208.000000
04b
04d   B5: #	N91 <- B4 B3  Freq: 0.999998
04d   	MOV    ECX,#22
052   	NOP 	# 1 bytes pad for loops and calls
053   	CALL,static  wrapper for: uncommon_trap(reason='unloaded' action='reinterpret' index='22')
        # BusyTask::run @ bci:10  L[0]=_
        # OopMap{off=88}
058   	INT3   ; ShouldNotReachHere

Observations

I've hilighted the while loop which begins at label B4. Observe that the field stop is now read (MOVZX8 EDX,[ECX]) and checked (TEST EDX,EDX) in the loop. If the value is false we jump back to label B4. Here's a slightly cleaned up version:

040   B4:                            ; Loop begins here
040   	TSTL   #polladdr,EAX         ; Safepoint polling
046   	MOVZX8 EDX,[ECX]             ; Load value of stop to a temp variable
049   	TEST   EDX,EDX               ; check the value of this temp variable
04b   	Je,s  B4                     ; If value == true, continue looping

By making stop as volatile, the compiler enforces happens-before relationship between the two threads so the writes performed on stop by ShutdownDriver are observed in Busytask.

One other thing. Take a look at the code starting at label B3 (after a slight cleanup):

02b   B3:                            ;
02b   	LEA    ECX,[EBX + #8]        ; Read value of stop
02e   	MOVZX8 EAX,[ECX]             ; Load value of stop to a temp variable
031   	TEST   EAX,EAX               ; check the value of temp variable
033   	Jne,s  B5                    ; if value == false, jump to B5
      	NOP

040   B4:                            ; loop begins here
040   	TSTL   #polladdr,EAX         ; Safepoint polling added
046   	MOVZX8 EDX,[ECX]             ; Load stop to a temp variable
049   	TEST   EDX,EDX               ; if value == true continue looping
04b   	Je,s  B4

04d   B5:
04d   	MOV    ECX,#22                    ; At this label, execution goes back
052   	NOP                               ; to interpreter.
053   	CALL,static                       ; call to out.println()
        # BusyTask::run @ bci:10  L[0]=_  ; Values needed to restore interpreter frame
058   	INT3                              ; Can't come back here from interpreter

Notice that there is still a test condition before the loop is executed. Also, in the loop (at label B4), the TEST instruction is executed just before the jump back to B4. In Java, this is how this logic would look:

        int temp1 = stop;                         // Label B3: load 'stop' field (stop == false)
        if(temp1 & temp1 == 1) {                  // is it true or false? (temp == false)
            do {
                                                  // Safepoint polling added
                int temp2 = stop;                 // Label B4:
            } while (temp2 & temp2 == 0)          // Compare and loop again ...
        }
        System.out.println("Busy task stopped!"); // Label B5: revert back to interpreter for execution

The compiler transforms the while loop into an if block containing a do...while loop. This optimization technique is called loop inversion.

If you have any comments, suggestions or corrections please feel free to let me know.

Download Code and Logs

Source code and logs