As explained in Java theory and practice: Synchronization optimizations in Mustang by Brian Goetz, lock coarsening is the process of merging adjacent synchronized blocks that lock on the same object. It is one of the optimization techniques available in the HotSpot VM and is on by default. It can be turned off with -XX:-EliminateLocks option.

To demonstrate this feature, I will use 2 simple classes Driver and FavoriteChars. The myFavorites() method in FavoriteChars invokes synchronized getVowel(int) method 3 times. We will see that when -XX:+EliminateLocks is enabled, instead of generating code to obtain and release locks 3 times for each invocation of getVowel(int), the HotSpot Server compiler (C2) merges the 3 invocations into a single synchronized block.

public class FavoriteChars {

	private final char[] VOWELS = new char[] { 'a', 'e', 'i', 'o', 'u' };

	public char[] myFavorites() {
		char first = getVowel(0);
		char second = getVowel(1);
		char third = getVowel(2);
		return new char[] { first, second, third };
	}

	public synchronized char getVowel(int index) {
		return VOWELS[index];
	}
}

Finally the driver class. Driver calls FavoriteChars.myFavorites() enough times that the method is compiled into native code and inlined by C2. To print out this native code, I will use a debug build of JVM.

public class Driver {
	public static void main(String[] args) {
		FavoriteChars demo = new FavoriteChars();
		for (int i = 0; i < 100000; i++) {
			System.err.println(demo.myFavorites());
		}
	}
}

The main method prints the char[] returned by FavoriteChars.myFavorites() to System.err for two reasons: (1) to ensure the method is not optimized away and (2) to redirect that array to /dev/null so that it doesn't interfere with -XX:+PrintOptoAssembly output, which is sent to System.out.

Lock coarsening disabled

First let's see the code with lock coarsening disabled. Here's my platform info:
vkandy@ksi:~/Optimizations$ uname -a
Linux ksi 2.6.31-15-generic #50-Ubuntu SMP Tue Nov 10 14:54:29 UTC 2009 i686 GNU/Linux
vkandy@ksi:~/Optimizations$ $DEBUG_JAVA_HOME/bin/java -server -Xinternalversion
Java HotSpot(TM) Server VM (16.0-b12-fastdebug) for linux-x86 JRE (1.6.0_18-ea-fastdebug-b05), built on Nov 18 2009 02:05:36 by "java_re" with gcc 3.2.1-7a (J2SE release)

vkandy@ksi:~/Optimizations$ $DEBUG_JAVA_HOME/bin/javac -d bin src/*.java
vkandy@ksi:~/Optimizations$ $DEBUG_JAVA_HOME/bin/java -server -XX:-EliminateLocks -XX:CompileCommand=print,*FavoriteChars.myFavorites -cp bin Driver >-el.log 2>/dev/null

I am only interested in the code for FavoriteChars.myFavorites() so, this command will redirect JIT'd myFavorites() method to -el.log. Following is the fast path code of the method. This is the code executed by the biased thread. See -el.log:

000   N660: #	B1 <- BLOCK HEAD IS JUNK   Freq: 1
000   	CMP    EAX,[ECX+4]	# Inline cache check
	JNE    SharedRuntime::handle_ic_miss_stub
	NOP
	NOP
	NOP

000
00c   B1: #	B24 B2 <- BLOCK HEAD IS JUNK   Freq: 1
00c   	# stack bang
	PUSHL  EBP
	SUB    ESP,40	# Create frame
01a   	MOV    EBX,ECX
01c   	MOV    EAX,[ECX]	# int
01e   	MOV    EBP,EAX
020   	AND    EBP,#7
023   	MOV    ECX, Thread::current()
02f   	CMP    EBP,#5
032   	Jne    B24  P=0.000001 C=-1.000000
032
038   B2: #	B27 B3 <- B1  Freq: 0.999999
038   	MOV    EDI,precise klass FavoriteChars: 0x098f1870:Constant:exact *
03d   	MOV    EBP,[EDI + #104]	# int
040   	MOV    EDX,EBP
042   	OR     EDX,ECX
044   	MOV    ESI,EDX
046   	XOR    ESI,EAX
048   	TEST   ESI,#-121
04e   	Jne    B27  P=0.000001 C=-1.000000
04e
054   B3: #	B60 B4 <- B25 B24 B2 B31  Freq: 1
054   	MEMBAR-acquire (prior CMPXCHG in FastLock so empty encoding)
054   	MOV    EAX,[EBX + #8] ! Field FavoriteChars.VOWELS
057   	MOV    EBP,[EAX + #8]
05a   	NullCheck EAX
05a
05a   B4: #	B26 B5 <- B3  Freq: 0.999999
05a   	TESTu  EBP,EBP
05c   	Jbe,u  B26  P=0.000001 C=-1.000000
05c
062   B5: #	B41 B6 <- B4  Freq: 0.999998
062   	MOVZX  EDI,[EAX + #12]	# ushort/char -> int
066   	MEMBAR-release ! (empty encoding)
066   	MOV    EBP,#7
06b   	AND    EBP,[EBX]
06d   	CMP    EBP,#5
070   	Jne    B41  P=0.000001 C=-1.000000
070
076   B6: #	B33 B7 <- B42 B41 B5  Freq: 0.999998
076   	MOV    EAX,[EBX]	# int
078   	MOV    EDX,EAX
07a   	AND    EDX,#7
07d   	CMP    EDX,#5
080   	Jne    B33  P=0.000001 C=-1.000000
080
086   B7: #	B35 B8 <- B6  Freq: 0.999997
086   	MOV    EBP,precise klass FavoriteChars: 0x098f1870:Constant:exact *
08b   	MOV    EBP,[EBP + #104]	# int
08e   	MOV    EDX,EBP
090   	OR     EDX,ECX
092   	MOV    ESI,EDX
094   	XOR    ESI,EAX
096   	TEST   ESI,#-121
09c   	Jne    B35  P=0.000001 C=-1.000000
09c
0a2   B8: #	B61 B9 <- B40 B33 B7 B38  Freq: 0.999998
0a2   	MEMBAR-acquire (prior CMPXCHG in FastLock so empty encoding)
0a2   	MOV    EBP,[EBX + #8] ! Field FavoriteChars.VOWELS
0a5   	MOV    EDX,[EBP + #8]
0a8   	NullCheck EBP
0a8
0a8   B9: #	B43 B10 <- B8  Freq: 0.999997
0a8   	CMPu   EDX,#1
0ab   	Jbe,u  B43  P=0.000001 C=-1.000000
0ab
0b1   B10: #	B47 B11 <- B9  Freq: 0.999996
0b1   	MOVZX  EBP,[EBP + #14]	# ushort/char -> int
0b5   	MEMBAR-release ! (empty encoding)
0b5   	MOV    EDX,#7
0ba   	AND    EDX,[EBX]
0bc   	CMP    EDX,#5
0bf   	Jne    B47  P=0.000001 C=-1.000000
0bf
0c5   B11: #	B12 <- B10  Freq: 0.999995
0c5   	MOV    [ESP + #8],ECX
0c9   	MOV    [ESP + #12],EDI
0cd   	MOV    [ESP + #16],EBP
0cd
0d1   B12: #	B45 B13 <- B58 B48 B11  Freq: 0.999996
0d1   	MOV    EAX,[EBX]	# int
0d3   	MOV    EDX,EBX
0d5   	MOV    ECX,EAX
0d7   	AND    ECX,#7
0da   	CMP    ECX,#5
0dd   	Jne    B45  P=0.000001 C=-1.000000
0dd
0e3   B13: #	B50 B14 <- B12  Freq: 0.999995
0e3   	MOV    EBX,precise klass FavoriteChars: 0x098f1870:Constant:exact *
0e8   	MOV    EDI,[EBX + #104]	# int
0eb   	MOV    ECX,EDI
0ed   	MOV    EBX,[ESP + #8]
0f1   	OR     ECX,EBX
0f3   	MOV    EBX,ECX
0f5   	XOR    EBX,EAX
0f7   	TEST   EBX,#-121
0fd   	Jne    B50  P=0.000001 C=-1.000000
0fd
103   B14: #	B15 <- B13  Freq: 0.999994
103   	MOV    EBX,EDX
105   	MOV    EDI,[ESP + #8]
105
109   B15: #	B62 B16 <- B55 B46 B14 B53  Freq: 0.999996
109   	MEMBAR-acquire (prior CMPXCHG in FastLock so empty encoding)
109   	MOV    EBP,[EBX + #8] ! Field FavoriteChars.VOWELS
10c   	MOV    EAX,[EBP + #8]
10f   	NullCheck EBP
10f
10f   B16: #	B49 B17 <- B15  Freq: 0.999995
10f   	CMPu   EAX,#2
112   	Jbe,u  B49  P=0.000001 C=-1.000000
112
118   B17: #	B56 B18 <- B16  Freq: 0.999994
118   	MOVZX  EBP,[EBP + #16]	# ushort/char -> int
11c   	MEMBAR-release ! (empty encoding)
11c   	MOV    ECX,#7
121   	AND    ECX,[EBX]
123   	CMP    ECX,#5
126   	Jne    B56  P=0.000001 C=-1.000000
126
12c   B18: #	B21 B19 <- B57 B56 B17  Freq: 0.999994
12c   	MOV    EAX,[EDI + #68]
12f   	LEA    EBX,[EAX + #24]
132   	CMPu   EBX,[EDI + #76]
135   	Jnb,us B21  P=0.000100 C=-1.000000
135
137   B19: #	B20 <- B18  Freq: 0.999894
137   	MOV    [EDI + #68],EBX
13a   	PREFETCHNTA [EBX + #256]	! Prefetch into non-temporal cache for write
141   	MOV    [EAX],0x00000001
147   	PREFETCHNTA [EBX + #288]	! Prefetch into non-temporal cache for write
14e   	MOV    [EAX + #4],precise klass [C: 0x096dfa30:Constant:exact *
155   	PREFETCHNTA [EBX + #320]	! Prefetch into non-temporal cache for write
15c   	MOV    [EAX + #8],#3
163   	MOV    [EAX + #12],#0
16a   	XOR    ECX.lo,ECX.lo
	XOR    ECX.hi,ECX.hi
16e   	MOV    [EAX + #16],ECX.lo
	MOV    [EAX + #16]+4,ECX.hi
16e
174   B20: #	N660 <- B22 B19  Freq: 0.999994
174   	MOV    ECX,[ESP + #12]
178   	MOV16  [EAX + #12],ECX
17c
17c   	#checkcastPP of EAX
17c   	MOV    EBX,[ESP + #16]
180   	MOV16  [EAX + #14],EBX
184   	MOV16  [EAX + #16],EBP
188   	ADD    ESP,40	# Destroy frame
	POPL   EBP
	TEST   PollPage,EAX	! Poll Safepoint

192   	RET

Observations

Firstly, you can see that the 3 invocations of getVowel(int), the critical sections for which lock is needed, are inlined at labels B3, B8 and B15. See instructions between hilighted lines 76-88, 111-123, 157-169. There are no calls to getVowel(int) method, instead we see 3 MOVs which get the job done (not considering loading the array): MOVZX EDI,[EAX + #12], MOVZX EBP,[EBP + #14] and MOVZX EBP,[EBP + #16]. Note that when you print the bytecode of FavoriteChars.class, javap may show 3 invokevirtual 3 <getVowel> <(I)C> statements but at runtime, getVowel(int) is compiled to native code and inlined in FavoriteChars.myFavorites() method.

Secondly, note the conditional jump instructions (JNE) at the end of B1 and B2, just above label B3 (the first critical section). Similarly, there are conditional jump instructions just above B8 and B15, the other 2 critical sections. The instructions in labels B1 and B2 are biased locking code which updates the object's header with biased thread's information. Threads other than the bias holding thread are made to jump to slow path:

00c   B1: #	B24 B2 <- BLOCK HEAD IS JUNK   Freq: 1
00c   	# stack bang
	PUSHL  EBP
	SUB    ESP,40	# Create frame
01a   	MOV    EBX,ECX
01c   	MOV    EAX,[ECX]	# int
01e   	MOV    EBP,EAX
020   	AND    EBP,#7
023   	MOV    ECX, Thread::current()
02f   	CMP    EBP,#5
032   	Jne    B24  P=0.000001 C=-1.000000
032
038   B2: #	B27 B3 <- B1  Freq: 0.999999
038   	MOV    EDI,precise klass FavoriteChars: 0x08472d08:Constant:exact *
03d   	MOV    EBP,[EDI + #104]	# int
040   	MOV    EDX,EBP
042   	OR     EDX,ECX
044   	MOV    ESI,EDX
046   	XOR    ESI,EAX
048   	TEST   ESI,#-121
04e   	Jne    B27  P=0.000001 C=-1.000000
1a6   B23: #	B24 <- B27  Freq: 9.99999e-13
1a6   	CMPXCHG [EBX],EBP	# If EAX==[EBX] Then store EBP into [EBX]
235   B27: #	B23 B28 <- B2  Freq: 9.99999e-07
235   	TEST   ESI,#7
23b   	Jne    B23  P=0.000001 C=-1.000000

The biased thread, acquires the lock and goes on to execute critical section in B3 whereas the other threads will have to execute CMPXCHG, a CAS operation in slow path (and probably more) to acquire the lock prior to entering the critical section in B3.

At this point we know that the current thread holds the lock for this object so we can move on to executing the critical section in B3. All that the getVowel(int) method does is load the array and read an element at a given index, so that's what is wrapped between MEMBAR-acquire and MEMBAR-release statements. No instructions are generated for MEMBAR-acquire and MEMBAR-release (difference in instruction address shows size is zero) because at this point, the thread that's executing this code is either the biased thread or the thread that won a lock:

054   B3: #	B60 B4 <- B25 B24 B2 B31  Freq: 1
054   	MEMBAR-acquire (prior CMPXCHG in FastLock so empty encoding)
054   	MOV    EAX,[EBX + #8] ! Field FavoriteChars.VOWELS
057   	MOV    EBP,[EAX + #8]
05a   	NullCheck EAX
05a
05a   B4: #	B26 B5 <- B3  Freq: 0.999999
05a   	TESTu  EBP,EBP
05c   	Jbe,u  B26  P=0.000001 C=-1.000000
05c
062   B5: #	B41 B6 <- B4  Freq: 0.999998
062   	MOVZX  EDI,[EAX + #12]	# ushort/char -> int
066   	MEMBAR-release ! (empty encoding)
066   	MOV    EBP,#7
06b   	AND    EBP,[EBX]
06d   	CMP    EBP,#5
070   	Jne    B41  P=0.000001 C=-1.000000

This is repeated 2 more times for the remaining 2 invocations of getVowel(int) and finally an array is constructed and returned. In summary, threads other than the biased thread acquire and release the same lock 3 times, for each critical section.

Lock coarsening enabled

Now, let's enable lock coarsening and run the program again.
vkandy@ksi:~/Optimizations$ $DEBUG_JAVA_HOME/bin/java -server -XX:+EliminateLocks -XX:CompileCommand=print,*FavoriteChars.myFavorites -cp bin Driver >+el.log 2>/dev/null

Following is the output of the fast path. The first 2 labels B1 and B2 (biased locking code) is similar to what we saw before. But notice the instructions between labels B3 - B5 in +el.log:

00c   B1: #	B12 B2 <- BLOCK HEAD IS JUNK   Freq: 1
00c   	# stack bang
	PUSHL  EBP
	SUB    ESP,24	# Create frame
01a   	MOV    EBX,ECX
01c   	MOV    EAX,[ECX]	# int
01e   	MOV    ECX,EAX
020   	AND    ECX,#7
023   	MOV    EDX, Thread::current()
02f   	CMP    ECX,#5
032   	Jne    B12  P=0.000001 C=-1.000000
032
038   B2: #	B15 B3 <- B1  Freq: 0.999999
038   	MOV    ECX,precise klass FavoriteChars: 0x08e05b88:Constant:exact *
03d   	MOV    EBP,[ECX + #104]	# int
040   	MOV    ECX,EBP
042   	OR     ECX,EDX
044   	MOV    EDI,ECX
046   	XOR    EDI,EAX
048   	TEST   EDI,#-121
04e   	Jne    B15  P=0.000001 C=-1.000000
04e
054   B3: #	B22 B4 <- B13 B12 B2 B19  Freq: 1
054   	MEMBAR-acquire (prior CMPXCHG in FastLock so empty encoding)
054   	MOV    EAX,[EBX + #8] ! Field FavoriteChars.VOWELS
057   	MOV    EBP,[EAX + #8]
05a   	NullCheck EAX
05a
05a   B4: #	B14 B5 <- B3  Freq: 0.999999
05a   	CMPu   EBP,#2
05d   	Jbe,u  B14  P=0.000001 C=-1.000000
05d
063   B5: #	B20 B6 <- B4  Freq: 0.999998
063   	MOVZX  EDI,[EAX + #16]	# ushort/char -> int
067   	MOVZX  ESI,[EAX + #14]	# ushort/char -> int
06b   	MOVZX  EBP,[EAX + #12]	# ushort/char -> int
06f   	MEMBAR-release ! (empty encoding)
06f   	MOV    EAX,#7
074   	AND    EAX,[EBX]
076   	CMP    EAX,#5
079   	Jne    B20  P=0.000001 C=-1.000000

The critical section - the 3 reads from the array FavoriteChars.VOWELS - are grouped together between MEMBAR-acquire and MEMBAR-release. Again there are no instructions for MEMBAR-acquire and MEMBAR-release because the fast path code is for the biased thread. However, threads other than the bias holding thread have to acquire lock just once to read the 3 chars from FavoriteChars.VOWELS: see the 3 MOVs in label B5. Meaning threads other than the bias holding thread execute expensive lock acquisition code just once to enter and execute all 3 critical sections. In other words, C2, when asked to -XX:+EliminateLocks, merged 3 synchronized blocks which lock on the same object, into 1 (relatively) larger block, thereby reducing locking overhead.

If you have any comments, suggestions or corrections please feel free to let me know.

Code

Source code and logs