Java synchronized
synchronized --- Intrinsic Lock (Monitor)
Every Java object has an intrinsic lock. synchronized acquires it:
java
// Method-level
synchronized void increment() { count++; }
// Block-level (preferred --- finer granularity)
void increment() {
synchronized (this) { count++; }
}
// Static --- locks the Class object
static synchronized void staticMethod() { /* locks MyClass.class */ }
Bytecode: monitorenter / monitorexit
The compiler translates synchronized into monitorenter and monitorexit bytecodes:
java
// Java source
synchronized (obj) {
doWork();
}
// Compiled bytecode (javap -c)
0: aload_1 // push obj reference
1: dup
2: astore_2 // save for monitorexit
3: monitorenter // ← acquire lock
4: invokevirtual #2 // doWork()
7: aload_2
8: monitorexit // ← release lock (normal exit)
9: goto 17
12: astore_3 // exception handler starts here
13: aload_2
14: monitorexit // ← release lock (exception exit)
15: aload_3
16: athrow // rethrow exception
17: return
Two monitorexit instructions --- normal exit (line 8) and exception handler (line 14). That's how synchronized guarantees the lock is always released even on exception.
For synchronized methods, no monitorenter/monitorexit in bytecode. Instead, method flags include ACC_SYNCHRONIZED and the JVM handles it at method invocation:
java
synchronized void doWork() { /* ... */ }
// Bytecode: no monitorenter/monitorexit
// Method flags: ACC_PUBLIC, ACC_SYNCHRONIZED
// JVM auto-acquires monitor on entry, releases on exit
Object Header: Mark Word (64-bit)
Every Java object has a header. The first 64 bits are the "mark word" --- this is where lock state lives:
Unlocked: [unused:25] [hashcode:31] [unused:1] [age:4] [biased:0] [lock:01]
Biased: [thread:54] [epoch:2] [age:4] [biased:1] [lock:01]
Lightweight: [ptr to lock record:62 ] [lock:00]
Heavyweight: [ptr to ObjectMonitor:62 ] [lock:10]
GC marked: [ ] [lock:11]
Last 2 bits (lock) tell the JVM which state:
01= unlocked or biased (bit 3 distinguishes: 0 = unlocked, 1 = biased)00= lightweight locked10= heavyweight locked (inflated to ObjectMonitor)11= GC mark (used during garbage collection)
Note: calling hashCode() on a biased object forces bias revocation --- the hashcode needs the bits that the thread ID occupies.
Lock Escalation (Biased → Lightweight → Heavyweight)
Biased Lock → Lightweight Lock (CAS) → Heavyweight Lock (OS mutex)
(no contention) (brief contention) (sustained contention)
Escalation is one-way --- once inflated to heavyweight, it stays heavyweight for that object (until GC).
1. Biased Lock (no contention, ~1 ns)
Most locks are only ever acquired by one thread. The first thread "biases" the object by writing its thread ID into the mark word. Subsequent lock/unlock by the same thread is nearly free (just check thread ID, no CAS):
First encounter (anonymous biased object):
mark word = [no thread ID | biased:1 | 01]
→ simple store: write Thread-0's ID into mark word (no CAS, no atomic op)
Thread-0 locks again:
→ check: mark word thread ID == my ID? → YES → enter (~1 ns, just a compare)
Thread-0 unlocks:
→ do nothing (mark word stays biased to Thread-0)
The entire point of biased locking is zero atomic operations. CAS only appears when revoking bias (escalating to lightweight).
Bias revocation happens at a JVM safepoint (all threads paused) --- see "Safepoint and Bias Revocation" below.
Note: biased locking was deprecated in Java 15 (-XX:-UseBiasedLocking) and removed in Java 18 --- modern CPUs made CAS cheap enough that the safepoint cost wasn't worth it.
Why Unlock Does Nothing (Biased Lock)
The JVM doesn't clear the thread ID on unlock. The object stays "biased to Thread-0" permanently:
Before lock: mark word = [Thread-0 ID | biased:1 | 01]
During lock: mark word = [Thread-0 ID | biased:1 | 01] (same!)
After unlock: mark word = [Thread-0 ID | biased:1 | 01] (still same!)
No visible difference between locked and unlocked --- the JVM doesn't track whether the biased lock is currently held. Compare with other lock levels:
Lightweight unlock: CAS restores original mark word ← actual work
Heavyweight unlock: ObjectMonitor._owner = null ← actual work
Biased unlock: nothing ← zero work
That's why biased locking is so fast --- unlock is literally zero work. Next lock is just "is my thread ID still there?" → yes → enter.
Bias Is Permanent Until Explicitly Revoked
The bias stays in the mark word indefinitely --- NOT removed by GC safepoints or time:
t=0: Thread-0 locks obj → bias set to Thread-0
t=1: Thread-0 unlocks → mark word unchanged
t=5: GC safepoint → all threads pause → GC runs → resume
→ obj's bias? UNTOUCHED (GC doesn't revoke bias)
t=10: Thread-0 locks again → check thread ID → match → enter (~1 ns)
t=20: Thread-1 tries to lock obj → TRIGGERS revocation safepoint
→ bias revoked → escalated to lightweight → bias gone forever
Bias is revoked ONLY when:
- Another thread tries to lock this object
hashCode()is called (needs the bits thread ID occupies)wait()/notify()is called (needs ObjectMonitor)- Bulk revocation threshold hit (see below)
Bulk Revocation
If too many objects of the same class get their bias revoked, the JVM disables biased locking for that entire class:
obj1 of class Foo: bias revoked (Thread-1 contended)
obj2 of class Foo: bias revoked (Thread-2 contended)
... ~20 revocations ...
JVM: "class Foo has too much contention"
→ disable bias for ALL Foo objects
→ new Foo() starts as unlocked (no bias), skips straight to lightweight
This prevents repeated expensive safepoints for classes that are inherently contended.
Safepoint and Bias Revocation
When Thread-1 tries to acquire a lock biased to Thread-0, the JVM must revoke the bias. This requires a safepoint --- a stop-the-world pause where all Java threads are suspended:
Normal execution:
Thread-0: [running]──[running]──[running]
Thread-1: [running]──[running]──[running]
Safepoint triggered (bias revocation):
Thread-0: [running]──[poll]──|STOPPED|──────|resume|──[running]
Thread-1: [running]──[poll]──|STOPPED|──────|resume|──[running]
↑ ↑
all stopped JVM revokes bias
all resume
The JVM inserts "safepoint polls" at method returns, loop back-edges, etc. When a safepoint is requested, each thread hits its next poll and pauses.
Revocation steps:
Thread-1 wants to lock obj (biased to Thread-0):
1. JVM requests safepoint → ALL threads pause
2. JVM inspects Thread-0's stack:
→ Thread-0 inside synchronized(obj)? → escalate to lightweight (set up lock record)
→ Thread-0 NOT inside? → clear bias (set mark word to unlocked)
3. ALL threads resume
4. Thread-1 CAS for lightweight lock
Other things that trigger safepoints: GC, thread dump (jstack), code deoptimization, class redefinition.
Why Biased Locking Was Removed (Java 18)
Biased lock tradeoff:
No contention ever: save ~10 ns per lock/unlock (skip CAS)
Contention happens once: pay ~1-10 ms (safepoint for revocation)
Break-even: ~100,000 uncontended lock/unlock operations per revocation
A CAS (~10-20 ns) is way cheaper than a safepoint (~1-10 ms). The optimization only pays off if contention literally never happens. Modern CPUs made CAS fast enough that the complexity and safepoint cost weren't worth it.
2. Lightweight Lock (brief contention, ~10-20 ns)
When bias is revoked, the lock escalates to lightweight. Each thread creates a "lock record" on its stack and tries to CAS the mark word:
Thread-1 stack: Object header:
┌──────────────┐ ┌──────────────────────┐
│ Lock Record │ ←────────────────│ [ptr to lock record] │
│ (saved mark) │ │ [00] lightweight tag │
└──────────────┘ └──────────────────────┘
The lock record saves the original mark word (displaced mark) so it can be restored on unlock. Lock and unlock are each a single CAS:
Lock: CAS(mark word, unlocked value, ptr to my lock record)
Unlock: CAS(mark word, ptr to my lock record, saved mark)
If the unlock CAS fails, it means the lock was inflated to heavyweight while held --- fall through to heavyweight unlock.
3. Heavyweight Lock (sustained contention, ~1-10 μs)
When lightweight CAS keeps failing (or adaptive spinning is exhausted), the JVM inflates to an ObjectMonitor --- a C++ structure backed by OS mutex:
ObjectMonitor:
┌──────────────────────────────────────────────────────┐
│ _owner: Thread-0 (who holds the lock) │
│ _recursions: 1 (reentrant depth) │
│ _count: 3 (number of waiters) │
│ │
│ _cxq: [T-3] → [T-2] (contention queue, │
│ LIFO, CAS push) │
│ _EntryList: [T-1] (entry list, moved │
│ from _cxq on unlock) │
│ _WaitSet: [T-4] → [T-5] (threads that called │
│ wait()) │
└──────────────────────────────────────────────────────┘
Adaptive Spinning
Before escalating from lightweight to heavyweight, the JVM spins --- but how long? Modern JVMs use adaptive spinning:
Adaptive spinning:
Previous spin succeeded on this lock? → spin longer this time
Previous spin failed? → spin shorter or skip entirely
Lock holder is running on a CPU? → spin (likely to release soon)
Lock holder is parked/blocked? → don't spin (waste of CPU)
The JVM profiles lock behavior at runtime and adjusts. This is why synchronized performance can be hard to predict --- it depends on runtime contention patterns.
What monitorenter Does (Full Flow)
monitorenter(obj):
│
├── Mark word biased to me?
│ → enter (no CAS, ~1 ns)
│
├── Mark word unlocked?
│ → CAS mark word → ptr to my lock record
│ → SUCCESS: lightweight lock (~10-20 ns)
│ → FAIL: spin (adaptive) → still locked? → inflate
│
├── Mark word lightweight locked by someone else?
│ → spin (adaptive)
│ → still locked? → inflate to ObjectMonitor
│ → CAS ObjectMonitor._owner
│ → FAIL: push onto _cxq → park (~1-10 μs)
│
└── Mark word heavyweight (ObjectMonitor)?
→ CAS ObjectMonitor._owner
→ SUCCESS: enter
→ FAIL: push onto _cxq → park
What monitorexit Does (Full Flow)
monitorexit(obj):
│
├── Biased lock?
│ → do nothing (mark word stays biased)
│
├── Lightweight lock?
│ → CAS restore saved mark from lock record
│ → SUCCESS: unlocked
│ → FAIL: inflated while held → heavyweight unlock
│
└── Heavyweight (ObjectMonitor)?
→ _recursions > 0? → decrement, still held (reentrant)
→ _recursions == 0?
→ _owner = null
→ _cxq not empty? → move to _EntryList
→ unpark head of _EntryList
ObjectMonitor: Entry Queue vs Wait Set
Entry Queue (_cxq + _EntryList)
"I want the lock but someone else has it."
Two sub-queues for performance --- separating the "arrive" path from the "wake" path to avoid contention:
_cxq (contention queue):
→ WHERE threads arrive (enqueue)
→ LIFO stack (push to head via CAS)
→ Lock-free --- arriving threads only CAS the _cxq head pointer
_EntryList:
→ WHERE threads get woken from (dequeue)
→ FIFO list
→ Only touched by the unlocking thread (no contention)
Why Two Queues?
If arriving threads pushed directly onto _EntryList, they'd contend with the unlocking thread:
Bad: both touch _EntryList
Thread-B (arrive): CAS append to _EntryList tail
Thread-A (unlock): read _EntryList head to unpark
→ both need atomic access to same pointer → CONTENTION
Good: separate paths
Thread-B (arrive): CAS push to _cxq head (different pointer)
Thread-A (unlock): read _EntryList head (different pointer)
→ no contention
Same principle as CLH queue's "enqueue at tail, dequeue at head" (see aqs.md).
How They Work Together
Thread-B arrives:
CAS push onto _cxq head (lock-free, ~10 ns)
_cxq: [T-B] → [T-C] → [T-D]
Thread-A unlocks:
1. Grab entire _cxq (one CAS: _cxq = null)
2. Reverse it (LIFO → FIFO) and append to _EntryList
3. Unpark head of _EntryList
_cxq: (empty)
_EntryList: [T-D] → [T-C] → [T-B] ← reversed for fairness
↑
unpark this one
Detailed Timeline
Time 0: Thread-0 holds lock
_cxq: (empty) _EntryList: (empty)
Time 1: Thread-1 arrives → CAS push _cxq
_cxq: [T-1] _EntryList: (empty)
Time 2: Thread-2 arrives → CAS push _cxq
_cxq: [T-2]→[T-1] _EntryList: (empty) ← LIFO: T-2 at head
Time 3: Thread-3 arrives → CAS push _cxq
_cxq: [T-3]→[T-2]→[T-1] _EntryList: (empty)
Time 4: Thread-0 unlocks:
Grab _cxq → reverse → _EntryList = [T-1]→[T-2]→[T-3]
unpark(T-1)
_cxq: (empty) _EntryList: [T-1]→[T-2]→[T-3]
Time 5: Thread-4 arrives while T-1 holds lock
_cxq: [T-4] _EntryList: [T-2]→[T-3] ← new arrivals go to _cxq
Time 6: T-1 unlocks:
Grab _cxq [T-4] → append to _EntryList → [T-2]→[T-3]→[T-4]
unpark(T-2)
Why LIFO for _cxq?
CAS push to head is the simplest possible lock-free enqueue --- one atomic operation:
java
// _cxq push --- one CAS
do {
node.next = _cxq; // point to current head
} while (!CAS(_cxq, node.next, node)); // swap head to me
LIFO means the most recent arrival is at the head. The reversal step on unlock converts to FIFO for fairness. Without reversal, the last thread to arrive would be woken first (unfair).
Wait Set (_WaitSet)
"I had the lock but called wait()."
java
synchronized (obj) {
while (!condition)
obj.wait(); // release lock → wait set → park
}
wait() atomically: releases the lock → moves thread to _WaitSet → parks.
notify() / notifyAll() --- Transfer Between Queues
notify() moves ONE thread from wait set to entry queue. notifyAll() moves ALL:
Before notify():
_EntryList: [T-1] → [T-2]
_WaitSet: [T-3] → [T-4]
After notify():
_EntryList: [T-1] → [T-2] → [T-3] ← moved
_WaitSet: [T-4]
After notifyAll():
_EntryList: [T-1] → [T-2] → [T-3] → [T-4]
_WaitSet: (empty)
notifyAll() Wasteful Cycle
All threads move to entry queue, compete one by one. Threads whose condition isn't met go right back:
notifyAll() → move ALL from wait set to entry queue
Consumer-1: acquires lock → queue.isEmpty()? NO → removes item → done ✅
Consumer-2: acquires lock → queue.isEmpty()? YES → wait() → back to wait set 🔄
Consumer-3: acquires lock → queue.isEmpty()? YES → wait() → back to wait set 🔄
Only 1 thread does useful work, others round-trip. This is why ReentrantLock with multiple conditions is better --- signal() wakes exactly the right type (see lock.md).
wait() / notify() / notifyAll() Details
Must be called inside synchronized block on the same object (otherwise IllegalMonitorStateException):
java
// Producer
synchronized (queue) {
while (queue.isFull())
queue.wait(); // releases lock, thread sleeps
queue.add(item);
queue.notifyAll(); // wake up consumers
}
// Consumer
synchronized (queue) {
while (queue.isEmpty())
queue.wait(); // releases lock, thread sleeps
item = queue.remove();
queue.notifyAll(); // wake up producers
}
Why while not if? Three reasons:
- Spurious wakeups --- thread can wake without
notify()(JVM/OS allows this) notifyAll()wakes everyone --- condition may not be met for this thread- Between
notify()and re-acquiring the lock, another thread may have changed the state
wait() Internals
obj.wait():
1. Check current thread owns obj's monitor (else IllegalMonitorStateException)
2. Save recursion count
3. Set _recursions = 0, _owner = null (fully release)
4. Move thread to _WaitSet
5. park()
... sleeping ...
6. notify()/notifyAll() → move to _EntryList
7. Re-acquire monitor (compete in entry queue)
8. Restore recursion count
9. Continue after wait()
notify() vs notifyAll() --- When to Use Which
notify():
✅ Efficient --- wakes only 1 thread
❌ Risky --- might wake wrong type (producer vs consumer)
Use when: all waiters are equivalent (same condition)
notifyAll():
✅ Safe --- guarantees the right thread eventually runs
❌ Wasteful --- wakes all, most go back to sleep
Use when: different types of waiters share one wait set (default choice)
In practice, almost always use notifyAll() with synchronized. Or better: use ReentrantLock with separate Condition queues and signal().
sleep() vs wait() --- Lock Behavior
Thread.sleep() keeps the lock. Object.wait() releases it:
java
synchronized (obj) {
Thread.sleep(5000); // sleeps 5s, HOLDS the lock
}
synchronized (obj) {
obj.wait(5000); // sleeps 5s, RELEASES the lock
}
Thread.sleep() inside synchronized:
Thread-0: [===== holding lock, sleeping 5s =====]
Thread-1: [blocked][blocked][blocked][blocked] → can't enter for 5s
obj.wait() inside synchronized:
Thread-0: [lock]→[released, sleeping]→[re-acquire]→[continue]
Thread-1: [enters synchronized]→[works]→[exits]
Thread.sleep() |
Object.wait() |
|
|---|---|---|
| Releases lock | No | Yes |
Needs synchronized |
No | Yes (else IllegalMonitorStateException) |
| Woken by | Timeout or interrupt | notify(), notifyAll(), timeout, or interrupt |
| Thread state | TIMED_WAITING | WAITING or TIMED_WAITING |
| Use case | Just pause | Coordinate between threads |
Never use sleep() inside synchronized for coordination --- use wait().
synchronized vs ReentrantLock Comparison
synchronized:
monitorenter → Biased? (~1 ns) → Lightweight CAS? (~10-20 ns) → Heavyweight park (~1-10 μs)
ReentrantLock.lock():
CAS(state, 0, 1) (~10-20 ns) → fail? → CLH queue → park (~1-10 μs)
| Scenario | synchronized |
ReentrantLock.lock() |
|---|---|---|
| No contention, single thread | ~1 ns (biased) | ~10-20 ns (CAS) |
| No contention, multiple threads | ~10-20 ns (lightweight) | ~10-20 ns (CAS) |
| Contention | ~1-10 μs (OS park) | ~1-10 μs (AQS park) |
| Feature | synchronized | ReentrantLock |
|---|---|---|
| Syntax | Block/method keyword | Explicit lock()/unlock() |
| Release | Automatic | Manual (finally required) |
| tryLock | No | Yes |
| Timed/interruptible | No | Yes |
| Fair ordering | No | Optional |
| Multiple conditions | No (1 wait set) | Yes (N condition queues) |
| JVM optimization | Biased/lightweight | N/A (always CAS) |
Default to synchronized. Upgrade to ReentrantLock when you need tryLock, timeout, fairness, or multiple conditions.
Reentrancy
synchronized is reentrant --- the same thread can enter the same lock multiple times without deadlocking:
java
synchronized (lock) { // count = 1
synchronized (lock) { // count = 2 (same thread, no deadlock)
synchronized (lock) { // count = 3
} // count = 2
} // count = 1
} // count = 0 → released
This is tracked differently at each lock level:
-
Biased: no counter needed (thread ID check is enough, JVM tracks via stack frames)
-
Lightweight: recursive lock records stacked on the thread's stack (each entry pushes a new lock record with null displaced mark)
-
Heavyweight:
ObjectMonitor._recursionscounterLightweight reentrant locking:
Thread-0 stack:
┌──────────────────┐
│ Lock Record #3 │ ← 3rd synchronized (displaced mark = null, recursive)
│ (null) │
├──────────────────┤
│ Lock Record #2 │ ← 2nd synchronized (displaced mark = null, recursive)
│ (null) │
├──────────────────┤
│ Lock Record #1 │ ← 1st synchronized (displaced mark = original mark word)
│ (saved mark) │
└──────────────────┘Unlock: pop lock records one by one
#3: null → recursive, do nothing
#2: null → recursive, do nothing
#1: has saved mark → CAS restore original mark word → unlocked
Memory Semantics (Happens-Before)
synchronized provides a happens-before guarantee --- everything before monitorexit is visible to any thread that subsequently executes monitorenter on the same object:
java
// Thread A
x = 42; // (1)
synchronized (lock) { // (2) monitorenter
sharedFlag = true; // (3)
} // (4) monitorexit --- flushes (1), (3) to main memory
// Thread B
synchronized (lock) { // (5) monitorenter --- invalidates cache, reads from main memory
if (sharedFlag) { // (6) sees true
System.out.println(x); // (7) guaranteed to see 42
}
}
monitorenter acts as a load barrier (invalidate cache), monitorexit acts as a store barrier (flush to main memory). Together they guarantee visibility across threads --- same as volatile but for an entire critical section.
monitorexit (Thread A):
── StoreStore barrier ── all writes before this are flushed
── StoreLoad barrier ── subsequent loads see flushed writes
monitorenter (Thread B):
── LoadLoad barrier ── invalidate cached values
── LoadStore barrier ── don't reorder loads before stores after
JIT Optimizations
The JVM's JIT compiler can optimize synchronized blocks at runtime:
Lock Elimination
If escape analysis proves the lock object never escapes the current thread, the JIT removes the lock entirely:
java
void doWork() {
Object lock = new Object(); // lock is local, never escapes
synchronized (lock) { // JIT removes this --- no other thread can see lock
compute();
}
}
// After JIT optimization:
void doWork() {
compute(); // synchronized removed entirely
}
Lock Coarsening
If the JIT sees adjacent synchronized blocks on the same object, it merges them into one:
java
// Before JIT:
synchronized (lock) { step1(); }
synchronized (lock) { step2(); }
synchronized (lock) { step3(); }
// After JIT coarsening:
synchronized (lock) {
step1();
step2();
step3();
}
This avoids repeated lock/unlock overhead. The JIT also coarsens locks inside loops:
java
// Before:
for (int i = 0; i < 100; i++) {
synchronized (lock) { list.add(i); } // 100 lock/unlock cycles
}
// After JIT coarsening:
synchronized (lock) {
for (int i = 0; i < 100; i++) {
list.add(i); // 1 lock/unlock cycle
}
}
Nested Lock Elimination
If the JIT detects nested locks on the same object, it eliminates the inner lock:
java
synchronized (lock) {
synchronized (lock) { // JIT knows we already hold this lock
doWork(); // inner synchronized removed
}
}
These optimizations are why synchronized can be surprisingly fast in practice --- the JIT adapts to actual runtime behavior.
Common Pitfalls
1. Synchronizing on wrong object
java
// BAD --- each thread sees different Integer object after autoboxing
synchronized (Integer.valueOf(1)) { }
// BAD --- String literals are interned, shared across classes
synchronized ("lock") { }
// GOOD --- dedicated lock object
private final Object lock = new Object();
synchronized (lock) { }
2. Synchronizing on this in public class
java
// BAD --- external code can also synchronize on your instance
public class MyService {
public synchronized void doWork() { } // locks on `this`
}
// External code can interfere:
synchronized (myService) { Thread.sleep(Long.MAX_VALUE); } // blocks all doWork()
// GOOD --- private lock object, external code can't interfere
public class MyService {
private final Object lock = new Object();
public void doWork() {
synchronized (lock) { /* ... */ }
}
}
3. Holding lock during I/O or long operations
java
// BAD --- blocks all other threads during network call
synchronized (lock) {
result = httpClient.call(url); // 500ms+ network latency
}
// GOOD --- minimize synchronized scope
Data data;
synchronized (lock) {
data = sharedState.copy(); // fast, just copy
}
result = httpClient.call(url, data); // outside lock
synchronized (lock) {
sharedState.update(result); // fast, just update
}
4. Nested synchronized --- deadlock risk
java
// Thread A: synchronized(X) → synchronized(Y)
// Thread B: synchronized(Y) → synchronized(X)
// → DEADLOCK
// Fix: always acquire locks in consistent order
// Or use ReentrantLock.tryLock() for deadlock prevention (see lock.md)