java
/**
* An operation whose processing needs to be delayed for at most the given delayMs. For example
* a delayed produce operation could be waiting for specified number of acks; or
* a delayed fetch operation could be waiting for a given number of bytes to accumulate.
*
* The logic upon completing a delayed operation is defined in onComplete() and will be called exactly once.
* Once an operation is completed, isCompleted() will return true. onComplete() can be triggered by either
* forceComplete(), which forces calling onComplete() after delayMs if the operation is not yet completed,
* or tryComplete(), which first checks if the operation can be completed or not now, and if yes calls
* forceComplete().
*
* A subclass of DelayedOperation needs to provide an implementation of both onComplete() and tryComplete().
*/
abstract class DelayedOperation(override val delayMs: Long,
lockOpt: Option[Lock] = None)
extends TimerTask with Logging {
private val completed = new AtomicBoolean(false)
private val tryCompletePending = new AtomicBoolean(false)
// Visible for testing
private[server] val lock: Lock = lockOpt.getOrElse(new ReentrantLock)
/*
* Force completing the delayed operation, if not already completed.
* This function can be triggered when
*
* 1. The operation has been verified to be completable inside tryComplete()
* 2. The operation has expired and hence needs to be completed right now
*
* Return true iff the operation is completed by the caller: note that
* concurrent threads can try to complete the same operation, but only
* the first thread will succeed in completing the operation and return
* true, others will still return false
*/
def forceComplete(): Boolean = {
if (completed.compareAndSet(false, true)) {
// cancel the timeout timer
cancel()
onComplete()
true
} else {
false
}
}
/**
* Check if the delayed operation is already completed
*/
def isCompleted: Boolean = completed.get()
/**
* Call-back to execute when a delayed operation gets expired and hence forced to complete.
*/
def onExpiration(): Unit
/**
* Process for completing an operation; This function needs to be defined
* in subclasses and will be called exactly once in forceComplete()
*/
def onComplete(): Unit
/**
* Try to complete the delayed operation by first checking if the operation
* can be completed by now. If yes execute the completion logic by calling
* forceComplete() and return true iff forceComplete returns true; otherwise return false
*
* This function needs to be defined in subclasses
*/
def tryComplete(): Boolean
/**
* Thread-safe variant of tryComplete() that attempts completion only if the lock can be acquired
* without blocking.
*
* If threadA acquires the lock and performs the check for completion before completion criteria is met
* and threadB satisfies the completion criteria, but fails to acquire the lock because threadA has not
* yet released the lock, we need to ensure that completion is attempted again without blocking threadA
* or threadB. `tryCompletePending` is set by threadB when it fails to acquire the lock and at least one
* of threadA or threadB will attempt completion of the operation if this flag is set. This ensures that
* every invocation of `maybeTryComplete` is followed by at least one invocation of `tryComplete` until
* the operation is actually completed.
*/
private[server] def maybeTryComplete(): Boolean = {
var retry = false
var done = false
do {
if (lock.tryLock()) {
try {
tryCompletePending.set(false)
done = tryComplete()
} finally {
lock.unlock()
}
// While we were holding the lock, another thread may have invoked `maybeTryComplete` and set
// `tryCompletePending`. In this case we should retry.
retry = tryCompletePending.get()
} else {
// Another thread is holding the lock. If `tryCompletePending` is already set and this thread failed to
// acquire the lock, then the thread that is holding the lock is guaranteed to see the flag and retry.
// Otherwise, we should set the flag and retry on this thread since the thread holding the lock may have
// released the lock and returned by the time the flag is set.
retry = !tryCompletePending.getAndSet(true)
}
} while (!isCompleted && retry)
done
}
/*
* run() method defines a task that is executed on timeout
*/
override def run(): Unit = {
if (forceComplete())
onExpiration()
}
}
这段代码是 Apache Kafka 中一个核心抽象类 DelayedOperation 的实现(用 Scala 编写),用于延迟执行某些操作 ,直到满足特定条件或超时。它广泛应用于 Kafka 的 延迟生产(DelayedProduce) 、延迟拉取(DelayedFetch) 、延迟心跳(DelayedHeartbeat) 等场景。
下面我将从设计思想、关键机制、线程安全策略和使用方式四个方面帮你深入理解:
一、核心目的:什么是"延迟操作"?
延迟操作 = 暂时不执行,等某个条件满足(或超时)后再完成。
典型例子:
- DelayedFetch:客户端拉取消息,但 Broker 上暂无足够数据(比如要求至少 10KB)。于是把请求挂起,等数据累积到阈值再返回。
- DelayedProduce:生产者要求"等待至少 2 个 ISR 副本确认",但当前只有 Leader 写入成功,需等待副本同步。
这类操作不能立即完成,也不能无限等待------所以需要:
- 异步等待条件满足
- 设置最大等待时间(
delayMs) - 一旦满足条件或超时,就执行回调
onComplete()
二、关键组件解析
1. 状态标志
scala
private val completed = new AtomicBoolean(false)
- 标记操作是否已完成。
- 保证
onComplete()只被调用一次(幂等性)。
2. 两个核心抽象方法(子类必须实现)
| 方法 | 作用 |
|---|---|
onComplete(): Unit |
真正执行业务逻辑的地方(如返回响应给客户端) |
tryComplete(): Boolean |
检查当前是否满足完成条件 ,若满足则调用 forceComplete() |
✅ 这是模板方法模式:父类控制流程,子类提供判断和执行逻辑。
3. 完成机制
forceComplete(): 强制完成(线程安全,CAS 保证只执行一次)- 取消定时器(
cancel()) - 调用
onComplete()
- 取消定时器(
tryComplete(): 尝试完成(由子类实现判断逻辑)
4. 超时处理
scala
override def run(): Unit = {
if (forceComplete()) onExpiration()
}
- 继承自
TimerTask,当延迟时间delayMs到期后自动触发。 - 超时后调用
onExpiration()(子类可定义超时行为,如返回空结果)
三、线程安全设计:maybeTryComplete() 的精妙之处
这是整个类最复杂也最精彩的部分,解决了一个经典并发问题:
多个线程可能同时尝试完成同一个延迟操作,如何避免竞争 + 保证最终一定完成?
场景:
- Thread A 正在持有锁并检查条件(尚未满足)
- Thread B 此时让条件满足了,但拿不到锁 → 无法调用
tryComplete()
如果不处理,可能导致操作永远不完成!
解决方案:引入 tryCompletePending 标志位
scala
private val tryCompletePending = new AtomicBoolean(false)
逻辑流程(简化版):
-
尝试获取锁(非阻塞):
- 成功 → 执行
tryComplete(),并清空tryCompletePending - 失败 → 设置
tryCompletePending = true(表示"有人想完成但没拿到锁")
- 成功 → 执行
-
循环重试:
- 如果
tryCompletePending被设为 true,说明有线程在等待完成,必须再试一次 - 直到操作完成 或 不再需要重试
- 如果
✅ 这种设计确保:只要有一次
maybeTryComplete()被调用,就一定会尝试完成操作,不会遗漏。
四、使用模式(子类如何继承?)
以 DelayedFetch 为例:
scala
class DelayedFetch(delayMs: Long, ...) extends DelayedOperation(delayMs) {
override def tryComplete(): Boolean = {
// 检查:是否有足够字节的数据?
if (hasEnoughData()) {
forceComplete() // 触发 onComplete
} else {
false
}
}
override def onComplete(): Unit = {
// 构造响应,发送给客户端
sendResponse(...)
}
override def onExpiration(): Unit = {
// 超时了,返回空数据或部分数据
sendTimeoutResponse(...)
}
}
触发时机:
- 主动检查 :每当新数据到达,Kafka 会调用
maybeTryComplete()尝试完成所有挂起的 Fetch 请求。 - 被动超时 :定时器到期自动触发
run()→onExpiration()
五、与 Java Lock/Condition 的关系
虽然这个类用了 ReentrantLock,但它不是用来保护共享资源的互斥锁,而是:
- 保证
tryComplete()的原子性检查+执行 - 避免多个线程同时修改操作状态
🔒 这里的锁是为了协调"谁来完成这个延迟操作",而不是保护某个数据结构。
六、总结:DelayedOperation 的设计哲学
| 特性 | 说明 |
|---|---|
| 延迟执行 | 不立即响应,等待条件或超时 |
| 一次完成 | onComplete() 保证只执行一次 |
| 线程安全 | CAS + Lock + 标志位,防止竞态和遗漏 |
| 灵活扩展 | 子类只需实现 tryComplete / onComplete / onExpiration |
| 资源友好 | 超时自动清理,避免请求堆积 |
💡 类比理解
想象你在餐厅点了一份"现烤牛排":
tryComplete()= 厨师每隔几秒看一眼牛排是否烤好onComplete()= 牛排好了,服务员端给你onExpiration()= 等太久(比如30分钟),经理过来道歉并送你一份沙拉maybeTryComplete()= 多个服务员都想端牛排,但只有一个能进厨房,其他人会留言"牛排好了!",确保不会没人管
这种模式在高并发系统中非常常见(如 Netty 的 Promise、Redis 的阻塞命令等),掌握它对理解 Kafka、RocketMQ 等中间件至关重要。