前言
业务开发中只要涉及到了MQ
,我们会条件反射的想到了需要避免消息重复消费 ,保证消费的幂等性~
即使是作为业界中成熟的消息中间件: RocketMq
,在真实的生产环境中 也不可避免的会出现重复消费的现象,那么本来就带大家了解了解RocketMq
什么情况下可能会出现重复消费~
重复消费出现时机
1. Producer发消息失败重试
org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl#sendKernelImpl
Producer
在同步发送模式下,因为网络波动 等原因,可能会出现消息发送成功,但是Producer
没有及时收到响应,从而出现timeout
的情况
此时Produer
会认为消息发送失败,从而进行重试发送 ,默认会重试2
次
这里就涉及到一个messageQueue
的选择策略了,如果是失败重试选择messageQueue
,那么RocketMq
会选择跟上一次选择的messageQueue
不同所属broker
的messageQueue
,从而提高发送成功率~
故,在重试发送时,发送成功率会很高,此时就造成了消息的重复发送 ,而且msgId
还不一致,极容易造成业务内容的重复消费~
2. Consumer消费异常
java
// org.apache.rocketmq.client.impl.consumer.DefaultMQPushConsumerImpl#pullMessage
public void pullMessage(final PullRequest pullRequest) {
// ......
PullCallback pullCallback = new PullCallback() {
@Override
public void onSuccess(PullResult pullResult) {
if (pullResult != null) {
pullResult = DefaultMQPushConsumerImpl.this.pullAPIWrapper.processPullResult(pullRequest.getMessageQueue(), pullResult, subscriptionData);
switch (pullResult.getPullStatus()) {
case FOUND:
// todo 成功拉取到消息
long prevRequestOffset = pullRequest.getNextOffset();
pullRequest.setNextOffset(pullResult.getNextBeginOffset());
long pullRT = System.currentTimeMillis() - beginTimestamp;
DefaultMQPushConsumerImpl.this.getConsumerStatsManager().incPullRT(pullRequest.getConsumerGroup(),
pullRequest.getMessageQueue().getTopic(), pullRT);
long firstMsgOffset = Long.MAX_VALUE;
if (pullResult.getMsgFoundList() == null || pullResult.getMsgFoundList().isEmpty()) {
DefaultMQPushConsumerImpl.this.executePullRequestImmediately(pullRequest);
} else {
firstMsgOffset = pullResult.getMsgFoundList().get(0).getQueueOffset();
DefaultMQPushConsumerImpl.this.getConsumerStatsManager().incPullTPS(pullRequest.getConsumerGroup(),
pullRequest.getMessageQueue().getTopic(), pullResult.getMsgFoundList().size());
boolean dispatchToConsume = processQueue.putMessage(pullResult.getMsgFoundList());
// todo 提交消费请求,交给Consumer消费
DefaultMQPushConsumerImpl.this.consumeMessageService.submitConsumeRequest(
pullResult.getMsgFoundList(),
processQueue,
pullRequest.getMessageQueue(),
dispatchToConsume);
// ......
break;
// ......
default:
break;
}
}
}
// ......
};
// ......
}
Consumer
客户端成功拉取到消息后,会提交消费请求 ,即submitConsumeRequest
java
// org.apache.rocketmq.client.impl.consumer.ConsumeMessageConcurrentlyService.ConsumeRequest#run
class ConsumeRequest implements Runnable {
// 待消费消息集合
private final List<MessageExt> msgs;
public void run() {
// ......
ConsumeConcurrentlyStatus status = null;
try {
// .....
// todo 回调Consumer,执行真正业务消费
status = listener.consumeMessage(Collections.unmodifiableList(msgs), context);
} catch (Throwable e) {
log.warn(String.format("consumeMessage exception: %s Group: %s Msgs: %s MQ: %s",
RemotingHelper.exceptionSimpleDesc(e),
ConsumeMessageConcurrentlyService.this.consumerGroup,
msgs,
messageQueue), e);
hasException = true;
}
// ...... 不对status改变
if (null == status) {
log.warn("consumeMessage return null, Group: {} Msgs: {} MQ: {}",
ConsumeMessageConcurrentlyService.this.consumerGroup,
msgs,
messageQueue);
// todo RECONSUME_LATER: 稍后再消费
status = ConsumeConcurrentlyStatus.RECONSUME_LATER;
}
// ......
if (!processQueue.isDropped()) {
// todo 处理消费结果
ConsumeMessageConcurrentlyService.this.processConsumeResult(status, context, this);
} else {
log.warn("processQueue is dropped without process consume result. messageQueue={}, msgs={}", messageQueue, msgs);
}
}
}
通过如上代码我们可以看到,当listener.consumeMessage
出现异常时,status
会维持null
值,当null == status
时,status
会被重置为RECONSUME_LATER
那么在processConsumeResult
中,即处理消费结果时👇🏻
java
public void processConsumeResult(
final ConsumeConcurrentlyStatus status,
final ConsumeConcurrentlyContext context,
final ConsumeRequest consumeRequest
) {
// 默认Integer.MAX_VALUE
int ackIndex = context.getAckIndex();
if (consumeRequest.getMsgs().isEmpty())
return;
switch (status) {
case CONSUME_SUCCESS: // 正常消费成功
if (ackIndex >= consumeRequest.getMsgs().size()) {
ackIndex = consumeRequest.getMsgs().size() - 1;
}
int ok = ackIndex + 1;
int failed = consumeRequest.getMsgs().size() - ok;
this.getConsumerStatsManager().incConsumeOKTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(), ok);
this.getConsumerStatsManager().incConsumeFailedTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(), failed);
break;
case RECONSUME_LATER: // 稍后重新消费
// ackIndex重置为-1
ackIndex = -1;
this.getConsumerStatsManager().incConsumeFailedTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(),
consumeRequest.getMsgs().size());
break;
default:
break;
}
switch (this.defaultMQPushConsumer.getMessageModel()) {
case BROADCASTING:
for (int i = ackIndex + 1; i < consumeRequest.getMsgs().size(); i++) {
MessageExt msg = consumeRequest.getMsgs().get(i);
log.warn("BROADCASTING, the message consume failed, drop it, {}", msg.toString());
}
break;
case CLUSTERING: // 集群模式
List<MessageExt> msgBackFailed = new ArrayList<MessageExt>(consumeRequest.getMsgs().size());
// todo 如果正常消费成功 ackIndex = consumeRequest.getMsgs().size() - 1;不会进入for循环
// 如果消费失败 ackIndex = -1 ,此时ackIndex + 1 = 0,进入for循环,重新遍历发消息
for (int i = ackIndex + 1; i < consumeRequest.getMsgs().size(); i++) {
MessageExt msg = consumeRequest.getMsgs().get(i);
// todo 重发消息,稍后再次消费
boolean result = this.sendMessageBack(msg, context);
if (!result) {
msg.setReconsumeTimes(msg.getReconsumeTimes() + 1);
msgBackFailed.add(msg);
}
}
if (!msgBackFailed.isEmpty()) {
consumeRequest.getMsgs().removeAll(msgBackFailed);
this.submitConsumeRequestLater(msgBackFailed, consumeRequest.getProcessQueue(), consumeRequest.getMessageQueue());
}
break;
default:
break;
}
// .....
}
源码可见,当status
处于RECONSUME_LATER
时,ackIndex
将被置为-1
此时,在集群模式下,将会重新遍历所有拉取到的消息,重新一个一个发送,稍后再次消费~
此场景下,问题就在于消息是批量拉取的 ,每一批可以拉取多个消息,当消费到第2、3....个消息时,出现了异常,这样会导致这一批消息都会被重新发送,进而重新被消费,那么对于之前正常消费成功过的消息来说,就会被重新消费。
再回到之前提到的submitConsumeRequest
,即如下源码
虽然每一批可以拉取多个消息,但是在submitConsumeRequest
时,有一个关键的参数: consumeBatchSize
consumeBatchSize
可以控制每一次消费的消息个数,且默认为1
,即默认每一次只消费一条消息,这样一来就不存在上述👆🏻说描述的场景了,也不会出现重复消费的情况。
java
// org.apache.rocketmq.client.impl.consumer.ConsumeMessageConcurrentlyService#submitConsumeRequest
public void submitConsumeRequest(
final List<MessageExt> msgs,
final ProcessQueue processQueue,
final MessageQueue messageQueue,
final boolean dispatchToConsume) {
// todo 默认是1
final int consumeBatchSize = this.defaultMQPushConsumer.getConsumeMessageBatchMaxSize();
if (msgs.size() <= consumeBatchSize) {
ConsumeRequest consumeRequest = new ConsumeRequest(msgs, processQueue, messageQueue);
try {
this.consumeExecutor.submit(consumeRequest);
} catch (RejectedExecutionException e) {
this.submitConsumeRequestLater(consumeRequest);
}
} else {
// consumeBatchSize < msgs.size()
for (int total = 0; total < msgs.size(); ) {
List<MessageExt> msgThis = new ArrayList<MessageExt>(consumeBatchSize);
// 每一次消费consumeBatchSize个消息
for (int i = 0; i < consumeBatchSize; i++, total++) {
if (total < msgs.size()) {
msgThis.add(msgs.get(total));
} else {
break;
}
}
// msgThis默认只有一个元素
ConsumeRequest consumeRequest = new ConsumeRequest(msgThis, processQueue, messageQueue);
try {
this.consumeExecutor.submit(consumeRequest);
} catch (RejectedExecutionException e) {
for (; total < msgs.size(); total++) {
msgThis.add(msgs.get(total));
}
this.submitConsumeRequestLater(consumeRequest);
}
}
}
}
3. Consumer提交offSet失败
如果有不了解RocketMq offSet
机制的可以先阅读 -> RocketMq offSet管理机制
接继上文,还是在processConsumeResult
中,处理完消息后,最终会updateOffset
java
// org.apache.rocketmq.client.impl.consumer.ConsumeMessageConcurrentlyService#processConsumeResult
public void processConsumeResult(
final ConsumeConcurrentlyStatus status,
final ConsumeConcurrentlyContext context,
final ConsumeRequest consumeRequest
) {
// ......
long offset = consumeRequest.getProcessQueue().removeMessage(consumeRequest.getMsgs());
if (offset >= 0 && !consumeRequest.getProcessQueue().isDropped()) {
// todo 更新offset
this.defaultMQPushConsumerImpl.getOffsetStore().updateOffset(consumeRequest.getMessageQueue(), offset, true);
}
}
广播模式下使用LocalFileOffsetStore
,集群模式下使用RemoteBrokerOffsetStore
我们以集群模式为例~
java
public class RemoteBrokerOffsetStore implements OffsetStore {
// key: messageQueue, value: 对应的offset
private ConcurrentMap<MessageQueue, AtomicLong> offsetTable =
new ConcurrentHashMap<MessageQueue, AtomicLong>();
@Override
public void updateOffset(MessageQueue mq, long offset, boolean increaseOnly) {
if (mq != null) {
// todo 从offsetTable拿到mq当前对应的offset
AtomicLong offsetOld = this.offsetTable.get(mq);
if (null == offsetOld) {
// 为空,则初始化一下
offsetOld = this.offsetTable.putIfAbsent(mq, new AtomicLong(offset));
}
if (null != offsetOld) {
if (increaseOnly) {
// todo 更新offset
MixAll.compareAndIncreaseOnly(offsetOld, offset);
} else {
offsetOld.set(offset);
}
}
}
}
}
可以看到,最终offset
是更新到了内存中的offsetTable
Consumer
在start
的时候会开启一个定时任务,每隔5s将内存中的offsetTable同步给broker
java
this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
@Override
public void run() {
try {
// todo 每隔5s将内存中的offsetTable同步给broker
MQClientInstance.this.persistAllConsumerOffset();
} catch (Exception e) {
log.error("ScheduledTask persistAllConsumerOffset exception", e);
}
}
}, 1000 * 10, this.clientConfig.getPersistConsumerOffsetInterval(), TimeUnit.MILLISECONDS);
这样就可能存在一个问题!
由于Consumer
处理完消息后,并不是实时同步给broker offset
,而是通过定时任务的方式
那么如果服务宕机导致最新的offset
没有同步给broker
,那么在服务重启后,只能根据broker
之前的offset
开始消费,此时就会造成重复消费的问题。
4. Broker持久化offset失败
继续接上述Consumer
同步offset
到broker
java
public class ConsumerOffsetManager extends ConfigManager {
// todo key: topic@group, value: <messageQueueId, offset>map
protected ConcurrentMap<String/* topic@group */, ConcurrentMap<Integer, Long>> offsetTable =
new ConcurrentHashMap<String, ConcurrentMap<Integer, Long>>(512);
}
broker
接收到Consumer
的同步offset
请求后,最终也是更新到本地内存中~
同时也是开启一个每隔5s
执行一次的定时任务,将内存中的offsetTable
持久化成文件~
java
// org.apache.rocketmq.broker.BrokerController#initialize
public boolean initialize() throws CloneNotSupportedException {
// ......
this.scheduledExecutorService.scheduleAtFixedRate(() -> {
try {
BrokerController.this.consumerOffsetManager.persist();
} catch (Throwable e) {
log.error("schedule persist consumerOffset error.", e);
}
}, 1000 * 10, this.brokerConfig.getFlushConsumerOffsetInterval(), TimeUnit.MILLISECONDS);
// ......
}
出现重复消费的场景也很明显,如果broker
宕机,那么最新的offset
可能没及时进行持久化,那么就会造成5s
的offset
丢失,broker
重新启动后,Consumer
从broker
读取的"最新"的offset
就是旧的,这样一来就会造成重复消费~
5. 主从同步offset失败
众所周知,RocketMq
存在主从模式,既然是主从模式,那么必然存在主从节点之间的数据同步~
在RocketMq
中从节点 默认每隔10s 会向主节点 发送请求同步数据,其中就包括offset
java
// org.apache.rocketmq.broker.BrokerController#handleSlaveSynchronize
private void handleSlaveSynchronize(BrokerRole role) {
// 如果是从节点
if (role == BrokerRole.SLAVE) {
if (null != slaveSyncFuture) {
slaveSyncFuture.cancel(false);
}
this.slaveSynchronize.setMasterAddr(null);
slaveSyncFuture = this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
@Override
public void run() {
try {
// todo 同步数据
BrokerController.this.slaveSynchronize.syncAll();
}
catch (Throwable e) {
log.error("ScheduledTask SlaveSynchronize syncAll error.", e);
}
}
}, 1000 * 3, 1000 * 10, TimeUnit.MILLISECONDS);
} else {
//handle the slave synchronise
if (null != slaveSyncFuture) {
slaveSyncFuture.cancel(false);
}
this.slaveSynchronize.setMasterAddr(null);
}
}
同理,如果主节点挂了,那么从节点会丢失10s
最新的offset
,如果此时从节点升级为主节点,那么Consumer
拉取到的最新的offset
就是旧的,也就造成了重复消费。
6. Consumer重平衡
在RocketMq
中,一个topic
往往有多个messageQueue
,一个ConsumerGroup
中往往有多个Consumer
,那么把这些messageQueue
合理的分配给一个ConsumerGroup
中consumer
的过程就叫重平衡
.
当Consumer
消费完消息后,正常情况下时需要更新offset
的
java
public void processConsumeResult(
final ConsumeConcurrentlyStatus status,
final ConsumeConcurrentlyContext context,
final ConsumeRequest consumeRequest
) {
// ......
//
long offset = consumeRequest.getProcessQueue().removeMessage(consumeRequest.getMsgs());
// 如果offset >= 0且没有drop,则需要更新offset
if (offset >= 0 && !consumeRequest.getProcessQueue().isDropped()) {
this.defaultMQPushConsumerImpl.getOffsetStore().updateOffset(consumeRequest.getMessageQueue(), offset, true);
}
}
但是重平衡进行的时候,Consumer
也会在进行消息消费 ,所以可能存在**Consumer
准备消费完准备去更新offset
时,此时drop
被标记为true
了,这样一来 最新的offset
不会被更新,故之后就可能造成重复消费**~
7. 最小位点提交
RocketMq
中,Consumer
会从Broker
拉取一批消息,再默认情况下一个一个提交到线程池进行消费
举例: Consumer
拉取到3
条消息,然后提交线程池消费,其中thread1 消费 msg1
, thread2 消费 msg2
, thread3 消费 msg3
thread3
消费比较快,先于thread1、thread2
消费完成,消费完成后需要从processQueueu
中移除消息,并updateOffset
此时,应该是将offset
更新为msg3
的offset
吗?
为了保证消息不丢失,此时更新的还是msg1
对应的offset
,即最小位点提交
结合上述案例及图示,我们可以想象,在最小位点的机制下,thread3
消费成功后提交了msg1
的offset
,此时客户端重启,重新向broker
获取的offset
还是msg1
的,此时就会造成重复消费。
总结
通过本文,我们了解到了RocketMq
中7
种可能出现重复消费的场景,保证消息幂等任重道远~
我是 Code皮皮虾 ,会在以后的日子里跟大家一起学习,一起进步! 觉得文章不错的话,可以在 掘金 关注我,这样就不会错过很多技术干货啦~