Rocketmq 7大重复消费场景源码解读~

前言

业务开发中只要涉及到了MQ,我们会条件反射的想到了需要避免消息重复消费保证消费的幂等性~

即使是作为业界中成熟的消息中间件: RocketMq,在真实的生产环境中 也不可避免的会出现重复消费的现象,那么本来就带大家了解了解RocketMq什么情况下可能会出现重复消费~


重复消费出现时机

1. Producer发消息失败重试

org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl#sendKernelImpl

Producer在同步发送模式下,因为网络波动 等原因,可能会出现消息发送成功,但是Producer没有及时收到响应,从而出现timeout的情况

此时Produer会认为消息发送失败,从而进行重试发送默认会重试2

这里就涉及到一个messageQueue的选择策略了,如果是失败重试选择messageQueue,那么RocketMq会选择跟上一次选择的messageQueue不同所属brokermessageQueue,从而提高发送成功率~

故,在重试发送时,发送成功率会很高,此时就造成了消息的重复发送 ,而且msgId还不一致,极容易造成业务内容的重复消费~

探究 | RocketMQ 发送消息messageQueue选择策略~


2. Consumer消费异常

java 复制代码
// org.apache.rocketmq.client.impl.consumer.DefaultMQPushConsumerImpl#pullMessage
public void pullMessage(final PullRequest pullRequest) {

  // ......

  PullCallback pullCallback = new PullCallback() {
    @Override
    public void onSuccess(PullResult pullResult) {
      if (pullResult != null) {
        pullResult = DefaultMQPushConsumerImpl.this.pullAPIWrapper.processPullResult(pullRequest.getMessageQueue(), pullResult, subscriptionData);

        switch (pullResult.getPullStatus()) {
          case FOUND:
            // todo 成功拉取到消息
            long prevRequestOffset = pullRequest.getNextOffset();
            pullRequest.setNextOffset(pullResult.getNextBeginOffset());
            long pullRT = System.currentTimeMillis() - beginTimestamp;
            DefaultMQPushConsumerImpl.this.getConsumerStatsManager().incPullRT(pullRequest.getConsumerGroup(),
                                                                               pullRequest.getMessageQueue().getTopic(), pullRT);

            long firstMsgOffset = Long.MAX_VALUE;
            if (pullResult.getMsgFoundList() == null || pullResult.getMsgFoundList().isEmpty()) {
              DefaultMQPushConsumerImpl.this.executePullRequestImmediately(pullRequest);
            } else {
              firstMsgOffset = pullResult.getMsgFoundList().get(0).getQueueOffset();

              DefaultMQPushConsumerImpl.this.getConsumerStatsManager().incPullTPS(pullRequest.getConsumerGroup(),
                                                                                  pullRequest.getMessageQueue().getTopic(), pullResult.getMsgFoundList().size());

              boolean dispatchToConsume = processQueue.putMessage(pullResult.getMsgFoundList());
              
              // todo 提交消费请求,交给Consumer消费
              DefaultMQPushConsumerImpl.this.consumeMessageService.submitConsumeRequest(
                pullResult.getMsgFoundList(),
                processQueue,
                pullRequest.getMessageQueue(),
                dispatchToConsume);

            // ......

            break;
          // ......
          default:
            break;
        }
      }
    }

    // ......
  };

  // ......
}

Consumer客户端成功拉取到消息后,会提交消费请求 ,即submitConsumeRequest

java 复制代码
// org.apache.rocketmq.client.impl.consumer.ConsumeMessageConcurrentlyService.ConsumeRequest#run
class ConsumeRequest implements Runnable {

  // 待消费消息集合
  private final List<MessageExt> msgs;

  public void run() {
    // ......

    ConsumeConcurrentlyStatus status = null;
    try {
      // .....

      // todo 回调Consumer,执行真正业务消费
      status = listener.consumeMessage(Collections.unmodifiableList(msgs), context);
    } catch (Throwable e) {
      log.warn(String.format("consumeMessage exception: %s Group: %s Msgs: %s MQ: %s",
                             RemotingHelper.exceptionSimpleDesc(e),
                             ConsumeMessageConcurrentlyService.this.consumerGroup,
                             msgs,
                             messageQueue), e);
      hasException = true;
    }

    // ...... 不对status改变

    if (null == status) {
      log.warn("consumeMessage return null, Group: {} Msgs: {} MQ: {}",
               ConsumeMessageConcurrentlyService.this.consumerGroup,
               msgs,
               messageQueue);
      // todo RECONSUME_LATER: 稍后再消费
      status = ConsumeConcurrentlyStatus.RECONSUME_LATER;
    }

    // ......
    
    if (!processQueue.isDropped()) {
      // todo 处理消费结果
      ConsumeMessageConcurrentlyService.this.processConsumeResult(status, context, this);
    } else {
      log.warn("processQueue is dropped without process consume result. messageQueue={}, msgs={}", messageQueue, msgs);
    }
  }
}

通过如上代码我们可以看到,当listener.consumeMessage出现异常时,status会维持null值,当null == status时,status会被重置为RECONSUME_LATER

那么在processConsumeResult中,即处理消费结果时👇🏻

java 复制代码
public void processConsumeResult(
  final ConsumeConcurrentlyStatus status,
  final ConsumeConcurrentlyContext context,
  final ConsumeRequest consumeRequest
) {

  // 默认Integer.MAX_VALUE
  int ackIndex = context.getAckIndex();

  if (consumeRequest.getMsgs().isEmpty())
    return;

  switch (status) {
    case CONSUME_SUCCESS: // 正常消费成功
      if (ackIndex >= consumeRequest.getMsgs().size()) {
        ackIndex = consumeRequest.getMsgs().size() - 1;
      }
      int ok = ackIndex + 1;
      int failed = consumeRequest.getMsgs().size() - ok;
      this.getConsumerStatsManager().incConsumeOKTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(), ok);
      this.getConsumerStatsManager().incConsumeFailedTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(), failed);
      break;
    case RECONSUME_LATER: // 稍后重新消费
      // ackIndex重置为-1
      ackIndex = -1;
      this.getConsumerStatsManager().incConsumeFailedTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(),
                                                         consumeRequest.getMsgs().size());
      break;
    default:
      break;
  }

  switch (this.defaultMQPushConsumer.getMessageModel()) {
    case BROADCASTING:
      for (int i = ackIndex + 1; i < consumeRequest.getMsgs().size(); i++) {
        MessageExt msg = consumeRequest.getMsgs().get(i);
        log.warn("BROADCASTING, the message consume failed, drop it, {}", msg.toString());
      }
      break;
    case CLUSTERING: // 集群模式
      List<MessageExt> msgBackFailed = new ArrayList<MessageExt>(consumeRequest.getMsgs().size());

      // todo 如果正常消费成功 ackIndex = consumeRequest.getMsgs().size() - 1;不会进入for循环
      // 如果消费失败 ackIndex = -1 ,此时ackIndex + 1 = 0,进入for循环,重新遍历发消息
      for (int i = ackIndex + 1; i < consumeRequest.getMsgs().size(); i++) {
        MessageExt msg = consumeRequest.getMsgs().get(i);
        // todo 重发消息,稍后再次消费
        boolean result = this.sendMessageBack(msg, context);
        if (!result) {
          msg.setReconsumeTimes(msg.getReconsumeTimes() + 1);
          msgBackFailed.add(msg);
        }
      }

      if (!msgBackFailed.isEmpty()) {
        consumeRequest.getMsgs().removeAll(msgBackFailed);

        this.submitConsumeRequestLater(msgBackFailed, consumeRequest.getProcessQueue(), consumeRequest.getMessageQueue());
      }
      break;
    default:
      break;
  }

  // .....
}

源码可见,当status处于RECONSUME_LATER时,ackIndex将被置为-1

此时,在集群模式下,将会重新遍历所有拉取到的消息,重新一个一个发送,稍后再次消费~

此场景下,问题就在于消息是批量拉取的每一批可以拉取多个消息,当消费到第2、3....个消息时,出现了异常,这样会导致这一批消息都会被重新发送,进而重新被消费,那么对于之前正常消费成功过的消息来说,就会被重新消费。

再回到之前提到的submitConsumeRequest,即如下源码

虽然每一批可以拉取多个消息,但是在submitConsumeRequest时,有一个关键的参数: consumeBatchSize

consumeBatchSize可以控制每一次消费的消息个数,且默认为1 ,即默认每一次只消费一条消息,这样一来就不存在上述👆🏻说描述的场景了,也不会出现重复消费的情况。

java 复制代码
// org.apache.rocketmq.client.impl.consumer.ConsumeMessageConcurrentlyService#submitConsumeRequest
public void submitConsumeRequest(
  final List<MessageExt> msgs,
  final ProcessQueue processQueue,
  final MessageQueue messageQueue,
  final boolean dispatchToConsume) {
  // todo 默认是1
  final int consumeBatchSize = this.defaultMQPushConsumer.getConsumeMessageBatchMaxSize();
  if (msgs.size() <= consumeBatchSize) {
    ConsumeRequest consumeRequest = new ConsumeRequest(msgs, processQueue, messageQueue);
    try {
      this.consumeExecutor.submit(consumeRequest);
    } catch (RejectedExecutionException e) {
      this.submitConsumeRequestLater(consumeRequest);
    }
  } else {
    // consumeBatchSize < msgs.size()
    for (int total = 0; total < msgs.size(); ) {
      List<MessageExt> msgThis = new ArrayList<MessageExt>(consumeBatchSize);
      
      // 每一次消费consumeBatchSize个消息
      for (int i = 0; i < consumeBatchSize; i++, total++) {
        if (total < msgs.size()) {
          msgThis.add(msgs.get(total));
        } else {
          break;
        }
      }

      // msgThis默认只有一个元素
      ConsumeRequest consumeRequest = new ConsumeRequest(msgThis, processQueue, messageQueue);
      try {
        this.consumeExecutor.submit(consumeRequest);
      } catch (RejectedExecutionException e) {
        for (; total < msgs.size(); total++) {
          msgThis.add(msgs.get(total));
        }

        this.submitConsumeRequestLater(consumeRequest);
      }
    }
  }
}

3. Consumer提交offSet失败

如果有不了解RocketMq offSet机制的可以先阅读 -> RocketMq offSet管理机制

接继上文,还是在processConsumeResult中,处理完消息后,最终会updateOffset

java 复制代码
// org.apache.rocketmq.client.impl.consumer.ConsumeMessageConcurrentlyService#processConsumeResult
public void processConsumeResult(
  final ConsumeConcurrentlyStatus status,
  final ConsumeConcurrentlyContext context,
  final ConsumeRequest consumeRequest
) {
  // ......

  long offset = consumeRequest.getProcessQueue().removeMessage(consumeRequest.getMsgs());
  if (offset >= 0 && !consumeRequest.getProcessQueue().isDropped()) {
		// todo 更新offset
    this.defaultMQPushConsumerImpl.getOffsetStore().updateOffset(consumeRequest.getMessageQueue(), offset, true);
  }
}

广播模式下使用LocalFileOffsetStore,集群模式下使用RemoteBrokerOffsetStore

我们以集群模式为例~

java 复制代码
public class RemoteBrokerOffsetStore implements OffsetStore {

  // key: messageQueue, value: 对应的offset
  private ConcurrentMap<MessageQueue, AtomicLong> offsetTable =
    new ConcurrentHashMap<MessageQueue, AtomicLong>();

  @Override
  public void updateOffset(MessageQueue mq, long offset, boolean increaseOnly) {
    if (mq != null) {
      // todo 从offsetTable拿到mq当前对应的offset
      AtomicLong offsetOld = this.offsetTable.get(mq);
      if (null == offsetOld) {
        // 为空,则初始化一下
        offsetOld = this.offsetTable.putIfAbsent(mq, new AtomicLong(offset));
      }

      if (null != offsetOld) {
        if (increaseOnly) {
          // todo 更新offset
          MixAll.compareAndIncreaseOnly(offsetOld, offset);
        } else {
          offsetOld.set(offset);
        }
      }
    }
  }
}

可以看到,最终offset是更新到了内存中的offsetTable

Consumerstart的时候会开启一个定时任务,每隔5s将内存中的offsetTable同步给broker

java 复制代码
this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {

  @Override
  public void run() {
    try {
      // todo 每隔5s将内存中的offsetTable同步给broker
      MQClientInstance.this.persistAllConsumerOffset();
    } catch (Exception e) {
      log.error("ScheduledTask persistAllConsumerOffset exception", e);
    }
  }
}, 1000 * 10, this.clientConfig.getPersistConsumerOffsetInterval(), TimeUnit.MILLISECONDS);

这样就可能存在一个问题!

由于Consumer处理完消息后,并不是实时同步给broker offset,而是通过定时任务的方式

那么如果服务宕机导致最新的offset没有同步给broker,那么在服务重启后,只能根据broker之前的offset开始消费,此时就会造成重复消费的问题。


4. Broker持久化offset失败

继续接上述Consumer同步offsetbroker

java 复制代码
public class ConsumerOffsetManager extends ConfigManager {

  // todo key: topic@group, value: <messageQueueId, offset>map
  protected ConcurrentMap<String/* topic@group */, ConcurrentMap<Integer, Long>> offsetTable =
    new ConcurrentHashMap<String, ConcurrentMap<Integer, Long>>(512);
  
}

broker接收到Consumer同步offset请求后,最终也是更新到本地内存中~

同时也是开启一个每隔5s执行一次的定时任务,将内存中的offsetTable持久化成文件~

java 复制代码
// org.apache.rocketmq.broker.BrokerController#initialize
public boolean initialize() throws CloneNotSupportedException {
  
  // ......
  
  this.scheduledExecutorService.scheduleAtFixedRate(() -> {
    try {
      BrokerController.this.consumerOffsetManager.persist();
    } catch (Throwable e) {
      log.error("schedule persist consumerOffset error.", e);
    }
  }, 1000 * 10, this.brokerConfig.getFlushConsumerOffsetInterval(), TimeUnit.MILLISECONDS);
  
  // ......
  
}

出现重复消费的场景也很明显,如果broker宕机,那么最新的offset可能没及时进行持久化,那么就会造成5soffset丢失,broker重新启动后,Consumerbroker读取的"最新"的offset就是旧的,这样一来就会造成重复消费~


5. 主从同步offset失败

众所周知,RocketMq存在主从模式,既然是主从模式,那么必然存在主从节点之间的数据同步~

RocketMq从节点 默认每隔10s 会向主节点 发送请求同步数据,其中就包括offset

java 复制代码
// org.apache.rocketmq.broker.BrokerController#handleSlaveSynchronize
private void handleSlaveSynchronize(BrokerRole role) {
  // 如果是从节点
  if (role == BrokerRole.SLAVE) {
    if (null != slaveSyncFuture) {
      slaveSyncFuture.cancel(false);
    }
    
    this.slaveSynchronize.setMasterAddr(null);
    slaveSyncFuture = this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
      @Override
      public void run() {
        try {
          // todo 同步数据
          BrokerController.this.slaveSynchronize.syncAll();
        }
        catch (Throwable e) {
          log.error("ScheduledTask SlaveSynchronize syncAll error.", e);
        }
      }
    }, 1000 * 3, 1000 * 10, TimeUnit.MILLISECONDS);
  } else {
    //handle the slave synchronise
    if (null != slaveSyncFuture) {
      slaveSyncFuture.cancel(false);
    }
    this.slaveSynchronize.setMasterAddr(null);
  }
}

同理,如果主节点挂了,那么从节点会丢失10s最新的offset,如果此时从节点升级为主节点,那么Consumer拉取到的最新的offset就是旧的,也就造成了重复消费。


6. Consumer重平衡

RocketMq中,一个topic往往有多个messageQueue,一个ConsumerGroup中往往有多个Consumer,那么把这些messageQueue合理的分配给一个ConsumerGroupconsumer的过程就叫重平衡.

Consumer消费完消息后,正常情况下时需要更新offset

java 复制代码
public void processConsumeResult(
  final ConsumeConcurrentlyStatus status,
  final ConsumeConcurrentlyContext context,
  final ConsumeRequest consumeRequest
) {
  // ......
  
  // 
  long offset = consumeRequest.getProcessQueue().removeMessage(consumeRequest.getMsgs());
  
  // 如果offset >= 0且没有drop,则需要更新offset
  if (offset >= 0 && !consumeRequest.getProcessQueue().isDropped()) {
    this.defaultMQPushConsumerImpl.getOffsetStore().updateOffset(consumeRequest.getMessageQueue(), offset, true);
  }
}

但是重平衡进行的时候,Consumer也会在进行消息消费 ,所以可能存在**Consumer准备消费完准备去更新offset时,此时drop被标记为true了,这样一来 最新的offset不会被更新,故之后就可能造成重复消费**~


7. 最小位点提交

RocketMq中,Consumer会从Broker拉取一批消息,再默认情况下一个一个提交到线程池进行消费

举例: Consumer拉取到3条消息,然后提交线程池消费,其中thread1 消费 msg1, thread2 消费 msg2, thread3 消费 msg3

thread3消费比较快,先于thread1、thread2消费完成,消费完成后需要从processQueueu中移除消息,并updateOffset

此时,应该是将offset更新为msg3offset吗?

为了保证消息不丢失,此时更新的还是msg1对应的offset,即最小位点提交

结合上述案例及图示,我们可以想象,在最小位点的机制下,thread3消费成功后提交了msg1offset,此时客户端重启,重新向broker获取的offset还是msg1的,此时就会造成重复消费。


总结

通过本文,我们了解到了RocketMq7种可能出现重复消费的场景,保证消息幂等任重道远~

我是 Code皮皮虾 ,会在以后的日子里跟大家一起学习,一起进步! 觉得文章不错的话,可以在 掘金 关注我,这样就不会错过很多技术干货啦~

相关推荐
今天头发还在吗17 分钟前
【Docker】在项目中如何实现Dockerfile 文件编写
java·docker·容器
1710orange19 分钟前
java设计模式:动态代理
java·开发语言·设计模式
聪明的笨猪猪1 小时前
Java “并发工具类”面试清单(含超通俗生活案例与深度理解)
java·经验分享·笔记·面试
whltaoin1 小时前
Spring Boot 常用注解分类整理(含用法示例)
java·spring boot·后端·注解·开发技巧
唐叔在学习1 小时前
【Git神技】三步搞定指定分支克隆,团队协作效率翻倍!
git·后端
咸菜一世2 小时前
Scala的while语句循环
后端
嚴寒2 小时前
Halo 博客系统部署配置
后端
卷Java2 小时前
用户权限控制功能实现说明
java·服务器·开发语言·数据库·servlet·微信小程序·uni-app
从零开始学习人工智能2 小时前
Spring Security 实战:彻底解决 CORS 跨域凭据问题与 WebSocket 连接失败
java·websocket·spring
winrisef2 小时前
删除无限递归文件夹
java·ide·python·pycharm·系统安全