RocketMQ消息存储:存储流程

根据命令码的RequestCode.SEND_MESSAGE,可以从代码中跟踪到broker处理消息发送的逻辑, broker会将不同消息的处理封装成Processor和分配线程池,其中消息发送对应的是SendMessageProcessor,会在processRequest方法进行处理请求。

1. SendMessageProcessor

在processRequest方法中,对应消息发送会走到默认分支中,首先会解析请求的header,然后会构建SendMessageContext,然后执行发送消息逻辑。

java 复制代码
//SendMessageProcessor#processRequest
public RemotingCommand processRequest(ChannelHandlerContext ctx,
                                      RemotingCommand request) throws RemotingCommandException {
    SendMessageContext mqtraceContext;
    switch (request.getCode()) {
        case RequestCode.CONSUMER_SEND_MSG_BACK:
            return this.consumerSendMsgBack(ctx, request);
        default:
            //解析header
            SendMessageRequestHeader requestHeader = parseRequestHeader(request);
            if (requestHeader == null) {
                return null;
            }
			//构建Context
            mqtraceContext = buildMsgContext(ctx, requestHeader);
            this.executeSendMessageHookBefore(ctx, request, mqtraceContext);

            RemotingCommand response;
            if (requestHeader.isBatch()) {
                response = this.sendBatchMessage(ctx, request, mqtraceContext, requestHeader);
            } else {
            	//执行发送消息逻辑
                response = this.sendMessage(ctx, request, mqtraceContext, requestHeader);
            }

            this.executeSendMessageHookAfter(response, mqtraceContext);
            return response;
    }
}

在sendMessage逻辑里,首先会构造响应对象;然后会调用super.msgCheck校验,主要会判断topic的权限,自动创建topic逻辑、以及队列id是否正确等情况。也会处理重试和死信队列情况:超过最大重试次数,会把消息加入到死信队列;再构建内部消息请求,然后调用MessageStore的putMessage进行写消息;最后再对写消息结果进行处理。

java 复制代码
//SendMessageProcessor
private RemotingCommand sendMessage(final ChannelHandlerContext ctx,
                                    final RemotingCommand request,
                                    final SendMessageContext sendMessageContext,
                                    final SendMessageRequestHeader requestHeader) throws RemotingCommandException {

    //构造响应对象
    final RemotingCommand response = RemotingCommand.createResponseCommand(SendMessageResponseHeader.class);
    final SendMessageResponseHeader responseHeader = (SendMessageResponseHeader)response.readCustomHeader();

    response.setOpaque(request.getOpaque());
    response.addExtField(MessageConst.PROPERTY_MSG_REGION, this.brokerController.getBrokerConfig().getRegionId());
    response.addExtField(MessageConst.PROPERTY_TRACE_SWITCH, String.valueOf(this.brokerController.getBrokerConfig().isTraceOn()));
    response.setCode(-1);
    //消息校验,主要是topic的校验
    super.msgCheck(ctx, requestHeader, response);
    if (response.getCode() != -1) {
        return response;
    }

    final byte[] body = request.getBody();

    //内部消息构建
    MessageExtBrokerInner msgInner = new MessageExtBrokerInner();
    msgInner.setTopic(requestHeader.getTopic());
    msgInner.setQueueId(queueIdInt);

    //处理重试和死信队列
    if (!handleRetryAndDLQ(requestHeader, response, request, msgInner, topicConfig)) {
        return response;
    }
	//构造msgInner内容
    msgInner.setBody(body);
    msgInner.setFlag(requestHeader.getFlag());
	//...
    String traFlag = oriProps.get(MessageConst.PROPERTY_TRANSACTION_PREPARED);
    //调用消息存储的方法
	putMessageResult = this.brokerController.getMessageStore().putMessage(msgInner);
	//响应结果处理
    return handlePutMessageResult(putMessageResult, response, request, msgInner, responseHeader, sendMessageContext, ctx, queueIdInt);

}

2. DefaultMesageStore

MessageStore的默认实现为DefaultMesageStore。在putMessage内部,首先会判断broker的状态,当出现以下情况是不可写消息:

  • broker处于关闭状态:当broker出于关闭时,会设置shutdown为true
  • borker为SLAVE角色:在4.4版本中,RocketMQ的Slave不具备写消息的能力,所以slave无法写消息。当然正常情况下消息是不会发送到slave上。
  • broker不可写入:当磁盘空间使用率超过90%(默认配置)的情况下,将不能进行写入。所以RocketMQ需要重点关注磁盘空间使用情况。
java 复制代码
//DefaultMessageStore#putMessage(MessageExtBrokerInner msg)
//broker是否关闭
if (this.shutdown) {
    log.warn("message store has shutdown, so putMessage is forbidden");
    return new PutMessageResult(PutMessageStatus.SERVICE_NOT_AVAILABLE, null);
}
//broker是否是slave
if (BrokerRole.SLAVE == this.messageStoreConfig.getBrokerRole()) {
    long value = this.printTimes.getAndIncrement();
    if ((value % 50000) == 0) {
        log.warn("message store is slave mode, so putMessage is forbidden ");
    }

    return new PutMessageResult(PutMessageStatus.SERVICE_NOT_AVAILABLE, null);
}
//broker不可写
if (!this.runningFlags.isWriteable()) {
    long value = this.printTimes.getAndIncrement();
    if ((value % 50000) == 0) {
        log.warn("message store is not writeable, so putMessage is forbidden " + this.runningFlags.getFlagBits());
    }

    return new PutMessageResult(PutMessageStatus.SERVICE_NOT_AVAILABLE, null);
} else {
    this.printTimes.set(0);
}

然后会判断消息的Topic长度是否超过127字符,消息属性长度是否超过32767字节,超出都不能写消息。

客户端上的发送消息时限制的Topic长度为255字符,以broker为准。在4.6.1版本中,统一为127字符。

java 复制代码
//DefaultMessageStore#putMessage(MessageExtBrokerInner msg)
//topic长度超过127字节
if (msg.getTopic().length() > Byte.MAX_VALUE) {
    log.warn("putMessage message topic length too long " + msg.getTopic().length());
    return new PutMessageResult(PutMessageStatus.MESSAGE_ILLEGAL, null);
}
//消息属性超过32767字节
if (msg.getPropertiesString() != null && msg.getPropertiesString().length() > Short.MAX_VALUE) {
    log.warn("putMessage message properties length too long " + msg.getPropertiesString().length());
    return new PutMessageResult(PutMessageStatus.PROPERTIES_SIZE_EXCEEDED, null);
}

还会判断系统的pageCache是否繁忙,那么是怎么判断的呢?

java 复制代码
//DefaultMessageStore#putMessage(MessageExtBrokerInner msg)
//判断操作系统的pageCache是否繁忙
if (this.isOSPageCacheBusy()) {
    return new PutMessageResult(PutMessageStatus.OS_PAGECACHE_BUSY, null);
}

在实际写消息的时候,会进行加锁,那么就会记录加锁时间,如果当前时间减去加锁时间,超过配置osPageCacheBusyTimeOutMills=1000(默认值),那么就会判断pageCache繁忙。那么为什么是pageCache繁忙呢,因为默认情况下,写消息到文件都会进行内存映射,那么写消息,实际就是写到内存,也就是pageCache,如果写消息慢,也可以理解为写pageCache慢,此为方法命名的原因。在并发异步发送,空闲内存较少的情况下,容易处理pageCache busy,可以根据测试情况,适当调整osPageCacheBusyTimeOutMills的值。

java 复制代码
//DefaultMessageStore#isOSPageCacheBusy
public boolean isOSPageCacheBusy() {
    long begin = this.getCommitLog().getBeginTimeInLock();
    long diff = this.systemClock.now() - begin;

    return diff < 10000000
            && diff > this.messageStoreConfig.getOsPageCacheBusyTimeOutMills();
}

3. CommitLog

写消息会委托给CommitLog进行处理,再看下CommitLog的putMessage方法

如果消息的延迟级别大于0,将消息的原主题名称和原消息的队列Id,存入消息的属性中,用延迟消息主题SCHEDULE\_TOPIC\_XXXX、消息队列id(延迟级别)更新原先消息的主题与队列。这里是并发消息消费重试关键的一步,因为消息重试实际就是通过延迟消息实现的

java 复制代码
//CommitLog#putMessage(final MessageExtBrokerInner msg)
if (msg.getDelayTimeLevel() > 0) {
    if (msg.getDelayTimeLevel() > this.defaultMessageStore.getScheduleMessageService().getMaxDelayLevel()) {
        msg.setDelayTimeLevel(this.defaultMessageStore.getScheduleMessageService().getMaxDelayLevel());
    }

    topic = ScheduleMessageService.SCHEDULE_TOPIC;
    queueId = ScheduleMessageService.delayLevel2QueueId(msg.getDelayTimeLevel());

    // Backup real topic, queueId
    MessageAccessor.putProperty(msg, MessageConst.PROPERTY_REAL_TOPIC, msg.getTopic());
    MessageAccessor.putProperty(msg, MessageConst.PROPERTY_REAL_QUEUE_ID, String.valueOf(msg.getQueueId()));
    msg.setPropertiesString(MessageDecoder.messageProperties2String(msg.getProperties()));

    msg.setTopic(topic);
    msg.setQueueId(queueId);
}

获取当前可以写入的Commitlog文件,并且在写消息前会进行加锁,这也说明消息存储到CommitLog文件是串行的。在默认情况下,(useReentrantLockWhenPutMessage=false) ,默认会使用自旋锁,也可以通过修改配置为重入锁。当消息写完后会释放锁。

因为是自旋锁,所以实际上broker在对SendMessageProcessor绑定的线程池,也就只有一个线程(默认情况下sendMessageThreadPoolNums=1),这样配合下才不会浪费CPU。

java 复制代码
//CommitLog#putMessage(final MessageExtBrokerInner msg)
MappedFile unlockMappedFile = null;
MappedFile mappedFile = this.mappedFileQueue.getLastMappedFile();

putMessageLock.lock(); //spin or ReentrantLock ,depending on store config

如果mappedFile为空,表明commitlog目录下没有任何文件,说明本次消息是第一次消息发送,会用偏移量0创建第一个文件,如果文件创建失败,抛出异常。

java 复制代码
//CommitLog#putMessage(final MessageExtBrokerInner msg)

if (null == mappedFile || mappedFile.isFull()) {
    mappedFile = this.mappedFileQueue.getLastMappedFile(0); // Mark: NewFile may be cause noise
}
if (null == mappedFile) {
    log.error("create mapped file1 error, topic: " + msg.getTopic() + " clientAddr: " + msg.getBornHostString());
    beginTimeInLock = 0;
    return new PutMessageResult(PutMessageStatus.CREATE_MAPEDFILE_FAILED, null);
}

会将消息追加到MappedFile中。先获取MappedFile当前写指针,如果currentPos大于等于文件大小,则说明文件已写满,抛出异常。如果currentPos小于文件大小,通过slice()方法创建一个与MappedFile的共享内存区,并设置position为当前storeTimestamp指针。然后会通过AppendMessageCallback执行消息的写入逻辑。

java 复制代码
//MappedFile#appendMessagesInner
//获取MappedFile当前指针
int currentPos = this.wrotePosition.get();
//当前指针小于文件大小,通过slice()创建一个与MappedFile的共享内存区,并设置position为当前指针
if (currentPos < this.fileSize) {
    ByteBuffer byteBuffer = writeBuffer != null ? writeBuffer.slice() : this.mappedByteBuffer.slice();
    byteBuffer.position(currentPos);
    AppendMessageResult result = null;
    if (messageExt instanceof MessageExtBrokerInner) {
        //单条消息写入
        result = cb.doAppend(this.getFileFromOffset(), byteBuffer, this.fileSize - currentPos, (MessageExtBrokerInner) messageExt);
    } else if (messageExt instanceof MessageExtBatch) {
        result = cb.doAppend(this.getFileFromOffset(), byteBuffer, this.fileSize - currentPos, (MessageExtBatch) messageExt);
    } else {
        return new AppendMessageResult(AppendMessageStatus.UNKNOWN_ERROR);
    }
    this.wrotePosition.addAndGet(result.getWroteBytes());
    this.storeTimestamp = result.getStoreTimestamp();
    return result;
}

4. AppendMessageCallback

创建全局唯一消息Id,消息id有16个字节。但是为了消息Id可读性,返回给应用程序的msgId为字符串类型。可以通过UtilAll.bytes2string将msgId字节数组装成字符串,可以通过UtilAll.string2bytes将msgId字符串还原成16个字节的字符数组,从而根据提取消息偏移量,可以快速通过msgId找到消息内容。

其中msgId是通过:4字节的ip + 4字节的端口号 + 8 字节的偏移量,编码成16字节的字符串。

java 复制代码
//CommitLog$DefaultAppendMessageCallback#doAppend

long wroteOffset = fileFromOffset + byteBuffer.position();

this.resetByteBuffer(hostHolder, 8);
// 4字节IP | 4字节端口号 | 8字节消息偏移量 将字节数组,转成字符串。可以反解msgId获取消息偏移量
String msgId = MessageDecoder.createMessageId(this.msgIdMemory, msgInner.getStoreHostBytes(hostHolder), wroteOffset);

获取该消息在消息队列的偏移量。Commitlog中保存了当前所有消息队列的当前待写入的偏移量在topicQueueTable中,该条消息写入后,会维护下一条消息的偏移量,在原来的基础上+1。

java 复制代码
//CommitLog$DefaultAppendMessageCallback#doAppend
keyBuilder.setLength(0);
keyBuilder.append(msgInner.getTopic());
keyBuilder.append('-');
keyBuilder.append(msgInner.getQueueId());
String key = keyBuilder.toString();
Long queueOffset = CommitLog.this.topicQueueTable.get(key);
if (null == queueOffset) {
    queueOffset = 0L;
    CommitLog.this.topicQueueTable.put(key, queueOffset);
}

根据消息体的长度、主题长度、属性长度、结合存储格式计算消息总长度。

java 复制代码
//CommitLog#calMsgLength
protected static int calMsgLength(int bodyLength, int topicLength, int propertiesLength) {
    final int msgLen = 4 //TOTALSIZE 消息条目总长度,4字节
        + 4 //MAGICCODE 魔数,4字节 固定0xdaa320a7
        + 4 //BODYCRC 消息体crc校验码,4字节
        + 4 //QUEUEID 消息消费队列ID,4字节
        + 4 //FLAG,消息Flag,RocketMQ不处理,供应用程序使用,4字节
        + 8 //QUEUEOFFSET 消息消费队列的偏移量,8字节
        + 8 //PHYSICALOFFSET 消息在commitlog文件的偏移量,8字节
        + 4 //SYSFLAG 消息系统Flag,4字节
        + 8 //BORNTIMESTAMP 消息生产者调用消息发送Api时间戳,8字节
        + 8 //BORNHOST 发送者IP+端口号 8字节
        + 8 //STORETIMESTAMP  消息存储时间戳 8字节
        + 8 //STOREHOSTADDRESS broker服务器IP + 端口号 8字节
        + 4 //RECONSUMETIMES 消息重试次数 4字节
        + 8 //Prepared Transaction Offset
        + 4 + (bodyLength > 0 ? bodyLength : 0) //BODY body长度 和 body信息
        + 1 + topicLength //TOPIC 主题存储长度 ,表示主题名称不能超过255个字符,和 主题
        + 2 + (propertiesLength > 0 ? propertiesLength : 0) //propertiesLength 消息属性长度,2字节,表示消息长度不能超过65536个字符 和消息属性
        + 0;
    return msgLen;

如果消息长度 + END_FILE_MIN_BLANK_LENGTH(默认8字节)大于CommitLog文件的空闲空间,返回ENF_OF_FILE,Broker会重新创建一个新的Commitlog文件存储该消息。可以看出Commitlog文件最少会空闲8个字节,高4字节存储当前文件剩余空间,低4字节存储魔数:0xcbd43194

java 复制代码
//CommitLog$DefaultAppendMessageCallback#doAppend
// Determines whether there is sufficient free space
if ((msgLen + END_FILE_MIN_BLANK_LENGTH) > maxBlank) {
    this.resetByteBuffer(this.msgStoreItemMemory, maxBlank);
    // 1 TOTALSIZE
    this.msgStoreItemMemory.putInt(maxBlank);
    // 2 MAGICCODE
    this.msgStoreItemMemory.putInt(CommitLog.BLANK_MAGIC_CODE);
    // 3 The remaining space may be any value
    // Here the length of the specially set maxBlank
    final long beginTimeMills = CommitLog.this.defaultMessageStore.now();
    byteBuffer.put(this.msgStoreItemMemory.array(), 0, maxBlank);
    return new AppendMessageResult(AppendMessageStatus.END_OF_FILE, wroteOffset, maxBlank, msgId, msgInner.getStoreTimestamp(),
        queueOffset, CommitLog.this.defaultMessageStore.now() - beginTimeMills);
}

然后会在内存中对消息进行写入,写入的顺序在calMsgLength一样,这里就不在赘述。

然后会将消息内容存储到ByteBuffer中,然后创建返回结果,这里只是将消息存储到MappedFile对应的内存映射Buffer,并没有刷到磁盘

java 复制代码
//CommitLog$DefaultAppendMessageCallback#doAppend
final long beginTimeMills = CommitLog.this.defaultMessageStore.now();
// Write messages to the queue buffer
byteBuffer.put(this.msgStoreItemMemory.array(), 0, msgLen);

AppendMessageResult result = new AppendMessageResult(AppendMessageStatus.PUT_OK, wroteOffset, msgLen, msgId,
    msgInner.getStoreTimestamp(), queueOffset, CommitLog.this.defaultMessageStore.now() - beginTimeMills);

追加结果AppendMessageResult如下:

java 复制代码
// Return code 消息追加结果
private AppendMessageStatus status;
// Where to start writing 消息的物理偏移量
private long wroteOffset;
// Write Bytes 
private int wroteBytes;
// Message ID 消息Id
private String msgId;
// Message storage timestamp 消息存储时间戳
private long storeTimestamp;
// Consume queue's offset(step by one) 消息消费队列逻辑偏移量,类似数值下标
private long logicsOffset;
private long pagecacheRT = 0; //当前未使用

private int msgNum = 1; //消息条数,批量消息发送时消息条数

public enum AppendMessageStatus {
    PUT_OK,//追加成功
    END_OF_FILE,//超过文件大小
    MESSAGE_SIZE_EXCEEDED,//消息长度超过最大允许长度
    PROPERTIES_SIZE_EXCEEDED,//消息属性超过最大允许长度
    UNKNOWN_ERROR,//未知异常
}

当消息写完后,会更新消费队列的逻辑偏移量, 在原来的基础上+1.

java 复制代码
//CommitLog$DefaultAppendMessageCallback#doAppend
case MessageSysFlag.TRANSACTION_NOT_TYPE:
case MessageSysFlag.TRANSACTION_COMMIT_TYPE:
    // The next update ConsumeQueue information
    CommitLog.this.topicQueueTable.put(key, ++queueOffset);
    break;

5.CommitLog结果处理

前面只是将消息写入到内存映射中,还没有刷盘,会根据配置处理刷盘以及HA的逻辑。

java 复制代码
//Commitlog#putMessage(final MessageExtBrokerInner msg) 
handleDiskFlush(result, putMessageResult, msg);
handleHA(result, putMessageResult, msg);

return putMessageResult;

会根据配置flushDiskType决定刷盘策略:

  • SYNC_FLUSH: 如果消息没有设置属性WAIT或者设置WAIT为true,则会等消息刷盘成功,刷盘超时时间默认为5s;否则的话,只会通知刷盘。
  • ASYNC_FLUSH:会唤醒线程,执行异步刷盘逻辑。
java 复制代码
public void handleDiskFlush(AppendMessageResult result, PutMessageResult putMessageResult, MessageExt messageExt) {
    // Synchronization flush
    if (FlushDiskType.SYNC_FLUSH == this.defaultMessageStore.getMessageStoreConfig().getFlushDiskType()) {
        final GroupCommitService service = (GroupCommitService) this.flushCommitLogService;
        if (messageExt.isWaitStoreMsgOK()) {
            GroupCommitRequest request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes());
            service.putRequest(request);
            boolean flushOK = request.waitForFlush(this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout());
            if (!flushOK) {
                putMessageResult.setPutMessageStatus(PutMessageStatus.FLUSH_DISK_TIMEOUT);
            }
        } else {
            service.wakeup();
        }
    }
    // Asynchronous flush
    else {
        if (!this.defaultMessageStore.getMessageStoreConfig().isTransientStorePoolEnable()) {
            flushCommitLogService.wakeup();
        } else {
            commitLogService.wakeup();
        }
    }
}

会根据brokerRole的情况,决定发送到SLAVE的策略:

  • SYNC_MASTER:同样也会判断消息是否设置属性WAIT或者设置WAIT为true,则会等消息刷盘成功。然后会通过HAService判断是否slave是否需要刷盘(内部判断为是否有连接、偏移进度是否在合理的范围),然后会通知进行处理刷盘。
  • ASYNC_MASTER、SLAVE:不处理,等SLAVE主动拉取。
java 复制代码
public void handleHA(AppendMessageResult result, PutMessageResult putMessageResult, MessageExt messageExt) {
    if (BrokerRole.SYNC_MASTER == this.defaultMessageStore.getMessageStoreConfig().getBrokerRole()) {
        HAService service = this.defaultMessageStore.getHaService();
        if (messageExt.isWaitStoreMsgOK()) {
            // Determine whether to wait
            if (service.isSlaveOK(result.getWroteOffset() + result.getWroteBytes())) {
                GroupCommitRequest request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes());
                service.putRequest(request);
                service.getWaitNotifyObject().wakeupAll();
                boolean flushOK =
                    request.waitForFlush(this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout());
                if (!flushOK) {
                    putMessageResult.setPutMessageStatus(PutMessageStatus.FLUSH_SLAVE_TIMEOUT);
                }
            }
            // Slave problem
            else {
                // Tell the producer, slave not available
                putMessageResult.setPutMessageStatus(PutMessageStatus.SLAVE_NOT_AVAILABLE);
            }
        }
    }

}

6.SendMessageProcessor结果处理

最后回到SendMessageProcessor中,会在handlePutMessageResult方法中,对写消息的结果进行处理。写消息结果为:PUT_OK、FLUSH_DISK_TIMEOUT、FLUSH_SLAVE_TIMEOUT、SLAVE_NOT_AVAILABLE都认为消息发送成功,然后会对响应头设置msgId、queueId、queueOffset。当不是oneway发送时,把响应结果发送给客户端,也就是oneway不会发送响应结果给客户端

java 复制代码
//SendMessageProcessor#handlePutMessageResult
responseHeader.setMsgId(putMessageResult.getAppendMessageResult().getMsgId());
responseHeader.setQueueId(queueIdInt);
responseHeader.setQueueOffset(putMessageResult.getAppendMessageResult().getLogicsOffset());

doResponse(ctx, request, response);
java 复制代码
//AbstractSendMessageProcessor#doResponse
protected void doResponse(ChannelHandlerContext ctx, RemotingCommand request,
    final RemotingCommand response) {
    if (!request.isOnewayRPC()) {
        try {
            ctx.writeAndFlush(response);
        } catch (Throwable e) {
           //...
        }
    }
}

7.参考链接

相关推荐
m0_571957581 小时前
Java | Leetcode Java题解之第543题二叉树的直径
java·leetcode·题解
魔道不误砍柴功3 小时前
Java 中如何巧妙应用 Function 让方法复用性更强
java·开发语言·python
NiNg_1_2343 小时前
SpringBoot整合SpringSecurity实现密码加密解密、登录认证退出功能
java·spring boot·后端
闲晨3 小时前
C++ 继承:代码传承的魔法棒,开启奇幻编程之旅
java·c语言·开发语言·c++·经验分享
Chrikk4 小时前
Go-性能调优实战案例
开发语言·后端·golang
幼儿园老大*4 小时前
Go的环境搭建以及GoLand安装教程
开发语言·经验分享·后端·golang·go
canyuemanyue4 小时前
go语言连续监控事件并回调处理
开发语言·后端·golang
杜杜的man4 小时前
【go从零单排】go语言中的指针
开发语言·后端·golang
测开小菜鸟4 小时前
使用python向钉钉群聊发送消息
java·python·钉钉
P.H. Infinity5 小时前
【RabbitMQ】04-发送者可靠性
java·rabbitmq·java-rabbitmq