根据命令码的RequestCode.SEND_MESSAGE
,可以从代码中跟踪到broker处理消息发送的逻辑, broker会将不同消息的处理封装成Processor和分配线程池,其中消息发送对应的是SendMessageProcessor,会在processRequest方法进行处理请求。
1. SendMessageProcessor
在processRequest方法中,对应消息发送会走到默认分支中,首先会解析请求的header,然后会构建SendMessageContext,然后执行发送消息逻辑。
java
//SendMessageProcessor#processRequest
public RemotingCommand processRequest(ChannelHandlerContext ctx,
RemotingCommand request) throws RemotingCommandException {
SendMessageContext mqtraceContext;
switch (request.getCode()) {
case RequestCode.CONSUMER_SEND_MSG_BACK:
return this.consumerSendMsgBack(ctx, request);
default:
//解析header
SendMessageRequestHeader requestHeader = parseRequestHeader(request);
if (requestHeader == null) {
return null;
}
//构建Context
mqtraceContext = buildMsgContext(ctx, requestHeader);
this.executeSendMessageHookBefore(ctx, request, mqtraceContext);
RemotingCommand response;
if (requestHeader.isBatch()) {
response = this.sendBatchMessage(ctx, request, mqtraceContext, requestHeader);
} else {
//执行发送消息逻辑
response = this.sendMessage(ctx, request, mqtraceContext, requestHeader);
}
this.executeSendMessageHookAfter(response, mqtraceContext);
return response;
}
}
在sendMessage逻辑里,首先会构造响应对象;然后会调用super.msgCheck
校验,主要会判断topic的权限,自动创建topic逻辑、以及队列id是否正确等情况。也会处理重试和死信队列情况:超过最大重试次数,会把消息加入到死信队列;再构建内部消息请求,然后调用MessageStore的putMessage进行写消息;最后再对写消息结果进行处理。
java
//SendMessageProcessor
private RemotingCommand sendMessage(final ChannelHandlerContext ctx,
final RemotingCommand request,
final SendMessageContext sendMessageContext,
final SendMessageRequestHeader requestHeader) throws RemotingCommandException {
//构造响应对象
final RemotingCommand response = RemotingCommand.createResponseCommand(SendMessageResponseHeader.class);
final SendMessageResponseHeader responseHeader = (SendMessageResponseHeader)response.readCustomHeader();
response.setOpaque(request.getOpaque());
response.addExtField(MessageConst.PROPERTY_MSG_REGION, this.brokerController.getBrokerConfig().getRegionId());
response.addExtField(MessageConst.PROPERTY_TRACE_SWITCH, String.valueOf(this.brokerController.getBrokerConfig().isTraceOn()));
response.setCode(-1);
//消息校验,主要是topic的校验
super.msgCheck(ctx, requestHeader, response);
if (response.getCode() != -1) {
return response;
}
final byte[] body = request.getBody();
//内部消息构建
MessageExtBrokerInner msgInner = new MessageExtBrokerInner();
msgInner.setTopic(requestHeader.getTopic());
msgInner.setQueueId(queueIdInt);
//处理重试和死信队列
if (!handleRetryAndDLQ(requestHeader, response, request, msgInner, topicConfig)) {
return response;
}
//构造msgInner内容
msgInner.setBody(body);
msgInner.setFlag(requestHeader.getFlag());
//...
String traFlag = oriProps.get(MessageConst.PROPERTY_TRANSACTION_PREPARED);
//调用消息存储的方法
putMessageResult = this.brokerController.getMessageStore().putMessage(msgInner);
//响应结果处理
return handlePutMessageResult(putMessageResult, response, request, msgInner, responseHeader, sendMessageContext, ctx, queueIdInt);
}
2. DefaultMesageStore
MessageStore的默认实现为DefaultMesageStore。在putMessage内部,首先会判断broker的状态,当出现以下情况是不可写消息:
- broker处于关闭状态:当broker出于关闭时,会设置shutdown为true
- borker为SLAVE角色:在4.4版本中,RocketMQ的Slave不具备写消息的能力,所以slave无法写消息。当然正常情况下消息是不会发送到slave上。
- broker不可写入:当磁盘空间使用率超过90%(默认配置)的情况下,将不能进行写入。所以RocketMQ需要重点关注磁盘空间使用情况。
java
//DefaultMessageStore#putMessage(MessageExtBrokerInner msg)
//broker是否关闭
if (this.shutdown) {
log.warn("message store has shutdown, so putMessage is forbidden");
return new PutMessageResult(PutMessageStatus.SERVICE_NOT_AVAILABLE, null);
}
//broker是否是slave
if (BrokerRole.SLAVE == this.messageStoreConfig.getBrokerRole()) {
long value = this.printTimes.getAndIncrement();
if ((value % 50000) == 0) {
log.warn("message store is slave mode, so putMessage is forbidden ");
}
return new PutMessageResult(PutMessageStatus.SERVICE_NOT_AVAILABLE, null);
}
//broker不可写
if (!this.runningFlags.isWriteable()) {
long value = this.printTimes.getAndIncrement();
if ((value % 50000) == 0) {
log.warn("message store is not writeable, so putMessage is forbidden " + this.runningFlags.getFlagBits());
}
return new PutMessageResult(PutMessageStatus.SERVICE_NOT_AVAILABLE, null);
} else {
this.printTimes.set(0);
}
然后会判断消息的Topic长度是否超过127字符,消息属性长度是否超过32767字节,超出都不能写消息。
客户端上的发送消息时限制的Topic长度为255字符,以broker为准。在4.6.1版本中,统一为127字符。
java
//DefaultMessageStore#putMessage(MessageExtBrokerInner msg)
//topic长度超过127字节
if (msg.getTopic().length() > Byte.MAX_VALUE) {
log.warn("putMessage message topic length too long " + msg.getTopic().length());
return new PutMessageResult(PutMessageStatus.MESSAGE_ILLEGAL, null);
}
//消息属性超过32767字节
if (msg.getPropertiesString() != null && msg.getPropertiesString().length() > Short.MAX_VALUE) {
log.warn("putMessage message properties length too long " + msg.getPropertiesString().length());
return new PutMessageResult(PutMessageStatus.PROPERTIES_SIZE_EXCEEDED, null);
}
还会判断系统的pageCache是否繁忙,那么是怎么判断的呢?
java
//DefaultMessageStore#putMessage(MessageExtBrokerInner msg)
//判断操作系统的pageCache是否繁忙
if (this.isOSPageCacheBusy()) {
return new PutMessageResult(PutMessageStatus.OS_PAGECACHE_BUSY, null);
}
在实际写消息的时候,会进行加锁,那么就会记录加锁时间,如果当前时间减去加锁时间,超过配置osPageCacheBusyTimeOutMills=1000(默认值),那么就会判断pageCache繁忙。那么为什么是pageCache繁忙呢,因为默认情况下,写消息到文件都会进行内存映射,那么写消息,实际就是写到内存,也就是pageCache,如果写消息慢,也可以理解为写pageCache慢,此为方法命名的原因。在并发异步发送,空闲内存较少的情况下,容易处理pageCache busy,可以根据测试情况,适当调整osPageCacheBusyTimeOutMills的值。
java
//DefaultMessageStore#isOSPageCacheBusy
public boolean isOSPageCacheBusy() {
long begin = this.getCommitLog().getBeginTimeInLock();
long diff = this.systemClock.now() - begin;
return diff < 10000000
&& diff > this.messageStoreConfig.getOsPageCacheBusyTimeOutMills();
}
3. CommitLog
写消息会委托给CommitLog进行处理,再看下CommitLog的putMessage方法
如果消息的延迟级别大于0,将消息的原主题名称和原消息的队列Id,存入消息的属性中,用延迟消息主题SCHEDULE\_TOPIC\_XXXX
、消息队列id(延迟级别)更新原先消息的主题与队列。这里是并发消息消费重试关键的一步,因为消息重试实际就是通过延迟消息实现的
java
//CommitLog#putMessage(final MessageExtBrokerInner msg)
if (msg.getDelayTimeLevel() > 0) {
if (msg.getDelayTimeLevel() > this.defaultMessageStore.getScheduleMessageService().getMaxDelayLevel()) {
msg.setDelayTimeLevel(this.defaultMessageStore.getScheduleMessageService().getMaxDelayLevel());
}
topic = ScheduleMessageService.SCHEDULE_TOPIC;
queueId = ScheduleMessageService.delayLevel2QueueId(msg.getDelayTimeLevel());
// Backup real topic, queueId
MessageAccessor.putProperty(msg, MessageConst.PROPERTY_REAL_TOPIC, msg.getTopic());
MessageAccessor.putProperty(msg, MessageConst.PROPERTY_REAL_QUEUE_ID, String.valueOf(msg.getQueueId()));
msg.setPropertiesString(MessageDecoder.messageProperties2String(msg.getProperties()));
msg.setTopic(topic);
msg.setQueueId(queueId);
}
获取当前可以写入的Commitlog文件,并且在写消息前会进行加锁,这也说明消息存储到CommitLog文件是串行的。在默认情况下,(useReentrantLockWhenPutMessage=false) ,默认会使用自旋锁,也可以通过修改配置为重入锁。当消息写完后会释放锁。
因为是自旋锁,所以实际上broker在对SendMessageProcessor绑定的线程池,也就只有一个线程(默认情况下sendMessageThreadPoolNums=1),这样配合下才不会浪费CPU。
java
//CommitLog#putMessage(final MessageExtBrokerInner msg)
MappedFile unlockMappedFile = null;
MappedFile mappedFile = this.mappedFileQueue.getLastMappedFile();
putMessageLock.lock(); //spin or ReentrantLock ,depending on store config
如果mappedFile为空,表明commitlog目录下没有任何文件,说明本次消息是第一次消息发送,会用偏移量0创建第一个文件,如果文件创建失败,抛出异常。
java
//CommitLog#putMessage(final MessageExtBrokerInner msg)
if (null == mappedFile || mappedFile.isFull()) {
mappedFile = this.mappedFileQueue.getLastMappedFile(0); // Mark: NewFile may be cause noise
}
if (null == mappedFile) {
log.error("create mapped file1 error, topic: " + msg.getTopic() + " clientAddr: " + msg.getBornHostString());
beginTimeInLock = 0;
return new PutMessageResult(PutMessageStatus.CREATE_MAPEDFILE_FAILED, null);
}
会将消息追加到MappedFile中。先获取MappedFile当前写指针,如果currentPos大于等于文件大小,则说明文件已写满,抛出异常。如果currentPos小于文件大小,通过slice()方法创建一个与MappedFile的共享内存区,并设置position为当前storeTimestamp指针。然后会通过AppendMessageCallback执行消息的写入逻辑。
java
//MappedFile#appendMessagesInner
//获取MappedFile当前指针
int currentPos = this.wrotePosition.get();
//当前指针小于文件大小,通过slice()创建一个与MappedFile的共享内存区,并设置position为当前指针
if (currentPos < this.fileSize) {
ByteBuffer byteBuffer = writeBuffer != null ? writeBuffer.slice() : this.mappedByteBuffer.slice();
byteBuffer.position(currentPos);
AppendMessageResult result = null;
if (messageExt instanceof MessageExtBrokerInner) {
//单条消息写入
result = cb.doAppend(this.getFileFromOffset(), byteBuffer, this.fileSize - currentPos, (MessageExtBrokerInner) messageExt);
} else if (messageExt instanceof MessageExtBatch) {
result = cb.doAppend(this.getFileFromOffset(), byteBuffer, this.fileSize - currentPos, (MessageExtBatch) messageExt);
} else {
return new AppendMessageResult(AppendMessageStatus.UNKNOWN_ERROR);
}
this.wrotePosition.addAndGet(result.getWroteBytes());
this.storeTimestamp = result.getStoreTimestamp();
return result;
}
4. AppendMessageCallback
创建全局唯一消息Id,消息id有16个字节。但是为了消息Id可读性,返回给应用程序的msgId为字符串类型。可以通过UtilAll.bytes2string将msgId字节数组装成字符串,可以通过UtilAll.string2bytes将msgId字符串还原成16个字节的字符数组,从而根据提取消息偏移量,可以快速通过msgId找到消息内容。
其中msgId是通过:4字节的ip + 4字节的端口号 + 8 字节的偏移量,编码成16字节的字符串。
java
//CommitLog$DefaultAppendMessageCallback#doAppend
long wroteOffset = fileFromOffset + byteBuffer.position();
this.resetByteBuffer(hostHolder, 8);
// 4字节IP | 4字节端口号 | 8字节消息偏移量 将字节数组,转成字符串。可以反解msgId获取消息偏移量
String msgId = MessageDecoder.createMessageId(this.msgIdMemory, msgInner.getStoreHostBytes(hostHolder), wroteOffset);
获取该消息在消息队列的偏移量。Commitlog中保存了当前所有消息队列的当前待写入的偏移量在topicQueueTable中,该条消息写入后,会维护下一条消息的偏移量,在原来的基础上+1。
java
//CommitLog$DefaultAppendMessageCallback#doAppend
keyBuilder.setLength(0);
keyBuilder.append(msgInner.getTopic());
keyBuilder.append('-');
keyBuilder.append(msgInner.getQueueId());
String key = keyBuilder.toString();
Long queueOffset = CommitLog.this.topicQueueTable.get(key);
if (null == queueOffset) {
queueOffset = 0L;
CommitLog.this.topicQueueTable.put(key, queueOffset);
}
根据消息体的长度、主题长度、属性长度、结合存储格式计算消息总长度。
java
//CommitLog#calMsgLength
protected static int calMsgLength(int bodyLength, int topicLength, int propertiesLength) {
final int msgLen = 4 //TOTALSIZE 消息条目总长度,4字节
+ 4 //MAGICCODE 魔数,4字节 固定0xdaa320a7
+ 4 //BODYCRC 消息体crc校验码,4字节
+ 4 //QUEUEID 消息消费队列ID,4字节
+ 4 //FLAG,消息Flag,RocketMQ不处理,供应用程序使用,4字节
+ 8 //QUEUEOFFSET 消息消费队列的偏移量,8字节
+ 8 //PHYSICALOFFSET 消息在commitlog文件的偏移量,8字节
+ 4 //SYSFLAG 消息系统Flag,4字节
+ 8 //BORNTIMESTAMP 消息生产者调用消息发送Api时间戳,8字节
+ 8 //BORNHOST 发送者IP+端口号 8字节
+ 8 //STORETIMESTAMP 消息存储时间戳 8字节
+ 8 //STOREHOSTADDRESS broker服务器IP + 端口号 8字节
+ 4 //RECONSUMETIMES 消息重试次数 4字节
+ 8 //Prepared Transaction Offset
+ 4 + (bodyLength > 0 ? bodyLength : 0) //BODY body长度 和 body信息
+ 1 + topicLength //TOPIC 主题存储长度 ,表示主题名称不能超过255个字符,和 主题
+ 2 + (propertiesLength > 0 ? propertiesLength : 0) //propertiesLength 消息属性长度,2字节,表示消息长度不能超过65536个字符 和消息属性
+ 0;
return msgLen;
如果消息长度 + END_FILE_MIN_BLANK_LENGTH(默认8字节)大于CommitLog文件的空闲空间,返回ENF_OF_FILE,Broker会重新创建一个新的Commitlog文件存储该消息。可以看出Commitlog文件最少会空闲8个字节,高4字节存储当前文件剩余空间,低4字节存储魔数:0xcbd43194
java
//CommitLog$DefaultAppendMessageCallback#doAppend
// Determines whether there is sufficient free space
if ((msgLen + END_FILE_MIN_BLANK_LENGTH) > maxBlank) {
this.resetByteBuffer(this.msgStoreItemMemory, maxBlank);
// 1 TOTALSIZE
this.msgStoreItemMemory.putInt(maxBlank);
// 2 MAGICCODE
this.msgStoreItemMemory.putInt(CommitLog.BLANK_MAGIC_CODE);
// 3 The remaining space may be any value
// Here the length of the specially set maxBlank
final long beginTimeMills = CommitLog.this.defaultMessageStore.now();
byteBuffer.put(this.msgStoreItemMemory.array(), 0, maxBlank);
return new AppendMessageResult(AppendMessageStatus.END_OF_FILE, wroteOffset, maxBlank, msgId, msgInner.getStoreTimestamp(),
queueOffset, CommitLog.this.defaultMessageStore.now() - beginTimeMills);
}
然后会在内存中对消息进行写入,写入的顺序在calMsgLength一样,这里就不在赘述。
然后会将消息内容存储到ByteBuffer中,然后创建返回结果,这里只是将消息存储到MappedFile对应的内存映射Buffer,并没有刷到磁盘。
java
//CommitLog$DefaultAppendMessageCallback#doAppend
final long beginTimeMills = CommitLog.this.defaultMessageStore.now();
// Write messages to the queue buffer
byteBuffer.put(this.msgStoreItemMemory.array(), 0, msgLen);
AppendMessageResult result = new AppendMessageResult(AppendMessageStatus.PUT_OK, wroteOffset, msgLen, msgId,
msgInner.getStoreTimestamp(), queueOffset, CommitLog.this.defaultMessageStore.now() - beginTimeMills);
追加结果AppendMessageResult如下:
java
// Return code 消息追加结果
private AppendMessageStatus status;
// Where to start writing 消息的物理偏移量
private long wroteOffset;
// Write Bytes
private int wroteBytes;
// Message ID 消息Id
private String msgId;
// Message storage timestamp 消息存储时间戳
private long storeTimestamp;
// Consume queue's offset(step by one) 消息消费队列逻辑偏移量,类似数值下标
private long logicsOffset;
private long pagecacheRT = 0; //当前未使用
private int msgNum = 1; //消息条数,批量消息发送时消息条数
public enum AppendMessageStatus {
PUT_OK,//追加成功
END_OF_FILE,//超过文件大小
MESSAGE_SIZE_EXCEEDED,//消息长度超过最大允许长度
PROPERTIES_SIZE_EXCEEDED,//消息属性超过最大允许长度
UNKNOWN_ERROR,//未知异常
}
当消息写完后,会更新消费队列的逻辑偏移量, 在原来的基础上+1.
java
//CommitLog$DefaultAppendMessageCallback#doAppend
case MessageSysFlag.TRANSACTION_NOT_TYPE:
case MessageSysFlag.TRANSACTION_COMMIT_TYPE:
// The next update ConsumeQueue information
CommitLog.this.topicQueueTable.put(key, ++queueOffset);
break;
5.CommitLog结果处理
前面只是将消息写入到内存映射中,还没有刷盘,会根据配置处理刷盘以及HA的逻辑。
java
//Commitlog#putMessage(final MessageExtBrokerInner msg)
handleDiskFlush(result, putMessageResult, msg);
handleHA(result, putMessageResult, msg);
return putMessageResult;
会根据配置flushDiskType决定刷盘策略:
- SYNC_FLUSH: 如果消息没有设置属性WAIT或者设置WAIT为true,则会等消息刷盘成功,刷盘超时时间默认为5s;否则的话,只会通知刷盘。
- ASYNC_FLUSH:会唤醒线程,执行异步刷盘逻辑。
java
public void handleDiskFlush(AppendMessageResult result, PutMessageResult putMessageResult, MessageExt messageExt) {
// Synchronization flush
if (FlushDiskType.SYNC_FLUSH == this.defaultMessageStore.getMessageStoreConfig().getFlushDiskType()) {
final GroupCommitService service = (GroupCommitService) this.flushCommitLogService;
if (messageExt.isWaitStoreMsgOK()) {
GroupCommitRequest request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes());
service.putRequest(request);
boolean flushOK = request.waitForFlush(this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout());
if (!flushOK) {
putMessageResult.setPutMessageStatus(PutMessageStatus.FLUSH_DISK_TIMEOUT);
}
} else {
service.wakeup();
}
}
// Asynchronous flush
else {
if (!this.defaultMessageStore.getMessageStoreConfig().isTransientStorePoolEnable()) {
flushCommitLogService.wakeup();
} else {
commitLogService.wakeup();
}
}
}
会根据brokerRole的情况,决定发送到SLAVE的策略:
- SYNC_MASTER:同样也会判断消息是否设置属性WAIT或者设置WAIT为true,则会等消息刷盘成功。然后会通过HAService判断是否slave是否需要刷盘(内部判断为是否有连接、偏移进度是否在合理的范围),然后会通知进行处理刷盘。
- ASYNC_MASTER、SLAVE:不处理,等SLAVE主动拉取。
java
public void handleHA(AppendMessageResult result, PutMessageResult putMessageResult, MessageExt messageExt) {
if (BrokerRole.SYNC_MASTER == this.defaultMessageStore.getMessageStoreConfig().getBrokerRole()) {
HAService service = this.defaultMessageStore.getHaService();
if (messageExt.isWaitStoreMsgOK()) {
// Determine whether to wait
if (service.isSlaveOK(result.getWroteOffset() + result.getWroteBytes())) {
GroupCommitRequest request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes());
service.putRequest(request);
service.getWaitNotifyObject().wakeupAll();
boolean flushOK =
request.waitForFlush(this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout());
if (!flushOK) {
putMessageResult.setPutMessageStatus(PutMessageStatus.FLUSH_SLAVE_TIMEOUT);
}
}
// Slave problem
else {
// Tell the producer, slave not available
putMessageResult.setPutMessageStatus(PutMessageStatus.SLAVE_NOT_AVAILABLE);
}
}
}
}
6.SendMessageProcessor结果处理
最后回到SendMessageProcessor中,会在handlePutMessageResult方法中,对写消息的结果进行处理。写消息结果为:PUT_OK、FLUSH_DISK_TIMEOUT、FLUSH_SLAVE_TIMEOUT、SLAVE_NOT_AVAILABLE都认为消息发送成功,然后会对响应头设置msgId、queueId、queueOffset。当不是oneway发送时,把响应结果发送给客户端,也就是oneway不会发送响应结果给客户端。
java
//SendMessageProcessor#handlePutMessageResult
responseHeader.setMsgId(putMessageResult.getAppendMessageResult().getMsgId());
responseHeader.setQueueId(queueIdInt);
responseHeader.setQueueOffset(putMessageResult.getAppendMessageResult().getLogicsOffset());
doResponse(ctx, request, response);
java
//AbstractSendMessageProcessor#doResponse
protected void doResponse(ChannelHandlerContext ctx, RemotingCommand request,
final RemotingCommand response) {
if (!request.isOnewayRPC()) {
try {
ctx.writeAndFlush(response);
} catch (Throwable e) {
//...
}
}
}
7.参考链接
- Topic命名规则:blog.csdn.net/zhangjun039...
- RocketMQ源码 4.4.0分支
- 《RocketMQ技术内幕》