RocketMQ消息存储：ConsumeQueue

本文主要介绍ConsumeQueue的文件结构，以及ConsumeQueue里的数据是如何写入的，何时进行刷盘。然后会进一步分析ConsumeQueue的数据在拉取消息过程中如何运用。

1. ConsumeQueue文件结构

RocketMQ基于主题订阅模式实现消息消费，消费者关心的是一个主题下的所有消息，但是，由于同一主题的消息并不是连续存储到commitlog文件中，如果遍历commitlog找消息的话，效率变得很低。为了适应消息消费的检索需求，设计了消息消费队列文件。

为了加速ConsumeQueue消息检索速度和节省磁盘空间，每一个ConsumeQueue不会存储全量消息，存储格式为：

8字节Commitlog中物理偏移量 + 4字节消息大小 + 8字节的Tag Hash码

scss 复制代码

commitlog offset(8字节) | size (4字节) | tag hashcode (8字节)

单个ConsumeQueue文件默认包含30W检索消息，所以文件大小为30W*20字节，大约为5.7M。

2. 数据构建

2.1 数据写入

在DefaultMessageStore中，会启动一个ReputMessageService(实际继承了Thread)线程，内部会定时的执行文件的构建。

java 复制代码

//DefaultMessageStore#start
public void start() throws Exception {
    //默认情况下，isDuplicationEnable=false,消息不能重复
    if (this.getMessageStoreConfig().isDuplicationEnable()) {
        this.reputMessageService.setReputFromOffset(this.commitLog.getConfirmOffset());
    } else {
        this.reputMessageService.setReputFromOffset(this.commitLog.getMaxOffset());
    }
    //启动服务，实际内部会启动线程
    this.reputMessageService.start();

}

线程内部会每隔1ms就执行doReput，基本上可以认为是实时将commitlog转到ConsumeQueue。doReport内部会循环执行，判断逻辑为当前处理的进度小于CommitLog文件最大偏移量。

java 复制代码

//DefaultMessageStore$ReputMessageService#run
public void run() {
   
    while (!this.isStopped()) {
        try {
            Thread.sleep(1);
            this.doReput();
        } catch (Exception e) {
         
        }
    }
}

接着会从CommitLog中根据偏移量读取数据，然后根据对应的存储结构，读取出对应的消息数据，封装成DispatchRequest，执行doDispatch方法

java 复制代码

//DefaultMessageStore$ReputMessageService#doReput
SelectMappedBufferResult result = DefaultMessageStore.this.commitLog.getData(reputFromOffset);
DispatchRequest dispatchRequest =
                 DefaultMessageStore.this.commitLog.checkMessageAndReturnSize(result.getByteBuffer(), false, false);

DefaultMessageStore.this.doDispatch(dispatchRequest);

doReput除了消息分发之外，还会通知消费端长轮询，确保消息能够快速消费，后文中会进一步说明。

doDispatch内部就是遍历CommitLogDispatcher，而ConsumeQueue对应的就是CommitLogDispatcherBuildConsumeQueue，而Index对应的是CommitLogDispatcherBuildIndex

java 复制代码

//DefaultMessageStore#doDispatch
public void doDispatch(DispatchRequest req) {
    for (CommitLogDispatcher dispatcher : this.dispatcherList) {
        dispatcher.dispatch(req);
    }
}

CommitLogDispatcherBuildConsumeQueue类内部的dispatch会调用putMessagePositionInfo方法，该方法就是找到腿影的ConsumeQueue对象，然后调用putMessagePositionInfoWrapper方法

java 复制代码

//DefaultMessageStore#putMessagePositionInfo
public void putMessagePositionInfo(DispatchRequest dispatchRequest) {
    ConsumeQueue cq = this.findConsumeQueue(dispatchRequest.getTopic(), dispatchRequest.getQueueId());
    cq.putMessagePositionInfoWrapper(dispatchRequest);
}

内部有一些是可写的判断，以及重试的逻辑，但最核心的是调用putMessagePositionInfo方法。结果成功后，会在CheckPoint中记录时间戳。

java 复制代码

//ConsumeQueue#putMessagePositionInfoWrapper
boolean result = this.putMessagePositionInfo(request.getCommitLogOffset(),
    request.getMsgSize(), tagsCode, request.getConsumeQueueOffset());
if (result) {
    this.defaultMessageStore.getStoreCheckpoint().setLogicsMsgTimestamp(request.getStoreTimestamp());
    return;
}

在实际的putMessagePositionInfo中，就是把offset，消息大小，以及tagsCode记录到内存，找到对应的MappedFile，然后把数据写入到MappedFile。此处省略了一些偏移量是否正确的判断。

java 复制代码

//ConsumeQueue#putMessagePositionInfo
private boolean putMessagePositionInfo(final long offset, final int size, final long tagsCode,
    final long cqOffset) {

    if (offset <= this.maxPhysicOffset) {
        return true;
    }

    this.byteBufferIndex.flip();
    this.byteBufferIndex.limit(CQ_STORE_UNIT_SIZE);
    this.byteBufferIndex.putLong(offset);
    this.byteBufferIndex.putInt(size);
    this.byteBufferIndex.putLong(tagsCode);
 
 	MappedFile mappedFile = this.mappedFileQueue.getLastMappedFile(expectLogicOffset);
 
 	return mappedFile.appendMessage(this.byteBufferIndex.array());
}

在appendMessage方法中，会执行消息的写入，通过FileChannel的wire方法。到这里只是完成了数据写入到操作系统缓存，还没有执行flush操作，那么何时执行flush操作？

java 复制代码

//MappedFile#appendMessage
public boolean appendMessage(final byte[] data) {
    int currentPos = this.wrotePosition.get();

    if ((currentPos + data.length) <= this.fileSize) {
        try {
            this.fileChannel.position(currentPos);
            this.fileChannel.write(ByteBuffer.wrap(data));
        } catch (Throwable e) {
            log.error("Error occurred when append message to mappedFile.", e);
        }
        this.wrotePosition.addAndGet(data.length);
        return true;
    }

    return false;
}

2.2 刷盘

ConsumeQueue的flush操作在FlushConsumeQueueService中处理。

FlushConsumeQueueService本质上也是一个线程，在run方法中，会定时执行doFlush操作，其中间隔时间默认为1s执行一次。

java 复制代码

//DefaultMessageStore$FlushConsumeQueueService#run
public void run() {
    DefaultMessageStore.log.info(this.getServiceName() + " service started");

    while (!this.isStopped()) {
        try {
            int interval = DefaultMessageStore.this.getMessageStoreConfig().getFlushIntervalConsumeQueue();
            this.waitForRunning(interval);
            this.doFlush(1);
        } catch (Exception e) {
            DefaultMessageStore.log.warn(this.getServiceName() + " service has exception. ", e);
        }
    }

    this.doFlush(RETRY_TIMES_OVER);
}

doFlush内部会遍历ConsumeQueue，然后执行flush方法。在一定条件下，还会把checkpoint也给flush掉。

java 复制代码

//DefaultMessageStore$FlushConsumeQueueService#doFlush

ConcurrentMap<String, ConcurrentMap<Integer, ConsumeQueue>> tables = DefaultMessageStore.this.consumeQueueTable;

for (ConcurrentMap<Integer, ConsumeQueue> maps : tables.values()) {
    for (ConsumeQueue cq : maps.values()) {
        boolean result = false;
        for (int i = 0; i < retryTimes && !result; i++) {
            result = cq.flush(flushConsumeQueueLeastPages);
        }
    }
}

内部会通过MappedFileQueue执行flush方法。

java 复制代码

//ConsumeQueue#flush
public boolean flush(final int flushLeastPages) {
    boolean result = this.mappedFileQueue.flush(flushLeastPages);
    if (isExtReadEnable()) {
        result = result & this.consumeQueueExt.flush(flushLeastPages);
    }

    return result;
}

在MappedFileQueue中，会找到MappedFile，调用flush方法，然后记录一些数据

java 复制代码

//MappedFileQueue#flush
public boolean flush(final int flushLeastPages) {
    boolean result = true;
    MappedFile mappedFile = this.findMappedFileByOffset(this.flushedWhere, this.flushedWhere == 0);
    if (mappedFile != null) {
        long tmpTimeStamp = mappedFile.getStoreTimestamp();
        int offset = mappedFile.flush(flushLeastPages);
        long where = mappedFile.getFileFromOffset() + offset;
        result = where == this.flushedWhere;
        this.flushedWhere = where;
        if (0 == flushLeastPages) {
            this.storeTimestamp = tmpTimeStamp;
        }
    }

    return result;
}

在MappedFile中，首先会判断是否需要刷盘，然后执行刷盘。在前面ConsumeQueue写文件是通过FileChannel，所以刷新的逻辑会调用FileChannel.force方法。

java 复制代码

//MappedFile#flush
public int flush(final int flushLeastPages) {
    if (this.isAbleToFlush(flushLeastPages)) {
        if (this.hold()) {
            int value = getReadPosition();

            try {
                //We only append data to fileChannel or mappedByteBuffer, never both.
                if (writeBuffer != null || this.fileChannel.position() != 0) {
                    this.fileChannel.force(false);
                } else {
                    this.mappedByteBuffer.force();
                }
            } catch (Throwable e) {
                log.error("Error occurred when force data to disk.", e);
            }

            this.flushedPosition.set(value);
            this.release();
        } else {
            log.warn("in flush, hold failed, flush offset = " + this.flushedPosition.get());
            this.flushedPosition.set(getReadPosition());
        }
    }
    return this.getFlushedPosition();
}

是否刷新判断：文件写满；写进度比上次刷新进度，大设置的刷新的页数 * 页数大小（默认4K），也就是默认情况超过8K才会刷盘；其他情况下，写进度大于刷新进度

java 复制代码

//MappedFile#isAbleToFlush
private boolean isAbleToFlush(final int flushLeastPages) {
    int flush = this.flushedPosition.get();
    int write = getReadPosition();
	//文件写满了
    if (this.isFull()) {
        return true;
    }
	//写进度 比 刷新进度，大flushLeastPages个 PAGE_SIZE(4K)
    if (flushLeastPages > 0) {
        return ((write / OS_PAGE_SIZE) - (flush / OS_PAGE_SIZE)) >= flushLeastPages;
    }
	//写进度 大于  刷新进度
    return write > flush;
}

3. 拉取消息

ConsumeQueue文件的目的就是为了方便消息拉取，那么看下消息拉取是如何使用到ConsumeQueue？拉取消息的入口在PullMessageProcessor，然后会调用DefaultMessageStore获取消息。

首先会根据topic和queueId找到ConsumeQueue，然后从ConsumeQueue根据偏移量获取数据

java 复制代码

//DefaultMessageStore#getMessage
ConsumeQueue consumeQueue = findConsumeQueue(topic, queueId);

SelectMappedBufferResult bufferConsumeQueue = consumeQueue.getIndexBuffer(offset);

startIndex也就是逻辑偏移量。首先会通过startIndex * 20得到再consumequeue文件中的偏移量offset，如果这个offset小于minLogicOffset，说明该消息已被删除，直接返回null。然后会根据偏移量，找到具体的物理文件，通过offset与物理文件大小取模，得到在该文件的偏移量，这样就找到了起始读取数据的入口。

java 复制代码

//ConsumeQueue#getIndexBuffer
public SelectMappedBufferResult getIndexBuffer(final long startIndex) {
    int mappedFileSize = this.mappedFileSize;
    //CQ_STORE_UNIT_SIZE 默认 20
    long offset = startIndex * CQ_STORE_UNIT_SIZE;
    if (offset >= this.getMinLogicOffset()) {
        MappedFile mappedFile = this.mappedFileQueue.findMappedFileByOffset(offset);
        if (mappedFile != null) {
            SelectMappedBufferResult result = mappedFile.selectMappedBuffer((int) (offset % mappedFileSize));
            return result;
        }
    }
    return null;
}

在获取到ConsumeQueue的结果数据后，会进行遍历获取，因为ConsumeQueue中每一个数据都是20字节，所以可以根据这个特性进行递增获取。然后获取消息的物理偏移量，消息的大小，到CommitLog中进行读取，读取到的数据加入到返回结果集中，过程中也会判断拉取的消息是否判断，如果满足也会跳出循环。

java 复制代码

//DefaultMessageStore#getMessage

int i = 0;
final int maxFilterMessageCount = Math.max(16000, maxMsgNums * ConsumeQueue.CQ_STORE_UNIT_SIZE);
final boolean diskFallRecorded = this.messageStoreConfig.isDiskFallRecorded();
ConsumeQueueExt.CqExtUnit cqExtUnit = new ConsumeQueueExt.CqExtUnit();
//从ConsumeQueue的数据结果中，遍历获取消息。
for (; i < bufferConsumeQueue.getSize() && i < maxFilterMessageCount; i += ConsumeQueue.CQ_STORE_UNIT_SIZE) {
    //读取物理偏移量
    long offsetPy = bufferConsumeQueue.getByteBuffer().getLong();
    //读取消息大小
    int sizePy = bufferConsumeQueue.getByteBuffer().getInt();
    long tagsCode = bufferConsumeQueue.getByteBuffer().getLong();

    maxPhyOffsetPulling = offsetPy;


    boolean isInDisk = checkInDiskByCommitOffset(offsetPy, maxOffsetPy);
	//如果这一批满了，则跳出循环，有数量，以及内存数据判断。
    if (this.isTheBatchFull(sizePy, maxMsgNums, getResult.getBufferTotalSize(), getResult.getMessageCount(),
        isInDisk)) {
        break;
    }
	//根据物理偏移量，从CommitLog文件中找数据
    SelectMappedBufferResult selectResult = this.commitLog.getMessage(offsetPy, sizePy);

    this.storeStatsService.getGetMessageTransferedMsgCount().incrementAndGet();
    //加入到结果集中
    getResult.addMessage(selectResult);
    status = GetMessageStatus.FOUND;
    nextPhyFileStartOffset = Long.MIN_VALUE;
}

3.1 拉取消息优化

当返回结果为：ResponseCode.PULL_NOT_FOUND时，会把请求挂起，等有消息时再返回，这样就能最快速度拉取到消息。第一次拉取消息brokerAllowSuspend会是true，而hasSuspendFlag则是从客户端传递的，默认也是true。suspendTimeoutMillisLong也是从客户端请求的header中获取，默认为15s。如果broker不支持长轮询，则会把轮询时间设短为1s，不过longPollingEnable默认为true，也就是轮询时间取客户端传递参数15s。

接着就会把请求封装成PullRequest，然后通过PullRequestHoldService进行暂停。PullRequestHoldService本质上也是一个线程。

java 复制代码

//PullMessageProcessor#processRequest
if (brokerAllowSuspend && hasSuspendFlag) {
    long pollingTimeMills = suspendTimeoutMillisLong;
    if (!this.brokerController.getBrokerConfig().isLongPollingEnable()) {
        pollingTimeMills = this.brokerController.getBrokerConfig().getShortPollingTimeMills();
    }

    String topic = requestHeader.getTopic();
    long offset = requestHeader.getQueueOffset();
    int queueId = requestHeader.getQueueId();
    PullRequest pullRequest = new PullRequest(request, channel, pollingTimeMills,
        this.brokerController.getMessageStore().now(), offset, subscriptionData, messageFilter);
    this.brokerController.getPullRequestHoldService().suspendPullRequest(topic, queueId, pullRequest);
    response = null;
    break;
}

suspendPullRequest就是把请求存起来。那么要响应必须满足其中一个条件：超时；或者有消息。

java 复制代码

//PullRequestHoldService#suspendPullRequest
public void suspendPullRequest(final String topic, final int queueId, final PullRequest pullRequest) {
    String key = this.buildKey(topic, queueId);
    ManyPullRequest mpr = this.pullRequestTable.get(key);
    if (null == mpr) {
        mpr = new ManyPullRequest();
        ManyPullRequest prev = this.pullRequestTable.putIfAbsent(key, mpr);
        if (prev != null) {
            mpr = prev;
        }
    }

    mpr.addPullRequest(pullRequest);
}

PullRequestHoldService本质是个线程，所以内部会执行循环检测。默认请求broker是支持长轮询的，所以会每5s执行一次checkHoldRequest逻辑。

java 复制代码

//PullRequestHoldService#run
public void run() {
    log.info("{} service started", this.getServiceName());
    while (!this.isStopped()) {
        try {
            if (this.brokerController.getBrokerConfig().isLongPollingEnable()) {
                this.waitForRunning(5 * 1000);
            } else {
                this.waitForRunning(this.brokerController.getBrokerConfig().getShortPollingTimeMills());
            }
            long beginLockTimestamp = this.systemClock.now();
            this.checkHoldRequest();
            long costTime = this.systemClock.now() - beginLockTimestamp;
            if (costTime > 5 * 1000) {
                log.info("[NOTIFYME] check hold request cost {} ms.", costTime);
            }
        } catch (Throwable e) {
           
        }
    }

  
}

checkHoldRequest内就是遍历缓存的请求数据，然后调用notifyMessageArriving方法进行通知。

java 复制代码

//PullRequestHoldService#checkHoldRequest
private void checkHoldRequest() {
    for (String key : this.pullRequestTable.keySet()) {
        String[] kArray = key.split(TOPIC_QUEUEID_SEPARATOR);
        if (2 == kArray.length) {
            String topic = kArray[0];
            int queueId = Integer.parseInt(kArray[1]);
            final long offset = this.brokerController.getMessageStore().getMaxOffsetInQueue(topic, queueId);
            try {
                this.notifyMessageArriving(topic, queueId, offset);
            } catch (Throwable e) {
                log.error("check hold request failed. topic={}, queueId={}", topic, queueId, e);
            }
        }
    }
}

notifyMessageArriving中就是遍历PullRequest请求，然后判断broker中是否存在比请求更大的偏移量，如果有就会尝试发起请求。另外也会判断是否超时，超时也会常识重新请求。

java 复制代码

//PullRequestHoldService#notifyMessageArriving
for (PullRequest request : requestList) {
    long newestOffset = maxOffset;
    if (newestOffset <= request.getPullFromThisOffset()) {
        newestOffset = this.brokerController.getMessageStore().getMaxOffsetInQueue(topic, queueId);
    }

    //broker中最大偏移量大于请求偏移量
    if (newestOffset > request.getPullFromThisOffset()) {
        boolean match = request.getMessageFilter().isMatchedByConsumeQueue(tagsCode,
            new ConsumeQueueExt.CqExtUnit(tagsCode, msgStoreTime, filterBitMap));
        // match by bit map, need eval again when properties is not null.
        if (match && properties != null) {
            match = request.getMessageFilter().isMatchedByCommitLog(null, properties);
        }
		//符合则执行消息拉取
        if (match) {
            try {
                    this.brokerController.getPullMessageProcessor().executeRequestWhenWakeup(request.getClientChannel(),
                    request.getRequestCommand());
            } catch (Throwable e) {
                log.error("execute request when wakeup failed.", e);
            }
            continue;
        }
    }
	//超过请求"暂停"时间
    if (System.currentTimeMillis() >= (request.getSuspendTimestamp() + request.getTimeoutMillis())) {
        try {
            this.brokerController.getPullMessageProcessor().executeRequestWhenWakeup(request.getClientChannel(),
                request.getRequestCommand());
        } catch (Throwable e) {
            log.error("execute request when wakeup failed.", e);
        }
        continue;
    }
	//不满足的会重新加入到缓存中
    replayList.add(request);
}

executeRequestWhenWakeup方法就是往PullMessageProcessor的线程池放一个请求，这个请求的Runnable是个匿名内部类，在这个方法中定义。

java 复制代码

//PullMessageProcessor#executeRequestWhenWakeup
this.brokerController.getPullMessageExecutor().submit(new RequestTask(run, channel, request));

其实就是调用PullMessageProcessor的processRequest方法，只是第三个参数brokerAllowSuspend为false。当拉取消息有响应后，直接把结果写到缓冲区响应返回

java 复制代码

////PullMessageProcessor$Runnable#run
final RemotingCommand response = PullMessageProcessor.this.processRequest(channel, request, false);

if (response != null) {
    response.setOpaque(request.getOpaque());
    response.markResponseType();
    try {
        channel.writeAndFlush(response).addListener(new ChannelFutureListener() {
            @Override
            public void operationComplete(ChannelFuture future) throws Exception {
                if (!future.isSuccess()) {

                }
            }
        });
    } catch (Throwable e) {
      
    }
}

前面是请求悬挂定时检测的逻辑，但往ConsumeQueue中写新的消息时，也会执行通知。其实就在ReputMessageService的doReput逻辑里。当执行完doDispatch方法，也就是ConsumeQueue写消息到内存后，会进行一下判断：

如果不是SLVAE服务器（因为MASTER才执行写消息），并且Broker支持长轮询的情况（默认支持），就会调用NotifyMessageArrivingListener的arriving方法。

java 复制代码

//DefaultMessageStore$ReputMessageService#doReput
if (BrokerRole.SLAVE != DefaultMessageStore.this.getMessageStoreConfig().getBrokerRole()
    && DefaultMessageStore.this.brokerConfig.isLongPollingEnable()) {
    DefaultMessageStore.this.messageArrivingListener.arriving(dispatchRequest.getTopic(),
        dispatchRequest.getQueueId(), dispatchRequest.getConsumeQueueOffset() + 1,
        dispatchRequest.getTagsCode(), dispatchRequest.getStoreTimestamp(),
        dispatchRequest.getBitMap(), dispatchRequest.getPropertiesMap());
}

而在arriving方法中，实际上就是调用PullRequestHoldService的notifyMessageArriving方法，这个方法前面已经有所说明。

java 复制代码

//NotifyMessageArrivingListener#arriving
public void arriving(String topic, int queueId, long logicOffset, long tagsCode,
    long msgStoreTime, byte[] filterBitMap, Map<String, String> properties) {
    this.pullRequestHoldService.notifyMessageArriving(topic, queueId, logicOffset, tagsCode,
        msgStoreTime, filterBitMap, properties);
}

3.2 Tag hash使用

前面我们知道consumeQueue中会存储tag hashcode，那么这个是如何使用的？

首先看下这个tag hashcode是从什么信息存储到consumeQueue中。在DispatchRequest构造的方法中，实际上会从消息的属性TAGS获取，然后直接将字符串转成hashcode。

java 复制代码

//CommitLog#checkMessageAndReturnSize
String tags = propertiesMap.get(MessageConst.PROPERTY_TAGS);
if (tags != null && tags.length() > 0) {
    tagsCode = MessageExtBrokerInner.tagsString2tagsCode(MessageExt.parseTopicFilterType(sysFlag), tags);
}

java 复制代码

//MessageExtBrokerInner#tagsString2tagsCode
public static long tagsString2tagsCode(final TopicFilterType filter, final String tags) {
    if (null == tags || tags.length() == 0) { return 0; }

    return tags.hashCode();
}

我们知道消息发送的时候，是可以指定tags属性的。在Java的client中，提供的Message就可以携带tags参数，内部会将其设置为TAGS的属性。不过虽然名称为tags，但是实际只能设置一个字符串的。

java 复制代码

//Message
public Message(String topic, String tags, String keys, int flag, byte[] body, boolean waitStoreMsgOK) {
    this.topic = topic;
    this.flag = flag;
    this.body = body;

    if (tags != null && tags.length() > 0)
        this.setTags(tags);

    if (keys != null && keys.length() > 0)
        this.setKeys(keys);

    this.setWaitStoreMsgOK(waitStoreMsgOK);
}

java 复制代码

//Message#setTags
public void setTags(String tags) {
    this.putProperty(MessageConst.PROPERTY_TAGS, tags);
}

再看下拉取消息时如何使用tag hashcode。

在client中Consume启动的时候，一般都会进行订阅topic，以及subExpression，subExpression我们通常使用*，内部会进行构造成SubscriptionData信息。

java 复制代码

//DefaultMQPushConsumer#subscribe
public void subscribe(String topic, String subExpression) throws MQClientException {
    this.defaultMQPushConsumerImpl.subscribe(topic, subExpression);
}

内部会通过FilterAPI构造成SubscriptionData信息，然后再RebalanceImpl中进行存储。

java 复制代码

public void subscribe(String topic, String subExpression) throws MQClientException {
    try {
        SubscriptionData subscriptionData = FilterAPI.buildSubscriptionData(this.defaultMQPushConsumer.getConsumerGroup(),
            topic, subExpression);
        this.rebalanceImpl.getSubscriptionInner().put(topic, subscriptionData);
        if (this.mQClientFactory != null) {
            this.mQClientFactory.sendHeartbeatToAllBrokerWithLock();
        }
    } catch (Exception e) {
        throw new MQClientException("subscription exception", e);
    }
}

在buildSubscriptionData方法中，如果subExpression没有设置或者为*，都会把subString设置为*。其他设置情况下，会通过||分割成多个tag，分别记录TagSet和CodeSet，CodeSet就是tag字符串的hashcode数据。

java 复制代码

//FilterAPI#buildSubscriptionData
public static SubscriptionData buildSubscriptionData(final String consumerGroup, String topic,
    String subString) throws Exception {
    SubscriptionData subscriptionData = new SubscriptionData();
    subscriptionData.setTopic(topic);
    subscriptionData.setSubString(subString);

    if (null == subString || subString.equals(SubscriptionData.SUB_ALL) || subString.length() == 0) {
        subscriptionData.setSubString(SubscriptionData.SUB_ALL);
    } else {
        String[] tags = subString.split("\\|\\|");
        if (tags.length > 0) {
            for (String tag : tags) {
                if (tag.length() > 0) {
                    String trimString = tag.trim();
                    if (trimString.length() > 0) {
                        subscriptionData.getTagsSet().add(trimString);
                        subscriptionData.getCodeSet().add(trimString.hashCode());
                    }
                }
            }
        } else {
            throw new Exception("subString split error");
        }
    }

    return subscriptionData;
}

在拉取消息的请求中，会携带SubscriptionData的subString、expressionType（默认为Tag），以及subVersion

java 复制代码

//DefaultMQPushConsumerImpl#pullMessage
final SubscriptionData subscriptionData = this.rebalanceImpl.getSubscriptionInner().get(pullRequest.getMessageQueue().getTopic());

 this.pullAPIWrapper.pullKernelImpl(
                pullRequest.getMessageQueue(),
                subExpression,
                subscriptionData.getExpressionType(),
                subscriptionData.getSubVersion(),
                pullRequest.getNextOffset(),
                this.defaultMQPushConsumer.getPullBatchSize(),
                sysFlag,
                commitOffsetValue,
                BROKER_SUSPEND_MAX_TIME_MILLIS,
                CONSUMER_TIMEOUT_MILLIS_WHEN_SUSPEND,
                CommunicationMode.ASYNC,
                pullCallback
            );

在Broker中的PullMessageProcessor，会根据请求的订阅参数进行构建SubscriptionData，当expressionType为tag的时候，和client构建方式是一样的的，都是调用FilterAPI的buildSubscriptionData方法。

java 复制代码

//PullMessageProcessor#processRequest
subscriptionData = FilterAPI.build(
    requestHeader.getTopic(), requestHeader.getSubscription(), requestHeader.getExpressionType()
);

java 复制代码

//FilterAPI#build
public static SubscriptionData build(final String topic, final String subString,
    final String type) throws Exception {
    if (ExpressionType.TAG.equals(type) || type == null) {
        return buildSubscriptionData(null, topic, subString);
    }
}

PullMessageProcessor中还会把subscriptionData等信息封装成MessageFilter，默认情况下为ExpressionMessageFilter

java 复制代码

//PullMessageProcessor#processRequest
messageFilter = new ExpressionMessageFilter(subscriptionData, consumerFilterData,
    this.brokerController.getConsumerFilterManager());

在DefaultMessageStore拉取消息的时候，会调用MessageFilter的isMatchedByConsumeQueue是否匹配，如果不匹配，则会过滤ConsumeQueue中的消息。

java 复制代码

//DefaultMessageStore#getMessage
if (messageFilter != null
    && !messageFilter.isMatchedByConsumeQueue(isTagsCodeLegal ? tagsCode : null, extRet ? cqExtUnit : null)) {
    if (getResult.getBufferTotalSize() == 0) {
        status = GetMessageStatus.NO_MATCHED_MESSAGE;
    }
    continue;
}

在isMatchedByConsumeQueue方法中，根据Tag过滤情况下，如果subString是*则匹配，否则，需要根据CodeSet进行判断tag hashcode是否相等，进行过滤。

java 复制代码

//ExpressionMessageFilter#isMatchedByConsumeQueue
public boolean isMatchedByConsumeQueue(Long tagsCode, ConsumeQueueExt.CqExtUnit cqExtUnit) {
    if (null == subscriptionData) {
        return true;
    }
    if (subscriptionData.isClassFilterMode()) {
        return true;
    }
    // by tags code.
    if (ExpressionType.isTagType(subscriptionData.getExpressionType())) {

        if (tagsCode == null) {
            return true;
        }

        if (subscriptionData.getSubString().equals(SubscriptionData.SUB_ALL)) {
            return true;
        }

        return subscriptionData.getCodeSet().contains(tagsCode.intValue());
    }
    //...
}

拉取消息还会个根据CommitLog内容进行判断，通过调用MessageFilter的isMatchedByCommitLog，如果不匹配也会过滤ConsumeQueue中的消息。

java 复制代码

//DefaultMessageStore#getMessage
if (messageFilter != null
    && !messageFilter.isMatchedByCommitLog(selectResult.getByteBuffer().slice(), null)) {
    if (getResult.getBufferTotalSize() == 0) {
        status = GetMessageStatus.NO_MATCHED_MESSAGE;
    }
    // release...
    selectResult.release();
    continue;
}

在isMatchedByCommitLog中，Tag过滤的方式，默认返回true，也就是默认是匹配的。

java 复制代码

//ExpressionMessageFilter#isMatchedByCommitLog
public boolean isMatchedByCommitLog(ByteBuffer msgBuffer, Map<String, String> properties) {
    if (subscriptionData == null) {
        return true;
    }

    if (subscriptionData.isClassFilterMode()) {
        return true;
    }

    if (ExpressionType.isTagType(subscriptionData.getExpressionType())) {
        return true;
    }
    //...
}

4. 参考链接

blog.csdn.net/jjhfen00/ar...
RocketMQ源码 4.4.0分支
《RocketMQ技术内幕》