RocketMQ发送消息有三种模式,分别为同步发送、异步发送、oneway。核心逻辑都是类似,往broker发送消息数据。本文将主要阐述同步发送和异步发送,并且对于重试进行说明。
1. 同步发送
以下为发送消息的例子,在此上进行分析底层发送原理。
java
public static void simpleTest() throws MQClientException, RemotingException, InterruptedException, MQBrokerException, UnsupportedEncodingException {
DefaultMQProducer producer = new DefaultMQProducer("please_rename_unique_group_name");
producer.setNamesrvAddr("127.0.0.1:9876");
producer.start();
Message msg = new Message("TopicTest", "TagA" , ("Hello RocketMQ ").getBytes(RemotingHelper.DEFAULT_CHARSET));
SendResult sendResult = producer.send(msg);
producer.shutdown();
}
消息的发送,默认会到交给内部实现类DefaultMQProducerImpl进行处理,内部有一些重载的方法,最终会到DefaultMQProducerImpl的sendDefaultImpl方法
java
//DefaultMQProducer#send
public SendResult send( Message msg) {
return this.defaultMQProducerImpl.send(msg);
}
在sendDefaultImpl中,communicationMode=CommunicationMode.SYNC,sendCallback=null,timeout则是由Producer进行设置,默认是3000ms。
方法执行前回对Producer的状态进行确认,也会校验message,message不能为空,message的body的长度不能为0,message的body不能超过最大长度(默认4M),topic校验。然后会进行topic信息查找。
java
//DefaultMQProducerImpl#sendDefaultImpl
private SendResult sendDefaultImpl(
Message msg,
final CommunicationMode communicationMode,
final SendCallback sendCallback,
final long timeout
) throws MQClientException, RemotingException, MQBrokerException, InterruptedException {
//producer状态确认
this.makeSureStateOK();
//消息校验,如大小
Validators.checkMessage(msg, this.defaultMQProducer);
final long invokeID = random.nextLong();
long beginTimestampFirst = System.currentTimeMillis();
long beginTimestampPrev = beginTimestampFirst;
long endTimestamp = beginTimestampFirst;
//查找Topic信息
TopicPublishInfo topicPublishInfo = this.tryToFindTopicPublishInfo(msg.getTopic());
}
如果topic信息为空,或者不可用,则会抛出异常。
- nameserver地址没有填写,则提示:
No name server address
- 否则就提示:
No route info of this topic
所以可以根据相应的错误提示,大概知道问题出在哪。
java
//DefaultMQProducerImpl#sendDefaultImpl
List<String> nsList = this.getmQClientFactory().getMQClientAPIImpl().getNameServerAddressList();
if (null == nsList || nsList.isEmpty()) {
throw new MQClientException(
"No name server address, please set it." + FAQUrl.suggestTodo(FAQUrl.NAME_SERVER_ADDR_NOT_EXIST_URL), null).setResponseCode(ClientErrorCode.NO_NAME_SERVER_EXCEPTION);
}
throw new MQClientException("No route info of this topic, " + msg.getTopic() + FAQUrl.suggestTodo(FAQUrl.NO_TOPIC_ROUTE_INFO),
null).setResponseCode(ClientErrorCode.NOT_FOUND_TOPIC_EXCEPTION);
下面分析一下Topic信息如何查找。
主要有两个逻辑,尝试从nameserver找对应的topic,如果找到直接返回;如果没有找到会再从nameserver找默认的topic,这个和自动创建Topic有关。
Topic自动创建相关内容,可以查看本人另外一篇文章记一次RokcetMQ Topic自动创建问题
java
//DefaultMQProducerImpl#tryToFindTopicPublishInfo
private TopicPublishInfo tryToFindTopicPublishInfo(final String topic) {
TopicPublishInfo topicPublishInfo = this.topicPublishInfoTable.get(topic);
if (null == topicPublishInfo || !topicPublishInfo.ok()) {
this.topicPublishInfoTable.putIfAbsent(topic, new TopicPublishInfo());
//尝试从nameserver找
this.mQClientFactory.updateTopicRouteInfoFromNameServer(topic);
topicPublishInfo = this.topicPublishInfoTable.get(topic);
}
if (topicPublishInfo.isHaveTopicRouterInfo() || topicPublishInfo.ok()) {
return topicPublishInfo;
} else {
//尝试从nameserver找默认的topic
this.mQClientFactory.updateTopicRouteInfoFromNameServer(topic, true, this.defaultMQProducer);
topicPublishInfo = this.topicPublishInfoTable.get(topic);
return topicPublishInfo;
}
}
updateTopicRouteInfoFromNameServer方法会到内部的重载方法,isDefault=false,defaultMQProducer=null,会向nameserver进行请求topic信息,返回的TopicRouteData会进行处理,这里暂且忽略处理的内容。
java
public boolean updateTopicRouteInfoFromNameServer(final String topic, boolean isDefault,
DefaultMQProducer defaultMQProducer) {
topicRouteData = this.mQClientAPIImpl.getTopicRouteInfoFromNameServer(topic, 1000 * 3);
}
找到TopicPublishInfo,就会尝试进行消息发送。
同步发送情况下,默认会重试两次(总共发送三次),重试次数可以配置,而异步发送只会发送一次。
java
//DefaultMQProducerImpl#sendDefaultImpl
//this.defaultMQProducer.getRetryTimesWhenSendFailed() == 2
int timesTotal = communicationMode == CommunicationMode.SYNC ? 1 + this.defaultMQProducer.getRetryTimesWhenSendFailed() : 1;
在发送前会选择往哪个topic上进行发送,因为这里没有使用MessageQueueSelector参数,且没有重试的情况下,将会采用轮询的策略。
java
//DefaultMQProducerImpl#sendDefaultImpl
MessageQueue mqSelected = this.selectOneMessageQueue(topicPublishInfo, lastBrokerName);
java
//TopicPublishInfo#selectOneMessageQueue
public MessageQueue selectOneMessageQueue() {
int index = this.sendWhichQueue.getAndIncrement();
int pos = Math.abs(index) % this.messageQueueList.size();
if (pos < 0)
pos = 0;
return this.messageQueueList.get(pos);
}
获取到具体的MessageQueue之后,将会调用重载方法sendKernelImpl,进行消息的发送。
首先会根据broker的名称,查找broker的地址信息。如果没有找到的话,则会抛出The broker xxx not exist
异常。
java
//DefaultMQProducerImpl#sendDefaultImpl
String brokerAddr = this.mQClientFactory.findBrokerAddressInPublish(mq.getBrokerName());
if (null == brokerAddr) {
tryToFindTopicPublishInfo(mq.getTopic());
brokerAddr = this.mQClientFactory.findBrokerAddressInPublish(mq.getBrokerName());
}
紧接着会构造消息发送的请求头,这里关注正常的消息发送消息,忽略一些重试的情况。其中有一些参数如sysFlag等用到的时候再进行说明
java
//DefaultMQProducerImpl#sendDefaultImpl
SendMessageRequestHeader requestHeader = new SendMessageRequestHeader();
requestHeader.setProducerGroup(this.defaultMQProducer.getProducerGroup());
requestHeader.setTopic(msg.getTopic());
requestHeader.setDefaultTopic(this.defaultMQProducer.getCreateTopicKey());
requestHeader.setDefaultTopicQueueNums(this.defaultMQProducer.getDefaultTopicQueueNums());
requestHeader.setQueueId(mq.getQueueId());
requestHeader.setSysFlag(sysFlag);
requestHeader.setBornTimestamp(System.currentTimeMillis());
requestHeader.setFlag(msg.getFlag());
requestHeader.setProperties(MessageDecoder.messageProperties2String(msg.getProperties()));
requestHeader.setReconsumeTimes(0);
requestHeader.setUnitMode(this.isUnitMode());
requestHeader.setBatch(msg instanceof MessageBatch);
构造好发送请求后,就会调用底层的MQClientApi进行发送,这内部会把请求头和发送的消息封装成网络传输的结构:RemotingCommand,然后调用网络传输客户端RemotingClient进行同步调用。
RemotingClient的具体实现是NettyRemotingClient,这里就是会根据地址获取连接的Channel(Netty框架提供),然后进行前后Hook处理以及耗时计算。真正的处理逻辑在invokeSyncImpl里。
java
//NettyRemotingClient#invokeSync
public RemotingCommand invokeSync(String addr, final RemotingCommand request, long timeoutMillis)
throws InterruptedException, RemotingConnectException, RemotingSendRequestException, RemotingTimeoutException {
long beginStartTime = System.currentTimeMillis();
final Channel channel = this.getAndCreateChannel(addr);
if (channel != null && channel.isActive()) {
try {
doBeforeRpcHooks(addr, request);
long costTime = System.currentTimeMillis() - beginStartTime;
if (timeoutMillis < costTime) {
throw new RemotingTimeoutException("invokeSync call timeout");
}
RemotingCommand response = this.invokeSyncImpl(channel, request, timeoutMillis - costTime);
doAfterRpcHooks(RemotingHelper.parseChannelRemoteAddr(channel), request, response);
return response;
} catch (RemotingSendRequestException e) {
log.warn("invokeSync: send request exception, so close the channel[{}]", addr);
this.closeChannel(addr, channel);
throw e;
} catch (RemotingTimeoutException e) {
if (nettyClientConfig.isClientCloseSocketIfTimeout()) {
this.closeChannel(addr, channel);
log.warn("invokeSync: close socket because of timeout, {}ms, {}", timeoutMillis, addr);
}
log.warn("invokeSync: wait response timeout exception, the channel[{}]", addr);
throw e;
}
} else {
this.closeChannel(addr, channel);
throw new RemotingConnectException(addr);
}
}
在内部的逻辑会调用Channel的writeAndFlush发数据通过发送broker上,同时会进行wait操作,等待结果相应。这里通过opaque标记唯一请求,等服务端响应的时候,也会携带同一个opaque,这样就可以同responseTable找到ResponseFuture,然后解除await等待。底层是通过CountDownLatch实现等待和解除等待。
这种实现方式给了我们很多的启发:如果我们需要实现rpc可以参考类似的逻辑,因为网络操作肯定是异步的,但是有些时候又想同步,那么就可以维护一个结构,然后等待响应;等对端返回结果时,再从结构中获取对应的信息,然后解除等待。异步的回调也是类似的,把需要回调的信息维护起来,然后再响应返回的时候进行回调。
这就是学习开源框架的意义,可以从中借鉴好的实现思路。
java
//NettyRemotingClient#invokeSyncImpl
public RemotingCommand invokeSyncImpl(final Channel channel, final RemotingCommand request,
final long timeoutMillis)
throws InterruptedException, RemotingSendRequestException, RemotingTimeoutException {
final int opaque = request.getOpaque();
try {
final ResponseFuture responseFuture = new ResponseFuture(channel, opaque, timeoutMillis, null, null);
this.responseTable.put(opaque, responseFuture);
final SocketAddress addr = channel.remoteAddress();
channel.writeAndFlush(request).addListener(new ChannelFutureListener() {
@Override
public void operationComplete(ChannelFuture f) throws Exception {
if (f.isSuccess()) {
responseFuture.setSendRequestOK(true);
return;
} else {
responseFuture.setSendRequestOK(false);
}
responseTable.remove(opaque);
responseFuture.setCause(f.cause());
responseFuture.putResponse(null);
log.warn("send a request command to channel <" + addr + "> failed.");
}
});
RemotingCommand responseCommand = responseFuture.waitResponse(timeoutMillis);
if (null == responseCommand) {
if (responseFuture.isSendRequestOK()) {
throw new RemotingTimeoutException(RemotingHelper.parseSocketAddressAddr(addr), timeoutMillis,
responseFuture.getCause());
} else {
throw new RemotingSendRequestException(RemotingHelper.parseSocketAddressAddr(addr), responseFuture.getCause());
}
}
return responseCommand;
} finally {
this.responseTable.remove(opaque);
}
}
2. 异步发送
整体逻辑和同步发送类似,只是在发送数据处理上有差异。异步发送的底层处理是invokeAsyncImpl,由NettyRemotingClient的父类NettyRemotingAbstract提供。
在发送前会先获取发送凭证,凭证有信号量Semaphore提供,默认是65535,防止发送频率过快。等发送的消息响应后,再释放信号量。
java
//NettyRemotingAbstract#invokeAsyncImpl
boolean acquired = this.semaphoreAsync.tryAcquire(timeoutMillis, TimeUnit.MILLISECONDS);
把请求等信息封装成ResponseFuture,放到responseTable后逻辑结束了,我们知道异步方法是需要设置回调函数的。这里封装了InvokeCallback里面包含了SendCallback的回调函数处理,那么如何处理异步消息的响应,执行SendCallback呢?
java
//NettyRemotingAbstract#invokeAsyncImpl
final ResponseFuture responseFuture = new ResponseFuture(channel, opaque, timeoutMillis - costTime, invokeCallback, once);
this.responseTable.put(opaque, responseFuture);
顺着responseTable的使用,会发现会有两处用到,一个是NettyRemotingAbstract的processResponseCommand方法和scanResponseTable方法。processResponseCommand是正常处理broker响应,而scanResponseTable则是扫描超时的发送的消息请求。
在processResponseCommand中,会根据请求的序号,找到之前写入到responseTable的ResponseFuture,然后如果有InvokeCallback参数,说明是异步,则执行异步回调。没有该参数,说明是个同步调用,则会在putResponse方法内执行CountDownLatch的countDown方法,这样前面同步发送的await得到释放。
java
//NettyRemotingAbstract#processResponseCommand
public void processResponseCommand(ChannelHandlerContext ctx, RemotingCommand cmd) {
final int opaque = cmd.getOpaque();
final ResponseFuture responseFuture = responseTable.get(opaque);
if (responseFuture != null) {
responseFuture.setResponseCommand(cmd);
responseTable.remove(opaque);
if (responseFuture.getInvokeCallback() != null) {
executeInvokeCallback(responseFuture);
} else {
responseFuture.putResponse(cmd);
responseFuture.release();
}
} else {
log.warn("receive response, but not matched any request, " + RemotingHelper.parseChannelRemoteAddr(ctx.channel()));
log.warn(cmd.toString());
}
}
在executeInvokeCallback方法中,会判断执行的线程,是通过CallbackExecutor还是当前线程执行,而在NettyRemotingClient的实现下,CallbackExecutor线程池是publicExecutor线程池,前缀为NettyClientPublicExecutor_
,回调函数则会调用InvokeCallback的operationComplete方法。
再看scanResponseTable方法,就是遍历responseTable查找出超时的数据,然后执行executeInvokeCallback方法。而scanResponseTable则会由定时器Timer执行,名称为ClientHouseKeepingService
,每隔1000ms执行。这也是代码上判断超时时间为什么要加上1000ms的原因。
java
//NettyRemotingAbstract#scanResponseTable
public void scanResponseTable() {
final List<ResponseFuture> rfList = new LinkedList<ResponseFuture>();
Iterator<Entry<Integer, ResponseFuture>> it = this.responseTable.entrySet().iterator();
while (it.hasNext()) {
Entry<Integer, ResponseFuture> next = it.next();
ResponseFuture rep = next.getValue();
if ((rep.getBeginTimestamp() + rep.getTimeoutMillis() + 1000) <= System.currentTimeMillis()) {
rep.release();
it.remove();
rfList.add(rep);
log.warn("remove timeout request, " + rep);
}
}
for (ResponseFuture rf : rfList) {
try {
executeInvokeCallback(rf);
} catch (Throwable e) {
log.warn("scanResponseTable, operationComplete Exception", e);
}
}
}
再回到nvokeCallback的operationComplete方法的执行上,实际上构造nvokeCallback的函数还在MQClientAPIImpl的sendMessageAsync方法上。
再有设置SendCallback以及有响应的情况下,首先会处理一下响应结果,封装成SendResult,然后再执行SendCallback的onSuccess方法。如果没有处理响应或者处理返回结果时抛出了异常,则会调用SendCallback的onException方法。
java
//MQClientAPIImpl#sendMessageAsync
SendResult sendResult = MQClientAPIImpl.this.processSendResponse(brokerName, msg, response);
assert sendResult != null;
if (context != null) {
context.setSendResult(sendResult);
context.getProducer().executeSendMessageHookAfter(context);
}
try {
sendCallback.onSuccess(sendResult);
} catch (Throwable e) {
}
3. 重试
3.1 重试参数
先了解一下重试相关的参数,其中同步发送时retryTimesWhenSendFailed和retryAnotherBrokerWhenNotStoreOK参数会生效,异步发送时retryTimesWhenSendAsyncFailed参数会生效。
java
//DefaultMQProducer
//同步发送重试次数
private int retryTimesWhenSendFailed = 2;
//异步发送重试次数
private int retryTimesWhenSendAsyncFailed = 2;
//发送失败的时候,尝试发送到另外一个broker
private boolean retryAnotherBrokerWhenNotStoreOK = false;
3.2 几种异常
- RemotingException:发送消息失败(channel的writeAndFlush失败),请求超时,对返回数据解密失败等连接相关的错误都会抛出该异常
- MQClientException:如果找不到brokerAddr的地址,则会抛出MQClientException异常。
- MQBrokerException:如果返回的ResponseCode不是SUCCESS,则会抛出MQBrokerException异常。
- InterruptedException:同步发送时,需要等待返回结果。如果中断的返回结果,则会出现该异常。(属于CountDownlLatch的await的异常)
3.3 同步发送
只有在没有执行MessageQueue的情况下,才会执行重试。
- 如果broker返回了结果,但是发送的结果不等于发送成功且retryAnotherBrokerWhenNotStoreOK的情况下,会进行重试。
- 当出现RemotingException、MQClientException异常时会进行重试;MQBrokerException异常的ResponseCode为OPIC_NOT_EXIST、SERVICE_NOT_AVAILABLE、SYSTEM_ERROR、NO_PERMISSION、NO_BUYER_ID、NOT_IN_CURRENT_UNIT时,也会进行重试;
- 其他情况下不会重试。
重试的时,会传入上一次的brokerName,获取新的MessageQueue。在多个broker的情况下,内部选择会过滤上一次的broker。
java
//DefaultMQProducerImpl#sendDefaultImpl
//执行次数,默认重试2次 + 1次 = 3次
int timesTotal = communicationMode == CommunicationMode.SYNC ? 1 + this.defaultMQProducer.getRetryTimesWhenSendFailed() : 1;
int times = 0;
String[] brokersSent = new String[timesTotal];
//循环执行3次
for (; times < timesTotal; times++) {
String lastBrokerName = null == mq ? null : mq.getBrokerName();
MessageQueue mqSelected = this.selectOneMessageQueue(topicPublishInfo, lastBrokerName);
if (mqSelected != null) {
mq = mqSelected;
brokersSent[times] = mq.getBrokerName();
try {
sendResult = this.sendKernelImpl(msg, mq, communicationMode, sendCallback, topicPublishInfo, timeout - costTime);
switch (communicationMode) {
case ASYNC:
return null;
case ONEWAY:
return null;
case SYNC:
//返回结果不等于发送成功
if (sendResult.getSendStatus() != SendStatus.SEND_OK) {
if (this.defaultMQProducer.isRetryAnotherBrokerWhenNotStoreOK()) {
continue;
}
}
return sendResult;
default:
break;
}
} catch (RemotingException e) {
continue;
} catch (MQClientException e) {
continue;
} catch (MQBrokerException e) {
switch (e.getResponseCode()) {
case ResponseCode.TOPIC_NOT_EXIST:
case ResponseCode.SERVICE_NOT_AVAILABLE:
case ResponseCode.SYSTEM_ERROR:
case ResponseCode.NO_PERMISSION:
case ResponseCode.NO_BUYER_ID:
case ResponseCode.NOT_IN_CURRENT_UNIT:
continue;
default:
if (sendResult != null) {
return sendResult;
}
throw e;
}
} catch (InterruptedException e) {
throw e;
}
} else {
break;
}
}
3.4 异步发送
异步发送会进行重试,主要是在请求没有响应的情况下,比如:
- 发送数据失败(写到channel的writeAndFlush失败)
- 超时(内部定时扫描,broker没有响应的情况)
只要有broker有响应,不论结果,那么都不会进行重试。比如broker响应Broker busy
是不会重试的。
代码上不太好展示,直接输出结论。
4. 参考资料
- RocketMQ源码4.4.0