研究大数据、高并发场景下的高性能类库--Aliyun Log Java Producer

📚研究大数据、高并发场景下的高性能类库--Aliyun Log Java Producer

本文研究一款被称为大数据、高并发场景下的高性能类库- Aliyun Log Java Producer。这个框架是一个 Java 客户端，通过攒批往日志服务（SLS）写入日志数据。

阅读源码，学习架构设计，从而应用到实际工作中。

📑一、Aliyun Log Java Producer 介绍

来自官网的介绍

在海量数据、资源有限的前提下，写入端要达到目标吞吐量需要实现复杂的控制逻辑，包括多线程、缓存策略、批量发送等，另外还要充分考虑失败重试的场景。

异步非阻塞
线程安全
优雅关闭
......

这些特性确实值得研究学习。

📒二、类库中的设计

我们先理解一些设计，然后再深入代码细节。

2.1 攒批发送

如果每次写一条日志数据，都发起一次 http 的调用，在海量日志下，性能肯定撑不住。 通过攒批，达到一定数量后进行发送，虽然会有一定时间的延迟，但是整体性能得到提升。

2.2 异步执行

异步提升吞吐率。

2.3 水位线

攒批是在内存中缓存了日志数据，会占用内存，避免资源被耗尽，设置一个最大水位线。

2.4 分而治之

发送成功和失败的结果将放入不同的队列。针对不同的任务用不同的线程、队列处理是非常合理的一种手段。

进入源码之前再做一些知识准备。

📖三、知识准备

3.1 SettableFuture

guava 对 JDK#Future 的扩展工具类，可以手动设置返回结果到 Future。

这个类库充分使用了这个能力。

Java 复制代码

@GwtCompatible
public final class SettableFuture<V> extends AbstractFuture.TrustedFuture<V> {
  .......
  // 设置结果
  @CanIgnoreReturnValue
  @Override
  public boolean set(@Nullable V value) {
    return super.set(value);
  }
  ......
}

过程如下：

java 复制代码

ListenableFuture<Result> f =
        producer.send(project, logStore, buildLogItem());

Result result = f.get();

日志被提交了，立即返回一个 ListenableFuture 。不用同步等待结果，增加了日志发送的吞吐率。

直接使用 Callback 则更加简单。

📜四、源码理解

先对几个关键类做一个简单的介绍：

类	作用
LogAccumulator	攒批容器；控制缓存数据水位线
RetryQueue	失败待重试的 ProducerBatch
BatchHandler	处理发送成功或失败的 batch；进行回调
Mover	循环地将 LogAccumulator 和 RetryQueue 中的超时 batch 处理
producerBatch	批对象

4.1 攒批发送

相关代码 com.aliyun.openservices.aliyun.log.producer.internals.LogAccumulator#doAppend

Java 复制代码

// 1. 组装 Batch 的 key
GroupKey groupKey = new GroupKey(project, logStore, topic, source, shardHash);
// 2. 获取那个 Batch
ProducerBatchHolder holder = getOrCreateProducerBatchHolder(groupKey);
// 3. 将日志数据进行攒批，如果符合发送条件就发送
synchronized (holder) {
  return appendToHolder(groupKey, logItems, callback, sizeInBytes, holder);
}

appendToHolder 会进行攒批，满足条件进行发送。部分核心代码如下：

Java 复制代码

private ListenableFuture<Result> appendToHolder(....) {
    if (holder.producerBatch != null) {
        // 将日志进行攒批  
        ListenableFuture<Result> f = holder.producerBatch.tryAppend(logItems, sizeInBytes, callback);
        // 如果 f 不为空，添加日志超过最大缓存容量值
        if (f != null) {
        if (holder.producerBatch.isMeetSendCondition()) {
            // 满足批发送条件，提交线程发送
            holder.transferProducerBatch(....);
        }
        return f;
        } else {
        // 不能缓存攒批，直接提交线程发送
        holder.transferProducerBatch();
        }
    }
    // 原因是每次发送完成后 holder.producerBatch 会被设置成 null，因此需要重新创建 producerBatch
    // 如果超过日志最长的保留时间，会被移到过期队列
    holder.producerBatch = new ProducerBatch(.....);

    // 同上面逻辑，但是因为是新建立的 producerBatch，不用担心 f 为空
    ListenableFuture<Result> f = holder.producerBatch.tryAppend(logItems, sizeInBytes, callback);
    batchCount.incrementAndGet();
    if (holder.producerBatch.isMeetSendCondition()) {
        holder.transferProducerBatch(......);
    }
    return f;
}

4.2 异步非阻塞

异步非阻塞的关键点：

使用 SettableFuture
将 callback、future 包装成 Thunk
攒批发送后，再处理 Thunk

jAVA 复制代码

public ListenableFuture<Result> tryAppend(
    List<LogItem> items, int sizeInBytes, Callback callback) {
    // 设置本批的容量水位线
  if (!hasRoomFor(sizeInBytes, items.size())) {
    return null;
  } else {
    // 包装 SettableFuture
    SettableFuture<Result> future = SettableFuture.create();
    logItems.addAll(items);
    // 添加到 ProducerBatch#list 
    thunks.add(new Thunk(callback, future));
    curBatchCount += items.size();
    curBatchSizeInBytes += sizeInBytes;
    return future;
  }
}

发送逻辑讲解完成，下面分析异步结果处理。

4.3 异步线程结果处理

发送结果会根据是否成功等将其添加到不同的队列。 com.aliyun.openservices.aliyun.log.producer.LogProducer#LogProducer

代码逻辑如下：

Java 复制代码

private void loopHandleBatches() {
  while (!closed) {
    try {
      // 取批
      ProducerBatch b = batches.take();
      handle(b);
    } catch (InterruptedException e) {
      LOGGER.info("The batch handler has been interrupted");
    }
  }
}
// 设置回调和 future
private void handle(ProducerBatch batch) {
  try {
    batch.fireCallbacksAndSetFutures();
  } catch (Throwable t) {
    LOGGER.error("Failed to handle batch, batch={}, e=", batch, t);
  } finally {
    batchCount.decrementAndGet();
    memoryController.release(batch.getCurBatchSizeInBytes());
  }
}

注:失败的 producerBatch 只会重试一次

发送接口返回的 ListenableFuture 是该批次的 result

ini 复制代码

ListenableFuture<Result> f =
        producer.send(project, logStore, buildLogItem());

4.4 过期批处理

触发发送批的逻辑，没有加入时间考虑。如果没有达到发送条件，日志数据会一直保留。因此由独立线程去处理（Mover 线程）

Java 复制代码

// 未考虑时间作为发送的判断
public boolean isMeetSendCondition() {
  return curBatchSizeInBytes >= batchSizeThresholdInBytes || curBatchCount >= batchCountThreshold;
}

Mover 线程会不断地扫描过期时间的任务进行执行

Java 复制代码

public ExpiredBatches expiredBatches() {
  .......
  // 遍历整个缓存集合
  for (Map.Entry<GroupKey, ProducerBatchHolder> entry : batches.entrySet()) {
    ProducerBatchHolder holder = entry.getValue();
    synchronized (holder) {
      if (holder.producerBatch == null) {
        continue;
      }
      long curRemainingMs = holder.producerBatch.remainingMs(nowMs, producerConfig.getLingerMs());
      if (curRemainingMs <= 0) {
        // 过期则添加过期队列
        holder.transferProducerBatch(expiredBatches);
      } else {
        remainingMs = Math.min(remainingMs, curRemainingMs);
      }
    }
  }
  .....
}

4.5 优雅关闭

框架针对优雅关闭做的努力, 考虑了很多细节。保证 close 方法退时，producer 缓存的所有数据都能被处理。

按照数据流动方向依次关闭队列和线程来达到优雅关闭、安全退出

所有线程 close() 方法大致逻辑：

修改 closed=true，表示不再接受新任务
join(timeoutMs) 等待该线程任务执行结束

Java 复制代码

// 定义 validate 变量，关闭后不再接收新任务
private volatile boolean closed;

// 执行关闭，修改 validate 
mover.close();

// 等待该线程执行结束
mover.join(timeoutMs);

public void close() {
  this.closed = true;
  interrupt();
}

线程池的关闭逻辑

Java 复制代码

private long closeIOThreadPool(long timeoutMs) throws InterruptedException, ProducerException {
  long startMs = System.currentTimeMillis();
  ioThreadPool.shutdown();
  // 保持线程池的任务都执行完毕
  if (ioThreadPool.awaitTermination(timeoutMs, TimeUnit.MILLISECONDS)) {
    LOGGER.debug("The ioThreadPool is terminated");
  } else {
    LOGGER.warn("The ioThreadPool is not fully terminated");
    throw new ProducerException("the ioThreadPool is not fully terminated");
  }
......
}

具体可以阅读 close 方法。com.aliyun.openservices.aliyun.log.producer.Producer#close()

4.6 补充部分知识

通过 Semaphore 控制缓存待发送数据的内存大小

private final Semaphore memoryController

通过 ProducerBatchHolder 缓存攒批日志数据 ProducerBatchHolder 中有一个 ConcurrentHashMap#putIfAbsent 的用法如果不存在，那么会向 map 中添加该键值对，并返回 null。

如果已经存在，那么不会覆盖已有的值，直接返回已经存在的值。 private final ConcurrentMap<GroupKey, ProducerBatchHolder> batches

通过上面代码实现了线程安全。

4.7 避免空转的锁竞争

Mover 线程如果空转会增加 ProducerBatchHolder 的锁竞争。考虑增加一定的休眠时间，下面这段代码就是这么做的，确实考虑很周全！！！

4.8 不足之处

异步发送的日志结果是批次执行的结果；可以增加一个 requestId 的透传。处理有点粗糙

Java 复制代码

  private void setFutures(Result result) {
    for (Thunk thunk : thunks) {
      try {
        if (result.isSuccessful()) {
          thunk.future.set(result);
        } else {
          thunk.future.setException(new ResultFailedException(result));
        }
      } catch (Exception e) {
        LOGGER.error("Failed to set future, groupKey={}, e=", groupKey, e);
      }
    }
  }

到此分析结束。

✒️五、最后

这个工程还是非常不错的，各方面考虑得也很周全，值得推荐学习。

🤔将学习的技术应用到自己的工程中，从模仿到超越！

研究大数据、高并发场景下的高性能类库--Aliyun Log Java Producer