RocksDB-db_bench源码(三)：Histogram统计图的使用

文章目录

- - [Histogram 源码解析](#Histogram 源码解析)
  - 获取百分比时延

在对 RocksDB 进行压测时，如果想要瞬时得到每个操作的时延，那很好办，只需要在执行完之后输出一下时间差即可。但是现在，如果想输出以往所有操作的最小时延、最大时延、平均时延、百分比时延甚至是所有时延，该怎么办？

最粗暴的方式，把之前的所有时延全部存下来就行，没错，最精确的做法当然如此。但是，实际上的需求根本不需要这么精确，因此常用的方法并不是保存所有数据，而是保存范围。什么意思呢？假设在 1000 个操作中有 5 个操作的时延分别为 1.12、1.13、1.14、1.17 和 1.15，相差非常小，那我们就完全没必要把它们全保存下来，而只需要保存 [1.1, 1.2] 这个范围共有 5 个点这个粗略信息就够用了。

而 Histogram 就是这么干的，它将统计的数据粗略按照桶（bucket）来划分，桶的粒度比较小，但它只会记录每一个桶中的点有多少个，并不会记录每个点是多少。Histogram 名为统计直方图，其会记录所有桶中值的数量，桶与桶之间按照升序排列，这样就可以很快获得百分比信息。

实际上，Histogram 做的事情很简单，就是统计和排序，本篇文章将大致介绍 RocksDB 内置的 Histogram 的源码，并以统计时延的需求来介绍其使用方法。

Histogram 源码解析

RockDB 的 Histogram 代码位于 monitoring/histogram.cc 和 monitoring/histogram.h 中。最顶层的封装类名为 HistogramImpl，其成员就两个，如下：

cpp 复制代码

class HistogramImpl : public Histogram {
 // ...
 private:
  HistogramStat stats_;
  std::mutex mutex_;
}

因此，重要的内容全在 HistogramStat 之中，我们来看下它的成员：

cpp 复制代码

struct HistogramStat {
  // ...
  // To be able to use HistogramStat as thread local variable, it
  // cannot have dynamic allocated member. That's why we're
  // using manually values from BucketMapper
  std::atomic_uint_fast64_t min_;
  std::atomic_uint_fast64_t max_;
  std::atomic_uint_fast64_t num_;
  std::atomic_uint_fast64_t sum_;
  std::atomic_uint_fast64_t sum_squares_;
  std::atomic_uint_fast64_t buckets_[109]; // 109==BucketMapper::BucketCount()
  const uint64_t num_buckets_;
}

这些字段很容易理解，就是记录最小值、最大值等等。而 buckets_ 就是上面说的桶，其元素值就是该桶之中有多少个点。109 就是桶的个数，由 BucketMapper::BucketCount() 来确定，这个值实际上等于 num_buckets_，也是 109。bucketMapper 的类型为 HistogramBucketMapper，非常重要，作用就是决定一个点映射进哪个桶里。

继续看 HistogramImpl，当我们想向统计中加点时，调用 Add 即可，而这个 Add 只是一层封装，实际上调用的是 HistogramStat 的 Add，代码如下：

cpp 复制代码

void HistogramStat::Add(uint64_t value) {
  // This function is designed to be lock free, as it's in the critical path
  // of any operation. Each individual value is atomic and the order of updates
  // by concurrent threads is tolerable.
  const size_t index = bucketMapper.IndexForValue(value);
  assert(index < num_buckets_);
  buckets_[index].store(buckets_[index].load(std::memory_order_relaxed) + 1,
                        std::memory_order_relaxed);

  uint64_t old_min = min();
  if (value < old_min) {
    min_.store(value, std::memory_order_relaxed);
  }

  uint64_t old_max = max();
  if (value > old_max) {
    max_.store(value, std::memory_order_relaxed);
  }

  num_.store(num_.load(std::memory_order_relaxed) + 1,
             std::memory_order_relaxed);
  sum_.store(sum_.load(std::memory_order_relaxed) + value,
             std::memory_order_relaxed);
  sum_squares_.store(
      sum_squares_.load(std::memory_order_relaxed) + value * value,
      std::memory_order_relaxed);
}

代码很简单，首先通过 IndexForValue 将 value 映射到它所在的桶中，然后这个桶的值 ++，接下来更新最大值、最小值、总数、求和等等，可以看到，并没有保存 value。代码的关键在于这个 IndexForValue，也就是如何将 value 映射到桶中。

在映射之前，需要先看一下桶的初始化，

cpp 复制代码

HistogramBucketMapper::HistogramBucketMapper() {
  // If you change this, you also need to change
  // size of array buckets_ in HistogramImpl
  bucketValues_ = {1, 2};
  valueIndexMap_ = {{1, 0}, {2, 1}};
  double bucket_val = static_cast<double>(bucketValues_.back());
  while ((bucket_val = 1.5 * bucket_val) <= static_cast<double>(port::kMaxUint64)) {
    bucketValues_.push_back(static_cast<uint64_t>(bucket_val));
    // Extracts two most significant digits to make histogram buckets more
    // human-readable. E.g., 172 becomes 170.
    uint64_t pow_of_ten = 1;
    while (bucketValues_.back() / 10 > 10) {
      bucketValues_.back() /= 10;
      pow_of_ten *= 10;
    }
    bucketValues_.back() *= pow_of_ten;
    valueIndexMap_[bucketValues_.back()] = bucketValues_.size() - 1;
  }
  maxBucketValue_ = bucketValues_.back();
  minBucketValue_ = bucketValues_.front();
}

bucketValues_ 就是所有桶的起点，比如 value 在 [1,2) 之间，那么就位于第 0 个桶。valueIndexMap_ 是一个 map，记录着每个桶的下标。可以很明显看到，桶是递增增加的，且会记录最后一个桶和第一个桶，以便于映射。

接下来再看 IndexForValue 就很简单了，代码如下：

cpp 复制代码

size_t HistogramBucketMapper::IndexForValue(const uint64_t value) const {
  if (value >= maxBucketValue_) {
    return bucketValues_.size() - 1;
  } else if ( value >= minBucketValue_ ) {
    std::map<uint64_t, uint64_t>::const_iterator lowerBound =
      valueIndexMap_.lower_bound(value);
    if (lowerBound != valueIndexMap_.end()) {
      return static_cast<size_t>(lowerBound->second);
    } else {
      return 0;
    }
  } else {
    return 0;
  }
}

这里就是通过 lower_bound 找到 value 所处的范围，然后返回对应的桶的 index。

当我们想要获取百分比信息时，是需要调用 Percentile 即可，因为桶都是升序排列的，且记录了桶内点的数量，因此只需从头遍历一遍桶即可，一旦数量超过了这个百分比，就说明位于这个桶。然后，要在桶内确定一个较为精确的值，也是通过比例的方式。代码如下：

cpp 复制代码

double HistogramStat::Percentile(double p) const {
  double threshold = num() * (p / 100.0);  // 点的总数num_
  uint64_t cumulative_sum = 0;
  for (unsigned int b = 0; b < num_buckets_; b++) {
    uint64_t bucket_value = bucket_at(b);
    cumulative_sum += bucket_value;
   // Step1: 选择是哪一桶
    if (cumulative_sum >= threshold) {
      // Step2: 从桶内选值
      // Scale linearly within this bucket
      uint64_t left_point = (b == 0) ? 0 : bucketMapper.BucketLimit(b-1);
      uint64_t right_point = bucketMapper.BucketLimit(b);
      uint64_t left_sum = cumulative_sum - bucket_value;
      uint64_t right_sum = cumulative_sum;
      double pos = 0;
      uint64_t right_left_diff = right_sum - left_sum;
      if (right_left_diff != 0) {
       pos = (threshold - left_sum) / right_left_diff;
      }
      double r = left_point + (right_point - left_point) * pos;
      uint64_t cur_min = min();
      uint64_t cur_max = max();
      if (r < cur_min) r = static_cast<double>(cur_min);
      if (r > cur_max) r = static_cast<double>(cur_max);
      return r;
    }
  }
  return static_cast<double>(max());
}

至此，Histogram 的大致源码就梳理完毕了。

获取百分比时延

若要使用 Histogram 统计时延，只需要在 db_bench 每次执行完一个操作之后调用一次 Add 即可。如果想要区分不同类型的时延，那就构造个 <OPType, HistogramImpl> 即可。

cpp 复制代码

void FinishedOps(uint64_t micros, int64_t num_ops,
                  enum OperationType op_type = kOthers) {
  if (kWrite == op_type) {
    writedone_ += num_ops;
    if (hist_.find(kWriteHiccup) == hist_.end()) {
      auto hist_tmp_write_hiccup = std::make_shared<HistogramImpl>();
      hist_.insert({kWriteHiccup, std::move(hist_tmp_write_hiccup)});
    }
    hist_[kWriteHiccup]->Add(micros);
  } else if (kRead == op_type) {
    readdone_ += num_ops;
    if (hist_.find(kReadHiccup) == hist_.end()) {
      auto hist_tmp_read_hiccup = std::make_shared<HistogramImpl>();
      hist_.insert({kReadHiccup, std::move(hist_tmp_read_hiccup)});
    }
    hist_[kReadHiccup]->Add(micros);
  } else if (kSeek == op_type) {
    seekdone_ += num_ops;
    if (hist_.find(kSeekHiccup) == hist_.end()) {
      auto hist_tmp_read_hiccup = std::make_shared<HistogramImpl>();
      hist_.insert({kSeekHiccup, std::move(hist_tmp_read_hiccup)});
    }
    hist_[kSeekHiccup]->Add(micros);
  }
  // ...
}

Histogram 会将各种时延按照升序放在不同的桶中，当想要输出时，调用其的相关函数即可：

cpp 复制代码

fprintf(fp_op_hiccup_report,
        "write %8ld   %8ld   %8.0f   %8.0f   %8.0f   %8.0f   %8.0f   "
        "%8.0f   %8.0f\n",
        hist_[kWriteHiccup]->min(), hist_[kWriteHiccup]->max(),
        hist_[kWriteHiccup]->Average(),
        hist_[kWriteHiccup]->Percentile(25.0),
        hist_[kWriteHiccup]->Percentile(50.0),
        hist_[kWriteHiccup]->Percentile(75.0),
        hist_[kWriteHiccup]->Percentile(90.0),
        hist_[kWriteHiccup]->Percentile(99.0),
        hist_[kWriteHiccup]->Percentile(99.9));
fflush(fp_op_hiccup_report);

结束。