elasticsearch性能测试工具esrally

介绍

esrally 是 elastic 官方开源的一款基于 python3 实现的针对 es 的压测工具，源码地址为https://github.com/elastic/rally，我们在官网上看到的 nightly benchmark 结果就是用这个工具每晚运行生成的报告。用这个工具，可以很方便的验证自己的代码修改、配置调整对性能的影响效果。

esrally主要功能如下：

自动创建、压测和销毁 es 集群
可分 es 版本管理压测数据和方案
完善的压测数据展示，支持不同压测之间的数据对比分析，也可以将数据存储到指定的es中进行二次分析
支持收集 JVM详细信息，比如内存、GC等数据来定位性能问题

部署

Python 3.8+

pip3

git 1.9+

jdk 1.8+

esrally 2.3+

安装依赖

安装bzip2 ,openssl开发包

这里必须要安装，否则在执行esrally的时候会提示缺少_bz2。

yum -y install bzip2-devel openssl openssl-devel libffi-devel

##安装Python3

（略）

安装JDK

1. 安装java

bash 复制代码

yum install java -y

2. 配置java环境变量

bash 复制代码

[root@tdocker Python-3.6.7]# which java
/usr/bin/java
[root@tdocker Python-3.6.7]# ll /usr/bin/java
lrwxrwxrwx 1 root root 22 10月 28 13:16 /usr/bin/java -> /etc/alternatives/java
[root@tdocker Python-3.6.7]# ll /etc/alternatives/java
lrwxrwxrwx 1 root root 73 10月 28 13:16 /etc/alternatives/java -> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.352.b08-2.el7_9.x86_64/jre/bin/java
[root@tdocker Python-3.6.7]# echo 'export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.352.b08-2.el7_9.x86_64/jre' >> /etc/profile
[root@tdocker Python-3.6.7]# source /etc/profile
[root@tdocker Python-3.6.7]# echo $JAVA_HOME
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.352.b08-2.el7_9.x86_64/jre
注: 这个JAVA_HOME的配置是必要的，我们在使用esrally时会用到它。

安装Git

1. 安装编译依赖

这里我们需要通过编译的方式安装git，首先安装编译依赖：

bash 复制代码

yum -y install curl-devel expat-devel gettext-devel openssl-devel zlib-devel gcc perl-ExtUtils-MakeMaker

2. 下载git源码

这里的--no-check-certificate参数一定要加上，否则在下载的时候会报Issued certificate has expired

bash 复制代码

[root@tdocker]# wget https://mirrors.edge.kernel.org/pub/software/scm/git/git-2.7.5.tar.gz --no-check-certificate

3. 解压并编译安装

bash 复制代码

[root@tdocker src]# tar xf git-2.7.5.tar.gz
[root@tdocker src]# ls
git-2.7.5  git-2.7.5.tar.gz 
[root@tdocker src]# cd git-2.7.5
[root@tdocker git-2.7.5]# make prefix=/usr/local/git all
[root@tdocker git-2.7.5]# make prefix=/usr/local/git install

4. 卸载旧版git

卸载安装Git编译依赖时自动安装的低版本Git

bash 复制代码

[root@tdocker git-2.7.5]# rpm -qa | grep -w git
git-1.8.3.1-23.el7_8.x86_64
[root@tdocker git-2.7.5]#  rpm -e git-1.8.3.1-23.el7_8.x86_64 --nodeps

5. 配置git环境变量

bash 复制代码

[root@tdocker git-2.7.5]# echo 'export GIT2_HOME=/usr/local/git' >> /etc/profile
[root@tdocker git-2.7.5]# echo 'export PATH=$PATH:$GIT2_HOME/bin' >> /etc/profile
[root@tdocker git-2.7.5]# source /etc/profile
[root@tdocker git-2.7.5]# git --version
git version 2.7.5

安装Esrally

1. 获取安装包

通过esrally在GitHub上的官方项目中来获取新版安装包：

2. 解压并安装

将安装包传至服务器上并解压

bash 复制代码

[root@tdocker src]# tar xf esrally-dist-linux-2.3.0.tar.gz
[root@tdocker src]# ls
esrally-dist-2.3.0  esrally-dist-linux-2.3.0.tar.gz  git-2.7.5  git-2.7.5.tar.gz

执行安装：

bash 复制代码

[root@tdocker src]# cd esrally-dist-2.3.0/
[root@tdocker esrally-dist-2.3.0]# ls
bin  install.sh
[root@tdocker esrally-dist-2.3.0]# ./install.sh

安装完成

配置esrally:

bash 复制代码

esrally configure

Esrally参数解释

esrally 相关术语及参数

Rally 是汽车拉力赛的意思，所以关于它里面术语也是跟汽车的拉力赛有关。

track: 即赛道的意思，这里指压测用到的样本数据和压测策略，使用 esrally list tracks 列出。rally 自带的track 可在 https://github.com/elastic/rally-tracks 中查看，每个 track的文件名中都存在 README.md 对压测的数据类型和参数做了详细的说明。如果没有指定 track，则默认使用 geonames track 进行测试；
target-hosts：即远程elasticsearch的ip和端口，以ip:port的形式指定；
pipeline: 指一个压测流程，可以通过
esrally list pipeline 查看，其中有一个 benchmark-only 的流程，就是将 es的管理交给用户来操作，rally 只用来做压测，如果你想针对已有的 es 进行压测，则使用该模式；
track-params：对默认的压测参数进行覆盖；
user-tag：本次压测的 tag 标记；
client-options：指定一些客户端连接选项，比如用户名和密码。

track.json是esrally的压测方案定义文件,包含以下几部分

indices:索引定义

templates:indices template，少使用

corpora:数据集文件定义

operations:具体操作,可以没有，直接在schedule或者challenge内定义

schedule:执行操作时的负载

challenge:区分不同测试场景,比如append和update，便于分开统计

Esrally压测es

官方数据集、track说明

esrally 自带的测试数据track：https://github.com/elastic/rally-tracks

主要包括:

1、Geonames: for evaluating the performance of structured data.

2、Geopoint: for evaluating the performance of geo queries.

3、Percolator: for evaluating the performance of percolation queries.

4、PMC: for evaluating the performance of full text search.

5、NYC taxis: for evaluating the performance for highly structured data.

6、Nested: for evaluating the performance for nested documents.

7、Logging: for evaluating the performance of (Web) server logs.

8、noaa: for evaluating the performance of range fields.

测试的时候主要用到了以下几个也可根据需求选用：

Geonames：评估结构化数据的性能
PMC：评估全文搜索的性能
Nested：评估嵌套文档的性能
Logging：评估（Web）服务器日志的性能
noaa：评估range的性能

我们根据官方文档来执行一个例子：
注:请使用非root用户

启动本地测试es节点性能测试:

压测其他集群

使用示例:
示例1:

bash 复制代码

esrally race \
  --track=geonames \
  --target-hosts=192.168.1.20:9200 \
  --pipeline=benchmark-only \
  --track-params="number_of_shards:3, number_of_replicas:1" \
  --user-tag="version:ARM_4C16G_1T*3" \
  --client-options="basic_auth_user:'elastic', basic_auth_password:'your_password'"

示例2：

bash 复制代码

esrally--track=pmc --target-hosts=192.168.1.20:9200,192.168.1.21:9200,192.168.1.22:9200 --pipeline=benchmark-only -report-format=csv --report-file=~/benchmarks/result.csv

报告示例

报告分析

Rally 导出的数据共有 4 列，分别是 Metric（维度）、*Task（任务） 、Unit（单位）和 Result（结果） *。

分析原则:

Min/Median/Max ：本组测试的最小吞吐率、中位吞吐率和最大吞吐率，单位为 ops/s ,越大越好。
50th/90th/99th/100th percentile latency : 提交请求和收到完整回复之间的时间段，越小越好
50th/90th/99th/99.9th/100th percentile service time：请求处理开始和接收完整响应之间的时间段，越小越好
error rate ：错误率，错误响应相对于响应总数的比例。任何被 Elasticsearch Python
客户端抛出的异常都被认为是错误响应(例如，HTTP 响应码 4xx、5xx或者网络错误，如网络不可达)。

指标具体说明
Cumulative indexing time of primary shards
Min cumulative indexing time across primary shards
Median cumulative indexing time across primary shards
Max cumulative indexing time across primary shards

Cumulative indexing throttle time of primary shards
Min cumulative indexing throttle time across primary shards
Median cumulative indexing throttle time across primary shards
Max cumulative indexing throttle time across primary shards

Cumulative merge time of primary shards
Cumulative merge count of primary shards
Min cumulative merge time across primary shards
Median cumulative merge time across primary shards
Max cumulative merge time across primary shards

Cumulative merge throttle time of primary shards
Min cumulative merge throttle time across primary shards
Median cumulative merge throttle time across primary shards
Max cumulative merge throttle time across primary shards

Cumulative refresh time of primary shards
Cumulative refresh count of primary shards
Min cumulative refresh time across primary shards
Median cumulative refresh time across primary shards
Max cumulative refresh time across primary shards

Cumulative flush time of primary shards
Cumulative flush count of primary shards
Min cumulative flush time across primary shards
Median cumulative flush time across primary shards
Max cumulative flush time across primary shards

根据每组数据对应的不同操作，我们可以将其数据分为若干组，这里我将数据进行一个基础的划分，从上到下依次为：

索引时间
索引节流时间
合并时间
合并节流时间
刷新时间
重刷时间
首先，先看第一组的索引时间，索引时间共有四个指标:
Cumulative indexing time of primary shards: 主分片累计索引时间
Cumulative indexing time across primary shards：跨分片累计索引时间
Cumulative indexing throttle time of primary shards：主分片累计节流索引时间
Cumulative indexing throttle time across primary shards：跨分片累计节流索引时间
这四个指标说明了 ElasticSearch 在进行数据处理所需要的索引时间，因此，时间越短越好。
接下来看合并时间的数据
Cumulative merge throttle time of primary shards：主分片累计节流合并时间
Min cumulative merge throttle time across primary shards：主分片累计节流合并时间
Median cumulative merge throttle time across primary shards：主分片累计节流中位合并时间
Max cumulative merge throttle time across primary shards：主分片累计节流最大合并时间
合并时间组结果类似于索引时间组，不同的是测量的数据 Merge 时间。和 index 类似，时间越短越好，合并数量越大越好。
node-stats 组的结果是针对 node-stats 命令的数据分析结果。这里的吞吐量越大越好，时延则越小越好。