之前从火焰图观测到,第一次请求使用 ANTLR 解析 SQL 耗时较长,而第二次请求由于缓存原因耗时很小,相关代码如下:
kotlin
private SQLStatement parse0(String sql, boolean useCache) {
if (useCache) {
Optional<SQLStatement> cachedSQLStatement = this.cache.getSQLStatement(sql);
if (cachedSQLStatement.isPresent()) {
return (SQLStatement)cachedSQLStatement.get();
}
}
SQLStatement result = (new SQLParseKernel(ParseRuleRegistry.getInstance(), this.databaseTypeName, sql)).parse();
if (useCache) {
this.cache.put(sql, result);
}
return result;
}
那么为了解决第一次请求耗时较长的原因 ( 以支持更平滑的发布 ),自然而然想到的思路便是启动预加载。目标有了,接下来就是方案,进一步思考会涉及以下几个问题:
- 自定义代码如何切入源码;
- 预加载的是什么?SQL 还是 SQLStatement;
针对第一点,最先想到的思路便是继承,但查看源码发现,核心类都被声明为 final ( Java 语法上 final class 禁止被继承 ),看起来继承路子堵死了 ( 真要搞也行,但太黑科技了 )。继承不行,那就反射来搞,通过查看相关对象的创建地方,最终确定了如下代码:
ini
private static Map<String, SQLParseEngine> ENGINES;
private static Field cacheInSQLParseEngine;
private static Field cacheInSQLParseResultCache;
private static volatile boolean init = false;
static {
try {
Field _ENGINES = SQLParseEngineFactory.class.getDeclaredField("ENGINES");
_ENGINES.setAccessible(true);
ENGINES = (Map<String, SQLParseEngine>) _ENGINES.get(null);
cacheInSQLParseEngine = SQLParseEngine.class.getDeclaredField("cache");
cacheInSQLParseEngine.setAccessible(true);
cacheInSQLParseResultCache = SQLParseResultCache.class.getDeclaredField("cache");
cacheInSQLParseResultCache.setAccessible(true);
init = true;
} catch (Exception e) {
log.warn("cannot get fields because of !", e);
}
}
至此,逻辑的切点完成,那么看看第二点,一开始想到将 SQLStatement 序列化到存储中,预加载的时候反序列化,但考虑一些兼容性问题,加上 (new SQLParseKernel(ParseRuleRegistry.getInstance(), this.databaseTypeName, sql)).parse() 这行代码给了一个很好的提示,同时结合数据的加载和保存,如下代码:
ini
@PostConstruct
public void load() {
if (!init) return;
long start = System.currentTimeMillis();
log.info("loading at {}", start);
ENGINES.keySet().forEach(databaseTypeName -> {
Set<String> sqls = warmerStorage.getSQL(databaseTypeName);
log.info("the amount of sql fetched from storage is {}", sqls.size());
sqls.forEach(sql -> ENGINES.get(databaseTypeName).parse(sql, true));
log.debug("the details of sql fetched from storage are {}", sqls);
});
long end = System.currentTimeMillis();
log.info("loaded at {}, consumed {} milliseconds", end, end - start);
}
// 异常已由 postProcessBeforeDestruction 处理
@PreDestroy
public void save() {
if (!init) return;
long start = System.currentTimeMillis();
log.info("saving at {}", start);
ENGINES.keySet().forEach(databaseTypeName -> {
Set<String> sqls = getAllSQL(databaseTypeName);
log.info("the amount of sql going to be saved is {}", sqls.size());
log.debug("the details of sql going to be saved are {}", sqls);
warmerStorage.saveSQL(databaseTypeName, sqls);
});
long end = System.currentTimeMillis();
log.info("saved at {}, consumed {} milliseconds", end, end - start);
}
在上面代码中,启动过程中从存储捞取 SQL ( 当然项目初始上线必然为空 ) 进行 SQL 解析并使用反射填充缓存,而在容器销毁之前则同样通过反射将这个期间内收集到 SQL 保存作为后续的预加载。此外,考虑到平滑发布的必要性,启动过程加载及时抛错;同时只在启动和销毁的只执行一次,因此性能和并发并无问题。 接下来用 wrk & pidstat 针对实验组和对照组做测试,先来看简单的单线程单链接的情况 ( wrk -t1 -c1 -d10s ),并观察性能数据、统计数据和输出日志 ( 实验组和对照组配置和环境相同,同时以下数据经历三次测试验证 )。
实验组
pidstat 输出
makefile
14:49:04 UID PID %usr %system %guest %CPU CPU Command
14:49:24 0 1 1.00 0.00 0.00 1.00 15 java
14:49:25 0 1 40.00 2.00 0.00 42.00 15 java
14:49:26 0 1 164.00 3.00 0.00 167.00 15 java
14:49:27 0 1 152.00 10.00 0.00 162.00 15 java
14:49:28 0 1 55.00 6.00 0.00 61.00 15 java
14:49:29 0 1 57.00 4.00 0.00 61.00 15 java
14:49:30 0 1 55.00 3.00 0.00 58.00 15 java
14:49:31 0 1 76.00 4.00 0.00 80.00 15 java
14:49:32 0 1 51.00 5.00 0.00 56.00 15 java
14:49:33 0 1 54.00 3.00 0.00 57.00 15 java
14:49:34 0 1 59.00 5.00 0.00 64.00 15 java
14:49:35 0 1 42.00 3.00 0.00 45.00 15 java
14:49:36 0 1 4.00 0.00 0.00 4.00 15 java
对照组
pidstat 输出
makefile
14:48:55 UID PID %usr %system %guest %CPU CPU Command
14:49:12 0 1 2.00 2.00 0.00 4.00 28 java
14:49:13 0 1 141.00 6.00 0.00 147.00 28 java
14:49:14 0 1 122.00 8.00 0.00 130.00 28 java
14:49:15 0 1 124.00 3.00 0.00 127.00 28 java
14:49:16 0 1 191.09 6.93 0.00 198.02 28 java
14:49:17 0 1 369.00 11.00 0.00 380.00 28 java
14:49:18 0 1 183.00 14.00 0.00 197.00 28 java
14:49:19 0 1 384.00 15.00 0.00 399.00 28 java
14:49:20 0 1 317.00 16.00 0.00 333.00 28 java
14:49:21 0 1 99.00 10.00 0.00 109.00 28 java
14:49:22 0 1 64.00 7.00 0.00 71.00 28 java
14:49:23 0 1 50.00 8.00 0.00 58.00 28 java
14:49:24 0 1 5.00 1.00 0.00 6.00 28 java
从上面的 pidstat 输出可以看出效果非常明显 ( CPU 使用率降低了两三倍,实际上数值低一些,因为 wrk 会做重试 )。
再来看看 wrk 统计数据和输出日志
实验组
wrk 输出
matlab
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 172.73ms 28.68ms 332.40ms 91.38%
Req/Sec 5.74 2.51 10.00 62.07%
Latency Distribution
50% 165.57ms
75% 173.90ms
90% 181.68ms
99% 332.40ms
58 requests in 10.10s, 396.20KB read
Requests/sec: 5.74
Transfer/sec: 39.24KB
日志输出
ruby
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
对照组
wrk 输出
matlab
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 322.54ms 292.45ms 1.10s 83.33%
Req/Sec 5.59 3.11 10.00 51.28%
Latency Distribution
50% 163.87ms
75% 509.89ms
90% 839.26ms
99% 1.10s
39 requests in 10.06s, 223.27KB read
Requests/sec: 3.88
Transfer/sec: 22.18KB
日志输出:
ruby
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
从上面的数据可以看到依旧效果明显,当然也可以看到实验组依旧有异常,但相对对照组来说少了六倍( 实际上数值低一些,因为 wrk 会做重试 ),还有进一步优化空间。
更近一步,在看看两组第二次测试性能 ( 两组数据相近,因此只展示其中一组数据 ):
pidstat 输出
makefile
14:57:46 UID PID %usr %system %guest %CPU CPU Command
14:57:51 0 1 2.00 2.00 0.00 4.00 28 java
14:57:52 0 1 48.00 6.00 0.00 54.00 28 java
14:57:53 0 1 56.00 3.00 0.00 59.00 28 java
14:57:54 0 1 67.00 6.00 0.00 73.00 28 java
14:57:55 0 1 51.00 4.00 0.00 55.00 28 java
14:57:56 0 1 36.00 4.00 0.00 40.00 28 java
14:57:57 0 1 51.00 6.00 0.00 57.00 28 java
14:57:58 0 1 53.00 5.00 0.00 58.00 28 java
14:57:59 0 1 43.00 5.00 0.00 48.00 28 java
14:58:00 0 1 95.00 4.00 0.00 99.00 28 java
14:58:01 0 1 55.00 6.00 0.00 61.00 28 java
14:58:02 0 1 27.00 2.00 0.00 29.00 28 java
14:58:03 0 1 2.00 2.00 0.00 4.00 28 java
wrk 输出:
matlab
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 172.73ms 28.68ms 332.40ms 91.38%
Req/Sec 5.74 2.51 10.00 62.07%
Latency Distribution
50% 165.57ms
75% 173.90ms
90% 181.68ms
99% 332.40ms
58 requests in 10.10s, 396.20KB read
Requests/sec: 5.74
Transfer/sec: 39.24KB
从上面的数据,可以看出,还有进一步优化空间 ( TO BE CONTINUED )。
写在最后
根据 shardingsphere 缓存维度 -- sql 形如 select * from where id in ( ?, ? ,? ) 倘若元素数据每次不同,则会生成相当冗余的缓存,推荐统一对列表数据做划分再处理,如下:
less
Lists.partition(Collections.emptyList(), 10).forEach( e -> { // do something} );