关于 Shardingsphere SQL 预热优化项

之前从火焰图观测到,第一次请求使用 ANTLR 解析 SQL 耗时较长,而第二次请求由于缓存原因耗时很小,相关代码如下:

kotlin 复制代码
    private SQLStatement parse0(String sql, boolean useCache) {
        if (useCache) {
            Optional<SQLStatement> cachedSQLStatement = this.cache.getSQLStatement(sql);
            if (cachedSQLStatement.isPresent()) {
                return (SQLStatement)cachedSQLStatement.get();
            }
        }

        SQLStatement result = (new SQLParseKernel(ParseRuleRegistry.getInstance(), this.databaseTypeName, sql)).parse();
        if (useCache) {
            this.cache.put(sql, result);
        }

        return result;
    }

那么为了解决第一次请求耗时较长的原因 ( 以支持更平滑的发布 ),自然而然想到的思路便是启动预加载。目标有了,接下来就是方案,进一步思考会涉及以下几个问题:

  1. 自定义代码如何切入源码;
  2. 预加载的是什么?SQL 还是 SQLStatement;

针对第一点,最先想到的思路便是继承,但查看源码发现,核心类都被声明为 final ( Java 语法上 final class 禁止被继承 ),看起来继承路子堵死了 ( 真要搞也行,但太黑科技了 )。继承不行,那就反射来搞,通过查看相关对象的创建地方,最终确定了如下代码:

ini 复制代码
        private static Map<String, SQLParseEngine> ENGINES;
        private static Field cacheInSQLParseEngine;
        private static Field cacheInSQLParseResultCache;
        private static volatile boolean init = false;

        static {
            try {
                Field _ENGINES = SQLParseEngineFactory.class.getDeclaredField("ENGINES");
                _ENGINES.setAccessible(true);
                ENGINES = (Map<String, SQLParseEngine>) _ENGINES.get(null);
                cacheInSQLParseEngine = SQLParseEngine.class.getDeclaredField("cache");
                cacheInSQLParseEngine.setAccessible(true);
                cacheInSQLParseResultCache = SQLParseResultCache.class.getDeclaredField("cache");
                cacheInSQLParseResultCache.setAccessible(true);
                init = true;
            } catch (Exception e) {
                log.warn("cannot get fields because of !", e);
            }
        }

至此,逻辑的切点完成,那么看看第二点,一开始想到将 SQLStatement 序列化到存储中,预加载的时候反序列化,但考虑一些兼容性问题,加上 (new SQLParseKernel(ParseRuleRegistry.getInstance(), this.databaseTypeName, sql)).parse() 这行代码给了一个很好的提示,同时结合数据的加载和保存,如下代码:

ini 复制代码
        @PostConstruct
        public void load() {
            if (!init) return;
            long start = System.currentTimeMillis();
            log.info("loading at {}", start);
            ENGINES.keySet().forEach(databaseTypeName -> {
                Set<String> sqls = warmerStorage.getSQL(databaseTypeName);
                log.info("the amount of sql fetched from storage is {}", sqls.size());
                sqls.forEach(sql -> ENGINES.get(databaseTypeName).parse(sql, true));
                log.debug("the details of sql fetched from storage are {}", sqls);
            });
            long end = System.currentTimeMillis();
            log.info("loaded at {}, consumed {} milliseconds", end, end - start);
        }

        // 异常已由 postProcessBeforeDestruction 处理
        @PreDestroy
        public void save() {
            if (!init) return;
            long start = System.currentTimeMillis();
            log.info("saving at {}", start);
            ENGINES.keySet().forEach(databaseTypeName -> {
                Set<String> sqls = getAllSQL(databaseTypeName);
                log.info("the amount of sql going to be saved is {}", sqls.size());
                log.debug("the details of sql going to be saved are {}", sqls);
                warmerStorage.saveSQL(databaseTypeName, sqls);
            });
            long end = System.currentTimeMillis();
            log.info("saved at {}, consumed {} milliseconds", end, end - start);
        }

在上面代码中,启动过程中从存储捞取 SQL ( 当然项目初始上线必然为空 ) 进行 SQL 解析并使用反射填充缓存,而在容器销毁之前则同样通过反射将这个期间内收集到 SQL 保存作为后续的预加载。此外,考虑到平滑发布的必要性,启动过程加载及时抛错;同时只在启动和销毁的只执行一次,因此性能和并发并无问题。 接下来用 wrk & pidstat 针对实验组和对照组做测试,先来看简单的单线程单链接的情况 ( wrk -t1 -c1 -d10s ),并观察性能数据、统计数据和输出日志 ( 实验组和对照组配置和环境相同,同时以下数据经历三次测试验证 )。

实验组

pidstat 输出
makefile 复制代码
14:49:04      UID       PID    %usr %system  %guest    %CPU   CPU  Command
14:49:24        0         1    1.00    0.00    0.00    1.00    15  java
14:49:25        0         1   40.00    2.00    0.00   42.00    15  java
14:49:26        0         1  164.00    3.00    0.00  167.00    15  java
14:49:27        0         1  152.00   10.00    0.00  162.00    15  java
14:49:28        0         1   55.00    6.00    0.00   61.00    15  java
14:49:29        0         1   57.00    4.00    0.00   61.00    15  java
14:49:30        0         1   55.00    3.00    0.00   58.00    15  java
14:49:31        0         1   76.00    4.00    0.00   80.00    15  java
14:49:32        0         1   51.00    5.00    0.00   56.00    15  java
14:49:33        0         1   54.00    3.00    0.00   57.00    15  java
14:49:34        0         1   59.00    5.00    0.00   64.00    15  java
14:49:35        0         1   42.00    3.00    0.00   45.00    15  java
14:49:36        0         1    4.00    0.00    0.00    4.00    15  java

对照组

pidstat 输出
makefile 复制代码
14:48:55      UID       PID    %usr %system  %guest    %CPU   CPU  Command
14:49:12        0         1    2.00    2.00    0.00    4.00    28  java
14:49:13        0         1  141.00    6.00    0.00  147.00    28  java
14:49:14        0         1  122.00    8.00    0.00  130.00    28  java
14:49:15        0         1  124.00    3.00    0.00  127.00    28  java
14:49:16        0         1  191.09    6.93    0.00  198.02    28  java
14:49:17        0         1  369.00   11.00    0.00  380.00    28  java
14:49:18        0         1  183.00   14.00    0.00  197.00    28  java
14:49:19        0         1  384.00   15.00    0.00  399.00    28  java
14:49:20        0         1  317.00   16.00    0.00  333.00    28  java
14:49:21        0         1   99.00   10.00    0.00  109.00    28  java
14:49:22        0         1   64.00    7.00    0.00   71.00    28  java
14:49:23        0         1   50.00    8.00    0.00   58.00    28  java
14:49:24        0         1    5.00    1.00    0.00    6.00    28  java

从上面的 pidstat 输出可以看出效果非常明显 ( CPU 使用率降低了两三倍,实际上数值低一些,因为 wrk 会做重试 )。

再来看看 wrk 统计数据和输出日志

实验组

wrk 输出
matlab 复制代码
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   172.73ms   28.68ms 332.40ms   91.38%
    Req/Sec     5.74      2.51    10.00     62.07%
  Latency Distribution
     50%  165.57ms
     75%  173.90ms
     90%  181.68ms
     99%  332.40ms
  58 requests in 10.10s, 396.20KB read
Requests/sec:      5.74
Transfer/sec:     39.24KB
日志输出
ruby 复制代码
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu

对照组

wrk 输出
matlab 复制代码
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   322.54ms  292.45ms   1.10s    83.33%
    Req/Sec     5.59      3.11    10.00     51.28%
  Latency Distribution
     50%  163.87ms
     75%  509.89ms
     90%  839.26ms
     99%    1.10s 
  39 requests in 10.06s, 223.27KB read
Requests/sec:      3.88
Transfer/sec:     22.18KB
日志输出:
ruby 复制代码
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu

从上面的数据可以看到依旧效果明显,当然也可以看到实验组依旧有异常,但相对对照组来说少了六倍( 实际上数值低一些,因为 wrk 会做重试 ),还有进一步优化空间。

更近一步,在看看两组第二次测试性能 ( 两组数据相近,因此只展示其中一组数据 ):

pidstat 输出
makefile 复制代码
14:57:46      UID       PID    %usr %system  %guest    %CPU   CPU  Command
14:57:51        0         1    2.00    2.00    0.00    4.00    28  java
14:57:52        0         1   48.00    6.00    0.00   54.00    28  java
14:57:53        0         1   56.00    3.00    0.00   59.00    28  java
14:57:54        0         1   67.00    6.00    0.00   73.00    28  java
14:57:55        0         1   51.00    4.00    0.00   55.00    28  java
14:57:56        0         1   36.00    4.00    0.00   40.00    28  java
14:57:57        0         1   51.00    6.00    0.00   57.00    28  java
14:57:58        0         1   53.00    5.00    0.00   58.00    28  java
14:57:59        0         1   43.00    5.00    0.00   48.00    28  java
14:58:00        0         1   95.00    4.00    0.00   99.00    28  java
14:58:01        0         1   55.00    6.00    0.00   61.00    28  java
14:58:02        0         1   27.00    2.00    0.00   29.00    28  java
14:58:03        0         1    2.00    2.00    0.00    4.00    28  java
wrk 输出:
matlab 复制代码
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   172.73ms   28.68ms 332.40ms   91.38%
    Req/Sec     5.74      2.51    10.00     62.07%
  Latency Distribution
     50%  165.57ms
     75%  173.90ms
     90%  181.68ms
     99%  332.40ms
  58 requests in 10.10s, 396.20KB read
Requests/sec:      5.74
Transfer/sec:     39.24KB

从上面的数据,可以看出,还有进一步优化空间 ( TO BE CONTINUED )。

写在最后

根据 shardingsphere 缓存维度 -- sql 形如 select * from where id in ( ?, ? ,? ) 倘若元素数据每次不同,则会生成相当冗余的缓存,推荐统一对列表数据做划分再处理,如下:

less 复制代码
Lists.partition(Collections.emptyList(), 10).forEach( e -> { // do something} );
相关推荐
NoneCoder1 天前
CSS系列(26)-- 动画性能优化详解
前端·css·性能优化
苏三说技术1 天前
Redis 性能优化的18招
数据库·redis·性能优化
程序猿会指北1 天前
【鸿蒙(HarmonyOS)性能优化指南】内存分析器Allocation Profiler
性能优化·移动开发·harmonyos·openharmony·arkui·组件化·鸿蒙开发
程序猿会指北1 天前
【鸿蒙(HarmonyOS)性能优化指南】启动分析工具Launch Profiler
c++·性能优化·harmonyos·openharmony·arkui·启动优化·鸿蒙开发
彭亚川Allen2 天前
优化了2年的性能,没想到最后被数据库连接池坑了一把
数据库·后端·性能优化
MClink2 天前
Go怎么做性能优化工具篇之pprof
开发语言·性能优化·golang
京东零售技术2 天前
Taro小程序开发性能优化实践
性能优化·taro
理想不理想v2 天前
wepack如何进行性能优化
性能优化
fantasy_arch3 天前
CPU性能优化-磁盘空间和解析时间
网络·性能优化
码农老起3 天前
企业如何通过TDSQL实现高效数据库迁移与性能优化
数据库·性能优化