关于 Shardingsphere SQL 预热优化项

之前从火焰图观测到,第一次请求使用 ANTLR 解析 SQL 耗时较长,而第二次请求由于缓存原因耗时很小,相关代码如下:

kotlin 复制代码
    private SQLStatement parse0(String sql, boolean useCache) {
        if (useCache) {
            Optional<SQLStatement> cachedSQLStatement = this.cache.getSQLStatement(sql);
            if (cachedSQLStatement.isPresent()) {
                return (SQLStatement)cachedSQLStatement.get();
            }
        }

        SQLStatement result = (new SQLParseKernel(ParseRuleRegistry.getInstance(), this.databaseTypeName, sql)).parse();
        if (useCache) {
            this.cache.put(sql, result);
        }

        return result;
    }

那么为了解决第一次请求耗时较长的原因 ( 以支持更平滑的发布 ),自然而然想到的思路便是启动预加载。目标有了,接下来就是方案,进一步思考会涉及以下几个问题:

  1. 自定义代码如何切入源码;
  2. 预加载的是什么?SQL 还是 SQLStatement;

针对第一点,最先想到的思路便是继承,但查看源码发现,核心类都被声明为 final ( Java 语法上 final class 禁止被继承 ),看起来继承路子堵死了 ( 真要搞也行,但太黑科技了 )。继承不行,那就反射来搞,通过查看相关对象的创建地方,最终确定了如下代码:

ini 复制代码
        private static Map<String, SQLParseEngine> ENGINES;
        private static Field cacheInSQLParseEngine;
        private static Field cacheInSQLParseResultCache;
        private static volatile boolean init = false;

        static {
            try {
                Field _ENGINES = SQLParseEngineFactory.class.getDeclaredField("ENGINES");
                _ENGINES.setAccessible(true);
                ENGINES = (Map<String, SQLParseEngine>) _ENGINES.get(null);
                cacheInSQLParseEngine = SQLParseEngine.class.getDeclaredField("cache");
                cacheInSQLParseEngine.setAccessible(true);
                cacheInSQLParseResultCache = SQLParseResultCache.class.getDeclaredField("cache");
                cacheInSQLParseResultCache.setAccessible(true);
                init = true;
            } catch (Exception e) {
                log.warn("cannot get fields because of !", e);
            }
        }

至此,逻辑的切点完成,那么看看第二点,一开始想到将 SQLStatement 序列化到存储中,预加载的时候反序列化,但考虑一些兼容性问题,加上 (new SQLParseKernel(ParseRuleRegistry.getInstance(), this.databaseTypeName, sql)).parse() 这行代码给了一个很好的提示,同时结合数据的加载和保存,如下代码:

ini 复制代码
        @PostConstruct
        public void load() {
            if (!init) return;
            long start = System.currentTimeMillis();
            log.info("loading at {}", start);
            ENGINES.keySet().forEach(databaseTypeName -> {
                Set<String> sqls = warmerStorage.getSQL(databaseTypeName);
                log.info("the amount of sql fetched from storage is {}", sqls.size());
                sqls.forEach(sql -> ENGINES.get(databaseTypeName).parse(sql, true));
                log.debug("the details of sql fetched from storage are {}", sqls);
            });
            long end = System.currentTimeMillis();
            log.info("loaded at {}, consumed {} milliseconds", end, end - start);
        }

        // 异常已由 postProcessBeforeDestruction 处理
        @PreDestroy
        public void save() {
            if (!init) return;
            long start = System.currentTimeMillis();
            log.info("saving at {}", start);
            ENGINES.keySet().forEach(databaseTypeName -> {
                Set<String> sqls = getAllSQL(databaseTypeName);
                log.info("the amount of sql going to be saved is {}", sqls.size());
                log.debug("the details of sql going to be saved are {}", sqls);
                warmerStorage.saveSQL(databaseTypeName, sqls);
            });
            long end = System.currentTimeMillis();
            log.info("saved at {}, consumed {} milliseconds", end, end - start);
        }

在上面代码中,启动过程中从存储捞取 SQL ( 当然项目初始上线必然为空 ) 进行 SQL 解析并使用反射填充缓存,而在容器销毁之前则同样通过反射将这个期间内收集到 SQL 保存作为后续的预加载。此外,考虑到平滑发布的必要性,启动过程加载及时抛错;同时只在启动和销毁的只执行一次,因此性能和并发并无问题。 接下来用 wrk & pidstat 针对实验组和对照组做测试,先来看简单的单线程单链接的情况 ( wrk -t1 -c1 -d10s ),并观察性能数据、统计数据和输出日志 ( 实验组和对照组配置和环境相同,同时以下数据经历三次测试验证 )。

实验组

pidstat 输出
makefile 复制代码
14:49:04      UID       PID    %usr %system  %guest    %CPU   CPU  Command
14:49:24        0         1    1.00    0.00    0.00    1.00    15  java
14:49:25        0         1   40.00    2.00    0.00   42.00    15  java
14:49:26        0         1  164.00    3.00    0.00  167.00    15  java
14:49:27        0         1  152.00   10.00    0.00  162.00    15  java
14:49:28        0         1   55.00    6.00    0.00   61.00    15  java
14:49:29        0         1   57.00    4.00    0.00   61.00    15  java
14:49:30        0         1   55.00    3.00    0.00   58.00    15  java
14:49:31        0         1   76.00    4.00    0.00   80.00    15  java
14:49:32        0         1   51.00    5.00    0.00   56.00    15  java
14:49:33        0         1   54.00    3.00    0.00   57.00    15  java
14:49:34        0         1   59.00    5.00    0.00   64.00    15  java
14:49:35        0         1   42.00    3.00    0.00   45.00    15  java
14:49:36        0         1    4.00    0.00    0.00    4.00    15  java

对照组

pidstat 输出
makefile 复制代码
14:48:55      UID       PID    %usr %system  %guest    %CPU   CPU  Command
14:49:12        0         1    2.00    2.00    0.00    4.00    28  java
14:49:13        0         1  141.00    6.00    0.00  147.00    28  java
14:49:14        0         1  122.00    8.00    0.00  130.00    28  java
14:49:15        0         1  124.00    3.00    0.00  127.00    28  java
14:49:16        0         1  191.09    6.93    0.00  198.02    28  java
14:49:17        0         1  369.00   11.00    0.00  380.00    28  java
14:49:18        0         1  183.00   14.00    0.00  197.00    28  java
14:49:19        0         1  384.00   15.00    0.00  399.00    28  java
14:49:20        0         1  317.00   16.00    0.00  333.00    28  java
14:49:21        0         1   99.00   10.00    0.00  109.00    28  java
14:49:22        0         1   64.00    7.00    0.00   71.00    28  java
14:49:23        0         1   50.00    8.00    0.00   58.00    28  java
14:49:24        0         1    5.00    1.00    0.00    6.00    28  java

从上面的 pidstat 输出可以看出效果非常明显 ( CPU 使用率降低了两三倍,实际上数值低一些,因为 wrk 会做重试 )。

再来看看 wrk 统计数据和输出日志

实验组

wrk 输出
matlab 复制代码
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   172.73ms   28.68ms 332.40ms   91.38%
    Req/Sec     5.74      2.51    10.00     62.07%
  Latency Distribution
     50%  165.57ms
     75%  173.90ms
     90%  181.68ms
     99%  332.40ms
  58 requests in 10.10s, 396.20KB read
Requests/sec:      5.74
Transfer/sec:     39.24KB
日志输出
ruby 复制代码
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu

对照组

wrk 输出
matlab 复制代码
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   322.54ms  292.45ms   1.10s    83.33%
    Req/Sec     5.59      3.11    10.00     51.28%
  Latency Distribution
     50%  163.87ms
     75%  509.89ms
     90%  839.26ms
     99%    1.10s 
  39 requests in 10.06s, 223.27KB read
Requests/sec:      3.88
Transfer/sec:     22.18KB
日志输出:
ruby 复制代码
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu
feign.RetryableException: Read timed out executing POST http://tx-goods-service/api/v1/tx/goods/spu-search/searchSpu

从上面的数据可以看到依旧效果明显,当然也可以看到实验组依旧有异常,但相对对照组来说少了六倍( 实际上数值低一些,因为 wrk 会做重试 ),还有进一步优化空间。

更近一步,在看看两组第二次测试性能 ( 两组数据相近,因此只展示其中一组数据 ):

pidstat 输出
makefile 复制代码
14:57:46      UID       PID    %usr %system  %guest    %CPU   CPU  Command
14:57:51        0         1    2.00    2.00    0.00    4.00    28  java
14:57:52        0         1   48.00    6.00    0.00   54.00    28  java
14:57:53        0         1   56.00    3.00    0.00   59.00    28  java
14:57:54        0         1   67.00    6.00    0.00   73.00    28  java
14:57:55        0         1   51.00    4.00    0.00   55.00    28  java
14:57:56        0         1   36.00    4.00    0.00   40.00    28  java
14:57:57        0         1   51.00    6.00    0.00   57.00    28  java
14:57:58        0         1   53.00    5.00    0.00   58.00    28  java
14:57:59        0         1   43.00    5.00    0.00   48.00    28  java
14:58:00        0         1   95.00    4.00    0.00   99.00    28  java
14:58:01        0         1   55.00    6.00    0.00   61.00    28  java
14:58:02        0         1   27.00    2.00    0.00   29.00    28  java
14:58:03        0         1    2.00    2.00    0.00    4.00    28  java
wrk 输出:
matlab 复制代码
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   172.73ms   28.68ms 332.40ms   91.38%
    Req/Sec     5.74      2.51    10.00     62.07%
  Latency Distribution
     50%  165.57ms
     75%  173.90ms
     90%  181.68ms
     99%  332.40ms
  58 requests in 10.10s, 396.20KB read
Requests/sec:      5.74
Transfer/sec:     39.24KB

从上面的数据,可以看出,还有进一步优化空间 ( TO BE CONTINUED )。

写在最后

根据 shardingsphere 缓存维度 -- sql 形如 select * from where id in ( ?, ? ,? ) 倘若元素数据每次不同,则会生成相当冗余的缓存,推荐统一对列表数据做划分再处理,如下:

less 复制代码
Lists.partition(Collections.emptyList(), 10).forEach( e -> { // do something} );
相关推荐
皮实的芒果8 小时前
前端实时通信方案对比:WebSocket vs SSE vs setInterval 轮询
前端·javascript·性能优化
mx9518 小时前
真实业务场景:在React中使用Web Worker实现HTML导出PDF的性能优化实践
性能优化·浏览器
博睿谷IT99_12 小时前
PostgreSQL性能优化实用技巧‌
数据库·postgresql·性能优化
冼紫菜12 小时前
基于Redis实现高并发抢券系统的数据同步方案详解
java·数据库·redis·后端·mysql·缓存·性能优化
顾林海12 小时前
深入探究 Android Native 代码的崩溃捕获机制
android·面试·性能优化
施嘉伟14 小时前
Kingbase性能优化浅谈
性能优化·kingbase
东风西巷17 小时前
Control Center安卓版:自定义控制中心,提升手机操作体验
android·智能手机·性能优化·软件需求
万水千山走遍TML1 天前
JavaScript性能优化
开发语言·前端·javascript·性能优化·js·js性能
顾林海1 天前
深入解析 Android Native Hook
android·面试·性能优化
三年呀2 天前
深入剖析TCP协议(内容一):从OSI与TCP/IP网络模型到三次握手、四次挥手、状态管理、性能优化及Linux内核源码实现的全面技术指南
网络·tcp/ip·性能优化·osi模型·拥塞控制