芝法酱学习笔记（2.3）——shardingsphere分库分表

一、前言

之前的例子中，我们以一个简化了的销售单报表查询，展示了大数据量查询时，在索引和变量类型层面可以做的一些优化。可我们发现，无论怎么优化，一次查询都要好几秒。

这是一个现实问题，只要一个系统用户的业务足够多，运行时间足够长，数据库的单张表中就会存在海量的数据。数据量大到一定程度，无论怎么做性能都会下降。那是否有解决方法呢？

最容易想到的，就是分库分表了。其实在这个业务中，第一章已经给出了一个分库的方案，把不同用户放在不同的库中。然而单个用户的数据也可能较大，这时就需要分表了。

本节，就介绍市面上最主流的分库分表方案，shardingsphere。

二、代码展示

由于本节原理部分讲的很少，多数仅仅介绍shardingsphere如何使用。而该框架学习的难点仅仅在如何配置，理论部分涉及很少，故这次在开始位置，直接给出代码展示，大家可以看着代码，再看我后面的介绍。

三、shardingsphere配置

3.1 版本

本节使用的shardingsphere版本引用如下：

xml 复制代码

        <dependency>
            <groupId>org.apache.shardingsphere</groupId>
            <artifactId>shardingsphere-jdbc</artifactId>
            <version>5.5.1</version>
        </dependency>

3.2 yml配置

shardingsphere有一个复杂的yml配置，我们先看官方文档的介绍。

其实，看着官方文档还是很复杂，而且配置时总觉得心慌，我们可以看着源码做配置。由于我们本次也牵扯监控中心和企业中心，监控中心是不分表的，所以用SpringBoot的默认数据源即可。所以本次数据要我们手动配置。创建shardingsphere数据源的核心代码为：

java 复制代码

 @Bean(name = "shardingSphereDataSource")
    public DataSource shardingSphereDataSource() throws SQLException, IOException {
        File file = new File(getClass().getClassLoader().getResource("shardingsphere.yml").getFile());
        DataSource dataSource = YamlShardingSphereDataSourceFactory.createDataSource(file);
        return dataSource;
    }

我们点进去这个YamlShardingSphereDataSourceFactory，可以看到配置类核心是这个结构体：

java 复制代码

@Getter
@Setter
public final class YamlJDBCConfiguration implements YamlConfiguration {
    private String databaseName;
    private Map<String, Map<String, Object>> dataSources = new HashMap<>();
    private Collection<YamlRuleConfiguration> rules = new LinkedList<>();
    private YamlModeConfiguration mode;
    private YamlAuthorityRuleConfiguration authority;
    private YamlSQLParserRuleConfiguration sqlParser;
    private YamlTransactionRuleConfiguration transaction;
    private YamlGlobalClockRuleConfiguration globalClock;
    private YamlSQLFederationRuleConfiguration sqlFederation;
    private YamlSQLTranslatorRuleConfiguration sqlTranslator;
    private YamlLoggingRuleConfiguration logging;
    private Properties props = new Properties();

	......
 }

这里面的变量名，则是我们yml第一层的配置。我们本节只讲本例中用到的配置，其他细节大家可以看着官网仔细学习。

变量	名称	作用
databaseName	数据源名	配置数据源的名称，如果不是自动配置的，这个其实没啥用
dataSources	数据源	在该节点下，配置多个数据源
mode	模式	是单机模式还是集群模式，本例子配单机Standalone，还有配连接类型，本例配JDBC
rule	规则	该段配置是重中之重，配置分库分表的规则。该配置是一个数组，每种类型规则可以配一个
props	变量	一些sharding框架用的变量，本例中用于开启log信息

3.3 规则配置

我们对YamlRuleConfiguration这个类ctrl + H，可以看到每种规则的类，这样可以确定我们每种规则具体该怎么配置。

shardingsphere的规则配置中，数组元素对应哪个类，使用shardingsphere的一个特有的注释，如数据分片的配置是：- !SHARDING

我们可以在官方文档中，查看每种规则配置的注释，这里给出本例用到的配置

类型类名	类型	注释	作用
YamlShardingRuleConfiguration	分片规则	- !SHARDING	用于描述如何分库分表
YamlSingleRuleConfiguration	单表规则	- !SINGLE	用于扫描库中有哪些表，可以配置通配
YamlBroadcastRuleConfiguration	广播表规则	- !BROADCAST	用于描述哪些表会被用作连表，并且该表是没做分表的

3.3.1 分片规则

分片规则怎么配，我们可以结合官网文档和源码来看

官网文档：

yml 复制代码

rules:
- !SHARDING
  tables: # 数据分片规则配置
    <logic_table_name> (+): # 逻辑表名称
      actualDataNodes (?): # 由数据源名 + 表名组成（参考 Inline 语法规则）
      databaseStrategy (?): # 分库策略，缺省表示使用默认分库策略，以下的分片策略只能选其一
        standard: # 用于单分片键的标准分片场景
          shardingColumn: # 分片列名称
          shardingAlgorithmName: # 分片算法名称
        complex: # 用于多分片键的复合分片场景
          shardingColumns: # 分片列名称，多个列以逗号分隔
          shardingAlgorithmName: # 分片算法名称
        hint: # Hint 分片策略
          shardingAlgorithmName: # 分片算法名称
        none: # 不分片
      tableStrategy: # 分表策略，同分库策略
      keyGenerateStrategy: # 分布式序列策略
        column: # 自增列名称，缺省表示不使用自增主键生成器
        keyGeneratorName: # 分布式序列算法名称
      auditStrategy: # 分片审计策略
        auditorNames: # 分片审计算法名称
          - <auditor_name>
          - <auditor_name>
        allowHintDisable: true # 是否禁用分片审计hint
  autoTables: # 自动分片表规则配置
    t_order_auto: # 逻辑表名称
      actualDataSources (?): # 数据源名称
      shardingStrategy: # 切分策略
        standard: # 用于单分片键的标准分片场景
          shardingColumn: # 分片列名称
          shardingAlgorithmName: # 自动分片算法名称
  bindingTables (+): # 绑定表规则列表
    - <logic_table_name_1, logic_table_name_2, ...> 
    - <logic_table_name_1, logic_table_name_2, ...> 
  defaultDatabaseStrategy: # 默认数据库分片策略
  defaultTableStrategy: # 默认表分片策略
  defaultKeyGenerateStrategy: # 默认的分布式序列策略
  defaultShardingColumn: # 默认分片列名称
  
  # 分片算法配置
  shardingAlgorithms:
    <sharding_algorithm_name> (+): # 分片算法名称
      type: # 分片算法类型
      props: # 分片算法属性配置
      # ...
  
  # 分布式序列算法配置
  keyGenerators:
    <key_generate_algorithm_name> (+): # 分布式序列算法名称
      type: # 分布式序列算法类型
      props: # 分布式序列算法属性配置
      # ...
  # 分片审计算法配置
  auditors:
    <sharding_audit_algorithm_name> (+): # 分片审计算法名称
      type: # 分片审计算法类型
      props: # 分片审计算法属性配置
      # ...

- !BROADCAST
  tables: # 广播表规则列表
    - <table_name>
    - <table_name>

源码：

java 复制代码

@RepositoryTupleEntity("sharding")
@Getter
@Setter
public final class YamlShardingRuleConfiguration implements YamlRuleConfiguration {
    
    @RepositoryTupleField(type = Type.TABLE)
    private Map<String, YamlTableRuleConfiguration> tables = new LinkedHashMap<>();
    
    @RepositoryTupleField(type = Type.TABLE)
    private Map<String, YamlShardingAutoTableRuleConfiguration> autoTables = new LinkedHashMap<>();
    
    @RepositoryTupleField(type = Type.TABLE)
    @RepositoryTupleKeyListNameGenerator(ShardingBindingTableRepositoryTupleKeyListNameGenerator.class)
    private Collection<String> bindingTables = new LinkedList<>();
    
    @RepositoryTupleField(type = Type.DEFAULT_STRATEGY)
    private YamlShardingStrategyConfiguration defaultDatabaseStrategy;
    
    @RepositoryTupleField(type = Type.DEFAULT_STRATEGY)
    private YamlShardingStrategyConfiguration defaultTableStrategy;
    
    @RepositoryTupleField(type = Type.DEFAULT_STRATEGY)
    private YamlKeyGenerateStrategyConfiguration defaultKeyGenerateStrategy;
    
    @RepositoryTupleField(type = Type.DEFAULT_STRATEGY)
    private YamlShardingAuditStrategyConfiguration defaultAuditStrategy;
    
    @RepositoryTupleField(type = Type.ALGORITHM)
    private Map<String, YamlAlgorithmConfiguration> shardingAlgorithms = new LinkedHashMap<>();
    
    @RepositoryTupleField(type = Type.ALGORITHM)
    private Map<String, YamlAlgorithmConfiguration> keyGenerators = new LinkedHashMap<>();
    
    @RepositoryTupleField(type = Type.ALGORITHM)
    private Map<String, YamlAlgorithmConfiguration> auditors = new LinkedHashMap<>();
    
    @RepositoryTupleField(type = Type.OTHER)
    private String defaultShardingColumn;
    
    @RepositoryTupleField(type = Type.OTHER)
    private YamlShardingCacheConfiguration shardingCache;
    
    @Override
    public Class<ShardingRuleConfiguration> getRuleConfigurationType() {
        return ShardingRuleConfiguration.class;
    }
}

这里文档基本写的很清楚了，大家看着文档配就行了。

这里主要讲解一下actualDataNodes的表达式

我们需要告诉shardingsphere，一个逻辑表可能出现的实际表有哪些，以便在连表查询时，shardingsphere帮我们做关联。自定义分表算法的回调中，也能获取到这些值（虽然可能用不到）。

以本例中的配置为例：

yml 复制代码

      consign:
        #logicTable: consign
        actualDataNodes: ds${0..1}.consign_${2022..2024}${1..4},ds${0..1}.consign_0
        tableStrategy:
          standard:
            shardingAlgorithmName: year-month-sharding
            shardingColumn: bill_time_key

可以使用${}的形式，展示可能出现哪些情况。由于shardingsphere的设计问题，这里必须是数字，后面会讲解为什么。多种不同的可能，可以用","分割。

注意，要实现分库分表时的join操作正常，要把可能join的组合配置到bindingTables中，不然两个分表了的表join，该出现笛卡尔积了。

3.3.2 单表规则

这个配置必须配，不然只能查询到分片配置中已经配的逻辑表，会非常难绷。报错信息如下：

text 复制代码

Cause: org.apache.shardingsphere.infra.exception.kernel.metadata.TableNotFoundException: Table or view 'item' does not exist.

该配置，可以用通配符，如：

yml 复制代码

  - !SINGLE
    tables:
      # 加载全部单表
      - "ds0.*"

3.3.3 广播表配置

如果连表时，分表了的表和没分表的表做连表，如本例中的consign连item，并且没有配置分库的列时，就会报如下错误：

text 复制代码

### Cause: java.sql.SQLException: Unknown exception.
More details: java.lang.NullPointerException: Cannot invoke "String.equalsIgnoreCase(String)" because "shardingColumn" is null
; uncategorized SQLException; SQL state [HY000]; error code [30000]; Unknown exception.

此时，把item配进去即可

yml 复制代码

  - !BROADCAST
    tables: # 广播表规则列表
      - item

3.4 本例完整的配置

yml 复制代码

mode:
  type: Standalone
  repository:
    type: JDBC
databaseName: mysql
dataSources:
  ds0:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.cj.jdbc.Driver
    url: jdbc:mysql://192.168.0.64:3306/study2024-class009-busy001?useUnicode=true&characterEncoding=utf-8&useSSL=false
    username: dbMgr
    password: qqhilvMgAl@7
  ds1:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.cj.jdbc.Driver
    url: jdbc:mysql://192.168.0.64:3306/study2024-class009-busy002?useUnicode=true&characterEncoding=utf-8&useSSL=false
    username: dbMgr
    password: qqhilvMgAl@7
rules:
  - !SHARDING
    defaultDatabaseStrategy:
      standard:
        shardingAlgorithmName: enterprise-sharding
        shardingColumn: enp_id
    defaultTableStrategy:
      none:
    shardingAlgorithms:
      year-month-sharding:
        type: CUSTOM_YEAR_MONTH
      enterprise-sharding:
        type: ENTERPRISE-SHARDING
    tables:
      consign:
        #logicTable: consign
        actualDataNodes: ds${0..1}.consign_${2022..2024}${1..4},ds${0..1}.consign_0
        tableStrategy:
          standard:
            shardingAlgorithmName: year-month-sharding
            shardingColumn: bill_time_key
      consign_header:
        #logicTable: consign_header
        actualDataNodes: ds${0..1}.consign_header_${2022..2024}${1..4},ds${0..1}.consign_header_0
        tableStrategy:
          standard:
            shardingAlgorithmName: year-month-sharding
            shardingColumn: bill_time_key
    bindingTables:
     - consign_header,consign
  - !SINGLE
    tables:
      # 加载全部单表
      - "ds0.*"
  - !BROADCAST
    tables: # 广播表规则列表
      - item
props:
  sql-show: true

四、自定义分片算法

在实际开发中，我们通常不会使用系统自带的算法。我们都会做一个自己的分片规则。

4.1 算法编写

在本例中，我们写了两个分片算法，一个是表的分片，根据年份和季度。另一个是数据库的分片，根据jwt中的库信息，告诉系统去哪个库中查询。这里我们先展示代码，再进行讲解。

4.1.1 YearMonthTableShardingAlgorithm

java 复制代码

public class YearMonthTableShardingAlgorithm implements StandardShardingAlgorithm<Long> {
    @Override
    public String doSharding(Collection<String> availableTargetNames, PreciseShardingValue<Long> shardingValue) {
        String tableName = shardingValue.getLogicTableName();
        Long billTimeSecond = shardingValue.getValue();
        LocalDateTime localDateTime = CommonUtil.parseFromSecond(billTimeSecond);
        int year = localDateTime.getYear();
        int monVal = localDateTime.getMonthValue();
        int season = (monVal+2)/3;
        if(year < 2022){
            return tableName+"_0";
        }else{
            return tableName+"_"+year+season;
        }
    }

    @Override
    public Collection<String> doSharding(Collection<String> collection, RangeShardingValue<Long> rangeShardingValue) {
        List<String> rtn = new ArrayList<String>();
        String tableName = rangeShardingValue.getLogicTableName();
        Long begTimeL = rangeShardingValue.getValueRange().lowerEndpoint();
        Long endTimeL = rangeShardingValue.getValueRange().upperEndpoint();
        LocalDateTime beginTime = CommonUtil.parseFromSecond(begTimeL);
        LocalDateTime endTime = CommonUtil.parseFromSecond(endTimeL);
        int yearBeg = beginTime.getYear();
        int yearEnd = endTime.getYear();
        int monBeg = beginTime.getMonthValue();
        int monEnd = endTime.getMonthValue();
        int seasonBeg = (monBeg+2)/3;
        int seasonEnd = (monEnd+2)/3;
        if(yearBeg < 2022){
            rtn.add(tableName+"_0");
            seasonBeg = 1;
            yearBeg = 2022;
        }

        for(int i = yearBeg; i <= yearEnd; i++){
            int curSeasonBeg = i > yearBeg ? 1:  seasonBeg;
            int curSeasonEnd = i < yearEnd ? 4 : seasonEnd;
            for(int j = curSeasonBeg; j <= curSeasonEnd; j++){
                rtn.add(tableName+"_"+i+""+j);
            }
        }
        return rtn;
    }

    @Override
    public String getType() {
        return "CUSTOM_YEAR_MONTH"; // 自定义算法类型名称
    }

}

4.1.2 YearMonthTableShardingAlgorithm

java 复制代码

@Slf4j
public class EnterpriseShardingAlgorithm implements StandardShardingAlgorithm<Long> {

    @Override
    public String getType() {
        return "ENTERPRISE-SHARDING"; // 自定义算法类型名称
    }

    @Override
    public String doSharding(Collection<String> availableTargetNames, PreciseShardingValue<Long> shardingValue) {
        ITokenUtil tokenUtil = SpringUtil.getBean(ITokenUtil.class);
        String prompt = DatasourceSetUtil.getDbPrompt();
        if(StringUtils.hasText(prompt)){
            return prompt;
        }
        if(tokenUtil.hasTokenObject()){
            AuthObject authObject = tokenUtil.getAuthObject();
            return authObject.getDbCode();
        }
        return "ds0";
    }

    @Override
    public Collection<String> doSharding(Collection<String> availableTargetNames, RangeShardingValue<Long> shardingValue) {
        String prompt = DatasourceSetUtil.getDbPrompt();
        if(StringUtils.hasText(prompt)){
            return List.of(prompt);
        }
        ITokenUtil tokenUtil = SpringUtil.getBean(ITokenUtil.class);
        if(tokenUtil.hasTokenObject()){
            AuthObject authObject = tokenUtil.getAuthObject();
            return List.of(authObject.getDbCode());
        }
        return List.of("ds0");
    }
}

4.1.3 讲解

这里，我们继承了StandardShardingAlgorithm，其实还可以继承ComplexKeysShardingAlgorithm或HintShardingAlgorithm，具体用法大家可以参见官方文档。我们这里仅详细讲下StandardShardingAlgorithm。

第一个回调，doSharding(Collection availableTargetNames, PreciseShardingValue shardingValue) ，该回调用于处理equal时的分片。而doSharding(Collection collection, RangeShardingValue rangeShardingValue)，用于处理范围查询时的分片。

getType回调，用于标识算法的名字，用于和配置关联。

4.2 meta-info的配置

仅仅写了算法，系统还不能识别，需要在Resource下的META-INFO.service中，配置都有哪些类是算法。

text 复制代码

indi.zhifa.study2024.common.auth.sharding.YearMonthTableShardingAlgorithm
indi.zhifa.study2024.common.auth.sharding.EnterpriseShardingAlgorithm

五、数据源配置

如果手动配置数据源，并且结合mp使用，还是要在配置SqlSessionFactory时，像之前讲的一样，参考mp的自动配置，做一系列的操作。这里就不在帖子中展示那些冗余代码了，大家去参考代码中看。这里仅展示核心内容：

java 复制代码

@Bean(name = "shardingSphereDataSource")
    public DataSource shardingSphereDataSource() throws SQLException, IOException {
        File file = new File(getClass().getClassLoader().getResource("shardingsphere.yml").getFile());
        DataSource dataSource = YamlShardingSphereDataSourceFactory.createDataSource(file);
        return dataSource;
    }

    @Bean(name = "shardingSqlSessionFactory")
    public SqlSessionFactory shardingSqlSessionFactory(@Qualifier("shardingSphereDataSource") DataSource dataSource) throws Exception {
        MybatisSqlSessionFactoryBean factory = new MybatisSqlSessionFactoryBean();
        factory.setDataSource(dataSource);
        enableMpSqlSessionFactory(factory);
        return factory.getObject();
    }

    @Primary
    @Bean(name = "shardingTransactionManager")
    public PlatformTransactionManager shardingTransactionManager(
            @Qualifier("shardingSphereDataSource") DataSource monitorDataSource) {
        return new DataSourceTransactionManager(monitorDataSource);
    }

六、遇到的坑

shardingsphere分表时有个坑，表名必须为逻辑名+_+数字

如consign_20221，千万不能写成consign_2022_1，不然在bindingTables的配置的检测过程时，会出错。

代码在文件

org.apache.shardingsphere.sharding.rule.checker.ShardingRuleChecker

java 复制代码

private boolean isValidActualTableName(final ShardingTable sampleShardingTable, final ShardingTable shardingTable) {
        for (String each : sampleShardingTable.getActualDataSourceNames()) {
            Collection<String> sampleActualTableNames = sampleShardingTable.getActualTableNames(each).stream()
                    .map(actualTableName -> actualTableName.replace(sampleShardingTable.getTableDataNode().getPrefix(), "")).collect(Collectors.toSet());
            Collection<String> actualTableNames =
                    shardingTable.getActualTableNames(each).stream().map(optional -> optional.replace(shardingTable.getTableDataNode().getPrefix(), "")).collect(Collectors.toSet());
            if (!sampleActualTableNames.equals(actualTableNames)) {
                return false;
            }
        }
        return true;
    }

我认为这个设定十分不合理，但也没办法，先做记录。