分库分表部分概念以及Sharding Sphere-jdbc的简单实战

分库分表

场景

分库分表是一种数据库架构设计和优化的手段，应用的场景在于应对大规模数据量和高并发的情况。但是除非业务量特别大，不然不要过度设计，不要提前优化。

基本概念

所谓分库，就是对某数据库中的所有表，分在不同的数据库中，分库分为垂直分库和水平分库：

- 垂直分库：商城数据库MallDB，分库后：orderDB，UserDB，ProductDB，CartDB；
- 水平分库：商城订单数据库OrderDB，分库后：OrderDB_1,OrderDB_2,OrderDB_3.

分表也是分为两种，垂直分表和水平分表：

- 垂直分表：OrderTable订单表，分表后：OrderTable（订单表）、OrderItemTable（订单商品明细表）、OrderStatusTable（订单状态表）；
- 水平分表：OrderTable，分表后OrderTable_1、OrderTable_2、OrderTable_3.

什么时候分库分表

数据量过大：如果数据量没有超过几百万，通常没必要分库分表。

针对数据量问题，可以增加磁盘，增加分库,将不同功能的业务表拆分到不同的数据库中。

针对性能问题：升级CPU/内存、读写分离、优化数据库系统配置、优化索引，优化SQL，分区、数据库表的垂直切分；

如果以上方案都不能很好的解决问题，再去考虑，数据库的水平切分。

优势劣势分析

优点：

提高性能：将数据分散到不同的数据库或表中，可以减轻单一数据库的负载，提高查询性能，降低系统响应时间。
增加可伸缩性：分库分表使系统更容易水平扩展，通过添加更多的数据库实例或分表来应对增长的数据和负载，实现更好的可伸缩性。
提高并发处理能力：分库分表可以分散并发读写操作，从而提高系统的并发处理能力，减少数据库锁的竞争。
业务模块隔离：不同业务模块的数据可以分别存储在不同的数据库中，提高了系统的灵活性和维护性，降低了模块之间的耦合度。
降低单点故障风险：分库分表可以降低系统的单点故障风险，提高系统的可用性。

缺点：

引入复杂性：分库分表引入了系统的复杂性，包括数据库路由、跨库事务、数据一致性等问题，需要更复杂的架构和管理手段。
查询跨度增加：跨多个库或表的查询可能会更为复杂，需要考虑如何优化这些查询，避免性能问题。
分布式事务：跨库的事务管理可能会更为复杂，需要采用分布式事务的解决方案，增加了开发和维护的难度。
数据迁移和维护成本：数据库的迁移、备份、恢复等维护工作会变得更为复杂，需要更加谨慎和周密的计划。

分库分表的中间件

目前，市面上提供的分库分表的中间件，主要有两种实现方式：

Client 模式
Proxy 模式

比较常见的包括：

Cobar
MyCAT
Atlas
TDDL
Sharding Sphere

分库分表分布式主键和分片算法

分布式主键的实现方案有很多，一般来说，采用 SnowFlake 的居多。

既然确定了分片键，接着看分片算法：

取余分片算法，可以平均分配每个库的数据量和请求压力。但是，在于说扩容起来比较麻烦，会有一个数据迁移的过程，之前的数据需要重新计算 hash 值重新分配到不同的库或表：

- 整数分片键：对于整数类型的分片键，可以直接使用取余分片算法。例如，有四个库，user_id 为 10 时，分到第 10%4=210%4=2 个库。
- 字符串分片键：对于字符串类型的分片键，可以先将其进行 hash 转换成整数，然后再使用取余分片算法。这确保了字符串的分布更加均匀。

范围分片算法，扩容的时候很简单，因为你只要预备好，给每个月都准备一个库就可以了，到了一个新的月份的时候，自然而然，就会写新的库了。但是大部分的请求，都是访问最新的数据。实际生产用 range，要看场景：

- 时间范围：如果分片键是时间范围，例如订单创建时间，可以按照时间范围进行分片。这样可以将数据按照时间维度存储在不同的分片中，方便处理历史数据和实时数据。
- 其他范围：除了时间范围，还可以考虑其他范围，比如按照地理位置范围、价格范围等进行分片。这样可以更好地满足特定查询或业务需求。

Hash算法： 无论是取余还是范围算法，对于某些场景，可以考虑使用哈希算法来增加分布的随机性，防止出现热点数据导致的不均匀分布问题。
一致性哈希： 对于分布式系统，一致性哈希是一种常见的分片算法，它能够在节点的增减时最小化数据迁移，提高系统的稳定性。

如果查询不带分片键会怎么样?

一般来说中间件会扫描全部表，然后聚合结果，并且返回。

分库分表之本地事务

在分库分表的场景中，确保一个逻辑操作内的多次数据操作形成本地事务是至关重要的。这是因为本地事务的保障可以确保在整个过程中，无论发生什么问题，系统都可以回滚到一个一致的状态，避免脏数据的产生。

举个例子，考虑创建订单的情景，通常需要在订单表和订单明细表中插入记录。如果基于订单和订单明细表的 ID 进行分库分表，这可能导致订单记录和订单明细记录分散到不同的库表中。因为一个订单可能包含多个商品，导致订单明细记录分布在不同的表中，无法形成本地事务。

为了确保本地事务，业务表一般选择基于 user_id 进行分库分表。这样，同一个用户的订单和订单明细记录将被分配到相同的库表中，从而保证在创建订单的操作中形成本地事务。这种设计有助于提高系统的一致性和可维护性。

为什么要强调本地事务的形成呢？因为在分布式环境下，事务的跨库操作可能面临网络故障、节点宕机等风险。通过在每个节点形成本地事务，即便发生失败，也能够在重试时保证不会产生脏数据。而分布式事务的解决方案，则可以在多个本地事务之间协调，实现最终一致性。这种组合策略有助于克服分布式系统中的一系列挑战，同时确保数据的完整性和可靠性。

实战（Sharding Sphere-jdbc）

1、分表

依赖配置：springboot版本：2.3.4.RELEASE，jdk8

xml 复制代码

<dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>8.0.27</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>druid-spring-boot-starter</artifactId>
            <version>1.1.23</version>
        </dependency>
        <dependency>
            <groupId>org.apache.shardingsphere</groupId>
            <artifactId>sharding-jdbc-spring-boot-starter</artifactId>
            <version>4.0.0-RC1</version>
        </dependency>
        <dependency>
            <groupId>com.baomidou</groupId>
            <artifactId>mybatis-plus-boot-starter</artifactId>
            <version>3.1.2</version>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>

分表依据：

创建数据库course_db,创建表course_1和course_2；
分表依据：如果cid为偶数添加到course_1中，奇数则添加到course_2中。

sql 复制代码

create database course_db;
use course_db;
create table course_1 (
  cid bigint(20) primary key ,
  cname varchar(50) not null,
  user_id bigint(20) not null ,
  status varchar(10) not null
) engine = InnoDB;
create table course_2 (
  cid bigint(20) primary key ,
  cname varchar(50) not null,
  user_id bigint(20) not null ,
  status varchar(10) not null
) engine = InnoDB;

其他内容：

实体类

kotlin 复制代码

@Data
public class Course {
   // 用户ID
    private Long cid;
    // 用户名
    private String cname;
    // 用户ID
    private Long userId;
    // 状态
    private String status;
}

Mapper

less 复制代码

@Repository
@Mapper
public interface CourseMapper extends BaseMapper<Course> {

}

启动类

less 复制代码

@SpringBootApplication
@MapperScan("com.qing.shardingdemo.mapper")
public class ShardingsphereJdbcDemoApplication {
    public static void main(String[] args) {
        SpringApplication.run(ShardingsphereJdbcDemoApplication.class, args);
    }
}

配置

ini 复制代码

# sharding-jdbc 水平分表策略
# 配置数据源，给数据源起别名
spring.shardingsphere.datasource.names=m1
# 一个实体类对应两张表，覆盖
spring.main.allow-bean-definition-overriding=true
# 配置数据源的具体内容，包含连接池，驱动，地址，用户名，密码
spring.shardingsphere.datasource.m1.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.m1.driver-class-name=com.mysql.cj.jdbc.Driver
spring.shardingsphere.datasource.m1.url=jdbc:mysql://localhost:3306/course_db?serverTimezone=GMT%2B8
spring.shardingsphere.datasource.m1.username=root
spring.shardingsphere.datasource.m1.password=123456
# 指定course表分布的情况，配置表在哪个数据库里，表的名称都是什么 m1.course_1,m1.course_2
spring.shardingsphere.sharding.tables.course.actual-data-nodes=m1.course_$->{1..2}
# 指定 course 表里面主键 cid 的生成策略 SNOWFLAKE
spring.shardingsphere.sharding.tables.course.key-generator.column=cid
spring.shardingsphere.sharding.tables.course.key-generator.type=SNOWFLAKE
# 配置分表策略    约定 cid 值偶数添加到 course_1 表，如果 cid 是奇数添加到 course_2 表
spring.shardingsphere.sharding.tables.course.table-strategy.inline.sharding-column=cid
spring.shardingsphere.sharding.tables.course.table-strategy.inline.algorithm-expression=course_$->{cid % 2 + 1}
# 打开 sql 输出日志
spring.shardingsphere.props.sql.show=true

测试类

less 复制代码

@RunWith(SpringRunner.class)
@SpringBootTest
class ShardingDemoApplicationTests {
    @Autowired
    private CourseMapper courseMapper;
    @Test
    void contextLoads() {
        Course course = new Course();
        //cid由设置的策略生成
        course.setCname("Java");
        course.setUserId(100L);
        course.setStatus("Normal");
        courseMapper.insert(course);
    }
}

结果：

生成的cid的末尾是奇数插入course_2中：

2、分表

分表依据： 两个数据库edu_db_1,edu_db_2，每个数据库中各自包含两个表course_1,course_2；

数据库分库规则：userid是偶数则添加到edu_db_1，userid是奇数则添加到edu_db_2；

数据表分表规则：c如果cid为偶数添加到course_1中，奇数则添加到course_2中。

less 复制代码

create database edu_db_1;
create database edu_db_2;
use edu_db_1;
create table course_1 (
   `cid` bigint(20) primary key,
   `cname` varchar(50) not null,
   `user_id` bigint(20) not null,
   `status` varchar(10) not null
);
create table course_2 (
   `cid` bigint(20) primary key,
   `cname` varchar(50) not null,
   `user_id` bigint(20) not null,
   `status` varchar(10) not null
);
use edu_db_2;
create table course_1 (
   `cid` bigint(20) primary key,
   `cname` varchar(50) not null,
   `user_id` bigint(20) not null,
   `status` varchar(10) not null
);
create table course_2 (
   `cid` bigint(20) primary key,
   `cname` varchar(50) not null,
   `user_id` bigint(20) not null,
   `status` varchar(10) not null
);

其他内容：

配置

ini 复制代码

# sharding-jdbc 水平分库分表策略
# 配置数据源，给数据源起别名
# 水平分库需要配置多个数据库
spring.shardingsphere.datasource.names=m1,m2
# 一个实体类对应两张表，覆盖
spring.main.allow-bean-definition-overriding=true
# 配置第一个数据源的具体内容，包含连接池，驱动，地址，用户名，密码
spring.shardingsphere.datasource.m1.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.m1.driver-class-name=com.mysql.cj.jdbc.Driver
spring.shardingsphere.datasource.m1.url=jdbc:mysql://localhost:3306/edu_db_1?serverTimezone=GMT%2B8
spring.shardingsphere.datasource.m1.username=root
spring.shardingsphere.datasource.m1.password=123456
# 配置第二个数据源的具体内容，包含连接池，驱动，地址，用户名，密码
spring.shardingsphere.datasource.m2.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.m2.driver-class-name=com.mysql.cj.jdbc.Driver
spring.shardingsphere.datasource.m2.url=jdbc:mysql://localhost:3306/edu_db_2?serverTimezone=GMT%2B8
spring.shardingsphere.datasource.m2.username=root
spring.shardingsphere.datasource.m2.password=123456
# 指定数据库分布的情况和数据表分布的情况
# m1 m2   course_1 course_2
spring.shardingsphere.sharding.tables.course.actual-data-nodes=m$->{1..2}.course_$->{1..2}
# 指定 course 表里面主键 cid 的生成策略 SNOWFLAKE
spring.shardingsphere.sharding.tables.course.key-generator.column=cid
spring.shardingsphere.sharding.tables.course.key-generator.type=SNOWFLAKE
# 指定分库策略    约定 user_id 值偶数添加到 m1 库，如果 user_id 是奇数添加到 m2 库
# 默认写法（所有的表的user_id）
#spring.shardingsphere.sharding.default-database-strategy.inline.sharding-column=user_id
#spring.shardingsphere.sharding.default-database-strategy.inline.algorithm-expression=m$->{user_id % 2 + 1}
# 指定只有course表的user_id
spring.shardingsphere.sharding.tables.course.database-strategy.inline.sharding-column=user_id
spring.shardingsphere.sharding.tables.course.database-strategy.inline.algorithm-expression=m$->{user_id % 2 + 1}
# 指定分表策略    约定 cid 值偶数添加到 course_1 表，如果 cid 是奇数添加到 course_2 表
spring.shardingsphere.sharding.tables.course.table-strategy.inline.sharding-column=cid
spring.shardingsphere.sharding.tables.course.table-strategy.inline.algorithm-expression=course_$->{cid % 2 + 1}
# 打开 sql 输出日志
spring.shardingsphere.props.sql.show=true

测试

scss 复制代码

    /*
    分库分表测试
     */
    @Test
    public void addCourse() {
        Course course = new Course();
        //cid由我们设置的策略，雪花算法进行生成
        course.setCname("go");
        //分库根据user_id
        course.setUserId(123L);
        course.setStatus("Normal");
        courseMapper.insert(course);
        course.setCname("c#");
        course.setUserId(111L);
        courseMapper.insert(course);
    }

结果：

useid为123奇数，存储在edu_db_2，而cid末尾为3奇数，所以添加在course_2中

useid为124偶数数，存储在edu_db_1，而cid末尾为0偶数，所以添加在course_1中