MySQL 分库分表实战：ShardingSphere 深度解析

当单机数据库无法承载海量数据时，分库分表成为必然选择。本文深入剖析分库分表的核心概念、分片策略、ShardingSphere 的工作原理、以及数据迁移和分布式 ID 生成方案。

一、分库分表概述

1.1 为什么需要分库分表

解决方向
垂直拆分
按业务拆分
分库
水平拆分
按数据拆分
分表
单机瓶颈
单表数据量过大
查询性能下降
索引效率降低
连接数耗尽
存储空间不足

1.2 垂直拆分 vs 水平拆分

水平拆分（分表）
按数据行拆分
分片键决定路由
数据均匀分布
示例：订单表
订单表_0 (user_id % 2 = 0)
订单表_1 (user_id % 2 = 1)
垂直拆分（分库/分表）
业务表按字段拆分
冷热数据分离
按业务模块拆分
示例：用户表
用户基本信息表
用户详情表
用户扩展表

1.3 分库分表架构

数据层
分片中间件
应用层
应用服务
ShardingSphere
ShardingSphere-JDBC
ShardingSphere-Proxy
数据库 1
表_0
表_1
数据库 2
表_0
表_1

二、分片策略

2.1 分片键选择

常见分片键
用户ID
订单查询按用户
时间
日志查询按时间
地区
地域查询
选择原则
业务相关性
查询频繁使用的字段
避免跨分片查询
数据均匀分布
避免热点数据集中
分片键值基数要大

2.2 分片算法

复合分片
多字段组合
user_id + order_time
灵活路由
复杂度高
Range分片
按范围划分
ID 1-100万 -> 表0
ID 100-200万 -> 表1
支持范围查询
可能热点不均
Hash分片
根据分片键 hash 取模
user_id % 4 = 0,1,2,3
数据均匀
无法范围查询

2.3 分片算法实现

java 复制代码

// Hash 分片算法
public class HashShardingAlgorithm implements PreciseShardingAlgorithm<Long> {
    
    @Override
    public String doSharding(Collection<String> availableTargetNames, 
                            PreciseShardingValue<Long> shardingValue) {
        long id = shardingValue.getValue();
        int shardingCount = availableTargetNames.size();
        
        // hash 取模
        int index = (int) (id % shardingCount);
        
        // 找到对应的表
        for (String targetName : availableTargetNames) {
            if (targetName.endsWith(String.valueOf(index))) {
                return targetName;
            }
        }
        
        throw new UnsupportedOperationException();
    }
}

// Range 分片算法
public class RangeShardingAlgorithm implements RangeShardingAlgorithm<Long> {
    
    @Override
    public Collection<String> doSharding(Collection<String> availableTargetNames,
                                        RangeShardingValue<Long> shardingValue) {
        Range<Long> range = shardingValue.getValueRange();
        
        List<String> result = new ArrayList<>();
        for (String targetName : availableTargetNames) {
            // 根据范围判断应该路由到哪些表
            if (需要路由到此表) {
                result.add(targetName);
            }
        }
        
        return result;
    }
}

三、ShardingSphere 实战

3.1 ShardingSphere-JDBC 架构

数据库层
ShardingSphere-JDBC
应用层
应用代码
SQL 解析
路由引擎
SQL 重写
执行引擎
数据库 1
数据库 2
数据库 3

3.2 Maven 依赖

xml 复制代码

<dependencies>
    <!-- ShardingSphere-JDBC -->
    <dependency>
        <groupId>org.apache.shardingsphere</groupId>
        <artifactId>sharding-jdbc-core</artifactId>
        <version>5.3.0</version>
    </dependency>
    
    <!-- 分片算法 -->
    <dependency>
        <groupId>org.apache.shardingsphere</groupId>
        <artifactId>sharding-algorithm-core</artifactId>
        <version>5.3.0</version>
    </dependency>
</dependencies>

3.3 分库分表配置

yaml 复制代码

# application.yml
spring:
  shardingsphere:
    datasource:
      # 数据源名称列表
      names: ds-0, ds-1, ds-2, ds-3
      
      # 数据源配置
      ds-0:
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        jdbc-url: jdbc:mysql://localhost:3306/order_db_0
        username: root
        password: root
        
      ds-1:
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        jdbc-url: jdbc:mysql://localhost:3306/order_db_1
        username: root
        password: root
    
    rules:
      sharding:
        # 分片规则
        tables:
          t_order:
            # 实际数据节点
            actual-data-nodes: ds-$->{0..3}.t_order_$->{0..3}
            
            # 分库策略
            database-strategy:
              standard:
                sharding-column: user_id
                sharding-algorithm-name: database-inline
                
            # 分表策略
            table-strategy:
              standard:
                sharding-column: order_id
                sharding-algorithm-name: order-inline
            
            # 分布式序列配置
            key-generators:
              snowflake:
                type: SNOWFLAKE
        
        # 绑定表（避免笛卡尔积）
        binding-tables:
          - t_order,t_order_item
        
        # 默认分片策略
        default-database-strategy:
          standard:
            sharding-column: user_id
            sharding-algorithm-name: database-inline
            
        # 分片算法定义
        sharding-algorithms:
          database-inline:
            type: INLINE
            props:
              algorithm-expression: ds-${user_id % 4}
              
          order-inline:
            type: INLINE
            props:
              algorithm-expression: t_order_${order_id % 4}
        
        # 分布式 ID 生成器
        key-generators:
          snowflake:
            type: SNOWFLAKE

3.4 分片算法配置

yaml 复制代码

# 使用自定义分片算法
spring:
  shardingsphere:
    rules:
      sharding:
        tables:
          t_order:
            key-generators:
              snowflake:
                type: SNOWFLAKE
                props:
                  worker-id: 1
        
        sharding-algorithms:
          # 自定义 Hash 算法
          order-hash-algorithm:
            type: CLASS_BASED
            props:
              strategy: HASH
              algorithmClassName: com.example.OrderHashAlgorithm
          
          # 取模算法
          order-mod-algorithm:
            type: INLINE
            props:
              algorithm-expression: t_order_${order_id % 4}

3.5 读写分离配置

yaml 复制代码

# application.yml
spring:
  shardingsphere:
    datasource:
      names: master-0, slave-0-0, slave-0-1
      
      master-0:
        # 主库配置...
        
      slave-0-0:
        # 从库配置...
    
    rules:
      read-write-split:
        data-sources:
          # 读写分离数据源名称
          ds_master_slave:
            # 主库
            write-data-source-name: master-0
            # 从库列表
            read-data-source-names: slave-0-0, slave-0-1
            # 负载均衡策略
            load-balancer-name: round_robin
        
        # 负载均衡器
        load-balancers:
          round_robin:
            type: ROUND_ROBIN

四、分布式 ID

4.1 分布式 ID 要求

常见方案
UUID
不递增，占用空间大
数据库自增
需要独立 ID 服务
Snowflake
趋势递增，高性能
Leaf
数据库 + Snowflake
ID要求
全局唯一
趋势递增
高性能
高可用
信息安全

4.2 Snowflake 算法

计算
时间戳 - 起始时间
worker_id << 12
timestamp << 22 | worker_id << 12 | sequence
Snowflake结构
64 位二进制
1 位: 符号位
41 位: 时间戳
10 位: 机器 ID
12 位: 序列号

4.3 Snowflake 实现

java 复制代码

public class SnowflakeIdGenerator {
    
    // 起始时间戳 (2020-01-01)
    private final long twepoch = 1577836800000L;
    
    // 机器ID位数
    private final long workerIdBits = 10L;
    // 序列号位数
    private final long sequenceBits = 12L;
    
    // 最大机器ID
    private final long maxWorkerId = ~(-1L << workerIdBits);
    // 最大序列号
    private final long maxSequence = ~(-1L << sequenceBits);
    
    // 机器ID偏移
    private final long workerIdShift = sequenceBits;
    // 时间戳偏移
    private final long timestampLeftShift = workerIdShift + workerIdBits;
    
    private long workerId;
    private long sequence = 0L;
    private long lastTimestamp = -1L;
    
    public SnowflakeIdGenerator(long workerId) {
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException("worker Id can't be greater than " 
                + maxWorkerId);
        }
        this.workerId = workerId;
    }
    
    public synchronized long nextId() {
        long timestamp = timeGen();
        
        if (timestamp < lastTimestamp) {
            throw new RuntimeException("Clock moved backwards");
        }
        
        if (lastTimestamp == timestamp) {
            sequence = (sequence + 1) & maxSequence;
            if (sequence == 0) {
                timestamp = tilNextMillis(lastTimestamp);
            }
        } else {
            sequence = 0;
        }
        
        lastTimestamp = timestamp;
        
        return ((timestamp - twepoch) << timestampLeftShift)
            | (workerId << workerIdShift)
            | sequence;
    }
    
    private long tilNextMillis(long lastTimestamp) {
        long timestamp = timeGen();
        while (timestamp <= lastTimestamp) {
            timestamp = timeGen();
        }
        return timestamp;
    }
    
    private long timeGen() {
        return System.currentTimeMillis();
    }
}

4.4 ShardingSphere 集成 Snowflake

yaml 复制代码

spring:
  shardingsphere:
    rules:
      sharding:
        tables:
          t_order:
            key-generators:
              snowflake:
                type: SNOWFLAKE
                props:
                  worker-id: 1
                  max-tolerate-time-difference-milliseconds: 10

四·续、分布式事务深度解析

4.4 Seata AT 模式原理

AT 模式核心组件
TC (Transaction Coordinator)

事务协调者
管理全局事务
协调分支事务提交/回滚
TM (Transaction Manager)

事务管理器
定义全局事务范围
发起全局提交/回滚
RM (Resource Manager)

资源管理器
管理分支事务资源
汇报分支状态

4.5 Seata AT 模式执行流程

RM-分片2 RM-分片1 TC (事务协调者) TM (事务管理器) 应用 RM-分片2 RM-分片1 TC (事务协调者) TM (事务管理器) 应用开启全局事务 XID 执行分支事务 (INSERT order_0) 执行完成执行分支事务 (INSERT order_1) 执行完成提交全局事务通知 TC 提交预提交成功，删除 Undo Log 预提交成功，删除 Undo Log 全局事务提交完成

4.6 分库分表下的分布式事务

Undo Log 表结构
branch_table
branch_id: 分支ID
xid: 全局事务ID
rollback_info: Undo 信息
log_status: 状态
Seata AT 模式优势
本地事务 + Undo Log
每个分片独立提交
无需全局锁
性能好
两阶段提交问题
分库分表的 XA 事务
性能差
每个分片都需要 XA 协议
全局锁时间长

4.7 Saga 模式（长事务）

补偿策略
正向: CreateOrder
正向: DeductStock
正向: Payment
失败!
补偿: RefundPayment
补偿: RestoreStock
补偿: CancelOrder
Saga 执行模式
正向补偿
每个操作有对应的补偿操作
事务链
A -> B -> C -> D
失败时反向回滚: C -> B -> A
Saga 模式适用场景
长时间运行的事务
跨多个服务
无法接受 XA 的阻塞

四·续续、ShardingSphere SQL 解析引擎

4.8 SQL 解析流程

AST 结构示例
SELECTStmt
From: Table(t_order)
Where: user_id = 1
Columns: *
SQL 解析流程
SQL: SELECT * FROM t_order WHERE user_id = 1

SQL 解析
生成 AST (抽象语法树)
SQL 验证
分片路由
SQL 重写
执行
结果归并

4.9 分片路由类型

笛卡尔积问题
笛卡尔积风险
t_order x t_order_item
4个分片 x 4个分片 = 16次查询
使用绑定表避免
路由类型
直接路由
分片键 = 常量

直接定位到单表
范围路由
分片键 IN / BETWEEN

定位到多个表
全路由
无分片键

查询所有分片

4.10 SQL 重写规则

示例
INSERT INTO t_order VALUES (...)
生成: INSERT INTO t_order_0 VALUES (...)
生成: INSERT INTO t_order_1 VALUES (...)
SQL 重写类型
分片插入
生成自增 ID

替换逻辑表为物理表
分片查询
补充分片条件

补全 ORDER BY
分片更新
校验分片键

不可修改分片键

五、跨分片查询

5.1 广播表

yaml 复制代码

# 广播表配置
spring:
  shardingsphere:
    rules:
      sharding:
        # 广播表（所有分片都保存完整数据）
        broadcast-tables:
          - t_dict
          - t_config

5.2 绑定表

yaml 复制代码

# 绑定表配置
spring:
  shardingsphere:
    rules:
      sharding:
        # 绑定表（分片规则一致的表，避免笛卡尔积）
        binding-tables:
          - t_order,t_order_item

5.3 跨分片查询策略

查询策略
避免跨分片
分片键查询
绑定表 JOIN
并行查询
所有分片并行查询
合并结果
数据汇总
分页需要特殊处理
先汇总后分页

5.4 分页查询问题

sql 复制代码

-- 问题：分页跨分片
-- 表 t_order_0 和 t_order_1 各有 100 条
-- LIMIT 0, 10 应该返回 10 条

-- 错误做法：各分片取前 10 条合并
SELECT * FROM t_order_0 ORDER BY create_time LIMIT 0, 10
UNION ALL
SELECT * FROM t_order_1 ORDER BY create_time LIMIT 0, 10
-- 可能只返回 10 条，而不是 20 条

-- 正确做法：各分片取更多数据
SELECT * FROM t_order_0 ORDER BY create_time LIMIT 0, 20
UNION ALL
SELECT * FROM t_order_1 ORDER BY create_time LIMIT 0, 20
-- 然后在内存中排序分页

六、数据迁移

6.1 迁移方案

工具选择
ShardingSphere-Scaling
在线迁移
不停服
迁移步骤
数据同步
双写
切流量
清理

6.2 双写方案

java 复制代码

// 双写实现
@Service
public class OrderService {
    
    @Autowired
    private OrderMapper orderMapper;
    
    @Autowired
    private DataSourceRouter router;
    
    public void createOrder(Order order) {
        // 写入分片表
        orderMapper.insert(order);
        
        // 写入历史表（双写）
        if (router.isWriteHistory()) {
            historyMapper.insert(order);
        }
    }
}

七、面试高频问题

7.1 分库分表有哪几种方式？

复制代码

1. 垂直分库
   - 按业务模块拆分到不同数据库
   - 减少单库压力

2. 垂直分表
   - 按字段拆分行
   - 冷热数据分离

3. 水平分库
   - 数据行拆分到不同数据库
   - 每个库结构相同

4. 水平分表
   - 数据行拆分到同一库的多张表
   - 每张表结构相同

7.2 分片键如何选择？

复制代码

选择原则：
1. 业务相关性高
   - 查询最频繁使用的字段
   
2. 数据分布均匀
   - 避免热点数据集中
   
3. 分片键值基数大
   - 保证数据分散

常见分片键：
- 用户ID: 订单查询按用户
- 时间: 日志查询按时间
- 地区: 地域查询

7.3 分布式 ID 有哪些方案？

复制代码

1. UUID
   - 优点：本地生成，高可用
   - 缺点：无序，占用空间大

2. 数据库自增
   - 优点：有序，简单
   - 缺点：需要独立服务，性能瓶颈

3. Snowflake
   - 优点：趋势递增，高性能
   - 缺点：依赖时钟

4. Leaf
   - 优点：数据库 + Snowflake 结合
   - 缺点：架构复杂

八、总结

8.1 分库分表要点

复制代码

分库分表要点：

1. 选择合适的分片键
2. 选择合适的分片算法
3. 处理跨分片查询
4. 处理分布式 ID
5. 数据迁移方案

8.2 最佳实践

复制代码

最佳实践：

1. 提前规划
   - 预估数据量
   - 选择合适的分片数

2. 分片策略
   - Hash 分片：数据均匀
   - Range 分片：支持范围查询

3. 跨分片处理
   - 使用广播表
   - 使用绑定表
   - 避免跨分片 JOIN

4. 分布式 ID
   - 推荐 Snowflake
   - 注意时钟回拨问题