后端——》springboot+jpa+shardingsphere实现分表

写在前面

最近有一个业务，需要数据库实现10亿级别数据的存取。用到了shardingsphere分表，网上文章鱼龙混杂，按照网上的pom和yml配置，项目老半天跑不起来，或者运行图中报错。将实现的经过和遇到的坑在此记录一下。

实现步骤

建表

此次只分表，不分库。所以在一个数据库下我们先新建一个表，要有id，主键自增。然后我们把表copy100份。

如图：rainbow_mobile是主表，rainbow_mobile_copy0 到 rainbow_mobile_copy100 是实际存数据的从表

此处有一个小细节，100份从表的表名并不是从 1到100，而是从0到99，为什么要这样分，配合下面的内容就能看明白。

那主表（rainbow_mobile）跟从表（rainbow_mobile_copy0到rainbow_mobile_copy99）是什么关系呢，分别是做什么的呢。

主表只是声明了表结构，供后端框架提供映射关系，并不存储数据。而从表是真正存储数据的表。

pom文件

这个shardingsphere的坑实在是太多，版本号又乱，各种启动starter配合起来也乱。稍微一个版本不对就启动报错，试了很久，下面这个是ok的，直接复制粘贴使用。

xml 复制代码

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
        <version>2.6.0</version> <!-- 根据你的Spring Boot版本设置版本号 -->
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-jpa</artifactId>
        <version>2.6.0</version> <!-- 根据你的Spring Boot版本设置版本号 -->
    </dependency>
    <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
        <version>8.0.26</version> <!-- 根据你的MySQL连接器版本设置版本号 -->
    </dependency>
    <!-- Spring Boot Starter Test -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-test</artifactId>
        <version>2.6.0</version> <!-- 根据你的Spring Boot版本设置版本号 -->
        <scope>test</scope>
    </dependency>
    <!-- ShardingSphere -->
    <dependency>
        <groupId>org.apache.shardingsphere</groupId>
        <artifactId>shardingsphere-jdbc-core-spring-boot-starter</artifactId>
        <version>5.0.0-alpha</version>
    </dependency>
    <dependency>
        <groupId>cn.hutool</groupId>
        <artifactId>hutool-crypto</artifactId>
        <version>5.7.8</version> <!-- 使用你需要的版本 -->
    </dependency>
    <dependency>
        <groupId>org.projectlombok</groupId>
        <artifactId>lombok</artifactId>
        <version>1.18.18</version>
    </dependency>

</dependencies>

application.yml 配置文件

yml 复制代码

server:
  port: 8080
spring:
  jpa:
    properties:
      hibernate:
        hbm2ddl:
          auto: update
        dialect: org.hibernate.dialect.MySQL5Dialect
        show_sql: true
        ddl-auto: create-drop
  shardingsphere:
    datasource:  #数据源配置
      names: ds0
      common:
        driver-class-name: com.mysql.cj.jdbc.Driver
        type: com.zaxxer.hikari.HikariDataSource
      ds0:
        jdbc-url: jdbc:mysql://127.0.0.1:3306/md5rainbow?serverTimezone=UTC&useSSL=false
        username: root
        password: 123456
    rules:
      sharding:
        sharding-algorithms:  #分片算法配置
          table-inline:
            type: INLINE
            props: 
              algorithm-expression: rainbow_mobile_copy$->{ id % 100 }
        key-generators:  #主键生成策略
          snowflake:
            type: SNOWFLAKE
            props:
              worker-id: 1
        tables:  #分表策略
          rainbow_mobile:
            actual-data-nodes: ds0.rainbow_mobile_copy$->{0..99}
            table-strategy:
              standard:
                sharding-column: id
                sharding-algorithm-name: table-inline
    props:
      sql-show: true

这里面除了常规的配置外，最主要的配置是这三个：

上文第11行的ddl-auto: create-drop，这个配置会配合项目启动时建立的hibernate_sequence表完成主键自增配置
上文第28行的algorithm-expression:rainbow_mobile_copy$->{ id % 100 }，这是一个落库算法，根据id取馍。余数为1，就落到rainbow_mobile_copy1表；余数为99，就落到rainbow_mobile_copy99表；余数为0，就落到rainbow_mobile_copy0表（通常id为100的倍数时，就会落到rainbow_mobile_copy0表中，由于余数不可能为100，所以数据库中不用添加rainbow_mobile_copy100这个表）
上文第36行的actual-data-nodes: ds0.rainbow_mobile_copy$->{0..99}，这个表示数据库中的表名的占位符是从第0到99的。（rainbow_mobile_copy是我的表，jym可以替换成自己的表）

Repository

这没啥好说的，平平无奇

java 复制代码

import com.x.md5.entity.RainbowMobile;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.stereotype.Repository;

@Repository
public interface RainbowMobileRepository extends JpaRepository<RainbowMobile, Long> {

}

Entity

这个是实体类，这个就有说法了。在普通的jpa或者springboot项目中，id上的自增长注解我们通常选用的是 IDENTITY这个选项。他们的区别是：

AUTO：由JPA自动选择适合底层数据库的主键生成策略。对于MySQL数据库，它会使用自增长字段来生成主键。
IDENTITY：使用底层数据库的自增长字段来生成主键。只适用于支持自增长字段的数据库，如MySQL和SQL Server等。
SEQUENCE：使用底层数据库的序列来生成主键。只适用于支持序列的数据库，如Oracle和PostgreSQL等。
TABLE：使用一个特定的数据库表来存储主键值。它是一种通用的主键生成策略，适用于所有支持JDBC的数据库。

由于我们在配置文件中配置了主键生成策略和分片算法，所以此处我们就不能选IDENTITY了，而要选AUTO了，由jpa自动选择。

java 复制代码

import lombok.Getter;
import lombok.Setter;
import lombok.ToString;

import javax.persistence.*;

@Getter
@Setter
@ToString
@Table(name = "rainbow_mobile")
@Entity
public class RainbowMobile {

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private Long id;

    private String mobile;

    private String mobileMd5;
}

Controller

配置已完毕，我们写个test来测试一下

往数据中添加1000条数据

java 复制代码

import cn.hutool.crypto.SecureUtil;
import com.x.md5.entity.RainbowMobile;
import com.x.md5.service.RainbowMobileService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.util.ArrayList;
import java.util.List;

@RestController
@RequestMapping("test")
public class TestController {
    
    @Autowired
    RainbowMobileService rainbowMobileService;

    @PostMapping(value = "test")
    public void test() {
        String mobileLeft="1531753";
        List<RainbowMobile> list=new ArrayList<>();
        for (int i = 0; i < 1000; i++) {
            String suffix = String.format("%04d", i);
            String mobile=mobileLeft+""+suffix;
            RainbowMobile rainbowMobile =new RainbowMobile();
            rainbowMobile.setMobile(mobile);
            rainbowMobile.setMobileMd5(SecureUtil.md5(mobile));
            rainbowMobileService.save(rainbowMobile);
        }
    }
}

运行项目，系统会自动创建一个账 hibernate_sequence表（更新id），我们给他设置一个初始值

js 复制代码

update hibernate_sequence set next_val=1

然后调用上面的 test接口。然后查看数据库，选3张表来验证插入结果。

可以看到数据已经按照要求正确的分布在了各个表中。

查询

上面是写入的代码，查询就直接调用jpa的Repository中的 findByXXX就好了。

从控制台输出看我们查询一条数据，它会自动执行 100条select语句（数据库中有100张表，它select 了100次），然后把查到的结果返回给我。如果查大量数据的情况下，每个表都这样执行一次，速度会有提升吗？