🚀Redisson分布式锁和seata分布式事务AT模式原理解析

分布式事务

Seata（Simple Extensible Autonomous Transaction Architecture）是一个开源的分布式事务解决方案，旨在解决分布式系统中的数据一致性和事务问题。Seata提供了高性能和易用性的分布式事务服务，支持多种事务模式，并且能够与各种主流的数据库和分布式存储系统进行集成。

Seata 支持的事务模式有四种分别是:

Seata AT 模式
Seata TCC 模式
Seata Saga 模式
Seata XA 模式

Seata的三大模块：

TC (Transaction Coordinator) - 事务协调者：维护全局和分支事务的状态，驱动全局事务提交或回滚。
TM (Transaction Manager) - 事务管理器：定义全局事务的范围，开始全局事务、提交或回滚全局事务。
RM ( Resource Manager ) - 资源管理器：管理分支事务处理的资源( Resource )，与 TC 交谈以注册分支事务和报告分支事务的状态，并驱动分支事务提交或回滚。

其中，TC 是我们部署的 Server 服务端，TM 和 RM 为嵌入到应用中的 Client 服务。

本文主讲Seata的默认模式-AT模式

AT模式

先来看看需要的配置

配置方面

这里使用的是db模式，即数据库，对应的数据库脚本

less 复制代码

-- -------------------------------- The script used when storeMode is 'db' --------------------------------
-- the table to store GlobalSession data
CREATE TABLE IF NOT EXISTS `global_table`
(
    `xid`                       VARCHAR(128) NOT NULL,
    `transaction_id`            BIGINT,
    `status`                    TINYINT      NOT NULL,
    `application_id`            VARCHAR(32),
    `transaction_service_group` VARCHAR(32),
    `transaction_name`          VARCHAR(128),
    `timeout`                   INT,
    `begin_time`                BIGINT,
    `application_data`          VARCHAR(2000),
    `gmt_create`                DATETIME,
    `gmt_modified`              DATETIME,
    PRIMARY KEY (`xid`),
    KEY `idx_status_gmt_modified` (`status` , `gmt_modified`),
    KEY `idx_transaction_id` (`transaction_id`)
    ) ENGINE = InnoDB
    DEFAULT CHARSET = utf8;

-- the table to store BranchSession data
CREATE TABLE IF NOT EXISTS `branch_table`
(
    `branch_id`         BIGINT       NOT NULL,
    `xid`               VARCHAR(128) NOT NULL,
    `transaction_id`    BIGINT,
    `resource_group_id` VARCHAR(32),
    `resource_id`       VARCHAR(256),
    `branch_type`       VARCHAR(8),
    `status`            TINYINT,
    `client_id`         VARCHAR(64),
    `application_data`  VARCHAR(2000),
    `gmt_create`        DATETIME(6),
    `gmt_modified`      DATETIME(6),
    PRIMARY KEY (`branch_id`),
    KEY `idx_xid` (`xid`)
    ) ENGINE = InnoDB
    DEFAULT CHARSET = utf8;

-- the table to store lock data
CREATE TABLE IF NOT EXISTS `lock_table`
(
    `row_key`        VARCHAR(128) NOT NULL,
    `xid`            VARCHAR(128),
    `transaction_id` BIGINT,
    `branch_id`      BIGINT       NOT NULL,
    `resource_id`    VARCHAR(256),
    `table_name`     VARCHAR(32),
    `pk`             VARCHAR(36),
    `status`         TINYINT      NOT NULL DEFAULT '0' COMMENT '0:locked ,1:rollbacking',
    `gmt_create`     DATETIME,
    `gmt_modified`   DATETIME,
    PRIMARY KEY (`row_key`),
    KEY `idx_status` (`status`),
    KEY `idx_branch_id` (`branch_id`)
    ) ENGINE = InnoDB
    DEFAULT CHARSET = utf8;

CREATE TABLE IF NOT EXISTS `distributed_lock`
(
    `lock_key`       CHAR(20) NOT NULL,
    `lock_value`     VARCHAR(20) NOT NULL,
    `expire`         BIGINT,
    primary key (`lock_key`)
    ) ENGINE = InnoDB
    DEFAULT CHARSET = utf8mb4;

INSERT INTO `distributed_lock` (lock_key, lock_value, expire) VALUES ('HandleAllSession', ' ', 0);

seata的application.yml配置(如果不是虚拟机的看registry.conf)

接下来先看AT模式的流程。

AT模式流程

看完上面这张图，我们来说说大致的流程：

1、订单服务作为事务管理器（因为全局事务从这开始)，商品服务作为资源管理器(因为被订单服务调用)。

2、AT模式分为两个阶段：

一阶段：

首先订单服务向TC注册全局事务，会拿到一个xid(全局事务的id)，
接着订单服务远程调用商品服务
然后商品服务注册分支事务到TC
之后商品服务开始解析对应调用方法的SQL语句，通过查询条件来生成查询语句，得到before Image(前镜像：指修改前的数据)
得到前镜像后执行SQL语句
执行完SQL语句根据before Image得到的结果通过主键来查询after Image(后镜像：修改后的数据)。
生成行锁
提交到undolog，并向TC报告事务状态

注意：（保存前镜像，执行SQL语句，保存后镜像，生成行锁）这些步骤均在Mysql数据库管理系统DBMS的事务内完成（因为数据库管理系统通常会先开启一个事务），保证了一阶段操作的原子性。

二阶段：分为提交和回滚

提交：

因为SQL语句已经执行完成，没有出现异常时，TC会通知RM把一阶段保存的前后镜像和行锁删除即可。

回滚：

当某个RM出现异常时（这里只有一个RM)，TC会通知所有RM(包括异常和正常的RM)还原业务数据。
首先Mysql数据库管理系统会开启一个事务，依次执行（校验脏写、还原数据、删除前后镜像、行锁）
校验脏写：通过比对当前数据库的数据与后镜像的数据，一致则没有脏写，否则需要转人工处理。
还原数据：根据前镜像生成逆向SQL进行还原数据
删除前后镜像、行锁
完成这些步骤后进行提交事务，并报告给TC事务状态。

这样一个AT模式的流程就大概说完了。

下面看看Redisson的分布式锁

分布式锁

基于Redisson的分布式锁

Redisson是一个基于Redis的分布式Java框架。它提供了丰富的功能和工具，帮助开发者在分布式系统中解决数据共享、并发控制和任务调度等问题。通过使用Redisson，开发者可以轻松地操作Redis的分布式对象（如集合、映射、队列等），实现可靠的分布式锁机制，以及管理和调度分布式环境中的任务和服务。

首先先来聊聊redis如何实现分布式锁

通过setNX+expire实现的，用RedisTemplate的setIfAbsent即包含这两种功能。
由于get和del锁两个操作非原子性，无法保证进程的安全，于是引入了Lua脚本保证原子性。
为避免死锁和支持递归调用，通过记录锁的持有次数实现可重入性来解决。

问题：当A进程获取锁后，业务处理的时间过长，导致锁过期，B进程同样可以获取锁，两者就会出现共享同一个资源的问题，那锁加长超时时间不就行了吗？那如果Redis节点宕机以后，这个锁处于锁住的状态，就会出现死锁问题。

解决方案：能够在业务处理过程中不断刷新超时时间，这样即避免了共享资源的问题，也可以避免死锁问题，而这个能够刷新超时时间的功能正是来源于Redisson

RLock

RLock是Redisson分布式锁的核心接口，那RLock是如何加锁的呢？

构建锁过程

typescript 复制代码

//创建锁
public RLock getLock(String name) {
    return new RedissonLock(connectionManager.getCommandExecutor(), name);
}

ini 复制代码

public RedissonLock(CommandAsyncExecutor commandExecutor, String name) {
        super(commandExecutor, name);
        //异步处理的命令执行器
        this.commandExecutor = commandExecutor;
        //生成唯一id
        this.id = commandExecutor.getConnectionManager().getId();
        //锁存活时间，默认30s
        this.internalLockLeaseTime = commandExecutor.getConnectionManager().getCfg().getLockWatchdogTimeout();
        //将id和业务key拼接，作为实际的key
        this.entryName = id + ":" + name;
        this.pubSub = commandExecutor.getConnectionManager().getSubscribeService().getLockPubSub();
}

加锁过程：这是RLock.lock()。

scss 复制代码

//获取锁的过程
private void lock(long leaseTime, TimeUnit unit, boolean interruptibly) throws InterruptedException {
   //获取当前线程Id
    long threadId = Thread.currentThread().getId();
    Long ttl = tryAcquire(-1, leaseTime, unit, threadId);
    // lock acquired
    if (ttl == null) {
        return;
    }

    RFuture<RedissonLockEntry> future = subscribe(threadId);
    if (interruptibly) {
        commandExecutor.syncSubscriptionInterrupted(future);
    } else {
        commandExecutor.syncSubscription(future);
    }

    try {
        while (true) {
            ttl = tryAcquire(-1, leaseTime, unit, threadId);
            // lock acquired
            if (ttl == null) {
                break;
            }

            // waiting for message
            if (ttl >= 0) {
                try {
                    future.getNow().getLatch().tryAcquire(ttl, TimeUnit.MILLISECONDS);
                } catch (InterruptedException e) {
                    if (interruptibly) {
                        throw e;
                    }
                    future.getNow().getLatch().tryAcquire(ttl, TimeUnit.MILLISECONDS);
                }
            } else {
                if (interruptibly) {
                    future.getNow().getLatch().acquire();
                } else {
                    future.getNow().getLatch().acquireUninterruptibly();
                }
            }
        }
    } finally {
        unsubscribe(future, threadId);
    }
//        get(lockAsync(leaseTime, unit));
}

csharp 复制代码

private Long tryAcquire(long waitTime, long leaseTime, TimeUnit unit, long threadId) {
        return get(tryAcquireAsync(waitTime, leaseTime, unit, threadId));
}

scss 复制代码

private <T> RFuture<Long> tryAcquireAsync(long waitTime, long leaseTime, TimeUnit unit, long threadId) {
    if (leaseTime != -1) {
        return tryLockInnerAsync(waitTime, leaseTime, unit, threadId, RedisCommands.EVAL_LONG);
    }
    RFuture<Long> ttlRemainingFuture = tryLockInnerAsync(waitTime, internalLockLeaseTime,
                                                            TimeUnit.MILLISECONDS, threadId, RedisCommands.EVAL_LONG);
    
    //这是一个异步回调函数，当 ttlRemainingFuture 完成时，执行回调函数中的代码。
    //ttlRemainingFuture 是一个异步的获取键的剩余存活时间的操作，而回调函数中的代码将在异步操作完成后执行。
    ttlRemainingFuture.onComplete((ttlRemaining, e) -> {
        if (e != null) {
            return;
        }

        // lock acquired
        if (ttlRemaining == null) {
            scheduleExpirationRenewal(threadId);
        }
    });
    return ttlRemainingFuture;
}

ini 复制代码

<T> RFuture<T> tryLockInnerAsync(long waitTime, long leaseTime, TimeUnit unit, long threadId, RedisStrictCommand<T> command) {
        //秒改为毫秒
        internalLockLeaseTime = unit.toMillis(leaseTime);
        return evalWriteAsync(getName(), LongCodec.INSTANCE, command,
                "if (redis.call('exists', KEYS[1]) == 0) then " +
                        "redis.call('hincrby', KEYS[1], ARGV[2], 1); " +
                        "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                        "return nil; " +
                        "end; " +
                        "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +
                        "redis.call('hincrby', KEYS[1], ARGV[2], 1); " +
                        "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                        "return nil; " +
                        "end; " +
                        "return redis.call('pttl', KEYS[1]);",
                Collections.singletonList(getName()), internalLockLeaseTime, getLockName(threadId));
}

解释加锁过程（调用lock()，不设置超时时间)：

调用lock方法后，获取当前线程id后就会进行tryAcquire方法之后到tryAcquireAsync，因为没有设置超时时间，默认情况下leaseTime为-1。
然后进入tryLockInnerAsync(waitTime, internalLockLeaseTime, TimeUnit.MILLISECONDS, threadId, RedisCommands.EVAL_LONG);此时传入了internalLockLeaseTime,在开头代码设置为30s。所以在tryLockInnerAsync的方法里，leaseTime=30s。
lua脚本（核心） ：

- 这个lua脚本会把Collections.singletonList(getName())作为key[1]，internalLockLeaseTime作为ARGV[1]，getLockName(threadId)作为ARGV[2]，表示uuid+threadId组合的唯一值，然后再进行判断，判断待获取锁的key存不存在(在getLock方法传入的name就是key)。
- 如果不存在，则执行then后面的语句，会创建KEY[1]对应的哈希表，并将ARGV[2]字段的值设置为1，并且设置KEY[1]的过期时间为ARGV[1]，也就是30s，返回 nil，表示脚本的执行结果。
- 如果存在就到下一个if分支，判断哈希表 KEYS[1] 中是否存在字段 ARGV[2]，存在就进行ARGV[2]字段的值递增加1（充当计数器，实现可重入锁的效果），且设置KEY[1]的过期时间为ARGV[1]，也就是30s，返回 nil，表示脚本的执行结果。不存在就返回pttl命令获取键 KEYS[1] 的剩余存活时间（TTL）。

注意：分布式锁的实现通常使用了 Redis 的哈希表（Hash）数据结构来保存锁的状态信息。

看门狗机制

在tryLockInnerAsyn完成之后到下一步

scss 复制代码

   //这是一个异步回调函数，当 ttlRemainingFuture 完成时，执行回调函数中的代码。
    //ttlRemainingFuture 是一个异步的获取键的剩余存活时间的操作，而回调函数中的代码将在异步操作完成后执行。
    ttlRemainingFuture.onComplete((ttlRemaining, e) -> {
        if (e != null) {
            return;
        }

        // lock acquired
        if (ttlRemaining == null) {
            scheduleExpirationRenewal(threadId);
        }

scss 复制代码

private void scheduleExpirationRenewal(long threadId) {
        ExpirationEntry entry = new ExpirationEntry();
        //putIfAbsent 方法会在键不存在时添加新值，并返回之前的值。
        ExpirationEntry oldEntry = EXPIRATION_RENEWAL_MAP.putIfAbsent(getEntryName(), entry);
       //判断在这个服务实例中的加锁客户端的锁key是否存在，
       如果已经存在了，就直接返回。
        if (oldEntry != null) {
        //如果之前已经存在对应的 entry 对象（键已经存在于 EXPIRATION_RENEWAL_MAP 中），
      	//则将线程 ID 添加到旧值 oldEntry 中，线程 ID 添加到 oldEntry 中，可以实现多个线程对同一个键的有序处理。
            oldEntry.addThreadId(threadId);
        } else {
            entry.addThreadId(threadId);
            //进行过期时间的续约操作
            renewExpiration();
        }
}

解释看门狗机制：

当加锁过程完成后，进入scheduleExpirationRenewal(threadId);
在scheduleExpirationRenewal里，会去判断expirationRenewalMap中对应的key存不存在
存在：添加线程Id到旧值oldEntry中
不存在：添加线程id到新值entry中，并进行续约操作（第一次都是不存在）
如果不存在就会进入renewExpiration()，看门狗机制的核心如下。

ini 复制代码

private void renewExpiration() {
	//拿到前面插入的键值对
        ExpirationEntry ee = EXPIRATION_RENEWAL_MAP.get(getEntryName());
        if (ee == null) {
            return;
        }
        
        Timeout task = commandExecutor.getConnectionManager().newTimeout(new TimerTask() {
            @Override
            public void run(Timeout timeout) throws Exception {
            //拿到之前EXPIRATION_RENEWAL_MAP插入的entry
                ExpirationEntry ent = EXPIRATION_RENEWAL_MAP.get(getEntryName());
                if (ent == null) {
                    return;
                }
                //拿到线程id
                Long threadId = ent.getFirstThreadId();
                if (threadId == null) {
                    return;
                }
                //进行
                RFuture<Boolean> future = renewExpirationAsync(threadId);
                future.onComplete((res, e) -> {
                    if (e != null) {
                        log.error("Can't update lock " + getName() + " expiration", e);
                        EXPIRATION_RENEWAL_MAP.remove(getEntryName());
                        return;
                    }
                    
                    if (res) {
                        // reschedule itself
                        renewExpiration();
                    }
                });
            }
        }, internalLockLeaseTime / 3, TimeUnit.MILLISECONDS);
        //给ee设置定时任务
        ee.setTimeout(task);
}

因为前面putIfAbsent如果不存在会把当前键值插入，并把旧值返回(旧值为空)，所以这里拿到的ee是不为空的。
task是一个定时任务，延迟internalLockLeaseTime/3（10s）之后执行，会给ee设置该定时任务。
定时任务内有个异步操作future，调用了renewExpirationAsync(threadId)

typescript 复制代码

protected RFuture<Boolean> renewExpirationAsync(long threadId) {
        return evalWriteAsync(getName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN,
                "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +
                        "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                        "return 1; " +
                        "end; " +
                        "return 0;",
                Collections.singletonList(getName()),
                internalLockLeaseTime, getLockName(threadId));
}

可以看到是一个判断加锁的那个key存不存在，
存在:进行重新设置过期时间为30s（也就是续期）
不存在:返回0；
然后future执行完会有一个回调函数

js 复制代码

                future.onComplete((res, e) -> {
                    if (e != null) {
                        log.error("Can't update lock " + getName() + " expiration", e);
                        EXPIRATION_RENEWAL_MAP.remove(getEntryName());
                        return;
                    }
                    
                    if (res) {
                        // reschedule itself
                        renewExpiration();
                    }
                });

首先判断异常参数是否为null
不为null:打印续期错误的异常日志，并从expirationRenewalMap中移除与该键关联的ExpirationEntry对象，最后return结束回调函数。
为null：判断res，res是异步操作future的结果
- res是0，证明锁释放了，不会续期
- res是1，证明锁未释放，进行递归renewExpiration() ,就会重新有一个定时任务

这就是看门狗机制。

释放锁

ini 复制代码

protected RFuture<Boolean> unlockInnerAsync(long threadId) {
        return this.commandExecutor.evalWriteAsync(this.getName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN,
        "if (redis.call('exists', KEYS[1]) == 0) then
       			 redis.call('publish', KEYS[2], ARGV[1]);
       			 return 1; 
        end;
        if (redis.call('hexists', KEYS[1], ARGV[3]) == 0) then 
        return nil;
        end;
        local counter = redis.call('hincrby', KEYS[1], ARGV[3], -1);
        if (counter > 0) then 
        redis.call('pexpire', KEYS[1], ARGV[2]);
        return 0;
        else 
        redis.call('del', KEYS[1]); 
        redis.call('publish', KEYS[2], ARGV[1]);
        return 1;
        end;
        return nil;
        ", Arrays.asList(this.getName(), this.getChannelName()), new Object[]{LockPubSub.unlockMessage, this.internalLockLeaseTime, this.getLockName(threadId)});
    }

lua脚本：

this.getName()表示锁名称，this.getChannelName()表示pubSub发布消息的channel名称， LockPubSub.unlockMessage表示锁释放消息，this.internalLockLeaseTime表示过期时间 this.getLockName(threadId)表示当前线程的id跟uuid组合的唯一值。
1、判断锁的key存不存在
不存在：通过发布消息的通道进行发布锁释放消息。
存在：进入下一个if分支
2、判断redis哈希表，锁key对应的字段ARGV[3]存不存在
不存在：返回nil
存在：对redis哈希表KEY[1]对应字段ARGV[3]的值进行-1操作，并将-1后ARGV[3]的值赋给counter。
3、判断counter是否大于0，
count大于0为true：表示当前锁被重入了，进行设置过期时间为30秒，返回0
count大于为false：进行删除锁key操作，并向当前锁相关的通道发布锁释放的消息，返回1。

消息订阅

ini 复制代码

private void lock(long leaseTime, TimeUnit unit, boolean interruptibly) throws InterruptedException {
    long threadId = Thread.currentThread().getId();
    Long ttl = tryAcquire(-1, leaseTime, unit, threadId);
    // lock acquired
 if (ttl == null) {
        return;
    }

    RFuture<RedissonLockEntry> future = subscribe(threadId);
    if (interruptibly) {
        commandExecutor.syncSubscriptionInterrupted(future);
    } else {
        commandExecutor.syncSubscription(future);
    }

简单聊聊这方面：

当当前线程获取锁了会直接return，然后其它线程进来ttl就不为空了，然后进行下一步了
通过subscribe进行订阅操作，订阅与未获取锁的线程标识符相关联的频道或主题。
然后通过interruptibly判断（默认为false）
true：在订阅过程中支持中断处理，当订阅过程出现中断，会触发 InterruptedException 异常。
false：订阅过程中不支持中断处理。当订阅过程出现中断，会被忽略。