前言
💡 痛点: 微服务间怎么保证数据一致性?Saga 怎么实现?补偿事务怎么写?本地消息表靠谱吗?TCC 和 Saga 怎么选?
🎯 解决方案: 本文系统讲解分布式事务核心模式:Saga 编配与编排实现、补偿事务设计与恢复策略、本地消息表可靠投递、TCC 三阶段提交、Seata 框架实战、最终一致性 vs 强一致性决策树。
#mermaid-svg-3JrHLwZRsRGubsnc{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-3JrHLwZRsRGubsnc .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-3JrHLwZRsRGubsnc .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-3JrHLwZRsRGubsnc .error-icon{fill:#552222;}#mermaid-svg-3JrHLwZRsRGubsnc .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-3JrHLwZRsRGubsnc .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-3JrHLwZRsRGubsnc .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-3JrHLwZRsRGubsnc .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-3JrHLwZRsRGubsnc .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-3JrHLwZRsRGubsnc .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-3JrHLwZRsRGubsnc .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-3JrHLwZRsRGubsnc .marker{fill:#333333;stroke:#333333;}#mermaid-svg-3JrHLwZRsRGubsnc .marker.cross{stroke:#333333;}#mermaid-svg-3JrHLwZRsRGubsnc svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-3JrHLwZRsRGubsnc p{margin:0;}#mermaid-svg-3JrHLwZRsRGubsnc .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-3JrHLwZRsRGubsnc .cluster-label text{fill:#333;}#mermaid-svg-3JrHLwZRsRGubsnc .cluster-label span{color:#333;}#mermaid-svg-3JrHLwZRsRGubsnc .cluster-label span p{background-color:transparent;}#mermaid-svg-3JrHLwZRsRGubsnc .label text,#mermaid-svg-3JrHLwZRsRGubsnc span{fill:#333;color:#333;}#mermaid-svg-3JrHLwZRsRGubsnc .node rect,#mermaid-svg-3JrHLwZRsRGubsnc .node circle,#mermaid-svg-3JrHLwZRsRGubsnc .node ellipse,#mermaid-svg-3JrHLwZRsRGubsnc .node polygon,#mermaid-svg-3JrHLwZRsRGubsnc .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-3JrHLwZRsRGubsnc .rough-node .label text,#mermaid-svg-3JrHLwZRsRGubsnc .node .label text,#mermaid-svg-3JrHLwZRsRGubsnc .image-shape .label,#mermaid-svg-3JrHLwZRsRGubsnc .icon-shape .label{text-anchor:middle;}#mermaid-svg-3JrHLwZRsRGubsnc .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-3JrHLwZRsRGubsnc .rough-node .label,#mermaid-svg-3JrHLwZRsRGubsnc .node .label,#mermaid-svg-3JrHLwZRsRGubsnc .image-shape .label,#mermaid-svg-3JrHLwZRsRGubsnc .icon-shape .label{text-align:center;}#mermaid-svg-3JrHLwZRsRGubsnc .node.clickable{cursor:pointer;}#mermaid-svg-3JrHLwZRsRGubsnc .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-3JrHLwZRsRGubsnc .arrowheadPath{fill:#333333;}#mermaid-svg-3JrHLwZRsRGubsnc .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-3JrHLwZRsRGubsnc .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-3JrHLwZRsRGubsnc .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-3JrHLwZRsRGubsnc .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-3JrHLwZRsRGubsnc .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-3JrHLwZRsRGubsnc .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-3JrHLwZRsRGubsnc .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-3JrHLwZRsRGubsnc .cluster text{fill:#333;}#mermaid-svg-3JrHLwZRsRGubsnc .cluster span{color:#333;}#mermaid-svg-3JrHLwZRsRGubsnc div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-3JrHLwZRsRGubsnc .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-3JrHLwZRsRGubsnc rect.text{fill:none;stroke-width:0;}#mermaid-svg-3JrHLwZRsRGubsnc .icon-shape,#mermaid-svg-3JrHLwZRsRGubsnc .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-3JrHLwZRsRGubsnc .icon-shape p,#mermaid-svg-3JrHLwZRsRGubsnc .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-3JrHLwZRsRGubsnc .icon-shape .label rect,#mermaid-svg-3JrHLwZRsRGubsnc .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-3JrHLwZRsRGubsnc .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-3JrHLwZRsRGubsnc .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-3JrHLwZRsRGubsnc :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 补偿机制
Saga 编排
失败
补偿
失败
消息驱动
Local Message Table
本地消息表
可靠消息队列
RocketMQ/Kafka
定时轮询
死信重试
Saga Orchestrator
编排器
Step 1
创建订单
Step 2
扣减库存
Step 3
扣款
Step 4
发送通知
Compensable Action
可补偿操作
Recovery
恢复/重试
Compensate Backward
逆向补偿
一、分布式事务问题本质
1.1 为什么 CAP 和 BASE 决定了选择
go
// ======== 强一致性场景 vs 最终一致性场景 ========
/*
分布式系统的不可能三角(CAP):
- Consistency(一致性):所有节点同一时刻看到相同数据
- Availability(可用性):每次请求都能得到响应
- Partition Tolerance(分区容错):网络分区时仍能运行
实际选择:
- CP 系统(强一致性):ZooKeeper、etcd、HBase
- AP 系统(最终一致性):Cassandra、DynamoDB
BASE 理论(最终一致性的基础):
- Basically Available(基本可用):允许部分功能暂时不可用
- Soft state(软状态):数据状态可以暂时不一致
- Eventually consistent(最终一致):系统在一段时间后达到一致
*/
// ======== 强一致性选择:两阶段提交(2PC)=======
/*
2PC 的问题(生产环境很少用):
1. 单点协调者:协调者挂了,参与者全部阻塞
2. 同步阻塞:所有参与者在 prepare 阶段锁定资源
3. 数据不一致:commit 阶段部分失败时
结论:2PC 只适合单机数据库,或极少数分片场景
*/
// ======== 最终一致性选择:Saga / 本地消息表 / TCC ========
/*
Saga:适用于长事务(业务流程跨多个服务)
TCC:适用于短事务(需要强隔离性)
本地消息表:适用于异步解耦(消息可靠性优先)
*/
1.2 业务场景决策树
go
// ======== 何时用哪种模式?=======
/*
决策树:
1. 能否接受最终一致性?
├─ 是 → 继续
└─ 否 → 强一致性 → 2PC(不推荐)或同一数据库
2. 事务时长?
├─ < 1秒(短事务)→ TCC
└─ > 1秒(长流程)→ Saga 或 本地消息表
3. 是否需要幂等性保证?
├─ 是 → 所有方案都需要幂等(Saga 天然支持)
└─ 否 → TCC(Try 阶段)
4. 补偿逻辑复杂度?
├─ 简单(可逆操作)→ Saga
└─ 复杂(需要业务判断)→ 本地消息表
典型场景决策:
- 电商下单(订单→库存→支付→物流):Saga
- 转账(扣款→入账):TCC 或 Saga
- 异步发消息(订单→通知→日志):本地消息表
- 秒杀扣库存:TCC(强隔离)或 Redis + MQ
*/
二、Saga 模式
2.1 Saga 编排器实现
go
// ======== Saga Orchestrator(编排式 Saga)=======
package saga
import (
"context"
"encoding/json"
"fmt"
"log"
"sync"
"time"
"github.com/google/uuid"
)
// SagaStep Saga 步骤定义
type SagaStep struct {
Name string
Execute func(ctx context.Context, payload []byte) (result []byte, err error)
Compensate func(ctx context.Context, payload []byte) error // 补偿函数
RetryPolicy RetryPolicy
}
// SagaResult Saga 执行结果
type SagaResult struct {
SagaID string
Completed bool
CompletedSteps []string
FailedStep string
Error error
}
// Saga Orchestrator
type SagaOrchestrator struct {
sagaID string
steps []SagaStep
results map[int][]byte // 每个步骤的结果(用于补偿)
mu sync.Mutex
}
func NewSaga(steps []SagaStep) *SagaOrchestrator {
return &SagaOrchestrator{
sagaID: uuid.New().String(),
steps: steps,
results: make(map[int][]byte),
}
}
// Execute 执行 Saga
func (s *SagaOrchestrator) Execute(ctx context.Context, initialPayload []byte) (*SagaResult, error) {
result := &SagaResult{
SagaID: s.sagaID,
}
executed := []string{}
payload := initialPayload
for i, step := range s.steps {
log.Printf("[Saga %s] Executing step %d: %s", s.sagaID, i, step.Name)
// 执行步骤(带重试)
var err error
for attempt := 0; attempt <= step.RetryPolicy.MaxRetries; attempt++ {
payload, err = step.Execute(ctx, payload)
if err == nil {
break
}
if attempt < step.RetryPolicy.MaxRetries {
wait := step.RetryPolicy.Backoff.Duration(attempt)
log.Printf("[Saga %s] Step %s failed (attempt %d), retrying in %v: %v",
s.sagaID, step.Name, attempt+1, wait, err)
time.Sleep(wait)
}
}
if err != nil {
result.FailedStep = step.Name
result.Error = err
// 补偿已执行的步骤(反向执行)
log.Printf("[Saga %s] Step %s failed, starting compensation", s.sagaID, step.Name)
s.compensate(ctx, executed)
return result, nil
}
// 保存执行结果
s.mu.Lock()
s.results[i] = payload
s.mu.Unlock()
executed = append(executed, step.Name)
log.Printf("[Saga %s] Step %s completed successfully", s.sagaID, step.Name)
}
result.Completed = true
result.CompletedSteps = executed
return result, nil
}
// compensate 逆向补偿已执行的步骤
func (s *SagaOrchestrator) compensate(ctx context.Context, executed []string) {
for i := len(executed) - 1; i >= 0; i-- {
stepIdx := -1
for j, step := range s.steps {
if step.Name == executed[i] {
stepIdx = j
break
}
}
if stepIdx == -1 {
continue
}
step := s.steps[stepIdx]
s.mu.Lock()
payload := s.results[stepIdx]
s.mu.Unlock()
log.Printf("[Saga %s] Compensating step %d: %s", s.sagaID, stepIdx, step.Name)
if err := step.Compensate(ctx, payload); err != nil {
// 补偿失败:记录日志,触发人工干预
log.Printf("[Saga %s] CRITICAL: Compensation failed for %s: %v",
s.sagaID, step.Name, err)
// 发送告警,进入人工处理流程
s.sendCompensationAlert(step.Name, err)
} else {
log.Printf("[Saga %s] Compensation completed for %s", s.sagaID, step.Name)
}
}
}
func (s *SagaOrchestrator) sendCompensationAlert(stepName string, err error) {
// 发送到告警系统,触发人工干预
fmt.Printf("ALERT: Saga %s compensation failed at step %s: %v\n", s.sagaID, stepName, err)
}
// RetryPolicy 重试策略
type RetryPolicy struct {
MaxRetries int
Backoff BackoffStrategy
}
type BackoffStrategy interface {
Duration(attempt int) time.Duration
}
type ExponentialBackoff struct {
Initial time.Duration
Max time.Duration
Factor float64
}
func (b ExponentialBackoff) Duration(attempt int) time.Duration {
d := time.Duration(float64(b.Initial) * pow(b.Factor, float64(attempt)))
if d > b.Max {
return b.Max
}
return d
}
func pow(base float64, exp int) float64 {
result := 1.0
for i := 0; i < exp; i++ {
result *= base
}
return result
}
2.2 电商下单 Saga 完整示例
go
// ======== 电商下单 Saga 完整实现 ========
package main
import (
"context"
"encoding/json"
"fmt"
"log"
"time"
"github.com/google/uuid"
"github.com/redis/go-redis/v9"
"github.com/jackc/pgx/v5/pgxpool"
)
// OrderService 订单服务
type OrderService struct {
db *pgxpool.Pool
redis *redis.Client
saga *SagaOrchestrator
}
// OrderPayload 订单 Saga payload
type OrderPayload struct {
OrderID string `json:"order_id"`
UserID string `json:"user_id"`
ProductID string `json:"product_id"`
Quantity int `json:"quantity"`
TotalPrice float64 `json:"total_price"`
}
// Step 1: 创建订单
func (s *OrderService) CreateOrder(ctx context.Context, payload []byte) ([]byte, error) {
var p OrderPayload
if err := json.Unmarshal(payload, &p); err != nil {
return nil, err
}
p.OrderID = uuid.New().String()
query := `
INSERT INTO orders (order_id, user_id, product_id, quantity, total_price, status, created_at)
VALUES ($1, $2, $3, $4, $5, 'pending', NOW())
RETURNING order_id
`
var orderID string
err := s.db.QueryRow(ctx, query,
p.OrderID, p.UserID, p.ProductID, p.Quantity, p.TotalPrice,
).Scan(&orderID)
if err != nil {
return nil, fmt.Errorf("failed to create order: %w", err)
}
p.OrderID = orderID
result, _ := json.Marshal(p)
return result, nil
}
// Step 1 补偿:取消订单
func (s *OrderService) CompensateCreateOrder(ctx context.Context, payload []byte) error {
var p OrderPayload
json.Unmarshal(payload, &p)
query := `UPDATE orders SET status = 'cancelled' WHERE order_id = $1`
_, err := s.db.Exec(ctx, query, p.OrderID)
return err
}
// Step 2: 扣减库存
func (s *OrderService) ReserveStock(ctx context.Context, payload []byte) ([]byte, error) {
var p OrderPayload
json.Unmarshal(payload, &p)
// Redis 分布式锁
lockKey := fmt.Sprintf("stock:lock:%s", p.ProductID)
locked, err := s.redis.SetNX(ctx, lockKey, "1", 10*time.Second).Result()
if err != nil || !locked {
return nil, fmt.Errorf("failed to acquire stock lock")
}
defer s.redis.Del(ctx, lockKey)
// 检查并扣减库存
stockKey := fmt.Sprintf("product:stock:%s", p.ProductID)
stock, err := s.redis.Get(ctx, stockKey).Int()
if err != nil || stock < p.Quantity {
return nil, fmt.Errorf("insufficient stock: available=%d, requested=%d", stock, p.Quantity)
}
err = s.redis.DecrBy(ctx, stockKey, int64(p.Quantity)).Err()
if err != nil {
return nil, fmt.Errorf("failed to reserve stock: %w", err)
}
// 保存库存快照用于补偿
compensationData, _ := json.Marshal(map[string]interface{}{
"order_id": p.OrderID,
"product_id": p.ProductID,
"quantity": p.Quantity,
"previous_stock": stock,
})
// 合并到 payload
var resultPayload OrderPayload
json.Unmarshal(payload, &resultPayload)
resultPayload.Quantity = p.Quantity // 保留扣减数量
return json.Marshal(resultPayload)
}
// Step 2 补偿:恢复库存
func (s *OrderService) CompensateReserveStock(ctx context.Context, payload []byte) error {
var p OrderPayload
json.Unmarshal(payload, &p)
stockKey := fmt.Sprintf("product:stock:%s", p.ProductID)
return s.redis.IncrBy(ctx, stockKey, int64(p.Quantity)).Err()
}
// Step 3: 扣款(模拟)
func (s *OrderService) ChargePayment(ctx context.Context, payload []byte) ([]byte, error) {
var p OrderPayload
json.Unmarshal(payload, &p)
// 模拟支付网关调用
paymentID := fmt.Sprintf("PAY_%s", uuid.New().String()[:8])
// 记录支付(实际向支付网关发起请求)
query := `
INSERT INTO payments (payment_id, order_id, user_id, amount, status, created_at)
VALUES ($1, $2, $3, $4, 'completed', NOW())
`
_, err := s.db.Exec(ctx, query, paymentID, p.OrderID, p.UserID, p.TotalPrice)
if err != nil {
return nil, fmt.Errorf("payment failed: %w", err)
}
resultPayload, _ := json.Marshal(OrderPayload{
OrderID: p.OrderID,
PaymentID: paymentID,
UserID: p.UserID,
ProductID: p.ProductID,
TotalPrice: p.TotalPrice,
})
return resultPayload, nil
}
// Step 3 补偿:退款(模拟)
func (s *OrderService) CompensateChargePayment(ctx context.Context, payload []byte) error {
var p OrderPayload
json.Unmarshal(payload, &p)
// 模拟退款
query := `UPDATE payments SET status = 'refunded' WHERE payment_id = $1`
_, err := s.db.Exec(ctx, query, p.PaymentID)
return err
}
// Step 4: 发送通知
func (s *OrderService) SendNotification(ctx context.Context, payload []byte) ([]byte, error) {
var p OrderPayload
json.Unmarshal(payload, &p)
log.Printf("[Notification] Order %s confirmed for user %s, amount: %.2f",
p.OrderID, p.UserID, p.TotalPrice)
// 这里可以发送邮件/SMS/推送通知
// 通知失败不影响事务(仅记录日志)
return payload, nil
}
// Step 4 补偿:发送退款通知
func (s *OrderService) CompensateSendNotification(ctx context.Context, payload []byte) error {
var p OrderPayload
json.Unmarshal(payload, &p)
log.Printf("[Notification] Order %s cancelled, refund initiated for user %s",
p.OrderID, p.UserID)
return nil
}
// 定义下单 Saga 步骤
func NewOrderSaga() *SagaOrchestrator {
steps := []SagaStep{
{
Name: "createOrder",
Execute: orderService.CreateOrder,
Compensate: orderService.CompensateCreateOrder,
RetryPolicy: RetryPolicy{
MaxRetries: 2,
Backoff: ExponentialBackoff{Initial: 100 * time.Millisecond, Max: 2 * time.Second, Factor: 2},
},
},
{
Name: "reserveStock",
Execute: orderService.ReserveStock,
Compensate: orderService.CompensateReserveStock,
RetryPolicy: RetryPolicy{
MaxRetries: 1,
Backoff: ExponentialBackoff{Initial: 200 * time.Millisecond, Max: 1 * time.Second, Factor: 2},
},
},
{
Name: "chargePayment",
Execute: orderService.ChargePayment,
Compensate: orderService.CompensateChargePayment,
RetryPolicy: RetryPolicy{
MaxRetries: 3,
Backoff: ExponentialBackoff{Initial: 500 * time.Millisecond, Max: 5 * time.Second, Factor: 2},
},
},
{
Name: "sendNotification",
Execute: orderService.SendNotification,
Compensate: orderService.CompensateSendNotification,
RetryPolicy: RetryPolicy{
MaxRetries: 0, // 通知失败不重试
},
},
}
return NewSaga(steps)
}
三、本地消息表
3.1 本地消息表核心实现
sql
-- ======== 本地消息表结构 ========
CREATE TABLE outbox (
outbox_id BIGSERIAL PRIMARY KEY,
aggregate_type VARCHAR(100) NOT NULL, -- 'order', 'payment', etc.
aggregate_id VARCHAR(100) NOT NULL, -- 业务 ID
event_type VARCHAR(100) NOT NULL, -- 'order.created', 'payment.completed'
payload JSONB NOT NULL, -- 消息内容
status VARCHAR(20) NOT NULL DEFAULT 'pending',
-- pending: 待发送, sent: 已发送, failed: 发送失败
retry_count INTEGER NOT NULL DEFAULT 0,
max_retries INTEGER NOT NULL DEFAULT 3,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
processed_at TIMESTAMPTZ
);
-- 索引
CREATE INDEX idx_outbox_status ON outbox (status, created_at);
CREATE INDEX idx_outbox_aggregate ON outbox (aggregate_type, aggregate_id);
CREATE INDEX idx_outbox_created ON outbox (created_at);
-- 事件表(幂等性保证)
CREATE TABLE event_log (
event_id BIGSERIAL PRIMARY KEY,
event_type VARCHAR(100) NOT NULL,
aggregate_id VARCHAR(100) NOT NULL,
event_data JSONB NOT NULL,
occurred_at TIMESTAMPTZ NOT NULL,
processed_at TIMESTAMPTZ,
UNIQUE(event_type, aggregate_id)
);
go
// ======== 本地消息表发送器 ========
package outbox
import (
"context"
"encoding/json"
"fmt"
"log"
"time"
"github.com/jackc/pgx/v5"
"github.com/jackc/pgx/v5/pgxpool"
)
// OutboxMessage 消息结构
type OutboxMessage struct {
OutboxID int64
AggregateType string
AggregateID string
EventType string
Payload []byte
}
// OutboxRepository 消息表操作
type OutboxRepository struct {
db *pgxpool.Pool
}
func NewOutboxRepository(db *pgxpool.Pool) *OutboxRepository {
return &OutboxRepository{db: db}
}
// Publish 发布消息(在同一事务中写入)
func (r *OutboxRepository) Publish(
ctx context.Context,
tx pgx.Tx,
aggregateType, aggregateID, eventType string,
payload interface{},
) error {
payloadJSON, err := json.Marshal(payload)
if err != nil {
return fmt.Errorf("failed to marshal payload: %w", err)
}
query := `
INSERT INTO outbox (aggregate_type, aggregate_id, event_type, payload, status)
VALUES ($1, $2, $3, $4, 'pending')
`
_, err = tx.Exec(ctx, query, aggregateType, aggregateID, eventType, payloadJSON)
return err
}
// GetPending 获取待发送消息
func (r *OutboxRepository) GetPending(ctx context.Context, limit int) ([]OutboxMessage, error) {
query := `
SELECT outbox_id, aggregate_type, aggregate_id, event_type, payload
FROM outbox
WHERE status = 'pending'
AND retry_count < max_retries
ORDER BY created_at ASC
LIMIT $1
FOR UPDATE SKIP LOCKED -- 防止并发获取同一条消息
`
rows, err := r.db.Query(ctx, query, limit)
if err != nil {
return nil, err
}
defer rows.Close()
var messages []OutboxMessage
for rows.Next() {
var msg OutboxMessage
if err := rows.Scan(&msg.OutboxID, &msg.AggregateType, &msg.AggregateID, &msg.EventType, &msg.Payload); err != nil {
return nil, err
}
messages = append(messages, msg)
}
return messages, nil
}
// MarkAsSent 标记为已发送
func (r *OutboxRepository) MarkAsSent(ctx context.Context, outboxID int64) error {
query := `
UPDATE outbox
SET status = 'sent', processed_at = NOW()
WHERE outbox_id = $1
`
_, err := r.db.Exec(ctx, query, outboxID)
return err
}
// MarkAsFailed 标记为失败(增加重试计数)
func (r *OutboxRepository) MarkAsFailed(ctx context.Context, outboxID int64) error {
query := `
UPDATE outbox
SET status = CASE
WHEN retry_count + 1 >= max_retries THEN 'failed'
ELSE 'pending'
END,
retry_count = retry_count + 1,
updated_at = NOW()
WHERE outbox_id = $1
`
_, err := r.db.Exec(ctx, query, outboxID)
return err
}
// Cleanup 清理旧消息(保留 7 天)
func (r *OutboxRepository) Cleanup(ctx context.Context) (int64, error) {
query := `
DELETE FROM outbox
WHERE status IN ('sent', 'failed')
AND processed_at < NOW() - INTERVAL '7 days'
`
result, err := r.db.Exec(ctx, query)
return result.RowsAffected(), err
}
// ======== Outbox Relay(轮询任务)=======
type OutboxRelay struct {
repo *OutboxRepository
publisher MessagePublisher
pollInterval time.Duration
batchSize int
}
type MessagePublisher interface {
Publish(ctx context.Context, topic, key string, payload []byte) error
}
func (r *OutboxRelay) Start(ctx context.Context) {
ticker := time.NewTicker(r.pollInterval)
defer ticker.Stop()
log.Printf("[OutboxRelay] Started, polling every %v", r.pollInterval)
for {
select {
case <-ctx.Done():
log.Println("[OutboxRelay] Stopping...")
return
case <-ticker.C:
r.processBatch(ctx)
}
}
}
func (r *OutboxRelay) processBatch(ctx context.Context) {
messages, err := r.repo.GetPending(ctx, r.batchSize)
if err != nil {
log.Printf("[OutboxRelay] Failed to get pending messages: %v", err)
return
}
if len(messages) == 0 {
return
}
log.Printf("[OutboxRelay] Processing %d messages", len(messages))
for _, msg := range messages {
err := r.publisher.Publish(ctx, msg.AggregateType, msg.AggregateID, msg.Payload)
if err != nil {
log.Printf("[OutboxRelay] Failed to publish outbox_id=%d: %v", msg.OutboxID, err)
if err := r.repo.MarkAsFailed(ctx, msg.OutboxID); err != nil {
log.Printf("[OutboxRelay] Failed to mark as failed: %v", err)
}
continue
}
if err := r.repo.MarkAsSent(ctx, msg.OutboxID); err != nil {
log.Printf("[OutboxRelay] Failed to mark as sent: %v", err)
}
}
}
3.2 消费者幂等处理
go
// ======== 事件消费者(幂等处理)=======
package consumer
import (
"context"
"encoding/json"
"fmt"
"log"
"github.com/jackc/pgx/v5/pgxpool"
)
type EventConsumer struct {
db *pgxpool.Pool
}
func (c *EventConsumer) Process(ctx context.Context, eventType, aggregateID string, payload []byte) error {
// 幂等检查:事件是否已处理
exists, err := c.checkEventProcessed(ctx, eventType, aggregateID)
if err != nil {
return fmt.Errorf("failed to check event: %w", err)
}
if exists {
log.Printf("[Consumer] Event %s:%s already processed, skipping", eventType, aggregateID)
return nil
}
// 处理事件(事务中)
tx, err := c.db.BeginTx(ctx, pgx.TxOptions{})
if err != nil {
return err
}
defer tx.Rollback(ctx)
// 业务处理
switch eventType {
case "order.created":
err = c.handleOrderCreated(ctx, tx, payload)
case "payment.completed":
err = c.handlePaymentCompleted(ctx, tx, payload)
default:
log.Printf("[Consumer] Unknown event type: %s", eventType)
return nil
}
if err != nil {
return err
}
// 记录事件处理(幂等标记)
if err := c.recordEvent(ctx, tx, eventType, aggregateID, payload); err != nil {
return err
}
return tx.Commit(ctx)
}
func (c *EventConsumer) checkEventProcessed(ctx context.Context, eventType, aggregateID string) (bool, error) {
query := `SELECT 1 FROM event_log WHERE event_type = $1 AND aggregate_id = $2 LIMIT 1`
var exists int
err := c.db.QueryRow(ctx, query, eventType, aggregateID).Scan(&exists)
if err == pgx.ErrNoRows {
return false, nil
}
return err == nil, err
}
func (c *EventConsumer) recordEvent(ctx context.Context, tx pgx.Tx, eventType, aggregateID string, payload []byte) error {
query := `
INSERT INTO event_log (event_type, aggregate_id, event_data, occurred_at, processed_at)
VALUES ($1, $2, $3, NOW(), NOW())
ON CONFLICT (event_type, aggregate_id) DO NOTHING
`
_, err := tx.Exec(ctx, query, eventType, aggregateID, payload)
return err
}
func (c *EventConsumer) handleOrderCreated(ctx context.Context, tx pgx.Tx, payload []byte) error {
var data struct {
OrderID string `json:"order_id"`
UserID string `json:"user_id"`
TotalPrice float64 `json:"total_price"`
}
json.Unmarshal(payload, &data)
log.Printf("[Consumer] Processing order.created: %s", data.OrderID)
// 业务逻辑:发送欢迎邮件、更新报表等
return nil
}
func (c *EventConsumer) handlePaymentCompleted(ctx context.Context, tx pgx.Tx, payload []byte) error {
var data struct {
OrderID string `json:"order_id"`
PaymentID string `json:"payment_id"`
Amount float64 `json:"amount"`
}
json.Unmarshal(payload, &data)
log.Printf("[Consumer] Processing payment.completed: %s", data.PaymentID)
// 更新订单状态
query := `UPDATE orders SET status = 'paid' WHERE order_id = $1`
_, err := tx.Exec(ctx, query, data.OrderID)
return err
}
四、TCC 模式
4.1 TCC 实现
go
// ======== TCC 三阶段实现 ========
package tcc
import (
"context"
"fmt"
"log"
"sync"
"time"
"github.com/google/uuid"
)
// Try/Confirm/Cancel 接口
type TryFunc func(ctx context.Context) (interface{}, error) // 预留资源
type ConfirmFunc func(ctx context.Context, tryResult interface{}) error // 确认
type CancelFunc func(ctx context.Context, tryResult interface{}) error // 取消
type TCCOption struct {
TryTimeout time.Duration
ConfirmTimeout time.Duration
RetryPolicy RetryPolicy
}
type TxnContext struct {
TxnID string
TryResult interface{}
Status string
CreatedAt time.Time
}
// TCC Service
type TCCService struct {
store map[string]*TxnContext
mu sync.RWMutex
}
func NewTCCService() *TCCService {
return &TCCService{
store: make(map[string]*TxnContext),
}
}
// Execute 执行 TCC 事务
func (s *TCCService) Execute(
ctx context.Context,
opts TCCOption,
try TryFunc,
confirm ConfirmFunc,
cancel CancelFunc,
) error {
txnID := uuid.New().String()
// ======== 1. Try 阶段:预留资源 ========
tryCtx, cancel := context.WithTimeout(ctx, opts.TryTimeout)
defer cancel()
log.Printf("[TCC %s] Try phase starting", txnID)
tryResult, err := try(tryCtx)
if err != nil {
log.Printf("[TCC %s] Try failed: %v, executing cancel", txnID, err)
// Try 失败,执行 Cancel
cancelCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
_ = cancel(cancelCtx, nil)
return fmt.Errorf("try failed: %w", err)
}
// 保存 Try 结果
s.mu.Lock()
s.store[txnID] = &TxnContext{
TxnID: txnID,
TryResult: tryResult,
Status: "try_completed",
CreatedAt: time.Now(),
}
s.mu.Unlock()
log.Printf("[TCC %s] Try completed, executing confirm", txnID)
// ======== 2. Confirm 阶段:确认资源 ========
confirmCtx, cancel := context.WithTimeout(ctx, opts.ConfirmTimeout)
defer cancel()
for attempt := 0; attempt <= opts.RetryPolicy.MaxRetries; attempt++ {
err = confirm(confirmCtx, tryResult)
if err == nil {
break
}
if attempt < opts.RetryPolicy.MaxRetries {
wait := opts.RetryPolicy.Backoff.Duration(attempt)
log.Printf("[TCC %s] Confirm failed (attempt %d), retrying in %v: %v",
txnID, attempt+1, wait, err)
time.Sleep(wait)
}
}
if err != nil {
log.Printf("[TCC %s] Confirm failed: %v, executing cancel", txnID, err)
// Confirm 失败,执行 Cancel
cancelCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
if cancelErr := cancel(cancelCtx, tryResult); cancelErr != nil {
log.Printf("[TCC %s] Cancel also failed: %v", txnID, cancelErr)
}
return fmt.Errorf("confirm failed: %w", err)
}
log.Printf("[TCC %s] Confirm completed successfully", txnID)
// 清理
s.mu.Lock()
delete(s.store, txnID)
s.mu.Unlock()
return nil
}
// ======== TCC 转账示例 ========
type AccountService struct {
db *pgxpool.Pool
}
func (a *AccountService) TransferTCC(ctx context.Context, from, to string, amount float64) error {
tcc := NewTCCService()
return tcc.Execute(ctx, TCCOption{
TryTimeout: 10 * time.Second,
ConfirmTimeout: 5 * time.Second,
RetryPolicy: RetryPolicy{
MaxRetries: 2,
Backoff: ExponentialBackoff{Initial: 100 * time.Millisecond, Factor: 2},
},
}, func(ctx context.Context) (interface{}, error) {
// ======== Try: 冻结金额 ========
// 检查余额是否充足
var balance float64
err := a.db.QueryRow(ctx,
`SELECT balance FROM accounts WHERE user_id = $1 FOR UPDATE`,
from).Scan(&balance)
if err != nil {
return nil, fmt.Errorf("failed to check balance: %w", err)
}
if balance < amount {
return nil, fmt.Errorf("insufficient balance: %.2f < %.2f", balance, amount)
}
// 冻结金额(减少可用余额,增加冻结金额)
_, err = a.db.Exec(ctx,
`UPDATE accounts SET frozen = frozen + $1 WHERE user_id = $2`,
amount, from)
if err != nil {
return nil, fmt.Errorf("failed to freeze amount: %w", err)
}
return map[string]float64{"from": from, "to": to, "amount": amount}, nil
}, func(ctx context.Context, tryResult interface{}) error {
// ======== Confirm: 完成转账 ========
data := tryResult.(map[string]interface{})
fromID := data["from"].(string)
toID := data["to"].(string)
amt := data["amount"].(float64)
tx, err := a.db.BeginTx(ctx, pgx.TxOptions{})
if err != nil {
return err
}
defer tx.Rollback(ctx)
// 从冻结金额中扣除
_, err = tx.Exec(ctx,
`UPDATE accounts SET frozen = frozen - $1 WHERE user_id = $2`,
amt, fromID)
if err != nil {
return err
}
// 增加目标账户余额
_, err = tx.Exec(ctx,
`UPDATE accounts SET balance = balance + $1 WHERE user_id = $2`,
amt, toID)
if err != nil {
return err
}
return tx.Commit(ctx)
}, func(ctx context.Context, tryResult interface{}) error {
// ======== Cancel: 解冻金额 ========
if tryResult == nil {
return nil
}
data := tryResult.(map[string]interface{})
fromID := data["from"].(string)
amt := data["amount"].(float64)
// 解冻金额(恢复可用余额)
_, err := a.db.Exec(ctx,
`UPDATE accounts SET frozen = frozen - $1 WHERE user_id = $2`,
amt, fromID)
return err
})
}
五、Seata 框架
5.1 AT 模式(自动补偿)
yaml
# ======== Seata Server 部署(TC 事务协调者)=======
# docker-compose.yml
version: '3.8'
services:
seata-server:
image: seataio/seata-server:1.7.0
container_name: seata-server
ports:
- "8091:8091"
- "7091:7091" # Metrics
environment:
- STORE_MODE=db
- SEATA_CONFIG_NAME=file:/root/seata-config/registry
- SPRING_DATASOURCE_DRIVER-CLASS-NAME=com.mysql.cj.jdbc.Driver
- SPRING_DATASOURCE_URL=jdbc:mysql://mysql:3306/seata?useUnicode=true&rewriteBatchedStatements=true
- SPRING_DATASOURCE_USER=seata
- SPRING_DATASOURCE_PASSWORD=seata_password
volumes:
- ./seata-config/registry.conf:/root/seata-config/registry:ro
depends_on:
- mysql
networks:
- seata-network
mysql:
image: mysql:8.0
ports:
- "3306:3306"
environment:
MYSQL_ROOT_PASSWORD: root_password
MYSQL_DATABASE: seata
volumes:
- mysql-data:/var/lib/mysql
networks:
- seata-network
networks:
seata-network:
driver: bridge
volumes:
mysql-data:
yaml
# ======== Seata Registry 配置 ========
# seata-config/registry.conf
registry {
type = "nacos"
nacos {
application = "seata-server"
serverAddr = "nacos:8848"
namespace = ""
group = "SEATA_GROUP"
cluster = "default"
}
}
config {
type = "nacos"
nacos {
serverAddr = "nacos:8848"
namespace = ""
group = "SEATA_GROUP"
dataId = "seataConfig"
}
}
# ======== 应用配置 ========
# application.yml
seata:
enabled: true
application-id: ${spring.application.name}
tx-service-group: my-tx-group
enable-auto-data-source-proxy: true
config:
type: nacos
nacos:
server-addr: ${NACOS_HOST:localhost}:${NACOS_PORT:8848}
group: SEATA_GROUP
registry:
type: nacos
nacos:
server-addr: ${NACOS_HOST:localhost}:${NACOS_PORT:8848}
group: SEATA_GROUP
service:
vgroup-mapping:
my-tx-group: default
enable-degrade: false
disable-global-transaction: false
java
// ======== Java 应用使用 Seata AT 模式 ========
// Spring Boot 应用
// 1. 依赖
// implementation 'io.seata:seata-spring-boot-starter:1.7.0'
// implementation 'io.seata:seata-dubbo-alibaba:1.7.0' // 如使用 Dubbo
// 2. 分布式事务注解
@Service
public class OrderService {
@GlobalTransactional(name = "create-order", timeoutMills = 30000, rollbackFor = Exception.class)
public Order createOrder(OrderDTO orderDTO) {
// Seata 自动管理以下所有操作的分布式事务
// 1. 创建订单
Order order = orderMapper.create(orderDTO);
// 2. 扣减库存(远程调用)
inventoryClient.deductStock(orderDTO.getProductId(), orderDTO.getQuantity());
// 3. 扣减余额(远程调用)
accountClient.deductBalance(orderDTO.getUserId(), orderDTO.getTotalPrice());
// 4. 发送消息
messageClient.sendOrderCreated(order);
return order;
}
}
// 3. 全局事务回滚
@GlobalTransactional(name = "transfer", rollbackFor = Exception.class)
public void transfer(String fromAccount, String toAccount, BigDecimal amount) {
accountClient.debit(fromAccount, amount);
// 模拟失败
throw new RuntimeException("Transfer failed");
// Seata 自动回滚 fromAccount 的扣款
}
六、Checklist 总结
□ 分布式事务模式选择
□ 业务场景分析:强一致性 vs 最终一致性
□ 事务时长判断:短事务 vs 长流程
□ 补偿逻辑复杂度评估
□ 选型决策:Saga / TCC / 本地消息表
□ Saga 模式
□ Saga Orchestrator 实现
□ 正向执行函数
□ 补偿函数(幂等设计)
□ 重试策略(指数退避)
□ 补偿失败告警
□ 人工干预流程
□ 订单→库存→支付→通知完整流程实现
□ 本地消息表
□ Outbox 表设计(status/retry_count/processed_at)
□ 事务中同时写入业务数据和 Outbox
□ Outbox Relay 轮询任务(FOR UPDATE SKIP LOCKED)
□ 消息发送失败重试
□ 消费者幂等处理(event_log UNIQUE 约束)
□ 旧消息清理
□ TCC 模式
□ Try: 预留资源/冻结金额
□ Confirm: 确认执行
□ Cancel: 释放冻结资源
□ Try/Confirm/Cancel 幂等性保证
□ 全局事务超时控制
□ Seata 框架
□ Seata Server(TC)部署
□ Registry/Config(Nacos/etcd)
□ @GlobalTransactional 注解
□ AT 模式与 MT 模式选择
□ 事务分组与高可用
□ 生产级可靠性
□ 幂等性设计(所有操作必须幂等)
□ 消息可靠性(at-least-once + 幂等消费)
□ 补偿链超时处理
□ 死信队列处理
□ 监控告警(Saga 失败率/TCC 冻结资金)
□ 定期补偿失败分析
总结
一句话总结: 分布式事务没有银弹,Saga 适合长流程补偿、TCC 适合资源预留、本地消息表适合可靠消息,三者按场景组合使用。
分布式事务模式对比:
| 维度 | Saga | TCC | 本地消息表 | 2PC |
|---|---|---|---|---|
| 一致性 | 最终一致 | 最终一致 | 最终一致 | 强一致 |
| 适用场景 | 长流程 | 短事务 | 异步解耦 | 不推荐 |
| 资源锁定 | 无 | Try 阶段锁定 | 无 | 全程锁定 |
| 补偿复杂度 | 高(需写补偿逻辑) | 中(需写 Cancel) | 低(重试即可) | 无 |
| 吞吐量 | 高 | 中 | 高 | 低 |
| 实现复杂度 | 高 | 中 | 低 | 高 |
| 失败恢复 | 补偿链 | 自动 Cancel | 重试消息 | 人工处理 |
下一步推荐:
- Seata 在 Spring Cloud / Dubbo 中的深度集成(AT/TCC/Saga 多模式)
- RocketMQ 事务消息实战(半消息 + 回查)
- 分布式事务可视化监控(SkyWalking / Pinpoint 集成)