企业级消息中心存储与可靠性篇:数据存储设计与高可用保障
系列文章第三篇:深入解析消息中心的数据存储架构和高可用设计
📖 系列文章导读
本系列文章将全面解析企业级消息中心的设计与实现,共分为5篇:
- 架构设计篇:设计哲学、架构演进、技术选型
- 核心实现篇:整体架构设计、核心功能实现
- 存储与可靠性篇(本篇):数据存储设计、高可用保障
- 运维与扩展篇:监控运维、扩展性设计
- 实战总结篇:业务价值、经验总结
🗄️ 数据存储架构设计
存储架构总览
scss
┌─────────────────────────────────────────────────────────────┐
│ 应用层 │
└─────────────────────┬───────────────────────────────────────┘
│
┌─────────────────────▼───────────────────────────────────────┐
│ 缓存层 │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Caffeine │ │ Redis │ │ 本地缓存 │ │
│ │ (L1缓存) │ │ (L2缓存) │ │ (热点数据) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────┬───────────────────────────────────────┘
│
┌─────────────────────▼───────────────────────────────────────┐
│ 存储层 │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ MySQL │ │ MongoDB │ │ Elasticsearch│ │
│ │ (实时数据) │ │ (历史数据) │ │ (搜索引擎) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────┘
数据分层存储策略
1. 冷热数据分离的设计思考
为什么需要冷热分离?
markdown
问题分析:
1. 数据访问特点:
- 热数据:最近7天的数据,访问频率高,要求响应快
- 温数据:7-30天的数据,偶尔访问,响应要求一般
- 冷数据:30天以上的数据,很少访问,主要用于审计
2. 存储成本:
- MySQL:高性能,高成本,适合热数据
- MongoDB:中等性能,中等成本,适合温数据
- 对象存储:低成本,适合冷数据归档
3. 查询模式:
- 实时查询:主要查询最近数据
- 统计分析:需要历史数据聚合
- 审计追溯:偶尔查询历史记录
冷热分离的实现策略:
java
@Service
public class DataTieringService {
@Autowired
private MessageTaskMapper messageTaskMapper;
@Autowired
private MessageHistoryService historyService;
@Autowired
private ArchiveService archiveService;
/**
* 数据分层策略
*/
@Scheduled(cron = "0 0 2 * * ?") // 每天凌晨2点执行
public void executeDataTiering() {
log.info("开始执行数据分层任务");
try {
// 1. 热数据 -> 温数据(MySQL -> MongoDB)
migrateHotToWarm();
// 2. 温数据 -> 冷数据(MongoDB -> 对象存储)
migrateWarmToCold();
// 3. 清理过期数据
cleanupExpiredData();
log.info("数据分层任务执行完成");
} catch (Exception e) {
log.error("数据分层任务执行失败", e);
}
}
/**
* 热数据迁移到温数据
*/
private void migrateHotToWarm() {
LocalDateTime cutoffTime = LocalDateTime.now().minusDays(7);
// 分批查询需要迁移的数据
int pageSize = 1000;
int offset = 0;
while (true) {
List<MessageTask> tasks = messageTaskMapper.selectForMigration(
cutoffTime, offset, pageSize);
if (tasks.isEmpty()) {
break;
}
// 批量迁移到MongoDB
List<MessageHistory> histories = tasks.stream()
.map(this::convertToHistory)
.collect(Collectors.toList());
historyService.batchInsert(histories);
// 删除MySQL中的数据
List<Long> taskIds = tasks.stream()
.map(MessageTask::getId)
.collect(Collectors.toList());
messageTaskMapper.batchDelete(taskIds);
log.info("迁移热数据到温数据: count={}", tasks.size());
offset += pageSize;
}
}
/**
* 温数据迁移到冷数据
*/
private void migrateWarmToCold() {
LocalDateTime cutoffTime = LocalDateTime.now().minusDays(30);
// 查询需要归档的数据
List<MessageHistory> histories = historyService.findForArchive(cutoffTime);
if (!histories.isEmpty()) {
// 归档到对象存储
String archiveFile = archiveService.archive(histories);
// 删除MongoDB中的数据
List<String> historyIds = histories.stream()
.map(MessageHistory::getId)
.collect(Collectors.toList());
historyService.batchDelete(historyIds);
log.info("归档温数据到冷存储: count={}, file={}", histories.size(), archiveFile);
}
}
/**
* 转换为历史记录
*/
private MessageHistory convertToHistory(MessageTask task) {
return MessageHistory.builder()
.taskId(task.getTaskId())
.messageType(task.getMessageType())
.recipient(task.getRecipient())
.content(task.getContent())
.channels(task.getChannels())
.status(task.getStatus())
.createTime(task.getCreateTime())
.updateTime(task.getUpdateTime())
.build();
}
}
2. 分库分表策略
分库分表的设计原则:
markdown
分库分表考虑因素:
1. 分片键选择:
- 消息任务表:按 task_id 哈希分片
- 用户配置表:按 user_id 哈希分片
- 发送记录表:按 create_time 时间分片
2. 分片数量:
- 数据库:4个库(考虑扩展性)
- 数据表:每库16张表(总共64张表)
3. 路由策略:
- 哈希路由:保证数据均匀分布
- 范围路由:支持范围查询
- 一致性哈希:支持动态扩容
分库分表的实现:
java
@Component
public class ShardingStrategy {
// 数据库数量
private static final int DB_COUNT = 4;
// 每个数据库的表数量
private static final int TABLE_COUNT = 16;
/**
* 根据任务ID计算分片
*/
public ShardingInfo calculateSharding(String taskId) {
// 使用一致性哈希算法
int hash = Math.abs(taskId.hashCode());
// 计算数据库索引
int dbIndex = hash % DB_COUNT;
// 计算表索引
int tableIndex = (hash / DB_COUNT) % TABLE_COUNT;
return ShardingInfo.builder()
.dbIndex(dbIndex)
.tableIndex(tableIndex)
.dbName("message_db_" + dbIndex)
.tableName("message_task_" + tableIndex)
.build();
}
/**
* 根据用户ID计算分片
*/
public ShardingInfo calculateUserSharding(String userId) {
int hash = Math.abs(userId.hashCode());
int dbIndex = hash % DB_COUNT;
int tableIndex = (hash / DB_COUNT) % TABLE_COUNT;
return ShardingInfo.builder()
.dbIndex(dbIndex)
.tableIndex(tableIndex)
.dbName("message_db_" + dbIndex)
.tableName("user_config_" + tableIndex)
.build();
}
/**
* 根据时间计算分片(按月分表)
*/
public ShardingInfo calculateTimeSharding(LocalDateTime createTime) {
// 按年月计算表后缀
String suffix = createTime.format(DateTimeFormatter.ofPattern("yyyyMM"));
// 简单轮询分库
int dbIndex = createTime.getMonthValue() % DB_COUNT;
return ShardingInfo.builder()
.dbIndex(dbIndex)
.tableIndex(0) // 时间分片不需要表索引
.dbName("message_db_" + dbIndex)
.tableName("send_record_" + suffix)
.build();
}
}
/**
* 分片数据访问层
*/
@Repository
public class ShardingMessageTaskMapper {
@Autowired
private ShardingStrategy shardingStrategy;
@Autowired
private Map<String, SqlSessionFactory> sqlSessionFactories;
/**
* 根据任务ID查询
*/
public MessageTask selectByTaskId(String taskId) {
ShardingInfo sharding = shardingStrategy.calculateSharding(taskId);
try (SqlSession session = getSqlSession(sharding.getDbName())) {
MessageTaskMapper mapper = session.getMapper(MessageTaskMapper.class);
return mapper.selectByTaskId(taskId, sharding.getTableName());
}
}
/**
* 插入消息任务
*/
public int insert(MessageTask task) {
ShardingInfo sharding = shardingStrategy.calculateSharding(task.getTaskId());
try (SqlSession session = getSqlSession(sharding.getDbName())) {
MessageTaskMapper mapper = session.getMapper(MessageTaskMapper.class);
return mapper.insert(task, sharding.getTableName());
}
}
/**
* 批量查询(跨分片)
*/
public List<MessageTask> selectByTaskIds(List<String> taskIds) {
// 按分片分组
Map<ShardingInfo, List<String>> shardingGroups = taskIds.stream()
.collect(Collectors.groupingBy(shardingStrategy::calculateSharding));
// 并行查询各分片
return shardingGroups.entrySet().parallelStream()
.flatMap(entry -> {
ShardingInfo sharding = entry.getKey();
List<String> ids = entry.getValue();
try (SqlSession session = getSqlSession(sharding.getDbName())) {
MessageTaskMapper mapper = session.getMapper(MessageTaskMapper.class);
return mapper.selectByTaskIds(ids, sharding.getTableName()).stream();
}
})
.collect(Collectors.toList());
}
/**
* 获取SQL会话
*/
private SqlSession getSqlSession(String dbName) {
SqlSessionFactory factory = sqlSessionFactories.get(dbName);
if (factory == null) {
throw new IllegalArgumentException("未找到数据库配置: " + dbName);
}
return factory.openSession();
}
}
3. 读写分离设计
读写分离的实现:
java
@Configuration
public class DataSourceConfig {
/**
* 主数据源(写)
*/
@Bean
@Primary
public DataSource masterDataSource() {
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:mysql://master-db:3306/message_center");
config.setUsername("root");
config.setPassword("password");
config.setMaximumPoolSize(20);
config.setMinimumIdle(5);
return new HikariDataSource(config);
}
/**
* 从数据源(读)
*/
@Bean
public DataSource slaveDataSource() {
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:mysql://slave-db:3306/message_center");
config.setUsername("readonly");
config.setPassword("password");
config.setMaximumPoolSize(30);
config.setMinimumIdle(10);
return new HikariDataSource(config);
}
/**
* 动态数据源
*/
@Bean
public DataSource dynamicDataSource() {
DynamicDataSource dataSource = new DynamicDataSource();
Map<Object, Object> targetDataSources = new HashMap<>();
targetDataSources.put("master", masterDataSource());
targetDataSources.put("slave", slaveDataSource());
dataSource.setTargetDataSources(targetDataSources);
dataSource.setDefaultTargetDataSource(masterDataSource());
return dataSource;
}
}
/**
* 动态数据源
*/
public class DynamicDataSource extends AbstractRoutingDataSource {
@Override
protected Object determineCurrentLookupKey() {
return DataSourceContextHolder.getDataSourceType();
}
}
/**
* 数据源上下文
*/
public class DataSourceContextHolder {
private static final ThreadLocal<String> CONTEXT_HOLDER = new ThreadLocal<>();
public static void setDataSourceType(String dataSourceType) {
CONTEXT_HOLDER.set(dataSourceType);
}
public static String getDataSourceType() {
return CONTEXT_HOLDER.get();
}
public static void clearDataSourceType() {
CONTEXT_HOLDER.remove();
}
}
/**
* 读写分离注解
*/
@Target({ElementType.METHOD, ElementType.TYPE})
@Retention(RetentionPolicy.RUNTIME)
public @interface DataSource {
String value() default "master";
}
/**
* 读写分离切面
*/
@Aspect
@Component
public class DataSourceAspect {
@Around("@annotation(dataSource)")
public Object around(ProceedingJoinPoint point, DataSource dataSource) throws Throwable {
try {
DataSourceContextHolder.setDataSourceType(dataSource.value());
return point.proceed();
} finally {
DataSourceContextHolder.clearDataSourceType();
}
}
}
/**
* 使用示例
*/
@Service
public class MessageTaskService {
@Autowired
private MessageTaskMapper messageTaskMapper;
/**
* 写操作使用主库
*/
@DataSource("master")
public void saveMessageTask(MessageTask task) {
messageTaskMapper.insert(task);
}
/**
* 读操作使用从库
*/
@DataSource("slave")
public MessageTask getMessageTask(String taskId) {
return messageTaskMapper.selectByTaskId(taskId);
}
/**
* 统计查询使用从库
*/
@DataSource("slave")
public List<MessageTask> getTasksByDateRange(LocalDateTime start, LocalDateTime end) {
return messageTaskMapper.selectByDateRange(start, end);
}
}
🛡️ 高可用设计理念
高可用设计的核心理念
1. 高可用的本质思考
diff
高可用的本质:
系统在面对各种故障时,仍能持续提供服务的能力
核心指标:
- 可用性:99.9%(年停机时间 < 8.76小时)
- 可靠性:消息零丢失
- 可恢复性:故障后快速恢复
- 可扩展性:支持业务增长
2. 高可用设计原则
故障隔离原则:
java
/**
* 熔断器实现
*/
@Component
public class CircuitBreakerManager {
private final Map<String, CircuitBreaker> circuitBreakers = new ConcurrentHashMap<>();
/**
* 获取熔断器
*/
public CircuitBreaker getCircuitBreaker(String serviceName) {
return circuitBreakers.computeIfAbsent(serviceName, name -> {
return CircuitBreaker.ofDefaults(name);
});
}
/**
* 执行带熔断保护的操作
*/
public <T> T executeWithCircuitBreaker(String serviceName, Supplier<T> operation, Supplier<T> fallback) {
CircuitBreaker circuitBreaker = getCircuitBreaker(serviceName);
return circuitBreaker.executeSupplier(operation)
.recover(throwable -> {
log.warn("服务调用失败,执行降级: service={}, error={}", serviceName, throwable.getMessage());
return fallback.get();
});
}
}
/**
* 短信服务熔断示例
*/
@Service
public class SmsService {
@Autowired
private SmsClient smsClient;
@Autowired
private CircuitBreakerManager circuitBreakerManager;
/**
* 发送短信(带熔断保护)
*/
public SendResult sendSms(String phone, String content) {
return circuitBreakerManager.executeWithCircuitBreaker(
"sms-service",
() -> smsClient.sendSms(phone, content),
() -> {
// 降级策略:记录到数据库,稍后重试
saveSmsForRetry(phone, content);
return SendResult.success("已加入重试队列");
}
);
}
private void saveSmsForRetry(String phone, String content) {
// 保存到重试表
SmsRetryRecord record = SmsRetryRecord.builder()
.phone(phone)
.content(content)
.retryCount(0)
.nextRetryTime(LocalDateTime.now().plusMinutes(5))
.build();
smsRetryMapper.insert(record);
}
}
快速恢复原则:
java
/**
* 健康检查服务
*/
@Service
public class HealthCheckService {
@Autowired
private List<HealthIndicator> healthIndicators;
/**
* 执行健康检查
*/
@Scheduled(fixedDelay = 30000) // 每30秒检查一次
public void performHealthCheck() {
for (HealthIndicator indicator : healthIndicators) {
try {
Health health = indicator.health();
if (health.getStatus() == Status.DOWN) {
// 服务异常,触发恢复流程
triggerRecovery(indicator.getName(), health);
}
} catch (Exception e) {
log.error("健康检查失败: indicator={}", indicator.getName(), e);
}
}
}
/**
* 触发恢复流程
*/
private void triggerRecovery(String serviceName, Health health) {
log.warn("检测到服务异常,开始恢复流程: service={}, details={}", serviceName, health.getDetails());
// 1. 发送告警
alertService.sendAlert("服务异常", serviceName + "服务检查失败");
// 2. 尝试自动恢复
autoRecoveryService.attemptRecovery(serviceName);
// 3. 更新服务状态
serviceStatusService.updateStatus(serviceName, ServiceStatus.RECOVERING);
}
}
/**
* 自动恢复服务
*/
@Service
public class AutoRecoveryService {
/**
* 尝试自动恢复
*/
public void attemptRecovery(String serviceName) {
switch (serviceName) {
case "database":
recoverDatabase();
break;
case "redis":
recoverRedis();
break;
case "mq":
recoverMessageQueue();
break;
default:
log.warn("未知服务类型,无法自动恢复: {}", serviceName);
}
}
/**
* 数据库恢复
*/
private void recoverDatabase() {
try {
// 1. 重新初始化连接池
dataSourceManager.reinitializeConnectionPool();
// 2. 执行简单查询测试
jdbcTemplate.queryForObject("SELECT 1", Integer.class);
log.info("数据库恢复成功");
} catch (Exception e) {
log.error("数据库恢复失败", e);
}
}
/**
* Redis恢复
*/
private void recoverRedis() {
try {
// 1. 重新建立连接
redisTemplate.getConnectionFactory().getConnection().ping();
// 2. 清理可能的脏数据
redisTemplate.execute((RedisCallback<Void>) connection -> {
connection.flushDb();
return null;
});
log.info("Redis恢复成功");
} catch (Exception e) {
log.error("Redis恢复失败", e);
}
}
}
优雅降级原则:
java
/**
* 降级策略管理器
*/
@Component
public class DegradationManager {
@Autowired
private DegradationConfigService configService;
/**
* 检查是否需要降级
*/
public boolean shouldDegrade(String serviceName) {
DegradationConfig config = configService.getConfig(serviceName);
if (config == null || !config.isEnabled()) {
return false;
}
// 检查系统负载
SystemMetrics metrics = systemMetricsService.getCurrentMetrics();
return metrics.getCpuUsage() > config.getCpuThreshold() ||
metrics.getMemoryUsage() > config.getMemoryThreshold() ||
metrics.getQueueSize() > config.getQueueThreshold();
}
/**
* 执行降级策略
*/
public <T> T executeWithDegradation(String serviceName, Supplier<T> normalOperation, Supplier<T> degradedOperation) {
if (shouldDegrade(serviceName)) {
log.info("执行降级策略: service={}", serviceName);
return degradedOperation.get();
} else {
return normalOperation.get();
}
}
}
/**
* 消息发送降级示例
*/
@Service
public class MessageSendService {
@Autowired
private DegradationManager degradationManager;
@Autowired
private MessageChannelFactory channelFactory;
/**
* 发送消息(支持降级)
*/
public SendResult sendMessage(MessageTask task) {
return degradationManager.executeWithDegradation(
"message-send",
// 正常策略:使用所有配置的通道
() -> sendWithAllChannels(task),
// 降级策略:只使用主要通道
() -> sendWithPrimaryChannel(task)
);
}
/**
* 使用所有通道发送
*/
private SendResult sendWithAllChannels(MessageTask task) {
List<SendResult> results = new ArrayList<>();
for (String channelType : task.getChannels()) {
MessageChannel channel = channelFactory.getChannel(channelType);
SendResult result = channel.sendMessage(task);
results.add(result);
}
return results.stream().anyMatch(SendResult::isSuccess)
? SendResult.success("至少一个通道发送成功")
: SendResult.failure("所有通道发送失败");
}
/**
* 只使用主要通道发送
*/
private SendResult sendWithPrimaryChannel(MessageTask task) {
String primaryChannel = task.getChannels().get(0); // 取第一个作为主要通道
MessageChannel channel = channelFactory.getChannel(primaryChannel);
return channel.sendMessage(task);
}
}
消息可靠性保证
1. 消息去重机制
去重的设计思考:
markdown
去重场景:
1. 网络重传:网络异常导致的重复请求
2. 业务重试:业务系统的重试机制
3. 系统故障:系统重启后的消息重发
去重策略:
1. 业务层去重:基于业务唯一标识
2. 消息层去重:基于消息ID
3. 存储层去重:数据库唯一约束
去重机制的实现:
java
/**
* 消息去重服务
*/
@Service
public class MessageDeduplicationService {
@Autowired
private RedisTemplate<String, String> redisTemplate;
@Autowired
private MessageTaskMapper messageTaskMapper;
// 去重缓存过期时间(24小时)
private static final long DEDUP_EXPIRE_SECONDS = 24 * 60 * 60;
/**
* 检查消息是否重复
*/
public boolean isDuplicate(String messageId) {
String key = "msg:dedup:" + messageId;
// 1. 先检查Redis缓存
Boolean exists = redisTemplate.hasKey(key);
if (Boolean.TRUE.equals(exists)) {
return true;
}
// 2. 检查数据库
MessageTask existingTask = messageTaskMapper.selectByTaskId(messageId);
if (existingTask != null) {
// 回写到Redis缓存
redisTemplate.opsForValue().set(key, "1", DEDUP_EXPIRE_SECONDS, TimeUnit.SECONDS);
return true;
}
return false;
}
/**
* 标记消息已处理
*/
public void markAsProcessed(String messageId) {
String key = "msg:dedup:" + messageId;
redisTemplate.opsForValue().set(key, "1", DEDUP_EXPIRE_SECONDS, TimeUnit.SECONDS);
}
/**
* 基于业务唯一标识去重
*/
public boolean isDuplicateByBusiness(String businessType, String businessId, String recipient) {
// 构建业务去重键
String businessKey = String.format("%s:%s:%s", businessType, businessId, recipient);
String key = "msg:biz:dedup:" + DigestUtils.md5Hex(businessKey);
// 检查是否存在
Boolean exists = redisTemplate.hasKey(key);
if (Boolean.TRUE.equals(exists)) {
return true;
}
// 设置去重标记(较短的过期时间,如1小时)
redisTemplate.opsForValue().set(key, businessKey, 3600, TimeUnit.SECONDS);
return false;
}
}
/**
* 去重拦截器
*/
@Component
public class DeduplicationInterceptor {
@Autowired
private MessageDeduplicationService deduplicationService;
/**
* 消息发送前的去重检查
*/
public boolean preHandle(MessageRequest request) {
// 1. 基于消息ID去重
if (StringUtils.isNotBlank(request.getMessageId())) {
if (deduplicationService.isDuplicate(request.getMessageId())) {
log.warn("检测到重复消息: messageId={}", request.getMessageId());
return false;
}
}
// 2. 基于业务标识去重
if (StringUtils.isNotBlank(request.getBusinessType()) &&
StringUtils.isNotBlank(request.getBusinessId())) {
if (deduplicationService.isDuplicateByBusiness(
request.getBusinessType(),
request.getBusinessId(),
request.getRecipient())) {
log.warn("检测到重复业务消息: businessType={}, businessId={}, recipient={}",
request.getBusinessType(), request.getBusinessId(), request.getRecipient());
return false;
}
}
return true;
}
}
2. 分布式锁实现
分布式锁的应用场景:
markdown
应用场景:
1. 防止重复处理:同一消息被多个消费者处理
2. 资源竞争:多个实例同时访问共享资源
3. 定时任务:防止定时任务重复执行
4. 限流控制:全局限流计数器
分布式锁的实现:
java
/**
* Redis分布式锁
*/
@Component
public class RedisDistributedLock {
@Autowired
private RedisTemplate<String, String> redisTemplate;
// 锁的默认过期时间(30秒)
private static final long DEFAULT_EXPIRE_TIME = 30;
// 获取锁的超时时间(3秒)
private static final long ACQUIRE_TIMEOUT = 3000;
/**
* 尝试获取锁
*/
public boolean tryLock(String lockKey, String lockValue, long expireTime) {
try {
// 使用SET命令的NX和EX参数实现原子操作
Boolean result = redisTemplate.opsForValue().setIfAbsent(
lockKey, lockValue, expireTime, TimeUnit.SECONDS);
return Boolean.TRUE.equals(result);
} catch (Exception e) {
log.error("获取分布式锁失败: lockKey={}", lockKey, e);
return false;
}
}
/**
* 释放锁
*/
public boolean releaseLock(String lockKey, String lockValue) {
try {
// 使用Lua脚本保证原子性
String luaScript =
"if redis.call('get', KEYS[1]) == ARGV[1] then " +
" return redis.call('del', KEYS[1]) " +
"else " +
" return 0 " +
"end";
Long result = redisTemplate.execute(
new DefaultRedisScript<>(luaScript, Long.class),
Collections.singletonList(lockKey),
lockValue
);
return result != null && result == 1;
} catch (Exception e) {
log.error("释放分布式锁失败: lockKey={}", lockKey, e);
return false;
}
}
/**
* 带超时的锁获取
*/
public boolean lockWithTimeout(String lockKey, String lockValue, long expireTime, long timeout) {
long startTime = System.currentTimeMillis();
while (System.currentTimeMillis() - startTime < timeout) {
if (tryLock(lockKey, lockValue, expireTime)) {
return true;
}
try {
Thread.sleep(50); // 短暂等待后重试
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return false;
}
}
return false;
}
}
/**
* 分布式锁注解
*/
@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface DistributedLock {
/**
* 锁的key,支持SpEL表达式
*/
String key();
/**
* 锁的过期时间(秒)
*/
long expireTime() default 30;
/**
* 获取锁的超时时间(毫秒)
*/
long timeout() default 3000;
/**
* 获取锁失败时的处理策略
*/
LockFailStrategy failStrategy() default LockFailStrategy.EXCEPTION;
}
/**
* 分布式锁切面
*/
@Aspect
@Component
public class DistributedLockAspect {
@Autowired
private RedisDistributedLock distributedLock;
@Autowired
private SpelExpressionParser spelParser;
@Around("@annotation(distributedLockAnnotation)")
public Object around(ProceedingJoinPoint point, DistributedLock distributedLockAnnotation) throws Throwable {
// 解析锁的key
String lockKey = parseLockKey(distributedLockAnnotation.key(), point);
String lockValue = UUID.randomUUID().toString();
// 尝试获取锁
boolean acquired = distributedLock.lockWithTimeout(
lockKey,
lockValue,
distributedLockAnnotation.expireTime(),
distributedLockAnnotation.timeout()
);
if (!acquired) {
// 根据策略处理获取锁失败
return handleLockFailure(distributedLockAnnotation.failStrategy(), point);
}
try {
// 执行业务方法
return point.proceed();
} finally {
// 释放锁
distributedLock.releaseLock(lockKey, lockValue);
}
}
/**
* 解析锁的key
*/
private String parseLockKey(String keyExpression, ProceedingJoinPoint point) {
if (!keyExpression.startsWith("#")) {
return keyExpression;
}
// 构建SpEL上下文
StandardEvaluationContext context = new StandardEvaluationContext();
// 添加方法参数
MethodSignature signature = (MethodSignature) point.getSignature();
String[] paramNames = signature.getParameterNames();
Object[] args = point.getArgs();
for (int i = 0; i < paramNames.length; i++) {
context.setVariable(paramNames[i], args[i]);
}
// 解析表达式
Expression expression = spelParser.parseExpression(keyExpression);
return expression.getValue(context, String.class);
}
}
/**
* 使用示例
*/
@Service
public class MessageProcessService {
/**
* 处理消息(防止重复处理)
*/
@DistributedLock(key = "'msg:process:' + #taskId", expireTime = 60)
public void processMessage(String taskId) {
// 业务逻辑
log.info("开始处理消息: taskId={}", taskId);
// 模拟处理时间
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
log.info("消息处理完成: taskId={}", taskId);
}
/**
* 定时任务(防止重复执行)
*/
@Scheduled(fixedDelay = 60000)
@DistributedLock(key = "'task:cleanup'", expireTime = 300)
public void cleanupExpiredMessages() {
log.info("开始清理过期消息");
// 清理逻辑
messageTaskMapper.deleteExpiredTasks(LocalDateTime.now().minusDays(30));
log.info("过期消息清理完成");
}
}
3. 失败重试机制
重试机制的设计考虑:
markdown
重试策略设计:
1. 重试条件:区分可重试和不可重试的错误
2. 重试次数:避免无限重试,设置最大重试次数
3. 重试间隔:指数退避算法,避免雪崩效应
4. 重试队列:使用延时消息实现重试调度
5. 死信处理:最终失败的消息处理机制
重试机制的完整实现:
java
/**
* 重试策略配置
*/
@Data
@Builder
public class RetryPolicy {
// 最大重试次数
private int maxRetries;
// 初始延时(毫秒)
private long initialDelay;
// 最大延时(毫秒)
private long maxDelay;
// 退避倍数
private double backoffMultiplier;
// 可重试的异常类型
private Set<Class<? extends Exception>> retryableExceptions;
// 不可重试的异常类型
private Set<Class<? extends Exception>> nonRetryableExceptions;
/**
* 默认重试策略
*/
public static RetryPolicy defaultPolicy() {
return RetryPolicy.builder()
.maxRetries(3)
.initialDelay(1000)
.maxDelay(60000)
.backoffMultiplier(2.0)
.retryableExceptions(Set.of(
ConnectException.class,
SocketTimeoutException.class,
ServiceUnavailableException.class
))
.nonRetryableExceptions(Set.of(
IllegalArgumentException.class,
AuthenticationException.class,
ValidationException.class
))
.build();
}
}
/**
* 重试执行器
*/
@Component
public class RetryExecutor {
@Autowired
private MessageProducer messageProducer;
/**
* 执行带重试的操作
*/
public <T> T executeWithRetry(String operationName, Supplier<T> operation, RetryPolicy policy) {
Exception lastException = null;
for (int attempt = 0; attempt <= policy.getMaxRetries(); attempt++) {
try {
return operation.get();
} catch (Exception e) {
lastException = e;
// 检查是否可重试
if (!isRetryable(e, policy)) {
log.warn("异常不可重试,直接失败: operation={}, error={}", operationName, e.getMessage());
throw new RuntimeException("操作失败: " + e.getMessage(), e);
}
// 检查是否还有重试机会
if (attempt >= policy.getMaxRetries()) {
log.error("重试次数耗尽,操作最终失败: operation={}, attempts={}", operationName, attempt + 1);
break;
}
// 计算延时时间
long delay = calculateDelay(attempt, policy);
log.warn("操作失败,准备重试: operation={}, attempt={}, delay={}ms, error={}",
operationName, attempt + 1, delay, e.getMessage());
// 等待后重试
try {
Thread.sleep(delay);
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
throw new RuntimeException("重试被中断", ie);
}
}
}
throw new RuntimeException("操作最终失败: " + lastException.getMessage(), lastException);
}
/**
* 异步重试(使用延时消息)
*/
public void scheduleAsyncRetry(String taskId, RetryContext context) {
if (context.getAttempt() >= context.getPolicy().getMaxRetries()) {
log.error("异步重试次数耗尽: taskId={}, attempts={}", taskId, context.getAttempt());
handleFinalFailure(taskId, context);
return;
}
// 计算延时
long delay = calculateDelay(context.getAttempt(), context.getPolicy());
// 更新重试上下文
context.setAttempt(context.getAttempt() + 1);
context.setNextRetryTime(LocalDateTime.now().plusSeconds(delay / 1000));
// 发送延时消息
RetryMessage retryMessage = RetryMessage.builder()
.taskId(taskId)
.context(context)
.build();
messageProducer.sendDelayMessage("retry-topic", retryMessage, delay);
log.info("安排异步重试: taskId={}, attempt={}, delay={}ms",
taskId, context.getAttempt(), delay);
}
/**
* 判断异常是否可重试
*/
private boolean isRetryable(Exception e, RetryPolicy policy) {
// 检查不可重试异常
for (Class<? extends Exception> nonRetryable : policy.getNonRetryableExceptions()) {
if (nonRetryable.isInstance(e)) {
return false;
}
}
// 检查可重试异常
for (Class<? extends Exception> retryable : policy.getRetryableExceptions()) {
if (retryable.isInstance(e)) {
return true;
}
}
// 默认可重试
return true;
}
/**
* 计算延时时间(指数退避)
*/
private long calculateDelay(int attempt, RetryPolicy policy) {
long delay = (long) (policy.getInitialDelay() * Math.pow(policy.getBackoffMultiplier(), attempt));
return Math.min(delay, policy.getMaxDelay());
}
/**
* 处理最终失败
*/
private void handleFinalFailure(String taskId, RetryContext context) {
// 1. 更新任务状态
messageTaskService.updateStatus(taskId, MessageStatus.FAILED);
// 2. 发送到死信队列
deadLetterService.sendToDeadLetter(taskId, context);
// 3. 发送告警
alertService.sendAlert("消息处理最终失败", "任务ID: " + taskId);
}
}
/**
* 重试消息消费者
*/
@Component
@RocketMQMessageListener(
topic = "retry-topic",
consumerGroup = "retry-consumer-group"
)
public class RetryMessageConsumer implements RocketMQListener<RetryMessage> {
@Autowired
private MessageProcessService messageProcessService;
@Autowired
private RetryExecutor retryExecutor;
@Override
public void onMessage(RetryMessage retryMessage) {
String taskId = retryMessage.getTaskId();
RetryContext context = retryMessage.getContext();
try {
// 重新处理消息
messageProcessService.processMessage(taskId);
log.info("重试处理成功: taskId={}, attempt={}", taskId, context.getAttempt());
} catch (Exception e) {
log.error("重试处理失败: taskId={}, attempt={}", taskId, context.getAttempt(), e);
// 安排下次重试
retryExecutor.scheduleAsyncRetry(taskId, context);
}
}
}
📊 数据备份与恢复
备份策略设计
备份策略的考虑因素:
markdown
备份策略设计:
1. 备份频率:
- 全量备份:每周一次
- 增量备份:每天一次
- 实时备份:重要数据实时同步
2. 备份范围:
- 业务数据:消息任务、用户配置、发送记录
- 配置数据:系统配置、模板数据、路由规则
- 日志数据:操作日志、错误日志、审计日志
3. 备份存储:
- 本地备份:快速恢复
- 远程备份:灾难恢复
- 云端备份:长期保存
备份服务的实现:
java
/**
* 数据备份服务
*/
@Service
public class DataBackupService {
@Autowired
private MessageTaskMapper messageTaskMapper;
@Autowired
private MessageHistoryService historyService;
@Autowired
private BackupStorageService storageService;
/**
* 执行全量备份
*/
@Scheduled(cron = "0 0 2 ? * SUN") // 每周日凌晨2点
public void performFullBackup() {
log.info("开始执行全量备份");
try {
String backupId = generateBackupId("FULL");
// 1. 备份消息任务数据
backupMessageTasks(backupId);
// 2. 备份历史数据
backupHistoryData(backupId);
// 3. 备份配置数据
backupConfigData(backupId);
// 4. 生成备份清单
generateBackupManifest(backupId);
log.info("全量备份完成: backupId={}", backupId);
} catch (Exception e) {
log.error("全量备份失败", e);
alertService.sendAlert("备份失败", "全量备份执行失败: " + e.getMessage());
}
}
/**
* 执行增量备份
*/
@Scheduled(cron = "0 0 3 * * ?") // 每天凌晨3点
public void performIncrementalBackup() {
log.info("开始执行增量备份");
try {
String backupId = generateBackupId("INCR");
LocalDateTime lastBackupTime = getLastBackupTime();
// 1. 备份增量消息任务
backupIncrementalMessageTasks(backupId, lastBackupTime);
// 2. 备份增量历史数据
backupIncrementalHistoryData(backupId, lastBackupTime);
// 3. 更新备份时间戳
updateLastBackupTime();
log.info("增量备份完成: backupId={}", backupId);
} catch (Exception e) {
log.error("增量备份失败", e);
alertService.sendAlert("备份失败", "增量备份执行失败: " + e.getMessage());
}
}
/**
* 备份消息任务数据
*/
private void backupMessageTasks(String backupId) {
int pageSize = 10000;
int offset = 0;
int totalCount = 0;
String fileName = String.format("message_tasks_%s.json", backupId);
try (FileWriter writer = new FileWriter(fileName)) {
writer.write("[\n");
boolean first = true;
while (true) {
List<MessageTask> tasks = messageTaskMapper.selectForBackup(offset, pageSize);
if (tasks.isEmpty()) {
break;
}
for (MessageTask task : tasks) {
if (!first) {
writer.write(",\n");
}
writer.write(JsonUtil.toJson(task));
first = false;
totalCount++;
}
offset += pageSize;
}
writer.write("\n]");
}
// 上传到备份存储
storageService.uploadBackupFile(backupId, fileName);
log.info("消息任务备份完成: count={}, file={}", totalCount, fileName);
}
/**
* 备份增量数据
*/
private void backupIncrementalMessageTasks(String backupId, LocalDateTime lastBackupTime) {
List<MessageTask> tasks = messageTaskMapper.selectByUpdateTimeAfter(lastBackupTime);
if (!tasks.isEmpty()) {
String fileName = String.format("message_tasks_incr_%s.json", backupId);
try (FileWriter writer = new FileWriter(fileName)) {
writer.write(JsonUtil.toJson(tasks));
}
storageService.uploadBackupFile(backupId, fileName);
log.info("增量消息任务备份完成: count={}, file={}", tasks.size(), fileName);
}
}
/**
* 生成备份ID
*/
private String generateBackupId(String type) {
return String.format("%s_%s_%s",
type,
LocalDateTime.now().format(DateTimeFormatter.ofPattern("yyyyMMdd_HHmmss")),
UUID.randomUUID().toString().substring(0, 8)
);
}
}
/**
* 备份存储服务
*/
@Service
public class BackupStorageService {
@Value("${backup.local.path:/data/backup}")
private String localBackupPath;
@Value("${backup.remote.enabled:true}")
private boolean remoteBackupEnabled;
@Autowired
private OssClient ossClient;
/**
* 上传备份文件
*/
public void uploadBackupFile(String backupId, String fileName) {
File localFile = new File(fileName);
try {
// 1. 移动到本地备份目录
File backupDir = new File(localBackupPath, backupId);
backupDir.mkdirs();
File targetFile = new File(backupDir, fileName);
Files.move(localFile.toPath(), targetFile.toPath(), StandardCopyOption.REPLACE_EXISTING);
// 2. 压缩文件
File compressedFile = compressFile(targetFile);
// 3. 上传到远程存储
if (remoteBackupEnabled) {
uploadToRemoteStorage(backupId, compressedFile);
}
log.info("备份文件上传完成: backupId={}, file={}", backupId, fileName);
} catch (Exception e) {
log.error("备份文件上传失败: backupId={}, file={}", backupId, fileName, e);
throw new RuntimeException("备份上传失败", e);
}
}
/**
* 压缩文件
*/
private File compressFile(File sourceFile) throws IOException {
File compressedFile = new File(sourceFile.getParent(), sourceFile.getName() + ".gz");
try (FileInputStream fis = new FileInputStream(sourceFile);
FileOutputStream fos = new FileOutputStream(compressedFile);
GZIPOutputStream gzos = new GZIPOutputStream(fos)) {
byte[] buffer = new byte[8192];
int len;
while ((len = fis.read(buffer)) != -1) {
gzos.write(buffer, 0, len);
}
}
// 删除原文件
sourceFile.delete();
return compressedFile;
}
/**
* 上传到远程存储
*/
private void uploadToRemoteStorage(String backupId, File file) {
try {
String objectKey = String.format("backup/%s/%s", backupId, file.getName());
ossClient.putObject("message-center-backup", objectKey, file);
log.info("文件上传到远程存储成功: objectKey={}", objectKey);
} catch (Exception e) {
log.error("文件上传到远程存储失败: file={}", file.getName(), e);
throw new RuntimeException("远程存储上传失败", e);
}
}
}
/**
* 数据恢复服务
*/
@Service
public class DataRecoveryService {
@Autowired
private BackupStorageService storageService;
@Autowired
private MessageTaskMapper messageTaskMapper;
/**
* 恢复数据
*/
public void recoverData(String backupId, RecoveryOptions options) {
log.info("开始数据恢复: backupId={}", backupId);
try {
// 1. 下载备份文件
List<File> backupFiles = downloadBackupFiles(backupId);
// 2. 验证备份完整性
validateBackupIntegrity(backupFiles);
// 3. 恢复数据
if (options.isRecoverMessageTasks()) {
recoverMessageTasks(backupFiles);
}
if (options.isRecoverHistoryData()) {
recoverHistoryData(backupFiles);
}
if (options.isRecoverConfigData()) {
recoverConfigData(backupFiles);
}
log.info("数据恢复完成: backupId={}", backupId);
} catch (Exception e) {
log.error("数据恢复失败: backupId={}", backupId, e);
throw new RuntimeException("数据恢复失败", e);
}
}
/**
* 恢复消息任务数据
*/
private void recoverMessageTasks(List<File> backupFiles) {
File taskFile = findBackupFile(backupFiles, "message_tasks_");
if (taskFile != null) {
try (FileReader reader = new FileReader(taskFile)) {
List<MessageTask> tasks = JsonUtil.fromJson(reader,
new TypeReference<List<MessageTask>>() {});
// 批量插入
int batchSize = 1000;
for (int i = 0; i < tasks.size(); i += batchSize) {
int end = Math.min(i + batchSize, tasks.size());
List<MessageTask> batch = tasks.subList(i, end);
messageTaskMapper.batchInsert(batch);
}
log.info("消息任务数据恢复完成: count={}", tasks.size());
} catch (Exception e) {
log.error("消息任务数据恢复失败", e);
throw new RuntimeException("消息任务恢复失败", e);
}
}
}
/**
* 查找备份文件
*/
private File findBackupFile(List<File> files, String prefix) {
return files.stream()
.filter(file -> file.getName().startsWith(prefix))
.findFirst()
.orElse(null);
}
}
🔄 数据一致性保证
分布式事务处理
分布式事务的挑战:
markdown
一致性挑战:
1. 跨数据库事务:MySQL + MongoDB + Redis
2. 跨服务事务:消息中心 + 业务系统
3. 异步处理:消息队列的最终一致性
4. 网络分区:CAP定理的权衡
最终一致性的实现:
java
/**
* 分布式事务管理器
*/
@Service
public class DistributedTransactionManager {
@Autowired
private TransactionLogService transactionLogService;
@Autowired
private CompensationService compensationService;
/**
* 执行分布式事务
*/
@Transactional
public void executeDistributedTransaction(DistributedTransactionContext context) {
String transactionId = UUID.randomUUID().toString();
try {
// 1. 记录事务开始
transactionLogService.logTransactionStart(transactionId, context);
// 2. 执行各个步骤
for (TransactionStep step : context.getSteps()) {
executeStep(transactionId, step);
}
// 3. 记录事务成功
transactionLogService.logTransactionSuccess(transactionId);
} catch (Exception e) {
log.error("分布式事务执行失败: transactionId={}", transactionId, e);
// 4. 执行补偿操作
compensationService.compensate(transactionId, context);
// 5. 记录事务失败
transactionLogService.logTransactionFailure(transactionId, e.getMessage());
throw new RuntimeException("分布式事务失败", e);
}
}
/**
* 执行事务步骤
*/
private void executeStep(String transactionId, TransactionStep step) {
try {
// 记录步骤开始
transactionLogService.logStepStart(transactionId, step.getStepId());
// 执行步骤
step.execute();
// 记录步骤成功
transactionLogService.logStepSuccess(transactionId, step.getStepId());
} catch (Exception e) {
// 记录步骤失败
transactionLogService.logStepFailure(transactionId, step.getStepId(), e.getMessage());
throw e;
}
}
}
📈 性能监控与优化
存储性能监控
关键性能指标:
markdown
监控指标:
1. 数据库性能:
- QPS/TPS
- 响应时间
- 连接池使用率
- 慢查询统计
2. 缓存性能:
- 命中率
- 内存使用率
- 网络延迟
- 热点数据分布
3. 存储容量:
- 磁盘使用率
- 数据增长趋势
- 分区均衡度
📝 本篇总结
在本篇《存储与可靠性篇》中,我们深入探讨了消息中心的数据存储架构和高可用保障机制:
🎯 核心内容回顾
-
数据存储架构:
- 冷热数据分离策略,优化存储成本和查询性能
- 分库分表设计,支持海量数据的水平扩展
- 读写分离实现,提升系统并发处理能力
-
高可用设计理念:
- 故障隔离、快速恢复、优雅降级的设计原则
- 熔断器、健康检查、自动恢复的实现机制
- 多层防护体系,确保系统稳定运行
-
消息可靠性保证:
- 消息去重机制,防止重复处理
- 分布式锁实现,保证操作原子性
- 失败重试机制,提高处理成功率
-
数据备份与恢复:
- 全量备份和增量备份策略
- 多层次存储,本地+远程+云端
- 完整的恢复流程和验证机制
🚀 下期预告
下一篇《运维与扩展篇》将重点介绍:
- 监控告警体系的建设
- 性能调优的实践经验
- 系统扩展性的设计考虑
- 运维自动化的实现方案
通过本系列文章的学习,相信你能够构建出一个高性能、高可用、易扩展的企业级消息中心系统。
系列文章索引:
关注我,获取更多企业级系统设计经验分享! 🎯