Spring Batch 详细介绍
Spring Batch 是一个轻量级、全面的批处理框架 ,专为开发企业级批处理应用 而设计。它基于 Spring 框架构建,提供了处理大规模数据集的可重用功能 ,包括事务管理、作业统计、重启、跳过和资源管理等,是 Java 生态中批处理任务的事实标准。
一、核心定位与价值主张
1. 解决的痛点
- 批处理复杂性:无需从零构建作业调度、事务、重启机制
- 数据量大:支持从百万到亿级数据的高效处理
- 可靠性要求:提供完善的容错、重试、跳过策略
- 监控需求:内置作业执行跟踪和统计
- 资源管理:避免内存溢出,支持分页、游标等机制
2. 适用场景
✅ ETL 任务 :数据抽取、转换、加载
✅ 对账处理 :银行日终对账、账单生成
✅ 报表生成 :月度/季度报表批量计算
✅ 数据同步 :跨系统数据同步
✅ 索引重建 :全文检索索引批量更新
✅ 归档清理:历史数据归档、日志清理
不适用场景 :
❌ 实时/近实时处理 → 应使用消息队列+流处理
❌ 交互式任务 → 用户触发的同步操作
❌ 微服务 API → 应使用 Spring MVC/WebFlux
二、核心概念与架构
1. 领域模型
┌────────────────────────────────────────────────────────┐
│ Job(作业) │
│ - 批处理的最高层级,由多个 Step 组成 │
│ - 有唯一名称和实例标识 │
└────────────────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────┐
│ Step(步骤) │
│ - 作业的基本执行单元,包含读、处理、写 │
│ - 两种类型:TaskletStep(简单任务)和 ChunkStep(分块)│
└────────────────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────┐
│ Chunk(分块) │
│ - 数据处理的批次单位,默认 100 条 │
│ - 事务边界:一个 Chunk 一个事务 │
└────────────────────────────────────────────────────────┘
2. 核心组件类图
java
Job
↓
JobInstance(作业实例:唯一业务标识)
↓
JobExecution(作业执行:一次运行实例,含状态、参数)
↓
StepExecution(步骤执行:Step 的运行时状态)
↓
ExecutionContext(执行上下文:跨 Step/重启的持久化数据)
三、核心组件详解
1. ItemReader(数据读取)
从各种数据源读取数据,按条返回。
常用实现:
java
// JDBC 游标读取(适合大数据量)
@Bean
public JdbcCursorItemReader<User> userReader(DataSource dataSource) {
return new JdbcCursorItemReaderBuilder<User>()
.name("userReader")
.dataSource(dataSource)
.sql("SELECT id, name, email FROM users WHERE status = 'PENDING'")
.rowMapper((rs, rowNum) -> User.builder()
.id(rs.getLong("id"))
.name(rs.getString("name"))
.email(rs.getString("email"))
.build())
.build();
}
// 分页读取(避免内存溢出)
@Bean
public JdbcPagingItemReader<Product> productReader(DataSource dataSource) {
return new JdbcPagingItemReaderBuilder<Product>()
.name("productReader")
.dataSource(dataSource)
.queryProvider(new SqlPagingQueryProviderFactoryBean() {{
setSelectClause("SELECT id, name, price");
setFromClause("FROM products");
setWhereClause("WHERE stock < 100");
setSortKey("id");
}})
.pageSize(1000)
.rowMapper(new ProductRowMapper())
.build();
}
// FlatFile 读取
@Bean
public FlatFileItemReader<Order> orderReader() {
return new FlatFileItemReaderBuilder<Order>()
.name("orderReader")
.resource(new ClassPathResource("orders.csv"))
.linesToSkip(1) // 跳过标题行
.delimited()
.names("orderId", "customerId", "amount")
.fieldSetMapper(fieldSet -> Order.builder()
.orderId(fieldSet.readLong("orderId"))
.customerId(fieldSet.readString("customerId"))
.amount(fieldSet.readBigDecimal("amount"))
.build())
.build();
}
2. ItemProcessor(数据处理)
对读取的数据进行转换、验证、过滤。
java
@Component
public class UserProcessor implements ItemProcessor<User, EnrichedUser> {
@Autowired
private ExternalApiClient apiClient;
@Override
public EnrichedUser process(User user) throws Exception {
// 1. 数据增强:调用外部 API 获取额外信息
UserProfile profile = apiClient.getProfile(user.getId());
// 2. 数据验证:不符合条件返回 null(跳过)
if (!profile.isActive()) {
log.warn("用户 {} 不活跃,跳过", user.getId());
return null;
}
// 3. 数据转换:POJO → DTO
return EnrichedUser.builder()
.id(user.getId())
.name(user.getName())
.email(user.getEmail())
.phone(profile.getPhone())
.loyaltyLevel(profile.getLoyaltyLevel())
.build();
}
}
3. ItemWriter(数据写入)
批量写入处理后的数据。
java
// JDBC 批量写入
@Bean
public JdbcBatchItemWriter<EnrichedUser> userWriter(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<EnrichedUser>()
.itemSqlParameterSourceProvider(user -> new MapSqlParameterSource()
.addValue("id", user.getId())
.addValue("name", user.getName())
.addValue("email", user.getEmail())
.addValue("phone", user.getPhone())
.addValue("loyaltyLevel", user.getLoyaltyLevel().name()))
.sql("INSERT INTO enriched_users (id, name, email, phone, loyalty_level) " +
"VALUES (:id, :name, :email, :phone, :loyaltyLevel)")
.dataSource(dataSource)
.build();
}
// 复合写入(写多个目的地)
@Bean
public CompositeItemWriter<Report> compositeWriter(
JdbcBatchItemWriter<Report> dbWriter,
FlatFileItemWriter<Report> fileWriter
) {
List<ItemWriter<? super Report>> writers = new ArrayList<>();
writers.add(dbWriter);
writers.add(fileWriter);
CompositeItemWriter<Report> compositeWriter = new CompositeItemWriter<>();
compositeWriter.setDelegates(writers);
return compositeWriter;
}
四、作业配置三种方式
1. Java Config(现代推荐)
java
@Configuration
@EnableBatchProcessing // 启用批处理支持
public class BatchConfig {
@Autowired
private JobBuilderFactory jobBuilderFactory;
@Autowired
private StepBuilderFactory stepBuilderFactory;
@Bean
public Job importUserJob(
JobCompletionNotificationListener listener,
Step importUserStep
) {
return jobBuilderFactory.get("importUserJob")
.incrementer(new RunIdIncrementer()) // 每次运行参数+1
.listener(listener) // 作业生命周期监听
.start(importUserStep)
.build();
}
@Bean
public Step importUserStep(
ItemReader<User> reader,
ItemProcessor<User, EnrichedUser> processor,
ItemWriter<EnrichedUser> writer,
PlatformTransactionManager transactionManager
) {
return stepBuilderFactory.get("importUserStep")
.<User, EnrichedUser>chunk(100) // 分块大小
.reader(reader)
.processor(processor)
.writer(writer)
.transactionManager(transactionManager)
.faultTolerant() // 启用容错
.skip(FlatFileParseException.class) // 跳过解析异常
.skipLimit(10) // 最多跳过 10 条
.retry(DeadlockLoserDataAccessException.class) // 重试死锁
.retryLimit(3)
.taskExecutor(taskExecutor()) // 多线程执行
.throttleLimit(5) // 并发线程数
.build();
}
@Bean
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(5);
executor.setMaxPoolSize(10);
return executor;
}
}
2. 注解方式(简化)
java
@SpringBootApplication
@EnableBatchProcessing
public class AnnotationBatchApp {
@Autowired
private JobRepository jobRepository;
@Autowired
private PlatformTransactionManager transactionManager;
public static void main(String[] args) {
SpringApplication.run(AnnotationBatchApp.class, args);
}
@Bean
public Job myJob(JobBuilderFactory jobs, Step myStep) {
return jobs.get("myJob")
.start(myStep)
.build();
}
@Bean
public Step myStep(StepBuilderFactory steps,
DataSource dataSource) {
return steps.get("myStep")
.<User, EnrichedUser>chunk(50)
.reader(reader(dataSource))
.processor(processor())
.writer(writer(dataSource))
.transactionManager(transactionManager)
.build();
}
}
3. XML 方式(传统)
xml
<job id="importUserJob" xmlns="http://www.springframework.org/schema/batch">
<step id="importUserStep">
<tasklet>
<chunk reader="userReader"
processor="userProcessor"
writer="userWriter"
commit-interval="100"/>
</tasklet>
</step>
</job>
五、作业启动与调度
1. 手动触发
java
@RestController
public class JobController {
@Autowired
private JobLauncher jobLauncher;
@Autowired
private Job importUserJob;
@PostMapping("/jobs/import-users")
public ResponseEntity<String> startJob(@RequestBody JobParams params) {
JobParameters jobParameters = new JobParametersBuilder()
.addLong("timestamp", System.currentTimeMillis()) // 唯一标识
.addString("input.file", params.getFilePath())
.addLong("chunk.size", params.getChunkSize())
.toJobParameters();
try {
JobExecution execution = jobLauncher.run(importUserJob, jobParameters);
return ResponseEntity.ok("作业启动成功,ID: " + execution.getId());
} catch (JobExecutionAlreadyRunningException e) {
return ResponseEntity.status(HttpStatus.CONFLICT).body("作业已在运行");
}
}
}
2. 定时调度(Spring Scheduler)
java
@Component
public class ScheduledJobLauncher {
@Autowired
private JobLauncher jobLauncher;
@Autowired
private Job reportJob;
// 每天凌晨 2:00 执行
@Scheduled(cron = "0 0 2 * * ?")
public void runDailyReport() throws Exception {
JobParameters params = new JobParametersBuilder()
.addString("date", LocalDate.now().toString())
.toJobParameters();
jobLauncher.run(reportJob, params);
}
}
3. 集成 Quartz(企业级调度)
java
@Configuration
public class QuartzConfig {
@Bean
public JobDetail batchJobDetail() {
return JobBuilder.newJob(SpringBatchJob.class)
.withIdentity("batchJob")
.storeDurably()
.build();
}
@Bean
public Trigger batchJobTrigger() {
return TriggerBuilder.newTrigger()
.forJob(batchJobDetail())
.withSchedule(CronScheduleBuilder.cronSchedule("0 0/30 * * * ?"))
.build();
}
}
六、高级特性
1. 并行处理(Multi-threaded Step)
java
@Bean
public Step parallelStep() {
return stepBuilderFactory.get("parallelStep")
.<User, EnrichedUser>chunk(100)
.reader(reader())
.processor(processor())
.writer(writer())
.taskExecutor(new SimpleAsyncTaskExecutor()) // 简单异步执行
.throttleLimit(10) // 最大并发数
.build();
}
2. 并行 Steps(Split Flow)
java
@Bean
public Job parallelJob() {
return jobBuilderFactory.get("parallelJob")
.start(step1())
.split(new SimpleAsyncTaskExecutor())
.add(
new FlowBuilder<SimpleFlow>("flow1").start(step2()).build(),
new FlowBuilder<SimpleFlow>("flow2").start(step3()).build()
)
.next(step4())
.build()
.build();
}
3. 远程分区(Remote Partitioning)
java
// Master 配置
@Bean
public Step masterStep() {
return stepBuilderFactory.get("masterStep")
.partitioner("workerStep", partitioner())
.gridSize(5) // 分区数
.taskExecutor(new SimpleAsyncTaskExecutor())
.build();
}
// Worker 配置(独立应用)
@StepScope
@Bean
public Task workerTask() {
return new TaskBuilder("workerTask")
.chunk(100)
.reader(itemReader())
.writer(itemWriter())
.build();
}
4. 跳过与重试策略
java
@Bean
public Step faultTolerantStep() {
return stepBuilderFactory.get("faultTolerantStep")
.<User, EnrichedUser>chunk(100)
.reader(reader())
.writer(writer())
.faultTolerant()
// 重试策略
.retry(DeadlockLoserDataAccessException.class)
.retryLimit(3)
.backOffPolicy(new ExponentialBackOffPolicy())
// 跳过策略
.skip(SkippableException.class)
.skipLimit(10)
.noSkip(FatalException.class) // 致命错误不跳过
// 监听器
.listener(new CustomSkipListener())
.build();
}
@Component
public class CustomSkipListener implements SkipListener<User, EnrichedUser> {
@Override
public void onSkipInRead(Throwable t) {
log.error("读取时跳过: {}", t.getMessage());
}
@Override
public void onSkipInProcess(User item, Throwable t) {
log.error("处理 {} 时跳过: {}", item.getId(), t.getMessage());
}
@Override
public void onSkipInWrite(EnrichedUser item, Throwable t) {
log.error("写入 {} 时跳过: {}", item.getId(), t.getMessage());
}
}
5. 作业重启与状态管理
java
// 支持重启的作业
@Bean
public Job restartableJob() {
return jobBuilderFactory.get("restartableJob")
.preventRestart() // 禁止重启
// .restartable() // 允许重启(默认)
.start(step1())
.build();
}
// 从上次失败点重启
// 命令行:java -jar batch-app.jar --spring.batch.job.name=importUserJob
6. 执行上下文(ExecutionContext)
java
@Component
@StepScope
public class StatefulProcessor implements ItemProcessor<User, EnrichedUser> {
@Value("#{stepExecution.stepName}")
private String stepName;
@Value("#{stepExecution.jobExecution.executionContext}")
private ExecutionContext executionContext;
@Override
public EnrichedUser process(User user) throws Exception {
// 从上下文获取状态
int processedCount = executionContext.getInt("processedCount", 0);
// 处理逻辑...
// 更新上下文(跨 Step 共享)
executionContext.putInt("processedCount", processedCount + 1);
return enrichedUser;
}
}
七、监控与管理
1. JobExplorer(查询执行历史)
java
@Service
public class JobMonitoringService {
@Autowired
private JobExplorer jobExplorer;
public List<String> getFailedJobs() {
return jobExplorer.getJobNames().stream()
.flatMap(name -> jobExplorer.getJobInstances(name, 0, 100).stream())
.map(jobExplorer::getJobExecutions)
.flatMap(List::stream)
.filter(exec -> exec.getStatus() == BatchStatus.FAILED)
.map(exec -> String.format("%s: %s",
exec.getJobInstance().getJobName(), exec.getExitStatus()))
.collect(Collectors.toList());
}
}
2. JobOperator(管理作业)
java
@RestController
public class JobManagementController {
@Autowired
private JobOperator jobOperator;
// 停止作业
@PostMapping("/jobs/{executionId}/stop")
public String stopJob(@PathVariable Long executionId) throws Exception {
jobOperator.stop(executionId);
return "停止命令已发送";
}
// 重启作业
@PostMapping("/jobs/{executionId}/restart")
public String restartJob(@PathVariable Long executionId) throws Exception {
Long newExecutionId = jobOperator.restart(executionId);
return "重启成功,新执行ID: " + newExecutionId;
}
// 查看摘要
@GetMapping("/jobs/{executionId}/summary")
public String getSummary(@PathVariable Long executionId) throws Exception {
return jobOperator.getSummary(executionId);
}
}
3. Spring Batch Admin(已归档)
现代替代方案:Spring Cloud Data Flow 或 自定义监控界面
八、与 Spring Boot 深度集成
1. 自动配置
java
@SpringBootApplication
@EnableBatchProcessing // 提供 JobBuilderFactory、StepBuilderFactory
public class BatchApplication {
public static void main(String[] args) {
SpringApplication.run(BatchApplication.class, args);
}
}
2. application.yml 配置
yaml
spring:
batch:
job:
enabled: false # 启动时不自动执行作业
initialize-schema: always # 自动创建 BATCH_* 表
schema: classpath:org/springframework/batch/core/schema-mysql.sql
datasource:
url: jdbc:mysql://localhost:3306/batch_db
username: batch_user
password: batch_pass
isolation-level-for-create: SERIALIZABLE
3. 生产环境配置
yaml
# 生产环境优化
spring:
batch:
jdbc:
initialize-schema: never # 手动创建表
job:
enabled: false
# 使用独立数据源避免事务冲突
datasource:
url: jdbc:mysql://localhost:3306/batch_metadata
hikari:
maximum-pool-size: 10
# 业务数据源
app:
datasource:
url: jdbc:mysql://localhost:3306/business_db
九、性能优化最佳实践
1. Chunk 大小调优
java
// 根据数据特征调整
.chunk(processingTime < 10ms ? 1000 : 100) // 快速处理可增大
2. 使用分页而非游标(大数据量)
java
// 游标在内存中保持连接,适合 < 100 万条
// 分页每次查询子集,适合超大数据量
3. 避免 JPA 缓存污染
java
// 批处理中禁用 L2 缓存
@Bean
public LocalContainerEntityManagerFactoryBean batchEntityManagerFactory(
DataSource dataSource
) {
LocalContainerEntityManagerFactoryBean em = new LocalContainerEntityManagerFactoryBean();
em.setDataSource(dataSource);
em.setPackagesToScan("com.example.domain");
Properties props = new Properties();
props.put("hibernate.cache.use_second_level_cache", "false");
props.put("hibernate.jdbc.batch_size", "50");
em.setJpaProperties(props);
return em;
}
4. 多线程分区
java
@Bean
public Step partitionedStep(
Partitioner partitioner,
Step workerStep
) {
return stepBuilderFactory.get("partitionedStep")
.partitioner("workerStep", partitioner)
.step(workerStep)
.gridSize(Runtime.getRuntime().availableProcessors() * 2)
.taskExecutor(new SimpleAsyncTaskExecutor())
.build();
}
5. 内存管理
java
// 使用游标读取避免内存溢出
jdbcCursorItemReader.setFetchSize(1000);
// 定期清理上下文
stepExecution.getExecutionContext().remove("largeDataList");
十、典型应用示例
场景:每日订单对账
java
@Configuration
public class ReconciliationBatchConfig {
@Bean
public Job dailyReconciliationJob(
Step extractOrdersStep,
Step reconcileStep,
Step generateReportStep
) {
return jobBuilderFactory.get("dailyReconciliationJob")
.incrementer(new RunIdIncrementer())
.listener(new JobResultListener())
.flow(extractOrdersStep)
.next(reconcileStep)
.next(generateReportStep)
.end()
.build();
}
@Bean
public Step extractOrdersStep(
JdbcCursorItemReader<Order> orderReader,
JdbcBatchItemWriter<Order> orderWriter
) {
return stepBuilderFactory.get("extractOrdersStep")
.<Order, Order>chunk(500)
.reader(orderReader)
.writer(orderWriter)
.listener(new OrderReadListener())
.build();
}
@Bean
public Step reconcileStep(
JdbcCursorItemReader<Transaction> transactionReader,
ItemProcessor<Transaction, Reconciliation> reconciler,
JdbcBatchItemWriter<Reconciliation> writer
) {
return stepBuilderFactory.get("reconcileStep")
.<Transaction, Reconciliation>chunk(200)
.reader(transactionReader)
.processor(reconciler)
.writer(writer)
.faultTolerant()
.skip(ReconciliationException.class)
.skipLimit(50)
.build();
}
@Bean
public Step generateReportStep(
FlatFileItemWriter<Reconciliation> reportWriter
) {
return stepBuilderFactory.get("generateReportStep")
.tasklet((contribution, chunkContext) -> {
// 生成对账报告文件
reportWriter.write(
fetchReconciliations(chunkContext.getStepContext()
.getStepExecution().getJobExecutionId())
);
return RepeatStatus.FINISHED;
})
.build();
}
}
十一、总结
Spring Batch vs. 其他方案
| 特性 | Spring Batch | Quartz | Spring Cloud Task | Flink/Spark |
|---|---|---|---|---|
| 定位 | 大数据量批处理 | 定时调度 | 微任务编排 | 流/批计算框架 |
| 事务管理 | 完善 | 无 | 基础 | 复杂 |
| 重启恢复 | 内置 | 无 | 简单 | 支持 |
| 学习曲线 | 中等 | 简单 | 简单 | 陡峭 |
| 适用规模 | 百万→亿级 | 不限 | 小型任务 | TB/PB 级 |
| 生态集成 | Spring 无缝 | 独立 | Spring Cloud | 独立 |
设计哲学
- 声明式配置:通过 Java Config 定义作业流程
- 分块处理:小批量提交,内存友好
- 状态持久化:元数据驱动,支持断点续传
- 透明化监控:执行历史、统计信息自动记录
- 可扩展架构:Reader/Processor/Writer 可自由组合
现代实践
✅ Spring Boot 3.x + Java 17 :利用 GraalVM 原生编译加速启动
✅ Spring Cloud Data Flow :可视化编排批处理工作流
✅ K8s Job/CronJob :利用容器编排调度批处理
✅ Istio 集成:Sidecar 提供可观测性和安全通信
Spring Batch 凭借其健壮性 和企业级特性 ,在 Java 批处理领域仍是不可动摇的基石 ,是处理核心业务批处理的首选框架。