partition配置skipPolicy交易重试问题详解

1.问题现象

txt 复制代码

在使用springBatch的partition时，如果配置了skip-policy后，在写入数据时发现如果交易抛出异常，会自动再另开事务重试一遍。

2.问题根本原因

txt 复制代码

1.在执行tasklet的时候，配置跳过策略，和不配置跳过策略的chunkProcessor，不管配置的跳过异常是什么，最大次数是多少，都会至少重试一次，因为代码的实现逻辑就是这样的，后面进行了源码解析。不是同一个实例，这个过程在服务启动的时候就已经加载配置好了。
2.配置了跳过策略的chunkProcessor，不管配置的跳过异常是什么，最大次数是多少，都会至少重试一次，因为代码的实现逻辑就是这样的，后面进行了源码解析。

配置了跳过策略的情况：

没有配置跳过策略的情况：

3.配置跳过策略chunkProcessor的源码解析

3.1partition用法简单介绍

txt 复制代码

1.partition其实就是在reader、writer的基础上，加了一个数据拆分器，需要实现org.springframework.batch.core.partition.support.Partitioner，并重写其partition方法，通过gridSize来限制数据分组的大小，这个过程可以自定义也可以使用默认的Partitioner。
2.通过trunk中的chunk-completion-policy策略还可以配置一次事务处理数据的条数，即trunkSize

3.2FaultTolerantChunkProcessor处理方式源码解析

在完成数据拆分后，会走到RetryTemplate.doExecute方法，方法的代码如下：

java 复制代码

protected <T, E extends Throwable> T doExecute(RetryCallback<T, E> retryCallback,
			RecoveryCallback<T> recoveryCallback, RetryState state)
			throws E, ExhaustedRetryException {

		RetryPolicy retryPolicy = this.retryPolicy;
		BackOffPolicy backOffPolicy = this.backOffPolicy;

		// Allow the retry policy to initialise itself...
		RetryContext context = open(retryPolicy, state);
		if (this.logger.isTraceEnabled()) {
			this.logger.trace("RetryContext retrieved: " + context);
		}

		// Make sure the context is available globally for clients who need
		// it...
		RetrySynchronizationManager.register(context);

		Throwable lastException = null;

		boolean exhausted = false;
		try {

			// Give clients a chance to enhance the context...
			boolean running = doOpenInterceptors(retryCallback, context);

			if (!running) {
				throw new TerminatedRetryException(
						"Retry terminated abnormally by interceptor before first attempt");
			}

			// Get or Start the backoff context...
			BackOffContext backOffContext = null;
			Object resource = context.getAttribute("backOffContext");

			if (resource instanceof BackOffContext) {
				backOffContext = (BackOffContext) resource;
			}

			if (backOffContext == null) {
				backOffContext = backOffPolicy.start(context);
				if (backOffContext != null) {
					context.setAttribute("backOffContext", backOffContext);
				}
			}

			/*
			 * We allow the whole loop to be skipped if the policy or context already
			 * forbid the first try. This is used in the case of external retry to allow a
			 * recovery in handleRetryExhausted without the callback processing (which
			 * would throw an exception).
			 */
			while (canRetry(retryPolicy, context) && !context.isExhaustedOnly()) {

				try {
					if (this.logger.isDebugEnabled()) {
						this.logger.debug("Retry: count=" + context.getRetryCount());
					}
					// Reset the last exception, so if we are successful
					// the close interceptors will not think we failed...
					lastException = null;
					return retryCallback.doWithRetry(context);
				}
				catch (Throwable e) {

					lastException = e;

					try {
						registerThrowable(retryPolicy, state, context, e);
					}
					catch (Exception ex) {
						throw new TerminatedRetryException("Could not register throwable",
								ex);
					}
					finally {
						doOnErrorInterceptors(retryCallback, context, e);
					}

					if (canRetry(retryPolicy, context) && !context.isExhaustedOnly()) {
						try {
							backOffPolicy.backOff(backOffContext);
						}
						catch (BackOffInterruptedException ex) {
							lastException = e;
							// back off was prevented by another thread - fail the retry
							if (this.logger.isDebugEnabled()) {
								this.logger
										.debug("Abort retry because interrupted: count="
												+ context.getRetryCount());
							}
							throw ex;
						}
					}

					if (this.logger.isDebugEnabled()) {
						this.logger.debug(
								"Checking for rethrow: count=" + context.getRetryCount());
					}

					if (shouldRethrow(retryPolicy, context, state)) {
						if (this.logger.isDebugEnabled()) {
							this.logger.debug("Rethrow in retry for policy: count="
									+ context.getRetryCount());
						}
						throw RetryTemplate.<E>wrapIfNecessary(e);
					}

				}

				/*
				 * A stateful attempt that can retry may rethrow the exception before now,
				 * but if we get this far in a stateful retry there's a reason for it,
				 * like a circuit breaker or a rollback classifier.
				 */
				if (state != null && context.hasAttribute(GLOBAL_STATE)) {
					break;
				}
			}

			if (state == null && this.logger.isDebugEnabled()) {
				this.logger.debug(
						"Retry failed last attempt: count=" + context.getRetryCount());
			}

			exhausted = true;
			return handleRetryExhausted(recoveryCallback, context, state);

		}
		catch (Throwable e) {
			throw RetryTemplate.<E>wrapIfNecessary(e);
		}
		finally {
			close(retryPolicy, context, state, lastException == null || exhausted);
			doCloseInterceptors(retryCallback, context, lastException);
			RetrySynchronizationManager.clear();
		}

	}

当第一次消费数据时，走的是**return retryCallback.doWithRetry(context);**这行，这行代码调用的是FaultTolerantChunkProcessor类的write方法中retryCallback的doWithRetry方法，其中dowriter方法之后调用的就是我们consumer的consumeData方法

在第一次调用后会抛出异常，但是错误被catch住了，所以会在RetryTemplate.doExecute方法中继续往下走，走的是return handleRetryExhausted(recoveryCallback, context, state);这行，这个调用的就是FaultTolerantChunkProcessor类的write方法中recoveryCallback的recover方法

其中scan方法内部有第二次调用writer的地方

在第二次报错后，错误也被catch住了，但此时其实后面还会再跑一遍，但是是新的构建的代码块，inputs和outputs都为null，相当于空跑，在后面的判断就return了

在整个处理过程中，还有一些用来判断的参数，比如：

txt 复制代码

1.FaultTolerantChunkProcessor类里子类UserData的scanning属性是用来判断用户数据是否被扫描过的。如果scanning属性为true,则表示该用户数据已经被扫描过，否则表示该用户数据尚未被扫描过。
2.在org.springframework.batch.core.step.item.Chunk类中，busy属性是一个布尔值，用于指示该代码块是否正在被构建。如果busy为true,则表示该代码块正在被构建，否则表示该代码块已经构建完成。

4.结论

txt 复制代码

在使用partition时，只要配置了跳过策略，那么在writer的过程中，如果第一次处理抛出异常，那么都会另起事务再处理一次，如果想规避这种情况，可以把跳过策略去掉，那么在数据处理失败时，就会直接抛错，批量也会停住。

附上断点截图：