问题描述
线上突然报错下标越界错误,日志如下
java
java.lang.IndexOutOfBoundsException: Index -3 out of bounds for length 5
at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248)
at java.base/java.util.Objects.checkIndex(Objects.java:372)
at java.base/java.util.ArrayList.get(ArrayList.java:459)
at org.springframework.cloud.loadbalancer.core.RoundRobinLoadBalancer.getInstanceResponse(RoundRobinLoadBalancer.java:104)
at org.springframework.cloud.loadbalancer.core.RoundRobinLoadBalancer.processInstanceResponse(RoundRobinLoadBalancer.java:87)
at org.springframework.cloud.loadbalancer.core.RoundRobinLoadBalancer.lambda$choose$0(RoundRobinLoadBalancer.java:82)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:106)
at reactor.core.publisher.MonoNext$NextSubscriber.onNext(MonoNext.java:82)
at reactor.core.publisher.FluxDematerialize$DematerializeSubscriber.onNext(FluxDematerialize.java:98)
at reactor.core.publisher.FluxDematerialize$DematerializeSubscriber.onNext(FluxDematerialize.java:44)
at reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber.drainAsync(FluxFlattenIterable.java:421)
at reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber.drain(FluxFlattenIterable.java:686)
at reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber.onNext(FluxFlattenIterable.java:250)
at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:74)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1816)
at reactor.core.publisher.MonoCollectList$MonoCollectListSubscriber.onComplete(MonoCollectList.java:128)
at reactor.core.publisher.DrainUtils.postCompleteDrain(DrainUtils.java:132)
at reactor.core.publisher.DrainUtils.postComplete(DrainUtils.java:187)
at reactor.core.publisher.FluxMaterialize$MaterializeSubscriber.onComplete(FluxMaterialize.java:141)
at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2400)
at reactor.core.publisher.FluxMaterialize$MaterializeSubscriber.request(FluxMaterialize.java:148)
at reactor.core.publisher.MonoCollectList$MonoCollectListSubscriber.onSubscribe(MonoCollectList.java:79)
at reactor.core.publisher.FluxMaterialize$MaterializeSubscriber.onSubscribe(FluxMaterialize.java:103)
at reactor.core.publisher.FluxJust.subscribe(FluxJust.java:68)
at reactor.core.publisher.InternalFluxOperator.subscribe(InternalFluxOperator.java:62)
at reactor.core.publisher.FluxDefer.subscribe(FluxDefer.java:54)
at reactor.core.publisher.Mono.subscribe(Mono.java:4400)
at reactor.core.publisher.Mono.block(Mono.java:1706)
at org.springframework.cloud.loadbalancer.blocking.client.BlockingLoadBalancerClient.choose(BlockingLoadBalancerClient.java:155)
at org.springframework.cloud.openfeign.loadbalancer.FeignBlockingLoadBalancerClient.execute(FeignBlockingLoadBalancerClient.java:97)
at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:119)
at feign.SynchronousMethodHandler.invoke(SynchronousMethodHandler.java:89)
at feign.ReflectiveFeign$FeignInvocationHandler.invoke(ReflectiveFeign.java:100)
at org.springframework.cloud.openfeign.FeignCachingInvocationHandlerFactory$1.proceed(FeignCachingInvocationHandlerFactory.java:66)
at org.springframework.cache.interceptor.CacheInterceptor.lambda$invoke$0(CacheInterceptor.java:54)
at org.springframework.cache.interceptor.CacheAspectSupport.execute(CacheAspectSupport.java:351)
at org.springframework.cache.interceptor.CacheInterceptor.invoke(CacheInterceptor.java:64)
at org.springframework.cloud.openfeign.FeignCachingInvocationHandlerFactory.lambda$create$1(FeignCachingInvocationHandlerFactory.java:53)
问题分析
查看源码如下
java
public class RoundRobinLoadBalancer implements ReactorServiceInstanceLoadBalancer {
final AtomicInteger position;
public RoundRobinLoadBalancer(ObjectProvider<ServiceInstanceListSupplier> serviceInstanceListSupplierProvider,
String serviceId) {
this(serviceInstanceListSupplierProvider, serviceId, new Random().nextInt(1000));
}
...
private Response<ServiceInstance> getInstanceResponse(List<ServiceInstance> instances) {
if (instances.isEmpty()) {
if (log.isWarnEnabled()) {
log.warn("No servers available for service: " + serviceId);
}
return new EmptyResponse();
}
// TODO: enforce order?
int pos = Math.abs(this.position.incrementAndGet());
// 出现问题的第104行代码在这
ServiceInstance instance = instances.get(pos % instances.size());
return new DefaultResponse(instance);
}
问题原因是pos % instances.size()变成了负数:-3
可以看到position是AtomicInteger类型默认初始化是1000以内的随机数。大家知道Integer越界后会成为负数,但是明明取了绝对值,为什么还会有负数?
只有一种可能就是Math.abs函数存在bug。查看abs源码如下。代码很简单,如果是负数直接取-a
java
public static int abs(int a) {
return (a < 0) ? -a : a;
}
问题复现
java
public static void main(String[] args) {
AtomicInteger num = new AtomicInteger(new Random().nextInt(1000));
int numStart = num.get();
for (long i = 0; i < Long.MAX_VALUE; i++) {
int pos = Math.abs(num.incrementAndGet());
if (pos < 0) {
log.info("numStart = {}, i = {} pos = {} pos % 5 = {}", numStart, i, pos, pos % 5);
break;
}
}
}
代码输出
java
numStart = 185, i = 2147483462 pos = -2147483648 pos % 5 = -3
问题原因
问题复现后基本原因也很明确了,问题出现在第2147483462次递增,也就是num=185+2147483462=2147483647,那么此时2147483647递增后结果为-2147483648。abs对-2147483648求值结果为2147483648。对整形熟悉的朋友都知道,整形的范围是:-2147483648 至 2147483647。
也就是说绝对值2147483648对于整形发生了越界,即得到结果是一个负数:-2147483648
因此-2147483648对5(线上外部请求的路由url列表大小)取余数得到-3发生了IndexOutOfBoundsException异常
解决方案
由于发生了一次越界,那么下次发生越界的起码需要一段时间,此时安排合理的时间对spring-cloud-loadbalancer版本进行升级即可,因为新版本已经修复了该问题,可以参考官方issue与PR:https://github.com/spring-cloud/spring-cloud-commons/pull/1077
官方源码
java
// Ignore the sign bit, this allows pos to loop sequentially from 0 to
// Integer.MAX_VALUE
int pos = this.position.incrementAndGet() & Integer.MAX_VALUE;
ServiceInstance instance = instances.get(pos % instances.size());
为什么要先对Integer.MAX_VALUE按位"与"运算?
- 位"与"运算只有在对(2的幂)取余数时候才能平替,即:X % 2^n = X & (2^n - 1)
- Integer.MAX_VALUE = 2^31 - 1 = 2147483647