一、问题起源
在处理一次生产环境cpu拉满问题时,把日志拉下来看发现很多http请求调用出错,项目使用的是okhttp 3.8.1版本。
二、问题描述
问题出在okhttp3.Dispatcher.finished(Dispatcher.java:201)
代码如下:
java
void finished(AsyncCall call) {
finished(runningAsyncCalls, call, true);
}
void finished(RealCall call) {
finished(runningSyncCalls, call, false);
}
private <T> void finished(Deque<T> calls, T call, boolean promoteCalls) {
int runningCallsCount;
Runnable idleCallback;
synchronized (this) { //201行
if (!calls.remove(call)) throw new AssertionError("Call wasn't in-flight!");
if (promoteCalls) promoteCalls();
runningCallsCount = runningCallsCount();
idleCallback = this.idleCallback;
}
if (runningCallsCount == 0 && idleCallback != null) {
idleCallback.run();
}
}
private void promoteCalls() {
if (runningAsyncCalls.size() >= maxRequests) return; // Already running max capacity.
if (readyAsyncCalls.isEmpty()) return; // No ready calls to promote.
for (Iterator<AsyncCall> i = readyAsyncCalls.iterator(); i.hasNext(); ) {
AsyncCall call = i.next();
if (runningCallsForHost(call) < maxRequestsPerHost) {
i.remove();
runningAsyncCalls.add(call);
executorService().execute(call);
}
if (runningAsyncCalls.size() >= maxRequests) return; // Reached max capacity.
}
}
三、分析代码
在OkHttpClient中final Dispatcher dispatcher;
作为成员对象,而我们代码中OkHttpClient作为连接池是单例的,这里是对dispatcher做synchronized。
追踪代码发现,在finished的调用方法中,我们方法中使用的是异步AsyncCall
,而这里synchronized
方法中的promoteCalls
被置为true。所以会调用promoteCalls()
方法, 而promoteCalls()
方法中会继续调用executorService().execute(call);
,就是这里,问题大了,synchronized
中执行http请求,那上面代码中的超时不就长时间占用锁了?怪不得进程blocked了。
关于线程的BLOCKED,需要知道:
- java.lang.Thread.State: BLOCKED:等待监视器锁而被阻塞的线程的线程状态,当进入 synchronized 块/方法或者在调用 wait()被唤醒/超时之后重新进入 synchronized 块/方法, 但是锁被其它线程占有,这个时候被操作系统挂起,状态为阻塞状态。若是有线程长时间处于 BLOCKED 状态,要考虑是否发生了死锁(deadlock)的情况。
- blocked的线程不会消耗cpu,但频繁的频繁切换线程上下文会导致cpu过高。线程被频繁唤醒,而又由于抢占锁失败频繁地被挂起. 因此也会带来大量的上下文切换, 消耗系统的cpu资源。
四、解决方案
okttp关于这个问题已经有过解答:
解决方案就简单多了:升级okhttp到3.14.9,虽然目前最新稳定版本为4.9.3,但是OkHttp 4发布,从Java切换到Kotlin。谨慎一点,还是小版本升级吧。
在3.14.9中,这部分代码被优化为:
java
private boolean promoteAndExecute() {
assert (!Thread.holdsLock(this));
List<AsyncCall> executableCalls = new ArrayList<>();
boolean isRunning;
synchronized (this) {
for (Iterator<AsyncCall> i = readyAsyncCalls.iterator(); i.hasNext(); ) {
AsyncCall asyncCall = i.next();
if (runningAsyncCalls.size() >= maxRequests) break; // Max capacity.
if (asyncCall.callsPerHost().get() >= maxRequestsPerHost) continue; // Host max capacity.
i.remove();
asyncCall.callsPerHost().incrementAndGet();
executableCalls.add(asyncCall);
runningAsyncCalls.add(asyncCall);
}
isRunning = runningCallsCount() > 0;
}
for (int i = 0, size = executableCalls.size(); i < size; i++) {
AsyncCall asyncCall = executableCalls.get(i);
asyncCall.executeOn(executorService());
}
return isRunning;
}
执行HTTP请求被移出了synchronized方法了。