背景
在做线程栈压缩任务时发现,我们崩溃列表中存在一个由于pthread_create
函数导致的 OOM,详细分析这个 crash,发现在此次 OOM 崩溃中,分配的线程数达到了1400
个,由此开始寻找问题所在。
分析问题
由于大量的线程名都是 firebase-iid-ex
(无法获取完整的线程名称)的线程,顾名思义是 firebase 的什么内部线程,检索了一下,最终定位在 SyncTask
类下,因此最终了解到这个线程名为firebase-iid-executor
。 继续分析,发现大量的崩溃在启动后 10s
左右就会闪退,又结合FirebaseCloudMessage SDK
的一些内部代码,最终定位在CloudMessagingReceiver
类中。 从上述代码中可知,CloudMessagingReceiver
是一个广播接收者,需要在XML
中注册,并且接收广播的机制是,当收到推送(广播通知)就会初始化一次这个CloudMessagingReceiver
类,那么构造方法便被调用一次,此时可见,内部的线程池配置也会重新初始化,但是这里违背了线程池创建的初衷,即 核心线程池 == 最大线程池数 == 1
,所以这里当在某一极短时间内接收到大量通知,便会创建大量线程。说一句题外话,当核心线程池 == 最大线程池数 == 1
,如果不指定 allowCoreThreadTimeOut(true)
,那么设置的存活时间无效。 所以针对上述问题,我向 firebase
提了issue(Too many named firebase-iid-executor threads are created)。
我们使用Firebase版本是32.3.1
,新版本已经修复这个问题。
修复
由于官方已经有修复版本了,那么直接升级版本号就好了,但是本文是 ASM
的修复方案,如果就此一帆风顺也就没必要有此文了。 我们尝试升级到最新版本32.7.4
,发现好多依赖冲突需要同时调整很多依赖,并且升级跨度比较大需要一个测试流程,目前资源比较紧张,需要一个简单风险又低的修复。
修复方案
反射方案
由于我们要反射一个类,首先要知道什么时候去触发一个反射,由于这种场景,会频繁的创建一个广播接收者对象,而又要通过反射处理这个线程池的创建,一方面是性能,一方面是反射的时机,所以这个方案就被摒弃了。
继承 + 重写 xml 注册流程
这个方案需要重写一个CloudMessagingReceiver
,并需要把子类内部现有的逻辑都复制到一个我们自定义重写的一个类中,并注册到清单文件中,原始的注册使用android:enabled=false
标记。
这个方案需要复制一份内部代码,风险比较大,无法做降级方案所以也被摒弃了。
ASM 插桩方案
最后我们选择使用了 ASM 字节码插桩
的方案去解决,并且它还可以支持降级,属于万全之策了。 由于我们的项目 AGP
的版本比较低,还可以使用 Transform
作为过渡版本,这里的方案就暂时使用它了。
分析插桩代码
java
public abstract class CloudMessagingReceiver extends BroadcastReceiver {
private final ExecutorService zza;
@WorkerThread
protected abstract int onMessageReceive(@NonNull Context var1, @NonNull CloudMessage var2);
@NonNull
protected Executor getBroadcastExecutor() {
return this.zza;
}
public CloudMessagingReceiver() {
zze.zza();
NamedThreadFactory var1 = new NamedThreadFactory("firebase-iid-executor");
TimeUnit var3 = TimeUnit.SECONDS;
LinkedBlockingQueue var4 = new LinkedBlockingQueue();
ThreadPoolExecutor var2 = new ThreadPoolExecutor(1, 1, 60L, var3, var4, var1);
var2.allowCoreThreadTimeOut(true);
// 1
this.zza = Executors.unconfigurableExecutorService(var2);
}
public final void onReceive(@NonNull Context context, @NonNull Intent intent) {
if (intent != null) {
boolean var3 = this.isOrderedBroadcast();
BroadcastReceiver.PendingResult var4 = this.goAsync();
// 2
Executor var5 = this.getBroadcastExecutor();
com.google.android.gms.cloudmessaging.zze var6 = new com.google.android.gms.cloudmessaging.zze(this, intent, context, var3, var4);
var5.execute(var6);
}
}
// ignore....
}
通过代码 1处我们可以知道,zza对象被赋值给这个线程池,然后在代码 2处被使用,所以我们直接把字节码插在这里得到这样的字节码(伪代码):
ini
public CloudMessagingReceiver() {
final Executor executor = FixedFirebaseHelper.getExecutor()
if(executor != null) {
zze.zza();
this.zza = executor;
return;
}
zze.zza();
NamedThreadFactory var1 = new NamedThreadFactory("firebase-iid-executor");
TimeUnit var3 = TimeUnit.SECONDS;
LinkedBlockingQueue var4 = new LinkedBlockingQueue();
ThreadPoolExecutor var2 = new ThreadPoolExecutor(1, 1, 60L, var3, var4, var1);
var2.allowCoreThreadTimeOut(true);
this.zza = Executors.unconfigurableExecutorService(var2);
}
如果FixFirebaseHelper.getExecutor()
为 NULL
,那么说明远程开关是关则不修复,如果开了会返回一个我们自己的一个线程池,当然这个线程池是在单例类中获取。ok 至此我们可以开搞。
插桩注意点
新老版本升级导致代码更新需要停止插桩动作
如果新版本调整了这个CloudMessagingReceiver
代码,需要及时告知给我们,所以在第一次依赖插件时,要先拷贝出一份当前的字节码文件到 CI打包机
某个位置,做后续的文件对比,如果发现不是同一个文件,那么通过 CI
告警。
*
插入类存在性检测
由于我们会插入一个类FixedFirebaseHelper
,有时候如果我们插入成功了这个类,但是这个类不存在项目中,由于我们推送场景不一定每次都测到,会导致线上崩溃问题。
部分插桩代码
FixFirebaseHelper
kotlin
object FixedFirebaseHelper {
private const val ALIVE_DURATION = 60L
private const val THREAD_NAME = "FirebasePushThread"
private val firebaseExecutor: ExecutorService = ThreadPoolExecutor(
1,
1,
ALIVE_DURATION,
TimeUnit.SECONDS,
LinkedBlockingQueue()
) { r ->
val thread = Thread(r, THREAD_NAME)
thread.isDaemon = true
thread
}
@JvmField
var enableHookFirebase: Boolean = false
@HookPoint(description = "Reset firebase push thread executor by asm.")
@JvmStatic
fun getExecutorService(): ExecutorService? {
return if (enableHookFirebase) firebaseExecutor else null
}
}
@Target(
AnnotationTarget.FUNCTION,
AnnotationTarget.PROPERTY_GETTER,
AnnotationTarget.PROPERTY_SETTER
)
@Retention(AnnotationRetention.RUNTIME)
@Inherited
@MustBeDocumented
annotation class HookPoint(
val description: String = ""
)
fun main() {
FixedFirebaseHelper.enableHookFirebase = true
FixedFirebaseHelper.getExecutorService()
}
上述是我自定义了一个类用于提供一个指定的线程池。注意这里的 getExecutorService()
一定要标记一下@JvmStatic
,不然会导致一个 crash,这个就是常规的 Koltin 的静态调用问题。
typescript
// Crash thread
java.lang.IncompatibleClassChangeError: The method 'java.util.concurrent.ExecutorService com.deliverysdk.asm_firebase_lib.FixedFirebaseHelper.getExecutorService()' was expected to be of type static but instead was found to be of type virtual (declaration of 'com.google.android.gms.cloudmessaging.CloudMessagingReceiver' appears in /data/app/~~0yGz42Ml04FLtJbVmLOGjw==/com.xxx.sea-md5RQIRBsQfs4LyCKYxPDw==/base.apk!classes15.dex)
HookedFirebaseTransform
scss
@Override
public void transform(TransformInvocation transformInvocation) throws TransformException, InterruptedException, IOException {
super.transform(transformInvocation);
isFindInjectLibClass = false;
TransformOutputProvider outputProvider = transformInvocation.getOutputProvider();
transformInvocation.getInputs().forEach(transformInput -> {
transformInput.getJarInputs().forEach(jarInput ->
checkInjectClassIsExisted(jarInput.getFile())
);
if (isFindInjectLibClass) {
System.out.println(TAG + "------------ find inject class and start trigger asm transform! ");
transformInput.getJarInputs().forEach(jarInput -> transformJarInput(jarInput, outputProvider));
transformInput.getDirectoryInputs().forEach(directoryInput -> transformDirectoryInput(directoryInput, outputProvider));
}
});
}
比较常规了,遍历所有的 jar文件,然后解压找到指定的字节码。
ini
private void transformJarInput(JarInput jarInput, TransformOutputProvider outputProvider) {
File dest = outputProvider.getContentLocation(jarInput.getName(), jarInput.getContentTypes(), jarInput.getScopes(), Format.JAR);
try {
File inputJarFile = jarInput.getFile();
boolean isExist = checkExistTargetClass(inputJarFile);
if (isExist){
// Unzip jar file and asm transform
System.out.println(TAG + "------------ start unzipJarClassesAndAsm ");
unzipJarClassesAndAsm(inputJarFile);
// Unzip jar file and asm transform
System.out.println(TAG + "------------ end unzipJarClassesAndAsm ");
}
FileUtils.copyFile(inputJarFile, dest);
} catch (IOException e) {
System.out.println(TAG + "------------ transformJarInput error: " + e.getMessage());
}
}
private void checkInjectClassIsExisted(File inputJar) {
try (ZipInputStream zis = new ZipInputStream(new FileInputStream(inputJar))) {
ZipEntry zipEntry = zis.getNextEntry();
while (zipEntry != null) {
String fileName = zipEntry.getName();
if (fileName.equals(INJECT_CLASS_NAME)) {
isFindInjectLibClass = true;
System.out.println(TAG + "------------ find inject class: " + fileName);
}
zipEntry = zis.getNextEntry();
}
} catch (IOException e) {
System.out.println(TAG + "------------ Unzip and check inject class is existed failed :" + e.getMessage());
}
}
先去调用checkInjectClassIsExisted
检查一下是否存在FixFirebaseHelper
这个类,然后在做后续jar 遍历流程。
ini
private boolean checkExistTargetClass(File inputJarFile) {
try (ZipInputStream zis = new ZipInputStream(new FileInputStream(inputJarFile))) {
ZipEntry zipEntry = zis.getNextEntry();
while (zipEntry != null) {
final String fileName = zipEntry.getName();
if (fileName.equals(HOOK_POINT_CLASS)) {
return true;
}
zipEntry = zis.getNextEntry();
}
} catch (IOException e) {
throw new RuntimeException(e);
}
return false;
}
public void unzipJarClassesAndAsm(File inputJar) {
File tempJar = null;
try {
File tempDirectory = new File(projectRootDir + TARGET_COMPARE_CLASS_PATH + "temp/");
if (!tempDirectory.exists()) {
tempDirectory.mkdirs();
}
tempJar = File.createTempFile("tempJar", ".jar", tempDirectory);
final JarOutputStream jos = new JarOutputStream(new FileOutputStream(tempJar));
final ZipInputStream zis = new ZipInputStream(new FileInputStream(inputJar));
ZipEntry zipEntry = zis.getNextEntry();
while (zipEntry != null) {
String fileName = zipEntry.getName();
if (!fileName.equals(HOOK_POINT_CLASS)) {
jos.putNextEntry(new JarEntry(zipEntry.getName()));
byte[] bytes = zis.readAllBytes();
jos.write(bytes);
jos.closeEntry();
} else {
final byte[] currentClassBytes = checkClassWhetherChanged(zis);
// Read and modify the target class
assert currentClassBytes != null;
ClassReader cr = new ClassReader(currentClassBytes);
ClassWriter cw = new ClassWriter(cr, ClassWriter.COMPUTE_MAXS);
HookedClassVisitor cv = new HookedClassVisitor(Opcodes.ASM9, cw);
cr.accept(cv, 0);
// Write the modified class back into the new jar
JarEntry modifiedClassEntry = new JarEntry(fileName);
jos.putNextEntry(modifiedClassEntry);
jos.write(cw.toByteArray());
jos.closeEntry();
System.out.println(TAG + "------------ Finish asm transform for " + HOOK_POINT_CLASS);
}
zipEntry = zis.getNextEntry();
}
zis.close();
jos.close();
System.out.println(TAG + "------------ inputJar path :" + inputJar.getAbsolutePath() + "------" + inputJar.exists());
if (inputJar.delete()) {
System.out.println(TAG + "------------ temp jar path :" + tempJar.getAbsolutePath() + "------" + tempJar.exists());
moveFileUsingShell(tempJar, inputJar);
} else {
System.out.println(TAG + "------------ Failed to delete original jar file.");
}
} catch (IOException e) {
System.out.println(TAG + "------------ Error processing jar file: " + e.getMessage());
} finally {
if (tempJar != null && tempJar.exists()) {
tempJar.deleteOnExit();
System.out.println(TAG + "------------ Delete temp jar file finally.");
}
}
}
/**
* CI can not adapt use file.renameTo() method, it not worked for ci.
*/
public void moveFileUsingShell(File source, File destination) {
ProcessBuilder processBuilder = new ProcessBuilder();
try {
processBuilder.command("sh", "-c", "mv " + source.getAbsolutePath() + " " + destination.getAbsolutePath());
Process process = processBuilder.start();
int exitVal = process.waitFor();
if (exitVal == 0) {
System.out.println("Success: Moved file from " + source.getAbsolutePath() + " to " + destination.getAbsolutePath());
} else {
System.out.println("Error: Failed to move file");
}
} catch (IOException | InterruptedException e) {
System.out.println("Error: Failed to move file : " + e.getMessage());
}
}
这里对 jar 文件先进行扫描,看看哪个 jar 中存在指定的字节码,我们这里是com/google/android/gms/cloudmessaging/CloudMessagingReceiver.class
如果找到了,那么将 jar 使用 ZipInputStream
读一下然后先调用checkClassWhetherChanged
方法去对比一下是否和需要字节码修复的 CloudMessageReceiver
是一致的,如果一致再插桩。
csharp
@Nullable
private byte[] checkClassWhetherChanged(ZipInputStream zis) throws IOException {
// After obtaining HOOK_POINT_CLASS, first copy this class bytecode file separately into the project root directory.
final File comparedExistClass = new File(projectRootDir + TARGET_COMPARE_CLASS_PATH, HOOK_POINT_CLASS.replace('/', File.separatorChar));
if (comparedExistClass.exists()) {
System.out.println(TAG + "------------ The class file already exists, compare the constructors of the two class files.");
try {
// If it already exists, directly extract the existing class and compare it with the current HOOK_POINT_CLASS. If the constructors' internal logic in the two bytecode files are inconsistent, then throw an exception to terminate packaging.
byte[] existingClassBytes = Files.readAllBytes(comparedExistClass.toPath());
// Read the current class file
byte[] currentClassBytes = zis.readAllBytes();
// Compare the constructors of the two class files
boolean constructorsAreEqual = areClassFilesIdentical(existingClassBytes, currentClassBytes);
if (!constructorsAreEqual){
System.out.println("existingClassBytes : " + existingClassBytes.length);
System.out.println("TODO CI can not worked to check...");
System.out.println("The constructors of the two class files are inconsistent, please check the constructors of the two class files.");
}
return currentClassBytes;
} catch (Exception e){
System.out.println(TAG + "------------ Error comparing class file: " + e.getMessage());
}
} else {
comparedExistClass.getParentFile().mkdirs();
try (FileOutputStream fos = new FileOutputStream(comparedExistClass)) {
byte[] currentClassBytes = zis.readAllBytes();
fos.write(currentClassBytes);
return currentClassBytes;
} catch (IOException e) {
System.out.println(TAG + "------------ Error saving class file: " + e.getMessage());
}
}
return null;
}
接下来我们就来到了 ASM
的大门。
HookedClassVisitor
scala
class HookedClassVisitor extends ClassVisitor {
public HookedClassVisitor(int api, ClassVisitor classVisitor) {
super(api, classVisitor);
}
@Override
public MethodVisitor visitMethod(int access, String name, String descriptor, String signature, String[] exceptions) {
MethodVisitor mv = super.visitMethod(access, name, descriptor, signature, exceptions);
if ("<init>".equals(name) && "()V".equals(descriptor)) {
return new HookedMethodVisitor(Opcodes.ASM9, mv, access, name, descriptor);
}
return mv;
}
}
HookedClassVisitor
类继承自 ASM
的 ClassVisitor
类。它用于访问Java类的结构,包括类中定义的方法。重要的是,它覆盖了 visitMethod
方法,这使得我们能够拦截类中方法的访问。
public HookedClassVisitor(int api, ClassVisitor classVisitor)
: 构造函数接收ASM
API版本号和一个ClassVisitor
对象。调用super(api, classVisitor)
将这些参数传递给父类。visitMethod
: 当访问类中的每个方法时,都会调用此方法。在这里,我们检查方法是否是类的初始化方法(构造函数)。这通过比较方法名是否为"<init>"
和描述符是否为"()V"
来确定,这表示无参数构造函数。如果条件满足,我们将创建并返回一个HookedMethodVisitor
实例,否则返回原始的MethodVisitor
实例。 这里还是比较好理解的。
HookedMethodVisitor
scala
class HookedMethodVisitor extends AdviceAdapter {
public HookedMethodVisitor(int api, MethodVisitor mv, int access, String name, String descriptor) {
super(api, mv, access, name, descriptor);
}
@Override
protected void onMethodEnter() {
mv.visitMethodInsn(INVOKESTATIC, INJECT_CLASS, "getExecutorService", "()Ljava/util/concurrent/ExecutorService;", false);
mv.visitVarInsn(ASTORE, 1);
mv.visitVarInsn(ALOAD, 1);
Label l1 = new Label();
mv.visitJumpInsn(IFNULL, l1);
mv.visitMethodInsn(INVOKESTATIC, "com/google/android/gms/internal/cloudmessaging/zze", "zza", "()Lcom/google/android/gms/internal/cloudmessaging/zzb;", false);
mv.visitInsn(POP);
mv.visitVarInsn(ALOAD, 0);
mv.visitVarInsn(ALOAD, 1);
mv.visitFieldInsn(PUTFIELD, Constant.HOOK_POINT_CLASS_NO_SUFFIX, "zza", "Ljava/util/concurrent/ExecutorService;");
mv.visitInsn(RETURN);
mv.visitLabel(l1);
}
}
到了HookedMethodVisitor
,首先我们重载了onMethodEnter
,这个方法代表进入方法时的回调, 第九行我们去执行静态方法,拿到 executor
这个线程池对象并将引用类型的值从操作数栈存储到局部变量表中的指定索引位置,这里设置 index = 1
, 当然为什么是 1 呢,因为 0 已经被 this
占了(对于非静态方法)。
之后 16 行需要注意一下返回值问题
arduino
mv.visitMethodInsn(INVOKESTATIC, "com/google/android/gms/internal/cloudmessaging/zze", "zza", "()Lcom/google/android/gms/internal/cloudmessaging/zzb;", false);
由于过度自信,这里返回值当时我忘记检查了,zze.zza()
这个方法是有返回值的。
不然即便是 class 文件看上去是调用了zze.zza()
,但真正到字节码层面上是找不到 zza()
方法的,函数签名不匹配。
通过上述代码最终生成了如下的字节码。
csharp
.method public constructor <init>()V
.registers 10
.line 1
.end local p0 # "this":Lcom/google/android/gms/cloudmessaging/CloudMessagingReceiver;
invoke-direct {p0}, Landroid/content/BroadcastReceiver;-><init>()V
invoke-static {}, Lcom/xxx/asm_firebase_lib/FixedFirebaseHelper;->getExecutorService()Ljava/util/concurrent/ExecutorService;
move-result-object v0
if-eqz v0, :cond_f
invoke-static {}, Lcom/google/android/gms/internal/cloudmessaging/zze;->zza()Lcom/google/android/gms/internal/cloudmessaging/zzb;
iput-object v0, p0, Lcom/google/android/gms/cloudmessaging/CloudMessagingReceiver;->zza:Ljava/util/concurrent/ExecutorService;
return-void
.line 2
:cond_f
invoke-static {}, Lcom/google/android/gms/internal/cloudmessaging/zze;->zza()Lcom/google/android/gms/internal/cloudmessaging/zzb;
new-instance v8, Lcom/google/android/gms/common/util/concurrent/NamedThreadFactory;
const-string v0, "firebase-iid-executor"
invoke-direct {v8, v0}, Lcom/google/android/gms/common/util/concurrent/NamedThreadFactory;-><init>(Ljava/lang/String;)V
new-instance v0, Ljava/util/concurrent/ThreadPoolExecutor;
sget-object v6, Ljava/util/concurrent/TimeUnit;->SECONDS:Ljava/util/concurrent/TimeUnit;
new-instance v7, Ljava/util/concurrent/LinkedBlockingQueue;
.line 3
invoke-direct {v7}, Ljava/util/concurrent/LinkedBlockingQueue;-><init>()V
const/4 v2, 0x1
const/4 v3, 0x1
const-wide/16 v4, 0x3c
move-object v1, v0
invoke-direct/range {v1 .. v8}, Ljava/util/concurrent/ThreadPoolExecutor;-><init>(IIJLjava/util/concurrent/TimeUnit;Ljava/util/concurrent/BlockingQueue;Ljava/util/concurrent/ThreadFactory;)V
const/4 v1, 0x1
.line 4
invoke-virtual {v0, v1}, Ljava/util/concurrent/ThreadPoolExecutor;->allowCoreThreadTimeOut(Z)V
.line 5
invoke-static {v0}, Ljava/util/concurrent/Executors;->unconfigurableExecutorService(Ljava/util/concurrent/ExecutorService;)Ljava/util/concurrent/ExecutorService;
move-result-object v0
iput-object v0, p0, Lcom/google/android/gms/cloudmessaging/CloudMessagingReceiver;->zza:Ljava/util/concurrent/ExecutorService;
return-void
.end method
编译耗时
在 M2Pro
上,2s的编译耗时可以忽略。
总结
使用ASM
修复一些临时问题还是比较香的,体现了AOP
的思想。比如慢函数检测,无侵入式埋点等等都可以使用ASM
处理,后续有空可以补充这些内容。总的来说 Google Firebase
团队修复问题速度还是蛮快的 >_< !