LeakCanary 源码阅读笔记（四）

本篇文章是阅读 LeakCanary 源码的系列文章第四篇，如果没有看过前面三篇文章建议先看看前面的文章：

LeakCanary 源码阅读笔记（一）
LeakCanary 源码阅读笔记（二）
LeakCanary 源码阅读笔记（三）

本篇文章主要介绍 LeakCanary 如何解析 HPROF 文件，如果不熟悉 HPROF 文件结构的同学，强烈建议先看看我之前介绍 HPROF 文件的文章： Android HPROF 内存快照文件详解。

解析 HPROF 文件前的操作

书接上回，当 HPROF 文件 dump 成功后，会发送一个 HeapDump 的事件给 InternalLeakCanary，我们来看看它的 sendEvent() 方法：

Kotlin 复制代码

fun sendEvent(event: Event) {
  for(listener in LeakCanary.config.eventListeners) {
    listener.onEvent(event)
  }
}

我们看到会依次回调 LeakCanary.config.eventListeners 中的监听。

Kotlin 复制代码

val eventListeners: List<EventListener> = listOf(
  LogcatEventListener,
  ToastEventListener,
  LazyForwardingEventListener {
    if (InternalLeakCanary.formFactor == TV) TvEventListener else NotificationEventListener
  },
  when {
      RemoteWorkManagerHeapAnalyzer.remoteLeakCanaryServiceInClasspath ->
        RemoteWorkManagerHeapAnalyzer
      WorkManagerHeapAnalyzer.validWorkManagerInClasspath -> WorkManagerHeapAnalyzer
      else -> BackgroundThreadHeapAnalyzer
  }
)

我们看到有很多的监听，默认分析 HPROF 文件的监听就是 BackgroundThreadHeapAnalyzer。

Kotlin 复制代码

object BackgroundThreadHeapAnalyzer : EventListener {

  internal val heapAnalyzerThreadHandler by lazy {
    val handlerThread = HandlerThread("HeapAnalyzer")
    handlerThread.start()
    Handler(handlerThread.looper)
  }

  override fun onEvent(event: Event) {
    if (event is HeapDump) {
      heapAnalyzerThreadHandler.post {
        val doneEvent = AndroidDebugHeapAnalyzer.runAnalysisBlocking(event) { event ->
          InternalLeakCanary.sendEvent(event)
        }
        InternalLeakCanary.sendEvent(doneEvent)
      }
    }
  }
}

朴实无华的代码，直接在 HeapAnalyzer 线程中调用 AndroidDebugHeapAnalyzer#runAnalysisBlocking() 方法去分析 HPROF。

Kotlin 复制代码

fun runAnalysisBlocking(
  heapDumped: HeapDump,
  isCanceled: () -> Boolean = { false },
  progressEventListener: (HeapAnalysisProgress) -> Unit
): HeapAnalysisDone<*> {
  // 进度监听器
  val progressListener = OnAnalysisProgressListener { step ->
    val percent = (step.ordinal * 1.0) / OnAnalysisProgressListener.Step.values().size
    progressEventListener(HeapAnalysisProgress(heapDumped.uniqueId, step, percent))
  }

  val heapDumpFile = heapDumped.file
  val heapDumpDurationMillis = heapDumped.durationMillis
  val heapDumpReason = heapDumped.reason

  // 检查文件是否存在
  val heapAnalysis = if (heapDumpFile.exists()) {
    // 执行分析
    analyzeHeap(heapDumpFile, progressListener, isCanceled)
  } else {
    missingFileFailure(heapDumpFile)
  }
  
  // 处理分析后的结果
  val fullHeapAnalysis = when (heapAnalysis) {
    is HeapAnalysisSuccess -> heapAnalysis.copy(
      dumpDurationMillis = heapDumpDurationMillis,
      metadata = heapAnalysis.metadata + ("Heap dump reason" to heapDumpReason)
    )
    is HeapAnalysisFailure -> {
      val failureCause = heapAnalysis.exception.cause!!
      if (failureCause is OutOfMemoryError) {
        heapAnalysis.copy(
          dumpDurationMillis = heapDumpDurationMillis,
          exception = HeapAnalysisException(
            RuntimeException(
              """
            Not enough memory to analyze heap. You can:
            - Kill the app then restart the analysis from the LeakCanary activity.
            - Increase the memory available to your debug app with largeHeap=true: https://developer.android.com/guide/topics/manifest/application-element#largeHeap
            - Set up LeakCanary to run in a separate process: https://square.github.io/leakcanary/recipes/#running-the-leakcanary-analysis-in-a-separate-process
            - Download the heap dump from the LeakCanary activity then run the analysis from your computer with shark-cli: https://square.github.io/leakcanary/shark/#shark-cli
          """.trimIndent(), failureCause
            )
          )
        )
      } else {
        heapAnalysis.copy(dumpDurationMillis = heapDumpDurationMillis)
      }
    }
  }
  progressListener.onAnalysisProgress(REPORTING_HEAP_ANALYSIS)

  val analysisDoneEvent = ScopedLeaksDb.writableDatabase(application) { db ->
    val id = HeapAnalysisTable.insert(db, heapAnalysis)
    when (fullHeapAnalysis) {
      is HeapAnalysisSuccess -> {
        val showIntent = LeakActivity.createSuccessIntent(application, id)
        val leakSignatures = fullHeapAnalysis.allLeaks.map { it.signature }.toSet()
        val leakSignatureStatuses = LeakTable.retrieveLeakReadStatuses(db, leakSignatures)
        val unreadLeakSignatures = leakSignatureStatuses.filter { (_, read) ->
          !read
        }.keys
          // keys returns LinkedHashMap$LinkedKeySet which isn't Serializable
          .toSet()
        HeapAnalysisSucceeded(
          heapDumped.uniqueId,
          fullHeapAnalysis,
          unreadLeakSignatures,
          showIntent
        )
      }
      is HeapAnalysisFailure -> {
        val showIntent = LeakActivity.createFailureIntent(application, id)
        HeapAnalysisFailed(heapDumped.uniqueId, fullHeapAnalysis, showIntent)
      }
    }
  }
  return analysisDoneEvent
}

上面代码看着多，其实大部分都是唬人的，忽略掉就好了，直接看 analyzeHeap() 是如何处理 HPROF 文件的。

Kotlin 复制代码

private fun analyzeHeap(
  heapDumpFile: File,
  progressListener: OnAnalysisProgressListener,
  isCanceled: () -> Boolean
): HeapAnalysis {
  val config = LeakCanary.config
  // Heap 的分析
  val heapAnalyzer = HeapAnalyzer(progressListener)
  val proguardMappingReader = try {
   // 混淆的 Map 文件的 Reader
    ProguardMappingReader(application.assets.open(PROGUARD_MAPPING_FILE_NAME))
  } catch (e: IOException) {
    null
  }

  progressListener.onAnalysisProgress(PARSING_HEAP_DUMP)

  // HPROF 文件流的 Provider
  val sourceProvider =
    ConstantMemoryMetricsDualSourceProvider(ThrowingCancelableFileSourceProvider(heapDumpFile) {
      if (isCanceled()) {
        throw RuntimeException("Analysis canceled")
      }
    })

  val closeableGraph = try {
    // 解析 HPROF 文件
    sourceProvider.openHeapGraph(proguardMapping = proguardMappingReader?.readProguardMapping())
  } catch (throwable: Throwable) {
    return HeapAnalysisFailure(
      heapDumpFile = heapDumpFile,
      createdAtTimeMillis = System.currentTimeMillis(),
      analysisDurationMillis = 0,
      exception = HeapAnalysisException(throwable)
    )
  }
  return closeableGraph
    .use { graph ->
      // 解析成功的 HPROF 文件的信息存放在 graph 中，然后调用 HeapAnalyzer#analyze() 方法分析泄漏。 
      val result = heapAnalyzer.analyze(
        heapDumpFile = heapDumpFile,
        graph = graph,
        leakingObjectFinder = config.leakingObjectFinder,
        referenceMatchers = config.referenceMatchers,
        computeRetainedHeapSize = config.computeRetainedHeapSize,
        objectInspectors = config.objectInspectors,
        metadataExtractor = config.metadataExtractor
      )
      if (result is HeapAnalysisSuccess) {
        val lruCacheStats = (graph as HprofHeapGraph).lruCacheStats()
        val randomAccessStats =
          "RandomAccess[" +
            "bytes=${sourceProvider.randomAccessByteReads}," +
            "reads=${sourceProvider.randomAccessReadCount}," +
            "travel=${sourceProvider.randomAccessByteTravel}," +
            "range=${sourceProvider.byteTravelRange}," +
            "size=${heapDumpFile.length()}" +
            "]"
        val stats = "$lruCacheStats $randomAccessStats"
        result.copy(metadata = result.metadata + ("Stats" to stats))
      } else result
    }
}

通过上面的代码我们发现 LeakCanary 还支持解混淆的，上面的代码主要分为两大块儿逻辑，首先是通过 openHeapGraph() 方法去解析 HPROF 文件中的内容，解析出的内容存放在 graph 变量中。然后通过 HeapAnalyzer#analyze() 方法去找到泄漏对象的引用链。
HPROF 的相关处理是在 shark 的 module 中完成的，所以我们在看到有的地方的 LeakCanary 的 Logo 时有一个鲨鱼，shark 这个库也是可以单独引用的。

解析 HPROF 文件

直接看 openHeapGraph() 方法：

Kotlin 复制代码

fun DualSourceProvider.openHeapGraph(
  proguardMapping: ProguardMapping? = null,
  indexedGcRootTypes: Set<HprofRecordTag> = HprofIndex.defaultIndexedGcRootTags()
): CloseableHeapGraph {
  // TODO We can probably remove the concept of DualSourceProvider. Opening a heap graph requires
  //  a random access reader which is built from a random access source + headers.
  //  Also require headers, and the index.
  //  So really we're:
  //  1) Reader the headers from an okio source
  //  2) Reading the whole source streaming to create the index. Wondering if we really need to parse
  //  the headers, close the file then parse / skip the header part. Can't the parsing + indexing give
  //  us headers + index?
  //  3) Using the index + headers + a random access source on the content to create a closeable
  //  abstraction.
  //  Note: should see if Okio has a better abstraction for random access now.
  //  Also Use FileSystem + Path instead of File as the core way to open a file based heap dump.
  // 解析 Header
  val header = openStreamingSource().use { HprofHeader.parseHeaderOf(it) }
  // 解析 Record
  val index = HprofIndex.indexRecordsOf(this, header, proguardMapping, indexedGcRootTypes)
  // 将解析结果封装到 HprofHeapGraph 中
  return index.openHeapGraph()
}

通过 HprofHeader#parseHeaderOf() 方法解析 Header，通过 HprofIndex#indexRecordsOf() 方法解析 Record。

先看看 Header 的解析：

Kotlin 复制代码

fun parseHeaderOf(source: BufferedSource): HprofHeader {
  require(!source.exhausted()) {
    throw IllegalArgumentException("Source has no available bytes")
  }
  // 版本字符串结束的位置
  val endOfVersionString = source.indexOf(0)
  // 读取版本字符串
  val versionName = source.readUtf8(endOfVersionString)

  // 检查是否支持该版本的 HPROF 文件的版本，如果不支持，直接报错
  val version = supportedVersions[versionName]
  checkNotNull(version) {
    "Unsupported Hprof version [$versionName] not in supported list ${supportedVersions.keys}"
  }
  // Skip the 0 at the end of the version string.
  // 跳过字符串结尾的 0
  source.skip(1)
  // 读取 ID 或者引用占用的字节数
  val identifierByteSize = source.readInt()
  // 读取 DUMP 的时间戳
  val heapDumpTimestamp = source.readLong()
  return HprofHeader(heapDumpTimestamp, version, identifierByteSize)
}

上面的代码很简单，主要读取以下数据：

版本字符串版本参考：

Kotlin 复制代码

enum class HprofVersion(val versionString: String) {
  JDK1_2_BETA3("JAVA PROFILE 1.0"),
  JDK1_2_BETA4("JAVA PROFILE 1.0.1"),
  JDK_6("JAVA PROFILE 1.0.2"),
  ANDROID("JAVA PROFILE 1.0.3")
}

Android 固定为 JAVA PROFILE 1.0.3。

ID 或者引用占用的字节数
DUMP 的时间戳

继续看 HprofIndex.indexRecordsOf() 方法：

Kotlin 复制代码

fun indexRecordsOf(
  hprofSourceProvider: DualSourceProvider,
  hprofHeader: HprofHeader,
  proguardMapping: ProguardMapping? = null,
  indexedGcRootTags: Set<HprofRecordTag> = defaultIndexedGcRootTags()
): HprofIndex {
  val reader = StreamingHprofReader.readerFor(hprofSourceProvider, hprofHeader)
  val index = HprofInMemoryIndex.indexHprof(
    reader = reader,
    hprofHeader = hprofHeader,
    proguardMapping = proguardMapping,
    indexedGcRootTags = indexedGcRootTags
  )
  return HprofIndex(hprofSourceProvider, hprofHeader, index)
}

StreamingHprofReader 是核心的读取类，HprofInMemoryIndex.indexHprof() 方法也是基于它的读取结果然后把需要的数据存放在内存中，它有两次调用 StreamingHprofReader#readRecords() 方法来读取 Record，我们先看看 StreamingHprofReader#readRecords() 方法：

Kotlin 复制代码

@Suppress("ComplexMethod", "NestedBlockDepth")
fun readRecords(
  // 需要处理的 RecordTag
  recordTags: Set<HprofRecordTag>,
  // 需要处理的 Record 会回调给 listener
  listener: OnHprofRecordTagListener
): Long {
  return sourceProvider.openStreamingSource().use { source ->
    val reader = HprofRecordReader(header, source)
    // 跳过 Header
    reader.skip(header.recordsPosition)

    // Local ref optimizations
    val intByteSize = INT.byteSize
    val identifierByteSize = reader.sizeOf(REFERENCE_HPROF_TYPE)

    // 循环读取 Records
    while (!source.exhausted()) {
      // type of the record
      // 读取 Record 的 Tag
      val tag = reader.readUnsignedByte()

      // number of microseconds since the time stamp in the header
      // 跳过时间戳
      reader.skip(intByteSize)

      // number of bytes that follow and belong to this record
      // Record 的 Body 的字节数长度
      val length = reader.readUnsignedInt()

      // 处理不同的 Record
      when (tag) {
        STRING_IN_UTF8.tag -> {
          if (STRING_IN_UTF8 in recordTags) {
            listener.onHprofRecord(STRING_IN_UTF8, length, reader)
          } else {
            reader.skip(length)
          }
        }
        UNLOAD_CLASS.tag -> {
          if (UNLOAD_CLASS in recordTags) {
            listener.onHprofRecord(UNLOAD_CLASS, length, reader)
          } else {
            reader.skip(length)
          }
        }
        LOAD_CLASS.tag -> {
          if (LOAD_CLASS in recordTags) {
            listener.onHprofRecord(LOAD_CLASS, length, reader)
          } else {
            reader.skip(length)
          }
        }
        STACK_FRAME.tag -> {
          if (STACK_FRAME in recordTags) {
            listener.onHprofRecord(STACK_FRAME, length, reader)
          } else {
            reader.skip(length)
          }
        }
        STACK_TRACE.tag -> {
          if (STACK_TRACE in recordTags) {
            listener.onHprofRecord(STACK_TRACE, length, reader)
          } else {
            reader.skip(length)
          }
        }
        HEAP_DUMP.tag, HEAP_DUMP_SEGMENT.tag -> {
          // 读取子 Record
          val heapDumpStart = reader.bytesRead
          var previousTag = 0
          var previousTagPosition = 0L
          // 循环读取子 Record
          while (reader.bytesRead - heapDumpStart < length) {
            val heapDumpTagPosition = reader.bytesRead
            val heapDumpTag = reader.readUnsignedByte()
            when (heapDumpTag) {
              ROOT_UNKNOWN.tag -> {
                if (ROOT_UNKNOWN in recordTags) {
                  listener.onHprofRecord(ROOT_UNKNOWN, -1, reader)
                } else {
                  reader.skip(identifierByteSize)
                }
              }
              ROOT_JNI_GLOBAL.tag -> {
                if (ROOT_JNI_GLOBAL in recordTags) {
                  listener.onHprofRecord(ROOT_JNI_GLOBAL, -1, reader)
                } else {
                  reader.skip(identifierByteSize + identifierByteSize)
                }
              }
              ROOT_JNI_LOCAL.tag -> {
                if (ROOT_JNI_LOCAL in recordTags) {
                  listener.onHprofRecord(ROOT_JNI_LOCAL, -1, reader)
                } else {
                  reader.skip(identifierByteSize + intByteSize + intByteSize)
                }
              }

              ROOT_JAVA_FRAME.tag -> {
                if (ROOT_JAVA_FRAME in recordTags) {
                  listener.onHprofRecord(ROOT_JAVA_FRAME, -1, reader)
                } else {
                  reader.skip(identifierByteSize + intByteSize + intByteSize)
                }
              }

              ROOT_NATIVE_STACK.tag -> {
                if (ROOT_NATIVE_STACK in recordTags) {
                  listener.onHprofRecord(ROOT_NATIVE_STACK, -1, reader)
                } else {
                  reader.skip(identifierByteSize + intByteSize)
                }
              }

              ROOT_STICKY_CLASS.tag -> {
                if (ROOT_STICKY_CLASS in recordTags) {
                  listener.onHprofRecord(ROOT_STICKY_CLASS, -1, reader)
                } else {
                  reader.skip(identifierByteSize)
                }
              }
              ROOT_THREAD_BLOCK.tag -> {
                if (ROOT_THREAD_BLOCK in recordTags) {
                  listener.onHprofRecord(ROOT_THREAD_BLOCK, -1, reader)
                } else {
                  reader.skip(identifierByteSize + intByteSize)
                }
              }

              ROOT_MONITOR_USED.tag -> {
                if (ROOT_MONITOR_USED in recordTags) {
                  listener.onHprofRecord(ROOT_MONITOR_USED, -1, reader)
                } else {
                  reader.skip(identifierByteSize)
                }
              }

              ROOT_THREAD_OBJECT.tag -> {
                if (ROOT_THREAD_OBJECT in recordTags) {
                  listener.onHprofRecord(ROOT_THREAD_OBJECT, -1, reader)
                } else {
                  reader.skip(identifierByteSize + intByteSize + intByteSize)
                }
              }

              ROOT_INTERNED_STRING.tag -> {
                if (ROOT_INTERNED_STRING in recordTags) {
                  listener.onHprofRecord(ROOT_INTERNED_STRING, -1, reader)
                } else {
                  reader.skip(identifierByteSize)
                }
              }

              ROOT_FINALIZING.tag -> {
                if (ROOT_FINALIZING in recordTags) {
                  listener.onHprofRecord(ROOT_FINALIZING, -1, reader)
                } else {
                  reader.skip(identifierByteSize)
                }
              }

              ROOT_DEBUGGER.tag -> {
                if (ROOT_DEBUGGER in recordTags) {
                  listener.onHprofRecord(ROOT_DEBUGGER, -1, reader)
                } else {
                  reader.skip(identifierByteSize)
                }
              }

              ROOT_REFERENCE_CLEANUP.tag -> {
                if (ROOT_REFERENCE_CLEANUP in recordTags) {
                  listener.onHprofRecord(ROOT_REFERENCE_CLEANUP, -1, reader)
                } else {
                  reader.skip(identifierByteSize)
                }
              }

              ROOT_VM_INTERNAL.tag -> {
                if (ROOT_VM_INTERNAL in recordTags) {
                  listener.onHprofRecord(ROOT_VM_INTERNAL, -1, reader)
                } else {
                  reader.skip(identifierByteSize)
                }
              }

              ROOT_JNI_MONITOR.tag -> {
                if (ROOT_JNI_MONITOR in recordTags) {
                  listener.onHprofRecord(ROOT_JNI_MONITOR, -1, reader)
                } else {
                  reader.skip(identifierByteSize + intByteSize + intByteSize)
                }
              }

              ROOT_UNREACHABLE.tag -> {
                if (ROOT_UNREACHABLE in recordTags) {
                  listener.onHprofRecord(ROOT_UNREACHABLE, -1, reader)
                } else {
                  reader.skip(identifierByteSize)
                }
              }
              CLASS_DUMP.tag -> {
                if (CLASS_DUMP in recordTags) {
                  listener.onHprofRecord(CLASS_DUMP, -1, reader)
                } else {
                  reader.skipClassDumpRecord()
                }
              }
              INSTANCE_DUMP.tag -> {
                if (INSTANCE_DUMP in recordTags) {
                  listener.onHprofRecord(INSTANCE_DUMP, -1, reader)
                } else {
                  reader.skipInstanceDumpRecord()
                }
              }

              OBJECT_ARRAY_DUMP.tag -> {
                if (OBJECT_ARRAY_DUMP in recordTags) {
                  listener.onHprofRecord(OBJECT_ARRAY_DUMP, -1, reader)
                } else {
                  reader.skipObjectArrayDumpRecord()
                }
              }

              PRIMITIVE_ARRAY_DUMP.tag -> {
                if (PRIMITIVE_ARRAY_DUMP in recordTags) {
                  listener.onHprofRecord(PRIMITIVE_ARRAY_DUMP, -1, reader)
                } else {
                  reader.skipPrimitiveArrayDumpRecord()
                }
              }

              PRIMITIVE_ARRAY_NODATA.tag -> {
                throw UnsupportedOperationException("$PRIMITIVE_ARRAY_NODATA cannot be parsed")
              }

              HEAP_DUMP_INFO.tag -> {
                if (HEAP_DUMP_INFO in recordTags) {
                  listener.onHprofRecord(HEAP_DUMP_INFO, -1, reader)
                } else {
                  reader.skipHeapDumpInfoRecord()
                }
              }
              else -> throw IllegalStateException(
                "Unknown tag ${
                  "0x%02x".format(
                    heapDumpTag
                  )
                } at $heapDumpTagPosition after ${
                  "0x%02x".format(
                    previousTag
                  )
                } at $previousTagPosition"
              )
            }
            previousTag = heapDumpTag
            previousTagPosition = heapDumpTagPosition
          }
        }
        HEAP_DUMP_END.tag -> {
          if (HEAP_DUMP_END in recordTags) {
            listener.onHprofRecord(HEAP_DUMP_END, length, reader)
          }
        }
        else -> {
          reader.skip(length)
        }
      }
    }
    reader.bytesRead
  }
}

上面代码读取各种 Record，根据传入的参数去处理需要处理的 Record，处理方法也是通过传入的 listener 来处理，每种 Record 的详细读取方法请看我的上一篇文章介绍。

我们再继续看 HprofInMemoryIndex.indexHprof() 方法的实现：

Kotlin 复制代码

  fun indexHprof(
    reader: StreamingHprofReader,
    hprofHeader: HprofHeader,
    proguardMapping: ProguardMapping?,
    indexedGcRootTags: Set<HprofRecordTag>
  ): HprofInMemoryIndex {

    // First pass to count and correctly size arrays once and for all.
    var maxClassSize = 0L
    var maxInstanceSize = 0L
    var maxObjectArraySize = 0L
    var maxPrimitiveArraySize = 0L
    var classCount = 0
    var instanceCount = 0
    var objectArrayCount = 0
    var primitiveArrayCount = 0
    var classFieldsTotalBytes = 0
    val stickyClassGcRootIds = LongScatterSet()
    
    // 第一次读 Record，只读取实例和 ROOT_STICKY_CLASS
    val bytesRead = reader.readRecords(
      EnumSet.of(
        CLASS_DUMP,
        INSTANCE_DUMP,
        OBJECT_ARRAY_DUMP,
        PRIMITIVE_ARRAY_DUMP,
        ROOT_STICKY_CLASS
      )
    ) { tag, _, reader ->
      val bytesReadStart = reader.bytesRead
      when (tag) {
        CLASS_DUMP -> {
          // 记录 Class 数量
          classCount++
          reader.skipClassDumpHeader()
          val bytesReadStaticFieldStart = reader.bytesRead
          reader.skipClassDumpStaticFields()
          reader.skipClassDumpFields()
          // 记录单个 Class 占用的最大内存
          maxClassSize = max(maxClassSize, reader.bytesRead - bytesReadStart)
          // 记录所有 Class Field 占用的内存和
          classFieldsTotalBytes += (reader.bytesRead - bytesReadStaticFieldStart).toInt()
        }
        INSTANCE_DUMP -> {
          // 记录普通实例数量
          instanceCount++
          reader.skipInstanceDumpRecord()
          // 记录单个实例占用内存的最大值
          maxInstanceSize = max(maxInstanceSize, reader.bytesRead - bytesReadStart)
        }
        OBJECT_ARRAY_DUMP -> {
          // 记录对象数组的数量
          objectArrayCount++
          reader.skipObjectArrayDumpRecord()
          // 记录单个对象数组的最大值
          maxObjectArraySize = max(maxObjectArraySize, reader.bytesRead - bytesReadStart)
        }
        PRIMITIVE_ARRAY_DUMP -> {
          // 记录基本类型数组的数量
          primitiveArrayCount++
          reader.skipPrimitiveArrayDumpRecord()
          // 记录基本类型数组的最大值
          maxPrimitiveArraySize = max(maxPrimitiveArraySize, reader.bytesRead - bytesReadStart)
        }
        ROOT_STICKY_CLASS -> {
          // StickyClass has only 1 field: id. Our API 23 emulators in CI are creating heap
          // dumps with duplicated sticky class roots, up to 30K times for some objects.
          // There's no point in keeping all these in our list of roots, 1 per each is enough
          // so we deduplicate with stickyClassGcRootIds.
          val id = reader.readStickyClassGcRootRecord().id
          if (id != ValueHolder.NULL_REFERENCE) {
            // 记录 ROOT_STICKY_CLASS 的 ID，其实就是 Class 中静态 Field 的 GCRoot.
            stickyClassGcRootIds += id
          }
        }
        else -> {
          // Not interesting.
        }
      }
    }
    
    // 根据第一次扫描 Record 结果中的各种最大值来计算储存这些数据需要的字节数，供第二次扫描时储存这些数据时使用。
    val bytesForClassSize = byteSizeForUnsigned(maxClassSize)
    val bytesForInstanceSize = byteSizeForUnsigned(maxInstanceSize)
    val bytesForObjectArraySize = byteSizeForUnsigned(maxObjectArraySize)
    val bytesForPrimitiveArraySize = byteSizeForUnsigned(maxPrimitiveArraySize)

    val indexBuilderListener = Builder(
      longIdentifiers = hprofHeader.identifierByteSize == 8,
      maxPosition = bytesRead,
      classCount = classCount,
      instanceCount = instanceCount,
      objectArrayCount = objectArrayCount,
      primitiveArrayCount = primitiveArrayCount,
      bytesForClassSize = bytesForClassSize,
      bytesForInstanceSize = bytesForInstanceSize,
      bytesForObjectArraySize = bytesForObjectArraySize,
      bytesForPrimitiveArraySize = bytesForPrimitiveArraySize,
      classFieldsTotalBytes = classFieldsTotalBytes,
      stickyClassGcRootIds
    )

    val recordTypes = EnumSet.of(
      STRING_IN_UTF8,
      LOAD_CLASS,
      CLASS_DUMP,
      INSTANCE_DUMP,
      OBJECT_ARRAY_DUMP,
      PRIMITIVE_ARRAY_DUMP
    ) + HprofRecordTag.rootTags.intersect(indexedGcRootTags)
    
    // 第二次读 Records，读字符串，LOAD_CLASS，所有的实例和所有的 GCRoot。
    reader.readRecords(recordTypes, indexBuilderListener)
    return indexBuilderListener.buildIndex(proguardMapping, hprofHeader)
  }
}

第一次扫描 Record 只处理实例和 ROOT_STICKY_CLASS（也就是Class 静态 Field 的 GCRoot），这个过程中会记录各种类型实例的数量和各种最大值，其中包括 Class 数量，Class 实例最大的占用内存，所有 Class 的 Field 占用内存和，普通实例数量，普通实例最大的占用内存，对象数组的数量，对象数组的最大值，基本类型数组的数量，基本类型数组的最大值。通过计算这些数据的最大值供第二次扫描时计算储存这些数据需要多少个字节，计算的方法就是 byteSizeForUnsigned()，这里不再细看了。

第二次扫描 Record 会处理字符串，所有类型的实例和所有的 GCRoot，处理的 listener 是 Builder。我们来看看它都做了些什么。

Kotlin 复制代码

override fun onHprofRecord(
  tag: HprofRecordTag,
  length: Long,
  reader: HprofRecordReader
) {
  when (tag) {
    STRING_IN_UTF8 -> {
      // 保存字符串到 hprofStringCache
      hprofStringCache[reader.readId()] = reader.readUtf8(length - identifierSize)
    }
    LOAD_CLASS -> {
      // classSerialNumber
      reader.skip(INT.byteSize)
      val id = reader.readId()
      // stackTraceSerialNumber
      reader.skip(INT.byteSize)
      val classNameStringId = reader.readId()
      // 映射 class 的 ID 和 class name 的 id
      classNames[id] = classNameStringId
    }
    ROOT_UNKNOWN -> {
      reader.readUnknownGcRootRecord().apply {
        // 如果不为空，将 GCRoot 保存在 gcRoots 变量中
        if (id != ValueHolder.NULL_REFERENCE) {
          gcRoots += this
        }
      }
    }
    // ... 忽略很多的 GCRoot 的处理，它们的处理都是和 ROOT_UNKNOWN 一样。
    
    
    CLASS_DUMP -> {
      val bytesReadStart = reader.bytesRead
      val id = reader.readId()
      // stack trace serial number
      reader.skip(INT.byteSize)
      val superclassId = reader.readId()
      reader.skip(5 * identifierSize)

      // instance size (in bytes)
      // Useful to compute retained size
      val instanceSize = reader.readInt()

      reader.skipClassDumpConstantPool()

      val startPosition = classFieldsIndex

      val bytesReadFieldStart = reader.bytesRead
      
      // 读取所有静态 Field 到 classFieldBytes 中
      reader.copyToClassFields(2)
      val staticFieldCount = lastClassFieldsShort().toInt() and 0xFFFF
      for (i in 0 until staticFieldCount) {
        reader.copyToClassFields(identifierSize)
        reader.copyToClassFields(1)
        val type = classFieldBytes[classFieldsIndex - 1].toInt() and 0xff
        if (type == PrimitiveType.REFERENCE_HPROF_TYPE) {
          reader.copyToClassFields(identifierSize)
        } else {
          reader.copyToClassFields(PrimitiveType.byteSizeByHprofType.getValue(type))
        }
      }

      // 读取所有的成员 Field 到 classFieldBytes 中
      reader.copyToClassFields(2)
      val fieldCount = lastClassFieldsShort().toInt() and 0xFFFF
      for (i in 0 until fieldCount) {
        reader.copyToClassFields(identifierSize)
        reader.copyToClassFields(1)
      }
      
      // 记录静态和成员 Field 占用的空间大小
      val fieldsSize = (reader.bytesRead - bytesReadFieldStart).toInt()
      // 记录当前 Record 占用的空间大小
      val recordSize = reader.bytesRead - bytesReadStart
      
      // Class 的基本信息写入到 classIndex 中，id 为 key
      classIndex.append(id)
        .apply {
          // Record 开始读取的位置（HPROF 文件中的位置）
          writeTruncatedLong(bytesReadStart, positionSize)
          // SuperClass 的 Id
          writeId(superclassId)
          // Class 实例大小
          writeInt(instanceSize)
          // Record 的大小
          writeTruncatedLong(recordSize, bytesForClassSize)
          // Feild 开始读取的位置（classFieldBytes 成员变量中的位置）
          writeTruncatedLong(startPosition.toLong(), classFieldsIndexSize)
        }
      require(startPosition + fieldsSize == classFieldsIndex) {
        "Expected $classFieldsIndex to have moved by $fieldsSize and be equal to ${startPosition + fieldsSize}"
      }
    }
    INSTANCE_DUMP -> {
      val bytesReadStart = reader.bytesRead
      val id = reader.readId()
      reader.skip(INT.byteSize)
      val classId = reader.readId()
      val remainingBytesInInstance = reader.readInt()
      // 跳过实例中的数据（也就是 Field 的值的数据）
      reader.skip(remainingBytesInInstance)
      val recordSize = reader.bytesRead - bytesReadStart
      
      // 普通 Instance 的数据写入到 instanceIndex 中。
      instanceIndex.append(id)
        .apply {
          // Record 开始读取的位置（HPROF 文件中的位置）
          writeTruncatedLong(bytesReadStart, positionSize)
          // ClassId
          writeId(classId)
          writeTruncatedLong(recordSize, bytesForInstanceSize)
        }
    }
    OBJECT_ARRAY_DUMP -> {
      val bytesReadStart = reader.bytesRead
      val id = reader.readId()
      // stack trace serial number
      reader.skip(INT.byteSize)
      val size = reader.readInt()
      val arrayClassId = reader.readId()
      // 跳过数组中的内容
      reader.skip(identifierSize * size)
      // record size - (ID+INT + INT + ID)
      val recordSize = reader.bytesRead - bytesReadStart
      // 普通对象数组写入到 objectArrayIndex
      objectArrayIndex.append(id)
        .apply {
          // Record 开始读取的位置（HPROF 文件中的位置）
          writeTruncatedLong(bytesReadStart, positionSize)
          writeId(arrayClassId)
          writeTruncatedLong(recordSize, bytesForObjectArraySize)
        }
    }
    PRIMITIVE_ARRAY_DUMP -> {
      val bytesReadStart = reader.bytesRead
      val id = reader.readId()
      reader.skip(INT.byteSize)
      val size = reader.readInt()
      val type = PrimitiveType.primitiveTypeByHprofType.getValue(reader.readUnsignedByte())
      // 跳过数组中的内容
      reader.skip(size * type.byteSize)
      val recordSize = reader.bytesRead - bytesReadStart
      // 基本类型数组写入到 primitiveArrayIndex 
      primitiveArrayIndex.append(id)
        .apply {
          // Record 开始读取的位置（HPROF 文件中的位置）
          writeTruncatedLong(bytesReadStart, positionSize)
          // 基本类型
          writeByte(type.ordinal.toByte())
          writeTruncatedLong(recordSize, bytesForPrimitiveArraySize)
        }
    }
    else -> {
      // Not interesting.
    }
  }
}

第二次扫描就开始记录 HPROF 文件中的一些重要的数据了，这些数据都是记录在 Builder 对象中。首先将字符串 ID 和 String 的映射记录在 hprofStringCache 中；ClassId 和对应的 StringId 的映射记录在 classNames 中；所有的 GCRoot 都记录在 gcRoots 中；Class 的基本信息记录在 classIndex 中；Class 对应的成员和静态 Field 都记录在 classFieldBytes 中，他是一个字节数组，在 classIndex 中有记录对应 Field 开始读取的位置；普通实例记录在 instanceIndex；普通实例数组记录在 objectArrayIndex；基本类型数组记录在 primitiveArrayIndex。

因为所有的数据都是保存在 Builder 中，我们再看看它的 buildIndex() 方法：

Kotlin 复制代码

  fun buildIndex(
    proguardMapping: ProguardMapping?,
    hprofHeader: HprofHeader
  ): HprofInMemoryIndex {
    require(classFieldsIndex == classFieldBytes.size) {
      "Read $classFieldsIndex into fields bytes instead of expected ${classFieldBytes.size}"
    }

    val sortedInstanceIndex = instanceIndex.moveToSortedMap()
    val sortedObjectArrayIndex = objectArrayIndex.moveToSortedMap()
    val sortedPrimitiveArrayIndex = primitiveArrayIndex.moveToSortedMap()
    val sortedClassIndex = classIndex.moveToSortedMap()
    // Passing references to avoid copying the underlying data structures.
    return HprofInMemoryIndex(
      positionSize = positionSize,
      hprofStringCache = hprofStringCache,
      classNames = classNames,
      classIndex = sortedClassIndex,
      instanceIndex = sortedInstanceIndex,
      objectArrayIndex = sortedObjectArrayIndex,
      primitiveArrayIndex = sortedPrimitiveArrayIndex,
      gcRoots = gcRoots,
      proguardMapping = proguardMapping,
      bytesForClassSize = bytesForClassSize,
      bytesForInstanceSize = bytesForInstanceSize,
      bytesForObjectArraySize = bytesForObjectArraySize,
      bytesForPrimitiveArraySize = bytesForPrimitiveArraySize,
      useForwardSlashClassPackageSeparator = hprofHeader.version != ANDROID,
      classFieldsReader = ClassFieldsReader(identifierSize, classFieldBytes),
      classFieldsIndexSize = classFieldsIndexSize,
      stickyClassGcRootIds = stickyClassGcRootIds,
    )
  }
}

扫描的各种数据又封装在 HprofInMemoryIndex 中。

Kotlin 复制代码

fun indexRecordsOf(
  hprofSourceProvider: DualSourceProvider,
  hprofHeader: HprofHeader,
  proguardMapping: ProguardMapping? = null,
  indexedGcRootTags: Set<HprofRecordTag> = defaultIndexedGcRootTags()
): HprofIndex {
  val reader = StreamingHprofReader.readerFor(hprofSourceProvider, hprofHeader)
  val index = HprofInMemoryIndex.indexHprof(
    reader = reader,
    hprofHeader = hprofHeader,
    proguardMapping = proguardMapping,
    indexedGcRootTags = indexedGcRootTags
  )
  return HprofIndex(hprofSourceProvider, hprofHeader, index)
}

然后 HprofInMemoryIndex 又被封装在 HprofIndex 中。

后面又会调用 HprofIndex#openHeapGraph() 方法：

Kotlin 复制代码

fun openHeapGraph(): CloseableHeapGraph {
  val reader = RandomAccessHprofReader.openReaderFor(sourceProvider, header)
  return HprofHeapGraph(header, reader, index)
}

然后解析的数据又全部被封装在 HprofHeapGraph 中，后面检查泄漏的实例时，很多的操作都需要通过 HprofHeapGraph。

最后

本篇文章主要介绍了如何解析 HPROF 文件，本来计划把如何查找泄漏的对象也讲完，不过本篇文章有点长了，所以下一篇文章再介绍如何查找泄漏的对象。