背景

在分析ART虚拟机的中对象模型时，发现Mirror Object类中引用的其他object指针，都使用了HeapReference来包装，比如，mirror::Throwable类中引用的其他几个对象的指针，都是HeapReference类型：

而HeapReference类只是对uint32_t类型的值进行包装，该类的声明如下：

[/art/runtime/mirror/object_reference.h](https://link.juejin.cn?target=http%3A%2F%2Faospxref.com%2Fandroid-13.0.0_r3%2Fxref%2Fart%2Fruntime%2Fmirror%2Fobject_reference.h%23167 "http://aospxref.com/android-13.0.0_r3/xref/art/runtime/mirror/object_reference.h#167")

因此，Mirror object指针保存在reference_这个成员变量里，而且这是一个32位的无符号数。

这个32位无符号数，可以通过Decompress()函数强制转换为Java对象的指针，代码如下：

因此，在ART虚拟机中，Java对象的指针是一个32位的地址，而正常来说，64位系统上，虚拟内存地址都是64位的。

64位系统上，将对象的指针保存在32位的内存地址上，这是一种指针压缩技术。

下面详细介绍指针压缩技术。

指针压缩

指针压缩是通过将指针的位数从64位减小到32位来实现的。在传统的64位系统上，每个指针通常占用8个字节（64位），而在指针压缩技术下，每个指针只需要4个字节（32位）。这种压缩技术可以显著减小指针占用的内存空间，从而提高内存利用率。

查阅资料发现，其实在Java的HotSpot虚拟机，Javascript的V8引擎以及Dart虚拟机等现代虚拟机中，都使用了这种指针压缩技术。

Dart VM ：

Dart 2.15 发布的新特性 · GitBook

Java指针压缩：

Java指针压缩

Compressed Ordinary Object Pointers

Ordinary Object Pointers，oops即普通对象指针。启用CompressOops后，以下对象的指针会被压缩：

每个Class的属性指针（静态成员变量）

每个对象的属性指针

普通对象数组的每个元素指针

启动压缩后，JVM保存32位的指针，但是在64位机器中，最终还是需要一个64位的地址来访问数据。这里JVM需要做一个对指针数据编码、解码的工作。在机器码中植入压缩与解压指令来实现以下过程：

首先，每个对象的大小一定是8字节的倍数，因为JVM会在对象的末尾加上数据进行对齐填充（Padding）。

假设对象x中有3个引用，a在地址0，b在地址8，c在地址16。那么在x中记录引用信息的时候，可以不记录0, 8, 16...这些数值，而是可以使用0, 1, 2...（即地址右移3位，相当于除8），这一步称为encode。在访问x.c的时候，拿到的地址信息是2，这里做一次decode（即地址左移3位，相当于乘8）得到地址16，然后就可以访问到c了。

这样，虽然我们使用32位来存储指针，但是我们多出了8倍的可寻址空间。所以压缩指针的方式可以访问的内存是4G * 8 = 32G。

V8引擎的指针压缩：

V8 中的指针压缩

内存和性能之间一直在战斗。作为用户，我们希望速度快，同时消耗尽可能少的内存。不幸的是，提高性能通常是以消耗内存为代价的（反之亦然）。

早在 2014 年，Chrome 从 32 位进程切换到 64 位进程。这为 Chrome 提供了更好的安全性、稳定性和性能，但它带来了内存成本，因为每个指针现在占用 8 个字节而不是 4 个字节。我们接受了在 V8 中减少这种开销的挑战，以尝试尽可能多地回收浪费的 4 个字节。

在深入实施之前，我们需要知道我们所处的位置以正确评估情况。为了衡量我们的内存和性能，我们使用一组反映流行的现实世界网站的网页。数据显示，V8 贡献了桌面上 Chrome 渲染器进程内存消耗的 60%，平均为 40%。

指针压缩（Pointer Compression）是 V8 中为减少内存消耗而进行的多项努力之一。这个想法很简单：我们可以存储来自某个"基（base）"地址的 32 位偏移量，而不是存储 64 位指针。有了这样一个简单的想法，我们可以从 V8 中的这种压缩中获得多少收益？

V8 堆包含大量项目，例如浮点值、字符串字符、解释器字节码和标记值（tagged values，有关详细信息，请参阅下一节）。在检查堆后，我们发现在现实世界的网站上，这些标记值占据了 V8 堆的 70% 左右！

ART指针压缩分析

在Linux中，分配内存一般使用malloc或者mmap函数，但这两个函数并没有提供分配内存到低4G内存空间（32位地址）上的能力。那么ART虚拟机是如何实现将对象内存分配低4G内存空间上的呢？

看看最简单的虚拟机堆内存分配器large object space的分配规则，这是专门针对Java大对象的分配算法。

大对象分配的具体实现上，有两种，一种是在32位上使用的LargeObjectMapSpace，另一种是64位上使用的FreeListSpace。

这里我们详细看看Java大对象内存分配内存的过程。

两者的内存分配的代码如下：

[/art/runtime/gc/space/large_object_space.cc](https://link.juejin.cn?target=http%3A%2F%2Faospxref.com%2Fandroid-13.0.0_r3%2Fxref%2Fart%2Fruntime%2Fgc%2Fspace%2Flarge_object_space.cc%23136 "http://aospxref.com/android-13.0.0_r3/xref/art/runtime/gc/space/large_object_space.cc#136")

两者都是调用MemMap::MapAnonymous函数分配的内存，这里注意到，调用这个函数传入的第四个参数low_4gb的值都是true，根据参数名称亦可以，是要将内存分配到低4G的内存地址上。

MapAnonymous函数最终调用了MapInternal函数，在这个函数中，由于在arm64位机型上，USE_ART_LOW_4G_ALLOCATOR这个宏的值为1，因此调用到了MapInternalArtLow4GBAllocator函数中，代码如下：

[/art/libartbase/base/mem_map.cc](https://link.juejin.cn?target=http%3A%2F%2Faospxref.com%2Fandroid-13.0.0_r3%2Fxref%2Fart%2Flibartbase%2Fbase%2Fmem_map.cc%231131 "http://aospxref.com/android-13.0.0_r3/xref/art/libartbase/base/mem_map.cc#1131")

arduino 复制代码

void* MemMap::MapInternal(void* addr,
                          size_t length,
                          int prot,
                          int flags,
                          int fd,
                          off_t offset,
                          bool low_4gb) {
#ifdef __LP64__
  // When requesting low_4g memory and having an expectation, the requested range should fit into
  // 4GB.
  if (low_4gb && (
      // Start out of bounds.
      (reinterpret_cast<uintptr_t>(addr) >> 32) != 0 ||
      // End out of bounds. For simplicity, this will fail for the last page of memory.
      ((reinterpret_cast<uintptr_t>(addr) + length) >> 32) != 0)) {
    LOG(ERROR) << "The requested address space (" << addr << ", "
               << reinterpret_cast<void*>(reinterpret_cast<uintptr_t>(addr) + length)
               << ") cannot fit in low_4gb";
    return MAP_FAILED;
  }
#else
  UNUSED(low_4gb);
#endif
  DCHECK_ALIGNED(length, kPageSize);
  // TODO:
  // A page allocator would be a useful abstraction here, as
  // 1) It is doubtful that MAP_32BIT on x86_64 is doing the right job for us
  void* actual = MAP_FAILED;
#if USE_ART_LOW_4G_ALLOCATOR  // arm64机型上，这个值是1
  // MAP_32BIT only available on x86_64.
  if (low_4gb && addr == nullptr) {
    // The linear-scan allocator has an issue when executable pages are denied (e.g., by selinux
    // policies in sensitive processes). In that case, the error code will still be ENOMEM. So
    // the allocator will scan all low 4GB twice, and still fail. This is *very* slow.
    //
    // To avoid the issue, always map non-executable first, and mprotect if necessary.
    const int orig_prot = prot;
    const int prot_non_exec = prot & ~PROT_EXEC;
    // 用户态实现的一种分配32位地址上的内存分配算法
    actual = MapInternalArtLow4GBAllocator(length, prot_non_exec, flags, fd, offset);

    if (actual == MAP_FAILED) {
      return MAP_FAILED;
    }

    // See if we need to remap with the executable bit now.
    if (orig_prot != prot_non_exec) {
      if (mprotect(actual, length, orig_prot) != 0) {
        PLOG(ERROR) << "Could not protect to requested prot: " << orig_prot;
        TargetMUnmap(actual, length);
        errno = ENOMEM;
        return MAP_FAILED;
      }
    }
    return actual;
  }

  actual = TargetMMap(addr, length, prot, flags, fd, offset);
#else
#if defined(__LP64__)
  if (low_4gb && addr == nullptr) {
    flags |= MAP_32BIT;   // x64机型上MAP_32BIT标志才生效
  }
#endif
  actual = TargetMMap(addr, length, prot, flags, fd, offset);
#endif
  return actual;
}


#if defined(__LP64__) && !defined(__Fuchsia__) && (defined(__aarch64__) || defined(__APPLE__))
#define USE_ART_LOW_4G_ALLOCATOR 1
#else
#if defined(__LP64__) && !defined(__Fuchsia__) && !defined(__x86_64__)
#error "Unrecognized 64-bit architecture."
#endif
#define USE_ART_LOW_4G_ALLOCATOR 0
#endif

而如果USE_ART_LOW_4G_ALLOCATOR宏的值为false时，64位系统下，给mmap函数设置了MAP_32BIT这个flag。

Linux man文件中说明了mmap函数的flag: MAP_32BIT的使用方法：

[Linux mmap doc](https://link.juejin.cn?target=https%3A%2F%2Fman7.org%2Flinux%2Fman-pages%2Fman2%2Fmmap.2.html "https://man7.org/linux/man-pages/man2/mmap.2.html")

根据文档描叙可知，MAP_32BIT这个标志只在x86-64设备上才生效，目的是允许线程栈分配到开头了2GB内存位置，可以提升上下文切换的性能。

所以，针对arm64设备，在MapInternalArtLow4GBAllocator函数中，源码实现了一套用户态的内存地址分配到前4GB位置的算法。

算法实现

MapInternalArtLow4GBAllocator中的算法代码如下：

[/art/libartbase/base/mem_map.cc](https://link.juejin.cn?target=http%3A%2F%2Faospxref.com%2Fandroid-13.0.0_r3%2Fxref%2Fart%2Flibartbase%2Fbase%2Fmem_map.cc%23MapInternalArtLow4GBAllocator "http://aospxref.com/android-13.0.0_r3/xref/art/libartbase/base/mem_map.cc#MapInternalArtLow4GBAllocator")

c 复制代码

void* MemMap::MapInternalArtLow4GBAllocator(size_t length,
                                            int prot,
                                            int flags,
                                            int fd,
                                            off_t offset) {
#if USE_ART_LOW_4G_ALLOCATOR
  void* actual = MAP_FAILED;

  bool first_run = true;

  std::lock_guard<std::mutex> mu(*mem_maps_lock_);
  for (uintptr_t ptr = next_mem_pos_; ptr < 4 * GB; ptr += kPageSize) {
    // Use gMaps as an optimization to skip over large maps.
    // Find the first map which is address > ptr.
    // gMaps是一个multimap对象，键是MMap分配的内存首地址，值是对应的MMap对象的指针
    // 从gMaps已分配的内存地址中查找第一个大于ptr的地址，然后找到gMap中上一个地址，
    // 上一个地址的end位置不能大于ptr的地址,否则地址出现重叠
    auto it = gMaps->upper_bound(reinterpret_cast<void*>(ptr));
    if (it != gMaps->begin()) {
      auto before_it = it;
      --before_it;
      // Start at the end of the map before the upper bound.
      ptr = std::max(ptr, reinterpret_cast<uintptr_t>(before_it->second->BaseEnd()));
      CHECK_ALIGNED(ptr, kPageSize);
    }
    // 遍历gMaps中所有已分配的内存地址，找到已分配的地址段的end到下一个start的长度小于待分配的length的位置
    while (it != gMaps->end()) {
      // How much space do we have until the next map?
      size_t delta = reinterpret_cast<uintptr_t>(it->first) - ptr;
      // If the space may be sufficient, break out of the loop.
      if (delta >= length) { // 找到足够的空间，则返回
        break;
      }
      // Otherwise, skip to the end of the map.
      ptr = reinterpret_cast<uintptr_t>(it->second->BaseEnd());
      CHECK_ALIGNED(ptr, kPageSize);
      ++it;
    }

    // Try to see if we get lucky with this address since none of the ART maps overlap.
    actual = TryMemMapLow4GB(reinterpret_cast<void*>(ptr), length, prot, flags, fd, offset);
    if (actual != MAP_FAILED) {
      next_mem_pos_ = reinterpret_cast<uintptr_t>(actual) + length;
      return actual;
    }

    if (4U * GB - ptr < length) {
      // Not enough memory until 4GB.
      if (first_run) {
        // Try another time from the bottom;
        ptr = LOW_MEM_START - kPageSize;
        first_run = false;
        continue;
      } else {
        // Second try failed.
        break;
      }
    }

    uintptr_t tail_ptr;

    // Check pages are free.
    bool safe = true;
    for (tail_ptr = ptr; tail_ptr < ptr + length; tail_ptr += kPageSize) {
      if (msync(reinterpret_cast<void*>(tail_ptr), kPageSize, 0) == 0) {
        safe = false;
        break;
      } else {
        DCHECK_EQ(errno, ENOMEM);
      }
    }

    next_mem_pos_ = tail_ptr;  // update early, as we break out when we found and mapped a region

    if (safe == true) {
      actual = TryMemMapLow4GB(reinterpret_cast<void*>(ptr), length, prot, flags, fd, offset);
      if (actual != MAP_FAILED) {
        return actual;
      }
    } else {
      // Skip over last page.
      ptr = tail_ptr;
    }
  }

  if (actual == MAP_FAILED) {
    LOG(ERROR) << "Could not find contiguous low-memory space.";
    errno = ENOMEM;
  }
  return actual;
#else
  UNUSED(length, prot, flags, fd, offset);
  LOG(FATAL) << "Unreachable";
  UNREACHABLE();
#endif
}


#if USE_ART_LOW_4G_ALLOCATOR
void* MemMap::TryMemMapLow4GB(void* ptr,
                                    size_t page_aligned_byte_count,
                                    int prot,
                                    int flags,
                                    int fd,
                                    off_t offset) {
  // 传入目标内存的首地址，调用mmap函数分配内存
  void* actual = TargetMMap(ptr, page_aligned_byte_count, prot, flags, fd, offset);
  if (actual != MAP_FAILED) {
    // Since we didn't use MAP_FIXED the kernel may have mapped it somewhere not in the low
    // 4GB. If this is the case, unmap and retry.
    if (reinterpret_cast<uintptr_t>(actual) + page_aligned_byte_count >= 4 * GB) {
      TargetMUnmap(actual, page_aligned_byte_count);
      actual = MAP_FAILED;
    }
  }
  return actual;
}
#endif

该算法的实现流程如下：

首次调用时，用于查找可用内存段的next_mem_pos_的初始值通过GenerateNextMemPos()函数来生成；
gMaps是一个静态全局变量，类型是std::multimap，键(key)存的是使用MemMap类分配过的内存首地址，值(value)是对应的MemMap的指针，根据key的值从小到大排列；
找到第一个大于起始地址next_mem_pos_的已分配的内存段，然后比较上一个内存段的尾部地址，跟起始值做对比，确保地址不出现重叠；
从大于起始位置next_mem_pos_的已分配内存段开始，依次遍历gMaps，找到大于等于length的空闲内存段，调用mmap完成分配；

图解如下：

其中，gMaps是一个静态成员变量，在MemMap构造函数中，将mmap分配出来的内存首地址和当前mmap指针保存到gMaps中，并按照地址由小到大的顺序排列，代码如下：

scss 复制代码

MemMap::MemMap(const std::string& name, uint8_t* begin, size_t size, void* base_begin,
               size_t base_size, int prot, bool reuse, size_t redzone_size)
    : name_(name), begin_(begin), size_(size), base_begin_(base_begin), base_size_(base_size),
      prot_(prot), reuse_(reuse), already_unmapped_(false), redzone_size_(redzone_size) {
  if (size_ == 0) {
    CHECK(begin_ == nullptr);
    CHECK(base_begin_ == nullptr);
    CHECK_EQ(base_size_, 0U);
  } else {
    CHECK(begin_ != nullptr);
    CHECK(base_begin_ != nullptr);
    CHECK_NE(base_size_, 0U);

    // Add it to gMaps.
    std::lock_guard<std::mutex> mu(*mem_maps_lock_);
    DCHECK(gMaps != nullptr);
    // 将已分配的内存首地址和MemMap指针保存到gMaps的键值对中
    gMaps->insert(std::make_pair(base_begin_, this));
  }
}

小结：

使用gMap记录0-4G的内存段上已分配的内存段的信息，然后按顺序遍历已分配的内存段的信息，找出未分配的内存段的首地址，传入到mmap函数中，最终分配出内存，这个内存地址就一定是在0-4G的地址上。

总结

Heap给对象分配内存的地址都在虚拟内存的低32位上。由于内核没有提供用于分配内存到低32位的地址上的系统调用，虚拟机利用已经分配的内存段的信息，在空闲的地32位内存段上查找空间足够的地址，然后调用mmap分配内存，从而实现了用户态的低32位地址的内存分配。

利用这种指针压缩技术可以显著减小指针占用的内存空间，从而提高内存利用率。

详解Android ART虚拟机中对象指针压缩技术

背景

指针压缩

Dart VM ：

Java指针压缩：

Compressed Ordinary Object Pointers

V8引擎的指针压缩：

ART指针压缩分析

算法实现

总结

参考

详解Android ART虚拟机中对象指针压缩技术

背景

指针压缩

Dart VM ：

Java指针 压缩 ：

Compressed Ordinary Object Pointers

V8引擎 的指针 压缩 ：

ART指针压缩分析

算法实现

总结

参考

Java指针压缩：

V8引擎的指针压缩：