
调用栈里的引用类型数据是GC的根集合(root set)的重要组成部分;找出栈上的引用是GC的根枚举(root enumeration)中不可或缺的一环。



如果JVM选择不记录任何这种类型的数据,那么它就无法区分内存里某个位置上的数据到底应该解读为引用类型还是整型还是别的什么。这种条件下,实现出来的GC就会是"[b]保守式GC(conservative GC)[/b]"。在进行GC的时候,JVM开始从一些已知位置(例如说JVM栈)开始扫描内存,扫描的时候每看到一个数字就看看它"像不像是一个指向GC堆中的指针"。这里会涉及上下边界检查(GC堆的上下界是已知的)、对齐检查(通常分配空间的时候会有对齐要求,假如说是4字节对齐,那么不能被4整除的数字就肯定不是指针),之类的。然后递归的这么扫描出去。

保守式GC的好处是相对来说实现简单些,而且可以方便的用在对GC没有特别支持的编程语言里提供自动内存管理功能。[url=http://www.hpl.hp.com/personal/Hans_Boehm/gc/\]Boehm-Demers-Weiser GC[/url]是保守式GC中的典型代表,可以嵌入到C或C++等语言写的程序中。




信息来源:[url=channel9.msdn.com/Shows/Behind+The+Code/Patrick-Dussud-Managing-Garbage-Collection]Patrick Dussud在Channel 9的访谈,23分钟左右[/url]



2、由于不知道疑似指针是否真的是指针,所以它们的值都不能改写;移动对象就意味着要修正指针。换言之,对象就不可移动了。有一种办法可以在使用保守式GC的同时支持对象的移动,那就是增加一个间接层,不直接通过指针来实现引用,而是添加一层"句柄"(handle)在中间,所有引用先指到一个句柄表里,再从句柄表找到实际对象。这样,要移动对象的话,只要修改句柄表里的内容即可。但是这样的话引用的访问速度就降低了。Sun JDK的Classic VM用过这种全handle的设计,但效果实在算不上好。


JVM可以选择在栈上不记录类型信息,而在对象上记录类型信息。这样的话,扫描栈的时候仍然会跟上面说的过程一样,但扫描到GC堆内的对象时因为对象带有足够类型信息了,JVM就能够判断出在该对象内什么位置的数据是引用类型了。这种是"[b]半保守式GC[/b]",也称为"根上保守(conservative with respect to the roots)"。


前面提到了Boehm GC,实际上它不但支持完全保守的方式,也可以支持半保守的方式。[url=gcc.gnu.org/java/]GCJ[/url]和[url=http://www.mono-project.com/\]Mono[/url]都是以半保守方式使用Boehm GC的例子。

Google Android的Dalvik VM的早期版本也是使用半保守式GC的一个例子。不过到2009年中的时候Dalvik VM的内部版本就已经开始支持准确式GC了------[url=http://osdir.com/ml/android-platform/2009-06/msg00024.html\]代价是优化过的DEX文件的体积膨胀了约9%[/url]。



完全保守的GC通常使用不移动对象的算法,例如mark-sweep。半保守方式的GC既可以使用mark-sweep,也可以使用移动部分对象的算法,例如[url=http://www.cs.cornell.edu/home/fms/mcc/bartlett.html\]Bartlett风格的mostly-copying GC[/url]。


与保守式GC相对的是"[b]准确式GC[/b]",原文可以是precise GC、exact GC、accurate GC或者type accurate GC。外国人也挺麻烦的,"准确"都统一不到一个词上⋯




1、让数据自身带上标记(tag)。这种做法在JVM里不常见,但在别的一些语言实现里有体现。就不详细介绍了。打标记的方式在半保守式GC中倒是更常见一些,例如CRuby就是用打标记的半保守式GC。CLDC-HI比较有趣,栈上对每个slot都配对一个字长的tag来说明它的类型,通过这种方式来减少stack map的开销;类似的实现在别的地方没怎么见过,大家一般都不这么取舍。


3、从外部记录下类型信息,存成映射表。现在三种主流的高性能JVM实现,HotSpot、JRockit和J9都是这样做的。其中,HotSpot把这样的数据结构叫做OopMap,JRockit里叫做livemap,J9里叫做GC map。Apache Harmony的DRLVM也把它叫[url=http://svn.apache.org/viewvc/harmony/enhanced/java/trunk/drlvm/vm/jitrino/src/codegenerator/ia32/Ia32GCMap.cpp?view=markup\]GCMap[/url]。








2、方法临返回前 / 调用方法的call指令后







HotSpot的解决方法是:所有经过JNI调用边界(调用JNI方法传入的参数、从JNI方法传回的返回值)的引用都必须用"句柄"(handle)包装起来。JNI需要调用Java API的时候也必须自己用句柄包装指针。在这种实现中,JNI方法里写的"jobject"实际上不是直接指向对象的指针,而是先指向一个句柄,通过句柄才能间接访问到对象。这样在扫描到JNI方法的时候就不需要扫描它的栈帧了------只要扫描句柄表就可以得到所有从JNI方法能访问到的GC堆里的对象。



[size=medium][b]实现例子:Oracle/Sun HotSpot VM[/b][/size]

Sun HotSpot VM从[url=http://java.sun.com/developer/technicalArticles/Networking/HotSpot/\]设计之初[/url]就使用准确式GC。

在HotSpot VM之前,Sun在1.0.x到1.2.x中提供的JVM(后来称为Classic VM)用的是半保守式的设计。在SPARC版JDK 1.2.1和1.2.2中提供的[url=http://labs.oracle.com/techrep/1998/abstract-67.html\]EVM[/url](也称为ExactVM)则实现了准确式GC。


简要回答里已经提到了HotSpot的实现方式。下面引用几篇论文分别看看HotSpot的client编译器(JDK6)与server编译器对OopMap / safepoint的支持。

[url=http://www.ssw.uni-linz.ac.at/Research/Papers/Ko08/Ko08.pdf\]Design of the Java HotSpot Client Compiler for Java 6[/url]

[quote]The Java HotSpotTM VM also provides various other garbage collectors [Sun Microsystems, Inc. 2006c]. Parallel garbage collectors for server machines with large physical memories and multiple CPUs distribute the work among multiple threads, thus decreasing the garbage collection overhead and increasing the application throughput. A concurrent mark-and-sweep algorithm [Boehm et al. 1991; Printezis and Detlefs 2000] allows the user program to continue its execution while dead objects are reclaimed.

Exact garbage collection requires information about pointers to heap objects. For machine code, this information is contained in object maps (also called oop maps) created by the JIT compiler. Besides, the compiler creates debugging information that maps the state of a compiled method back to the state of the interpreter. This enables aggressive compiler optimizations, because the VM can deoptimize [Holzle et al. 1992] back to a safe state when the assumptions under which an optimization was performed are invalidated (see Section 2.6). The machine code, the object maps, and the debugging information are stored together in a so-called native method object. Garbage collection and deoptimization are allowed to occur only at some discrete points in the program, called [i][b]safepoints[/b][/i], such as backward branches, method calls, return instructions, and operations that may throw an exception.


After register allocation, machine code can be generated in a rather simple and straightforward way. The compiler traverses the LIR, operation by operation, and emits appropriate machine instructions into a code buffer. This process also yields object maps and debugging information.[/quote]

A Compiler for the Java HotSpot Virtual Machine (描述的是2000年JDK 1.3.0里的HotSpot Client Compiler)

[quote]4.5 Debug Information

The back end generates [i]debug information[/i] as a side effect of code generation. Debug information includes a mapping of program counter offsets to bytecode indices, so-called pc-maps. It also includes oop maps (i.e., pointer maps), specifying the exact stack location of all oops for all pc's where GC can occur. Finally, it includes safepoint information, which is used for GC as well.

Pc-maps are required for exception handling: if a runtime exception occurs, the pc where the exception occurred is mapped to the corresponding bytecode index (bci). This bci is used to loop upi the exception handler.

Oop maps are bit maps specifying which stack and register locations hold oops for a given pc. During GC, these locations need to be visited since they represent roots for GC. Also, if GC moves objects, pointers must be updated.

GC cannot happen at arbitrary places, but only at safepoints. Safepoints are designated places in the code, usually function calls, backward branches, and return instructions. All threads running code (interpreted or compiled) must be suspended at a safepoint before GC is executed. At a safepoint, the location of all oops on the stack is known. The safepoint suspension mechanism is independent of the compiler. Its discussion is beyond the scope of this paper.

Debug information is stored in a compressed form together with the generated code in [i]native method objects[/i]. They reside in the [i]code cache[/i] managed by the VM.[/quote]

[url=http://www.usenix.org/events/jvm01/full_papers/paleczny/paleczny.pdf\]The Java HotSpot Server Compiler[/url]



Then we gather and output the information necessary for garbage collection. This includes the location of all object pointers which are alive across a safepoint, as well as the location of all values which are callee saves in the method, and all necessary computation information for derived pointers which have an object pointer as their base.


17.Code Generation

In addition to executable machine code, the code generator also provides oopmaps, debug info, exception tables, relocation information, and an implicit−null check table for use by the runtime system. All of this information is associated with one or more native-code offsets from method entry. Oopmaps and debug info are associated with the offset to their safepoint. Oopmaps are generated during register allocation and the code generator simply packages this information for the runtime. Safepoints at which a deoptimization may occur also record debug info describing either the constant value or native storage location for monitors, locals, and expression stack entries. The storage location may be a register or a stack frame offset.





[Verified Entry Point]

0x00007f3749ea3400: mov %eax,-0x6000(%rsp)


0x00007f3749ea3453: callq 0x00007f3749e7c820 ; OopMap{[0]=Oop off=88}

;*invokevirtual intern

; - TestC2OopMapGeneration::doTest@19 (line 4)

; {optimized virtual_call}

0x00007f3749ea3458: inc %ebp ;*iinc


OopMap{[0]=Oop off=88}


有一个OopMap与这条callq指令之后的一条指令(inc %ebp)关联在一起;




OopMap{零到多个"数据位置=内容类型"的记录 off=该OopMap关联的指令的位置}


[0]表示栈顶指针+偏移量0,这里就是[rsp + 0],也就是栈顶;右边的"=Oop"说明这个位置存着一个普通对象指针(ordinary object pointer,HotSpot将指向GC堆中对象开头位置的指针称为Oop)。



enum oop_types { // must fit in type_bits unused_value =0, // powers of 2, for masking OopMapStream oop_value = 1, value_value = 2, narrowoop_value = 4, callee_saved_value = 8, derived_oop_value= 16 };



staticvoid print_register_type(OopMapValue::oop_types x, VMReg optional, outputStream* st) { switch( x ) { case OopMapValue::oop_value: st->print("Oop"); break; case OopMapValue::value_value: st->print("Value" ); break; case OopMapValue::narrowoop_value: tty->print("NarrowOop" ); break; case OopMapValue::callee_saved_value: st->print("Callers_" ); optional->print_on(st); break; case OopMapValue::derived_oop_value: st->print("Derived_oop_" ); optional->print_on(st); break; default: ShouldNotReachHere(); }}



0x00007f3749ea3400 + 0x58 = 0x00007f3749ea3458,正好就是例子中callq指令后的inc %ebp所在的位置。





OopMap{rbp=Oop off=144}


OopMap{rbp=NarrowOop off=248}


OopMap{[296]=Callers_eax [292]=Callers_ecx [288]=Callers_edx [284]=Callers_ebx [272]=Callers_esi [268]=Callers_edi [28]=Callers_xmm0 [32]=Callers_xmm0 [36]=Callers_xmm1 [40]=Callers_xmm1 [44]=Callers_xmm2 [48]=Callers_xmm2 [52]=Callers_xmm3 [56]=Callers_xmm3 [60]=Callers_xmm4 [64]=Callers_xmm4 [68]=Callers_xmm5 [72]=Callers_xmm5 [76]=Callers_xmm6 [80]=Callers_xmm6 [84]=Callers_xmm7 [88]=Callers_xmm7 off=674}


OopMap{[24]=Oop [28]=Derived_oop_[24] [32]=Derived_oop_[24] off=192}

这里可以看到derived oop类型的数据,在栈上[28]位置的是由栈上[24]开始的一个对象的派生引用。


[size=medium][b]实现例子:IBM Sovereign JVM[/b][/size]

IBM在JDK 5之前主要提供的JVM是Sovereign VM,里面的GC是半保守式的(栈上保守,堆上准确);而从JDK 5开始主要提供J9 VM,里面的GC转为准确式。

关于IBM DK for Java 1.4.1中的Sovereign JVM所使用的半保守式GC设计,可以参考下面这段文字:

[url=http://download.boulder.ibm.com/ibmdl/pub/software/dw/jdk/diagnosis/GCandMemory-042005.pdf\]IBM JVM Garbage Collection and Storage Allocation techniques[/url]

[quote]4.1 Mark Phase

In this phase, all the live objects are marked. Because unreachable objects cannot be identified singly, all the reachable objects must be identified. Therefore, everything else must be garbage. The process of marking all reachable objects is also known as tracing.

The active state of the JVM is made up of the saved registers for each thread, the set of stacks that represent the threads, the static's that are in Java classes, and the set of local and global JNI references. All functions that are invoked in the JVM itself cause a frame on the C stack. This frame might contain instances of objects as a result of either an assignment to a local variable, or a parameter that is sent from the caller. All these references are treated equally by the tracing routines. The Garbage Collector views the stack of a thread as a set of 4-byte fields (8 bytes in 64-bit architecture) and scans them from the top to the bottom of each of the stacks. The Garbage Collector assumes that the stacks are 4-byte aligned (8-byte aligned in 64-bit architecture). Each slot is examined to see whether it points at an object that is in the heap. Note that this does not make it necessarily a pointer to an object, because it might be only an accidental combination of bits in a float or integer. So, when the Garbage Collector performs the scan of a thread stack, it handles conservatively anything that it finds. Anything that points at an object is assumed to be an object, but the object in question must not be moved during garbage collection. A slot is thought to be a pointer to an object if it meets these three requirements:

  1. It is grained (aligned) on an 8-byte boundary.
  2. It is inside the bounds of the heap.
  3. The allocbit is on.

Objects that are referenced in this way are known as roots, and have their dosed bit set on to indicate that they cannot be moved. The setting of dosed bits is done only if the Garbage Collector is to perform a compaction. Tracing can now proceed accurately. That is, the Garbage Collector can find references in the roots to other objects and, because it knows that they are real references, it can move them during compaction because it can change the reference. The tracing process uses a stack that can hold 4 KB entries. All references that are pushed to the stack are marked at the same time by setting the relevant markbit to on. The roots are marked and pushed to the stack and then the Garbage Collector starts to pop entries off the stack and trace them. Normal objects (not arrays) are traced by using the mtpr to access the classblock, which tells where references to other objects are to be found in this object. As each reference is found, if it is not already marked, it is marked and pushed.

Array objects are traced by looking at each array entry and, if it is not already marked, it is marked and pushed. Some additional code traces a small portion of the array at a time, to try to avoid mark stack overflow.

The above process continues repeatedly until the mark stack eventually becomes empty.[/quote]




早期版本的Mono采用Boehm GC,使用的是半保守模式。可以参考以下文档的描述:

[url=http://www.mono-project.com/Mono:Runtime#Mono.27s_use_of_Boehm_GC\]Mono's use of Boehm GC[/url]

[quote][size=small]Mono's use of Boehm GC[/size]

We are using the Boehm conservative GC in precise mode.

There are a few areas that the GC scans for pointers to managed objects:

  1. The heap (where other managed objects are allocated)
  2. thread stacks and registers
  3. static data area
  4. data structures allocated by the runtime

(1) is currently handled in mostly precise mode: almost always the GC will only consider memory words that contain only references to the heap, so there is very little chance of pointer misidentification and hence memory retention as a result. The new GC requires a fully precise mode here, so it will improve things marginally. The details about mostly precise have to do with large objects with sparse bitmaps of references and the handling of multiple appdomains safely.

(2) is always scanned conservatively. This will be true for the new GC, too, at least for the first versions, where I'll have my own share of fun at tracking the bugs that a moving generational GC will expose. Later we'll conservatively scan only the unmanaged part of the stacks.

(3) We already optimized this both with Boehm and the current GC to work in precise mode.

(4) I already optimized this to work in mostly precise mode (ie some data structures are dealt with precisely, others not yet). I'll need to do more work in this area, especially for the new GC, where having pinned objects can be a significant source of pain.[/quote]

从Mono 2.8开始,一种名为[url=http://www.mono-project.com/Compacting_GC\]SGen[/url]的新GC实现被包含在发布包中。但这个版本的SGen仍然是半保守式GC。

未来Mono将逐渐转为使用完全准确式的GC。这篇文档描述了设计思路:[url=http://www.mono-project.com/Compacting_GC#Precise_Stack_Marking\]Precise Stack Marking[/url]



[url=http://citeseer.ist.psu.edu/viewdoc/summary?doi=\]Finding References in Java Stacks[/url]

1997年Sun Labs的一篇论文。主要讲解了在Java的栈里找出引用的特点、问题与解决思路,特别是与jsr带来的问题。其中前三章概括介绍了保守式GC与准确式GC的概念、准确式GC涉及的元数据(stack map)、生成stack map的方法等内容。

[url=http://labs.oracle.com/techrep/1998/abstract-70.html\]GC Points in a Threaded Environment[/url]

1998年Sun Labs的一篇论文。讲解了多线程条件下允许GC发生的位置,所谓的"GC point",也称为"safepoint"。

[url=http://book.douban.com/subject/4873919/\]Oracle JRockit: The Definitive Guide[/url] 第三章,84-87页,Livemaps


[url=http://www.usenix.org/events/jvm01/full_papers/barabash/barabash.pdf\]Mostly Accurate Stack Scanning[/url]


[url=http://www.cs.tufts.edu/\~nr/cs257/archive/james-stichnoth/p118-stichnoth.pdf\]Support for Garbage Collection at Every Instruction in a Java Compiler[/url]

1999年Intel出的一篇论文。它的论点是通过采用压缩技术,即便为每条指令都生成GC map,消耗的空间也可以接受。

虽然如此,但现在主流的VM里没有一个是会为每条指令都生成GC map的;相反,多数是不但只在safepoint才生成GC map,而且平时还将GC map压缩起来放在内存里。

[url=http://d3s.mff.cuni.cz/publications/cpe0X.pdf\]Accurate Garbage Collection in Uncooperative Environments Revisited[/url]


[url=http://citeseer.ist.psu.edu/viewdoc/summary?doi=\]Runtime Tags Aren't Necessary[/url]

1988年Andrew W. Appel(虎书作者)写的一篇论文。它提出了在[url=http://en.wikipedia.org/wiki/ML_(programming_language)\]ML语言[/url]的静态多态类型系统的支持下,GC可以怎样在不使用tag的前提下将引用与非引用区分开。

[url=http://llvm.org/docs/GarbageCollection.html\]Accurate Garbage Collection with LLVM[/url]

编译器框架LLVM的官方文档,介绍了如何将LLVM与准确式GC结合起来。后面这个演示稿介绍了VMKit是如何将MMTk与LLVM整合在一起实现准确式GC的:[url=http://llvm.org/devmtg/2009-10/Geoffray_GarbageCollectionVMKit.pdf\]Precise and Efficient Garbage Collection in VMKit with MMTk[/url]。

Ruby Hacking Guide,第五章 垃圾收集,is_pointer_to_heap()






[url=http://book.douban.com/subject/1484763/\]Shared Source CLI Essentials[/url], 249-250页, Scheduling Collection

[url=http://callvirt.net/blog/files/Shared Source CLI 2.0 Internals.pdf\]Shared Source CLI 2.0 Internals[/url], 253-254页, Scheduling Collection


这本书在[url=http://rednaxelafx.iteye.com/blog/631981\]之前一帖[/url]里介绍过,强力推荐对VM的实现感兴趣的同学阅读。前提是能接受[url=http://www.microsoft.com/resources/msdn/en-us/MSDN-FILES/027/002/097/ShSourceCLILicense.htm\]SSCLI License[/url]。

上面提到的章节里介绍了SSCLI的safepoint与GC map。相关代码可以参考:




[url=http://msdn.microsoft.com/en-us/library/bb190764.aspx\]MSDN: SOS.dll (SOS Debugging Extension)[/url]


