diff options
author | Dan Gohman <djg@cray.com> | 2007-07-18 16:29:46 +0000 |
---|---|---|
committer | Dan Gohman <djg@cray.com> | 2007-07-18 16:29:46 +0000 |
commit | f17a25c88b892d30c2b41ba7ecdfbdfb2b4be9cc (patch) | |
tree | ebb79ea1ee5e3bc1fdf38541a811a8b804f0679a /docs/GarbageCollection.html | |
download | external_llvm-f17a25c88b892d30c2b41ba7ecdfbdfb2b4be9cc.zip external_llvm-f17a25c88b892d30c2b41ba7ecdfbdfb2b4be9cc.tar.gz external_llvm-f17a25c88b892d30c2b41ba7ecdfbdfb2b4be9cc.tar.bz2 |
It's not necessary to do rounding for alloca operations when the requested
alignment is equal to the stack alignment.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40004 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs/GarbageCollection.html')
-rw-r--r-- | docs/GarbageCollection.html | 534 |
1 files changed, 534 insertions, 0 deletions
diff --git a/docs/GarbageCollection.html b/docs/GarbageCollection.html new file mode 100644 index 0000000..0accd0c --- /dev/null +++ b/docs/GarbageCollection.html @@ -0,0 +1,534 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" + "http://www.w3.org/TR/html4/strict.dtd"> +<html> +<head> + <title>Accurate Garbage Collection with LLVM</title> + <link rel="stylesheet" href="llvm.css" type="text/css"> +</head> +<body> + +<div class="doc_title"> + Accurate Garbage Collection with LLVM +</div> + +<ol> + <li><a href="#introduction">Introduction</a> + <ul> + <li><a href="#feature">GC features provided and algorithms supported</a></li> + </ul> + </li> + + <li><a href="#interfaces">Interfaces for user programs</a> + <ul> + <li><a href="#roots">Identifying GC roots on the stack: <tt>llvm.gcroot</tt></a></li> + <li><a href="#allocate">Allocating memory from the GC</a></li> + <li><a href="#barriers">Reading and writing references to the heap</a></li> + <li><a href="#explicit">Explicit invocation of the garbage collector</a></li> + </ul> + </li> + + <li><a href="#gcimpl">Implementing a garbage collector</a> + <ul> + <li><a href="#llvm_gc_readwrite">Implementing <tt>llvm_gc_read</tt> and <tt>llvm_gc_write</tt></a></li> + <li><a href="#callbacks">Callback functions used to implement the garbage collector</a></li> + </ul> + </li> + <li><a href="#gcimpls">GC implementations available</a> + <ul> + <li><a href="#semispace">SemiSpace - A simple copying garbage collector</a></li> + </ul> + </li> + +<!-- + <li><a href="#codegen">Implementing GC support in a code generator</a></li> +--> +</ol> + +<div class="doc_author"> + <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> +</div> + +<!-- *********************************************************************** --> +<div class="doc_section"> + <a name="introduction">Introduction</a> +</div> +<!-- *********************************************************************** --> + +<div class="doc_text"> + +<p>Garbage collection is a widely used technique that frees the programmer from +having to know the life-times of heap objects, making software easier to produce +and maintain. Many programming languages rely on garbage collection for +automatic memory management. There are two primary forms of garbage collection: +conservative and accurate.</p> + +<p>Conservative garbage collection often does not require any special support +from either the language or the compiler: it can handle non-type-safe +programming languages (such as C/C++) and does not require any special +information from the compiler. The +<a href="http://www.hpl.hp.com/personal/Hans_Boehm/gc/">Boehm collector</a> is +an example of a state-of-the-art conservative collector.</p> + +<p>Accurate garbage collection requires the ability to identify all pointers in +the program at run-time (which requires that the source-language be type-safe in +most cases). Identifying pointers at run-time requires compiler support to +locate all places that hold live pointer variables at run-time, including the +<a href="#roots">processor stack and registers</a>.</p> + +<p> +Conservative garbage collection is attractive because it does not require any +special compiler support, but it does have problems. In particular, because the +conservative garbage collector cannot <i>know</i> that a particular word in the +machine is a pointer, it cannot move live objects in the heap (preventing the +use of compacting and generational GC algorithms) and it can occasionally suffer +from memory leaks due to integer values that happen to point to objects in the +program. In addition, some aggressive compiler transformations can break +conservative garbage collectors (though these seem rare in practice). +</p> + +<p> +Accurate garbage collectors do not suffer from any of these problems, but they +can suffer from degraded scalar optimization of the program. In particular, +because the runtime must be able to identify and update all pointers active in +the program, some optimizations are less effective. In practice, however, the +locality and performance benefits of using aggressive garbage allocation +techniques dominates any low-level losses. +</p> + +<p> +This document describes the mechanisms and interfaces provided by LLVM to +support accurate garbage collection. +</p> + +</div> + +<!-- ======================================================================= --> +<div class="doc_subsection"> + <a name="feature">GC features provided and algorithms supported</a> +</div> + +<div class="doc_text"> + +<p> +LLVM provides support for a broad class of garbage collection algorithms, +including compacting semi-space collectors, mark-sweep collectors, generational +collectors, and even reference counting implementations. It includes support +for <a href="#barriers">read and write barriers</a>, and associating <a +href="#roots">meta-data with stack objects</a> (used for tagless garbage +collection). All LLVM code generators support garbage collection, including the +C backend. +</p> + +<p> +We hope that the primitive support built into LLVM is sufficient to support a +broad class of garbage collected languages, including Scheme, ML, scripting +languages, Java, C#, etc. That said, the implemented garbage collectors may +need to be extended to support language-specific features such as finalization, +weak references, or other features. As these needs are identified and +implemented, they should be added to this specification. +</p> + +<p> +LLVM does not currently support garbage collection of multi-threaded programs or +GC-safe points other than function calls, but these will be added in the future +as there is interest. +</p> + +</div> + +<!-- *********************************************************************** --> +<div class="doc_section"> + <a name="interfaces">Interfaces for user programs</a> +</div> +<!-- *********************************************************************** --> + +<div class="doc_text"> + +<p>This section describes the interfaces provided by LLVM and by the garbage +collector run-time that should be used by user programs. As such, this is the +interface that front-end authors should generate code for. +</p> + +</div> + +<!-- ======================================================================= --> +<div class="doc_subsection"> + <a name="roots">Identifying GC roots on the stack: <tt>llvm.gcroot</tt></a> +</div> + +<div class="doc_text"> + +<div class="doc_code"><tt> + void %llvm.gcroot(<ty>** %ptrloc, <ty2>* %metadata) +</tt></div> + +<p> +The <tt>llvm.gcroot</tt> intrinsic is used to inform LLVM of a pointer variable +on the stack. The first argument contains the address of the variable on the +stack, and the second contains a pointer to metadata that should be associated +with the pointer (which <b>must</b> be a constant or global value address). At +runtime, the <tt>llvm.gcroot</tt> intrinsic stores a null pointer into the +specified location to initialize the pointer.</p> + +<p> +Consider the following fragment of Java code: +</p> + +<pre> + { + Object X; // A null-initialized reference to an object + ... + } +</pre> + +<p> +This block (which may be located in the middle of a function or in a loop nest), +could be compiled to this LLVM code: +</p> + +<pre> +Entry: + ;; In the entry block for the function, allocate the + ;; stack space for X, which is an LLVM pointer. + %X = alloca %Object* + ... + + ;; "CodeBlock" is the block corresponding to the start + ;; of the scope above. +CodeBlock: + ;; Initialize the object, telling LLVM that it is now live. + ;; Java has type-tags on objects, so it doesn't need any + ;; metadata. + call void %llvm.gcroot(%Object** %X, sbyte* null) + ... + + ;; As the pointer goes out of scope, store a null value into + ;; it, to indicate that the value is no longer live. + store %Object* null, %Object** %X + ... +</pre> + +</div> + +<!-- ======================================================================= --> +<div class="doc_subsection"> + <a name="allocate">Allocating memory from the GC</a> +</div> + +<div class="doc_text"> + +<div class="doc_code"><tt> + sbyte *%llvm_gc_allocate(unsigned %Size) +</tt></div> + +<p>The <tt>llvm_gc_allocate</tt> function is a global function defined by the +garbage collector implementation to allocate memory. It returns a +zeroed-out block of memory of the appropriate size.</p> + +</div> + +<!-- ======================================================================= --> +<div class="doc_subsection"> + <a name="barriers">Reading and writing references to the heap</a> +</div> + +<div class="doc_text"> + +<div class="doc_code"><tt> + sbyte *%llvm.gcread(sbyte *, sbyte **)<br> + void %llvm.gcwrite(sbyte*, sbyte*, sbyte**) +</tt></div> + +<p>Several of the more interesting garbage collectors (e.g., generational +collectors) need to be informed when the mutator (the program that needs garbage +collection) reads or writes object references into the heap. In the case of a +generational collector, it needs to keep track of which "old" generation objects +have references stored into them. The amount of code that typically needs to be +executed is usually quite small (and not on the critical path of any +computation), so the overall performance impact of the inserted code is +tolerable.</p> + +<p>To support garbage collectors that use read or write barriers, LLVM provides +the <tt>llvm.gcread</tt> and <tt>llvm.gcwrite</tt> intrinsics. The first +intrinsic has exactly the same semantics as a non-volatile LLVM load and the +second has the same semantics as a non-volatile LLVM store, with the +additions that they also take a pointer to the start of the memory +object as an argument. At code generation +time, these intrinsics are replaced with calls into the garbage collector +(<tt><a href="#llvm_gc_readwrite">llvm_gc_read</a></tt> and <tt><a +href="#llvm_gc_readwrite">llvm_gc_write</a></tt> respectively), which are then +inlined into the code. +</p> + +<p> +If you are writing a front-end for a garbage collected language, every load or +store of a reference from or to the heap should use these intrinsics instead of +normal LLVM loads/stores.</p> + +</div> + +<!-- ======================================================================= --> +<div class="doc_subsection"> + <a name="initialize">Garbage collector startup and initialization</a> +</div> + +<div class="doc_text"> + +<div class="doc_code"><tt> + void %llvm_gc_initialize(unsigned %InitialHeapSize) +</tt></div> + +<p> +The <tt>llvm_gc_initialize</tt> function should be called once before any other +garbage collection functions are called. This gives the garbage collector the +chance to initialize itself and allocate the heap spaces. The initial heap size +to allocate should be specified as an argument. +</p> + +</div> + +<!-- ======================================================================= --> +<div class="doc_subsection"> + <a name="explicit">Explicit invocation of the garbage collector</a> +</div> + +<div class="doc_text"> + +<div class="doc_code"><tt> + void %llvm_gc_collect() +</tt></div> + +<p> +The <tt>llvm_gc_collect</tt> function is exported by the garbage collector +implementations to provide a full collection, even when the heap is not +exhausted. This can be used by end-user code as a hint, and may be ignored by +the garbage collector. +</p> + +</div> + + +<!-- *********************************************************************** --> +<div class="doc_section"> + <a name="gcimpl">Implementing a garbage collector</a> +</div> +<!-- *********************************************************************** --> + +<div class="doc_text"> + +<p> +Implementing a garbage collector for LLVM is fairly straight-forward. The LLVM +garbage collectors are provided in a form that makes them easy to link into the +language-specific runtime that a language front-end would use. They require +functionality from the language-specific runtime to get information about <a +href="#gcdescriptors">where pointers are located in heap objects</a>. +</p> + +<p>The +implementation must include the <a +href="#allocate"><tt>llvm_gc_allocate</tt></a> and <a +href="#explicit"><tt>llvm_gc_collect</tt></a> functions, and it must implement +the <a href="#llvm_gc_readwrite">read/write barrier</a> functions as well. To +do this, it will probably have to <a href="#traceroots">trace through the roots +from the stack</a> and understand the <a href="#gcdescriptors">GC descriptors +for heap objects</a>. Luckily, there are some <a href="#gcimpls">example +implementations</a> available. +</p> +</div> + + +<!-- ======================================================================= --> +<div class="doc_subsection"> + <a name="llvm_gc_readwrite">Implementing <tt>llvm_gc_read</tt> and <tt>llvm_gc_write</tt></a> +</div> + +<div class="doc_text"> + <div class="doc_code"><tt> + void *llvm_gc_read(void*, void **)<br> + void llvm_gc_write(void*, void *, void**) + </tt></div> + +<p> +These functions <i>must</i> be implemented in every garbage collector, even if +they do not need read/write barriers. In this case, just load or store the +pointer, then return. +</p> + +<p> +If an actual read or write barrier is needed, it should be straight-forward to +implement it. +</p> + +</div> + +<!-- ======================================================================= --> +<div class="doc_subsection"> + <a name="callbacks">Callback functions used to implement the garbage collector</a> +</div> + +<div class="doc_text"> +<p> +Garbage collector implementations make use of call-back functions that are +implemented by other parts of the LLVM system. +</p> +</div> + +<!--_________________________________________________________________________--> +<div class="doc_subsubsection"> + <a name="traceroots">Tracing GC pointers from the program stack</a> +</div> + +<div class="doc_text"> + <div class="doc_code"><tt> + void llvm_cg_walk_gcroots(void (*FP)(void **Root, void *Meta)); + </tt></div> + +<p> +The <tt>llvm_cg_walk_gcroots</tt> function is a function provided by the code +generator that iterates through all of the GC roots on the stack, calling the +specified function pointer with each record. For each GC root, the address of +the pointer and the meta-data (from the <a +href="#roots"><tt>llvm.gcroot</tt></a> intrinsic) are provided. +</p> +</div> + +<!--_________________________________________________________________________--> +<div class="doc_subsubsection"> + <a name="staticroots">Tracing GC pointers from static roots</a> +</div> + +<div class="doc_text"> +TODO +</div> + + +<!--_________________________________________________________________________--> +<div class="doc_subsubsection"> + <a name="gcdescriptors">Tracing GC pointers from heap objects</a> +</div> + +<div class="doc_text"> +<p> +The three most common ways to keep track of where pointers live in heap objects +are (listed in order of space overhead required):</p> + +<ol> +<li>In languages with polymorphic objects, pointers from an object header are +usually used to identify the GC pointers in the heap object. This is common for +object-oriented languages like Self, Smalltalk, Java, or C#.</li> + +<li>If heap objects are not polymorphic, often the "shape" of the heap can be +determined from the roots of the heap or from some other meta-data [<a +href="#appel89">Appel89</a>, <a href="#goldberg91">Goldberg91</a>, <a +href="#tolmach94">Tolmach94</a>]. In this case, the garbage collector can +propagate the information around from meta data stored with the roots. This +often eliminates the need to have a header on objects in the heap. This is +common in the ML family.</li> + +<li>If all heap objects have pointers in the same locations, or pointers can be +distinguished just by looking at them (e.g., the low order bit is clear), no +book-keeping is needed at all. This is common for Lisp-like languages.</li> +</ol> + +<p>The LLVM garbage collectors are capable of supporting all of these styles of +language, including ones that mix various implementations. To do this, it +allows the source-language to associate meta-data with the <a +href="#roots">stack roots</a>, and the heap tracing routines can propagate the +information. In addition, LLVM allows the front-end to extract GC information +from in any form from a specific object pointer (this supports situations #1 and +#3). +</p> + +<p><b>Making this efficient</b></p> + + + +</div> + + + +<!-- *********************************************************************** --> +<div class="doc_section"> + <a name="gcimpls">GC implementations available</a> +</div> +<!-- *********************************************************************** --> + +<div class="doc_text"> + +<p> +To make this more concrete, the currently implemented LLVM garbage collectors +all live in the <tt>llvm/runtime/GC/*</tt> directories in the LLVM source-base. +If you are interested in implementing an algorithm, there are many interesting +possibilities (mark/sweep, a generational collector, a reference counting +collector, etc), or you could choose to improve one of the existing algorithms. +</p> + +</div> + +<!-- ======================================================================= --> +<div class="doc_subsection"> + <a name="semispace">SemiSpace - A simple copying garbage collector</a> +</div> + +<div class="doc_text"> +<p> +SemiSpace is a very simple copying collector. When it starts up, it allocates +two blocks of memory for the heap. It uses a simple bump-pointer allocator to +allocate memory from the first block until it runs out of space. When it runs +out of space, it traces through all of the roots of the program, copying blocks +to the other half of the memory space. +</p> + +</div> + +<!--_________________________________________________________________________--> +<div class="doc_subsubsection"> + Possible Improvements +</div> + +<div class="doc_text"> + +<p> +If a collection cycle happens and the heap is not compacted very much (say less +than 25% of the allocated memory was freed), the memory regions should be +doubled in size.</p> + +</div> + +<!-- *********************************************************************** --> +<div class="doc_section"> + <a name="references">References</a> +</div> +<!-- *********************************************************************** --> + +<div class="doc_text"> + +<p><a name="appel89">[Appel89]</a> Runtime Tags Aren't Necessary. Andrew +W. Appel. Lisp and Symbolic Computation 19(7):703-705, July 1989.</p> + +<p><a name="goldberg91">[Goldberg91]</a> Tag-free garbage collection for +strongly typed programming languages. Benjamin Goldberg. ACM SIGPLAN +PLDI'91.</p> + +<p><a name="tolmach94">[Tolmach94]</a> Tag-free garbage collection using +explicit type parameters. Andrew Tolmach. Proceedings of the 1994 ACM +conference on LISP and functional programming.</p> + +</div> + +<!-- *********************************************************************** --> + +<hr> +<address> + <a href="http://jigsaw.w3.org/css-validator/check/referer"><img + src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> + <a href="http://validator.w3.org/check/referer"><img + src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> + + <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> + <a href="http://llvm.org">LLVM Compiler Infrastructure</a><br> + Last modified: $Date$ +</address> + +</body> +</html> |