aboutsummaryrefslogtreecommitdiffstats
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/BitCodeFormat.rst12
-rw-r--r--docs/BitSets.rst70
-rw-r--r--docs/CMake.rst46
-rw-r--r--docs/CMakeLists.txt43
-rw-r--r--docs/CodeGenerator.rst12
-rw-r--r--docs/CodingStandards.rst19
-rw-r--r--docs/CommandGuide/lit.rst2
-rw-r--r--docs/CompilerWriterInfo.rst2
-rw-r--r--docs/ExceptionHandling.rst176
-rw-r--r--docs/ExtendingLLVM.rst8
-rw-r--r--docs/GarbageCollection.rst607
-rw-r--r--docs/GettingStarted.rst4
-rw-r--r--docs/GettingStartedVS.rst4
-rw-r--r--docs/HowToSetUpLLVMStyleRTTI.rst26
-rw-r--r--docs/LangRef.rst589
-rw-r--r--docs/LinkTimeOptimization.rst2
-rw-r--r--docs/MergeFunctions.rst802
-rw-r--r--docs/Passes.rst32
-rw-r--r--docs/Phabricator.rst3
-rw-r--r--docs/ProgrammersManual.rst75
-rw-r--r--docs/ReleaseNotes.rst28
-rw-r--r--docs/SourceLevelDebugging.rst1
-rw-r--r--docs/StackMaps.rst7
-rw-r--r--docs/Statepoints.rst580
-rw-r--r--docs/TableGen/index.rst1
-rw-r--r--docs/WritingAnLLVMPass.rst4
-rw-r--r--docs/conf.py4
-rw-r--r--docs/index.rst15
-rw-r--r--docs/tutorial/LangImpl1.rst11
-rw-r--r--docs/tutorial/LangImpl4.rst2
-rw-r--r--docs/tutorial/LangImpl5.rst2
-rw-r--r--docs/tutorial/LangImpl6.rst2
-rw-r--r--docs/tutorial/LangImpl7.rst4
-rw-r--r--docs/tutorial/LangImpl8.rst716
-rw-r--r--docs/tutorial/LangImpl9.rst262
35 files changed, 3418 insertions, 755 deletions
diff --git a/docs/BitCodeFormat.rst b/docs/BitCodeFormat.rst
index 34485b5..4b398a4 100644
--- a/docs/BitCodeFormat.rst
+++ b/docs/BitCodeFormat.rst
@@ -672,7 +672,7 @@ for each library name referenced.
MODULE_CODE_GLOBALVAR Record
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-``[GLOBALVAR, pointer type, isconst, initid, linkage, alignment, section, visibility, threadlocal, unnamed_addr, dllstorageclass]``
+``[GLOBALVAR, pointer type, isconst, initid, linkage, alignment, section, visibility, threadlocal, unnamed_addr, externally_initialized, dllstorageclass, comdat]``
The ``GLOBALVAR`` record (code 7) marks the declaration or definition of a
global variable. The operand fields are:
@@ -741,7 +741,7 @@ global variable. The operand fields are:
MODULE_CODE_FUNCTION Record
^^^^^^^^^^^^^^^^^^^^^^^^^^^
-``[FUNCTION, type, callingconv, isproto, linkage, paramattr, alignment, section, visibility, gc, prefix, dllstorageclass]``
+``[FUNCTION, type, callingconv, isproto, linkage, paramattr, alignment, section, visibility, gc, prologuedata, dllstorageclass, comdat, prefixdata]``
The ``FUNCTION`` record (code 8) marks the declaration or definition of a
function. The operand fields are:
@@ -784,12 +784,18 @@ function. The operand fields are:
* *unnamed_addr*: If present and non-zero, indicates that the function has
``unnamed_addr``
-* *prefix*: If non-zero, the value index of the prefix data for this function,
+* *prologuedata*: If non-zero, the value index of the prologue data for this function,
plus 1.
* *dllstorageclass*: An encoding of the
:ref:`dllstorageclass<bcdllstorageclass>` of this function
+* *comdat*: An encoding of the COMDAT of this function
+
+* *prefixdata*: If non-zero, the value index of the prefix data for this function,
+ plus 1.
+
+
MODULE_CODE_ALIAS Record
^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/docs/BitSets.rst b/docs/BitSets.rst
new file mode 100644
index 0000000..c6ffdbd
--- /dev/null
+++ b/docs/BitSets.rst
@@ -0,0 +1,70 @@
+=======
+Bitsets
+=======
+
+This is a mechanism that allows IR modules to co-operatively build pointer
+sets corresponding to addresses within a given set of globals. One example
+of a use case for this is to allow a C++ program to efficiently verify (at
+each call site) that a vtable pointer is in the set of valid vtable pointers
+for the type of the class or its derived classes.
+
+To use the mechanism, a client creates a global metadata node named
+``llvm.bitsets``. Each element is a metadata node with three elements:
+the first is a metadata string containing an identifier for the bitset,
+the second is a global variable and the third is a byte offset into the
+global variable.
+
+This will cause a link-time optimization pass to generate bitsets from the
+memory addresses referenced from the elements of the bitset metadata. The pass
+will lay out the referenced globals consecutively, so their definitions must
+be available at LTO time. The `GlobalLayoutBuilder`_ class is responsible for
+laying out the globals efficiently to minimize the sizes of the underlying
+bitsets. An intrinsic, :ref:`llvm.bitset.test <bitset.test>`, generates code
+to test whether a given pointer is a member of a bitset.
+
+:Example:
+
+::
+
+ target datalayout = "e-p:32:32"
+
+ @a = internal global i32 0
+ @b = internal global i32 0
+ @c = internal global i32 0
+ @d = internal global [2 x i32] [i32 0, i32 0]
+
+ !llvm.bitsets = !{!0, !1, !2, !3, !4}
+
+ !0 = !{!"bitset1", i32* @a, i32 0}
+ !1 = !{!"bitset1", i32* @b, i32 0}
+ !2 = !{!"bitset2", i32* @b, i32 0}
+ !3 = !{!"bitset2", i32* @c, i32 0}
+ !4 = !{!"bitset2", i32* @d, i32 4}
+
+ declare i1 @llvm.bitset.test(i8* %ptr, metadata %bitset) nounwind readnone
+
+ define i1 @foo(i32* %p) {
+ %pi8 = bitcast i32* %p to i8*
+ %x = call i1 @llvm.bitset.test(i8* %pi8, metadata !"bitset1")
+ ret i1 %x
+ }
+
+ define i1 @bar(i32* %p) {
+ %pi8 = bitcast i32* %p to i8*
+ %x = call i1 @llvm.bitset.test(i8* %pi8, metadata !"bitset2")
+ ret i1 %x
+ }
+
+ define void @main() {
+ %a1 = call i1 @foo(i32* @a) ; returns 1
+ %b1 = call i1 @foo(i32* @b) ; returns 1
+ %c1 = call i1 @foo(i32* @c) ; returns 0
+ %a2 = call i1 @bar(i32* @a) ; returns 0
+ %b2 = call i1 @bar(i32* @b) ; returns 1
+ %c2 = call i1 @bar(i32* @c) ; returns 1
+ %d02 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 0)) ; returns 0
+ %d12 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 1)) ; returns 1
+ ret void
+ }
+
+.. _GlobalLayoutBuilder: http://llvm.org/klaus/llvm/blob/master/include/llvm/Transforms/IPO/LowerBitSets.h
diff --git a/docs/CMake.rst b/docs/CMake.rst
index 653fa16..4d96466 100644
--- a/docs/CMake.rst
+++ b/docs/CMake.rst
@@ -26,7 +26,7 @@ Quick start
We use here the command-line, non-interactive CMake interface.
#. `Download <http://www.cmake.org/cmake/resources/software.html>`_ and install
- CMake. Version 2.8 is the minimum required.
+ CMake. Version 2.8.8 is the minimum required.
#. Open a shell. Your development tools must be reachable from this shell
through the PATH environment variable.
@@ -59,6 +59,36 @@ We use here the command-line, non-interactive CMake interface.
environment variable, for instance. You can force CMake to use a given build
tool, see the `Usage`_ section.
+#. After CMake has finished running, proceed to use IDE project files or start
+ the build from the build directory:
+
+ .. code-block:: console
+
+ $ cmake --build .
+
+ The ``--build`` option tells ``cmake`` to invoke the underlying build
+ tool (``make``, ``ninja``, ``xcodebuild``, ``msbuild``, etc).
+
+ The underlying build tool can be invoked directly either of course, but
+ the ``--build`` option is portable.
+
+#. After LLVM has finished building, install it from the build directory:
+
+ .. code-block:: console
+
+ $ cmake --build . --target install
+
+ The ``--target`` option with ``install`` parameter in addition to
+ the ``--build`` option tells ``cmake`` to build the ``install`` target.
+
+ It is possible to set a different install prefix at installation time
+ by invoking the ``cmake_install.cmake`` script generated in the
+ build directory:
+
+ .. code-block:: console
+
+ $ cmake -DCMAKE_INSTALL_PREFIX=/tmp/llvm -P cmake_install.cmake
+
.. _Basic CMake usage:
.. _Usage:
@@ -215,8 +245,8 @@ LLVM-specific variables
Build in C++1y mode, if available. Defaults to OFF.
**LLVM_ENABLE_ASSERTIONS**:BOOL
- Enables code assertions. Defaults to OFF if and only if ``CMAKE_BUILD_TYPE``
- is *Release*.
+ Enables code assertions. Defaults to ON if and only if ``CMAKE_BUILD_TYPE``
+ is *Debug*.
**LLVM_ENABLE_EH**:BOOL
Build LLVM with exception handling support. This is necessary if you wish to
@@ -234,7 +264,7 @@ LLVM-specific variables
Enable all compiler warnings. Defaults to ON.
**LLVM_ENABLE_PEDANTIC**:BOOL
- Enable pedantic mode. This disable compiler specific extensions, is
+ Enable pedantic mode. This disables compiler specific extensions, if
possible. Defaults to ON.
**LLVM_ENABLE_WERROR**:BOOL
@@ -316,8 +346,8 @@ LLVM-specific variables
otherwise this has no effect.
**LLVM_DOXYGEN_QCH_FILENAME**:STRING
- The filename of the Qt Compressed Help file that will be genrated when
- ``-DLLVM_ENABLE_DOXYGEN=ON`` and
+ The filename of the Qt Compressed Help file that will be generated when
+ ``-DLLVM_ENABLE_DOXYGEN=ON`` and
``-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON`` are given. Defaults to
``org.llvm.qch``.
This option is only useful in combination with
@@ -330,7 +360,7 @@ LLVM-specific variables
for more information. Defaults to "org.llvm". This option is only useful in
combination with ``-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON``; otherwise
this has no effect.
-
+
**LLVM_DOXYGEN_QHP_CUST_FILTER_NAME**:STRING
See `Qt Help Project`_ for
more information. Defaults to the CMake variable ``${PACKAGE_STRING}`` which
@@ -429,7 +459,7 @@ and uses them to build a simple application ``simple-tool``.
add_definitions(${LLVM_DEFINITIONS})
# Now build our tools
- add_excutable(simple-tool tool.cpp)
+ add_executable(simple-tool tool.cpp)
# Find the libraries that correspond to the LLVM components
# that we wish to use
diff --git a/docs/CMakeLists.txt b/docs/CMakeLists.txt
index d310a0a..da27627 100644
--- a/docs/CMakeLists.txt
+++ b/docs/CMakeLists.txt
@@ -104,3 +104,46 @@ if (LLVM_ENABLE_SPHINX)
endif()
endif()
+
+list(FIND LLVM_BINDINGS_LIST ocaml uses_ocaml)
+if( NOT uses_ocaml LESS 0 )
+ set(doc_targets
+ ocaml_llvm
+ ocaml_llvm_all_backends
+ ocaml_llvm_analysis
+ ocaml_llvm_bitreader
+ ocaml_llvm_bitwriter
+ ocaml_llvm_executionengine
+ ocaml_llvm_irreader
+ ocaml_llvm_linker
+ ocaml_llvm_target
+ ocaml_llvm_ipo
+ ocaml_llvm_passmgr_builder
+ ocaml_llvm_scalar_opts
+ ocaml_llvm_transform_utils
+ ocaml_llvm_vectorize
+ )
+
+ foreach(llvm_target ${LLVM_TARGETS_TO_BUILD})
+ list(APPEND doc_targets ocaml_llvm_${llvm_target})
+ endforeach()
+
+ set(odoc_files)
+ foreach( doc_target ${doc_targets} )
+ get_target_property(odoc_file ${doc_target} OCAML_ODOC)
+ list(APPEND odoc_files -load ${odoc_file})
+ endforeach()
+
+ add_custom_target(ocaml_doc
+ COMMAND ${CMAKE_COMMAND} -E remove_directory ${CMAKE_CURRENT_BINARY_DIR}/ocamldoc/html
+ COMMAND ${CMAKE_COMMAND} -E make_directory ${CMAKE_CURRENT_BINARY_DIR}/ocamldoc/html
+ COMMAND ${OCAMLFIND} ocamldoc -d ${CMAKE_CURRENT_BINARY_DIR}/ocamldoc/html
+ -sort -colorize-code -html ${odoc_files})
+
+ add_dependencies(ocaml_doc ${doc_targets})
+
+ if (NOT LLVM_INSTALL_TOOLCHAIN_ONLY)
+ install(DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/ocamldoc/html
+ DESTINATION docs/ocaml/html)
+ endif()
+endif()
diff --git a/docs/CodeGenerator.rst b/docs/CodeGenerator.rst
index b0a1059..75d40db 100644
--- a/docs/CodeGenerator.rst
+++ b/docs/CodeGenerator.rst
@@ -464,7 +464,7 @@ code:
mov %EAX, %EDX
sar %EDX, 31
idiv %ECX
- ret
+ ret
This approach is extremely general (if it can handle the X86 architecture, it
can handle anything!) and allows all of the target specific knowledge about the
@@ -848,6 +848,10 @@ is based on the final SelectionDAG, with nodes that must be scheduled together
bundled into a single scheduling-unit node, and with immediate operands and
other nodes that aren't relevant for scheduling omitted.
+The option ``-filter-view-dags`` allows to select the name of the basic block
+that you are interested to visualize and filters all the previous
+``view-*-dags`` options.
+
.. _Build initial DAG:
Initial SelectionDAG Construction
@@ -1336,7 +1340,7 @@ found before being stored or after being reloaded.
If the indirect strategy is used, after all the virtual registers have been
mapped to physical registers or stack slots, it is necessary to use a spiller
object to place load and store instructions in the code. Every virtual that has
-been mapped to a stack slot will be stored to memory after been defined and will
+been mapped to a stack slot will be stored to memory after being defined and will
be loaded before being used. The implementation of the spiller tries to recycle
load/store instructions, avoiding unnecessary instructions. For an example of
how to invoke the spiller, see ``RegAllocLinearScan::runOnMachineFunction`` in
@@ -1349,7 +1353,7 @@ With very rare exceptions (e.g., function calls), the LLVM machine code
instructions are three address instructions. That is, each instruction is
expected to define at most one register, and to use at most two registers.
However, some architectures use two address instructions. In this case, the
-defined register is also one of the used register. For instance, an instruction
+defined register is also one of the used registers. For instance, an instruction
such as ``ADD %EAX, %EBX``, in X86 is actually equivalent to ``%EAX = %EAX +
%EBX``.
@@ -1574,7 +1578,7 @@ three important things that you have to implement for your target:
correspond to. The MCInsts that are generated by this are fed into the
instruction printer or the encoder.
-Finally, at your choosing, you can also implement an subclass of MCCodeEmitter
+Finally, at your choosing, you can also implement a subclass of MCCodeEmitter
which lowers MCInst's into machine code bytes and relocations. This is
important if you want to support direct .o file emission, or would like to
implement an assembler for your target.
diff --git a/docs/CodingStandards.rst b/docs/CodingStandards.rst
index 0552c71..221c431 100644
--- a/docs/CodingStandards.rst
+++ b/docs/CodingStandards.rst
@@ -83,7 +83,7 @@ Supported C++11 Language and Library Features
While LLVM, Clang, and LLD use C++11, not all features are available in all of
the toolchains which we support. The set of features supported for use in LLVM
-is the intersection of those supported in MSVC 2012, GCC 4.7, and Clang 3.1.
+is the intersection of those supported in MSVC 2013, GCC 4.7, and Clang 3.1.
The ultimate definition of this set is what build bots with those respective
toolchains accept. Don't argue with the build bots. However, we have some
guidance below to help you know what to expect.
@@ -123,6 +123,12 @@ unlikely to be supported by our host compilers.
* ``override`` and ``final``: N2928_, N3206_, N3272_
* Atomic operations and the C++11 memory model: N2429_
+* Variadic templates: N2242_
+* Explicit conversion operators: N2437_
+* Defaulted and deleted functions: N2346_
+
+ * But not defaulted move constructors or move assignment operators, MSVC 2013
+ cannot synthesize them.
.. _N2118: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2118.html
.. _N2439: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2439.htm
@@ -143,6 +149,9 @@ unlikely to be supported by our host compilers.
.. _N3206: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3206.htm
.. _N3272: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3272.htm
.. _N2429: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2429.htm
+.. _N2242: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2242.pdf
+.. _N2437: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2437.pdf
+.. _N2346: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2346.htm
.. _MSVC-compatible RTTI: http://llvm.org/PR18951
The supported features in the C++11 standard libraries are less well tracked,
@@ -251,7 +260,8 @@ The next section in the file is a concise note that defines the license that the
file is released under. This makes it perfectly clear what terms the source
code can be distributed under and should not be modified in any way.
-The main body is a ``doxygen`` comment describing the purpose of the file. It
+The main body is a ``doxygen`` comment (identified by the ``///`` comment
+marker instead of the usual ``//``) describing the purpose of the file. It
should have a ``\brief`` command that describes the file in one or two
sentences. Any additional information should be separated by a blank line. If
an algorithm is being implemented or something tricky is going on, a reference
@@ -281,7 +291,8 @@ happens: does the method return null? Abort? Format your hard disk?
Comment Formatting
^^^^^^^^^^^^^^^^^^
-In general, prefer C++ style (``//``) comments. They take less space, require
+In general, prefer C++ style comments (``//`` for normal comments, ``///`` for
+``doxygen`` documentation comments). They take less space, require
less typing, don't have nesting problems, etc. There are a few cases when it is
useful to use C style (``/* */``) comments however:
@@ -710,7 +721,7 @@ the symbol (e.g., MSVC). This can lead to problems at link time.
// Bar isn't POD, but it does look like a struct.
struct Bar {
int Data;
- Foo() : Data(0) { }
+ Bar() : Data(0) { }
};
Do not use Braced Initializer Lists to Call a Constructor
diff --git a/docs/CommandGuide/lit.rst b/docs/CommandGuide/lit.rst
index 2708e9d..9c63848 100644
--- a/docs/CommandGuide/lit.rst
+++ b/docs/CommandGuide/lit.rst
@@ -341,7 +341,7 @@ LOCAL CONFIGURATION FILES
~~~~~~~~~~~~~~~~~~~~~~~~~
When :program:`lit` loads a subdirectory in a test suite, it instantiates a
-local test configuration by cloning the configuration for the parent direction
+local test configuration by cloning the configuration for the parent directory
--- the root of this configuration chain will always be a test suite. Once the
test configuration is cloned :program:`lit` checks for a *lit.local.cfg* file
in the subdirectory. If present, this file will be loaded and can be used to
diff --git a/docs/CompilerWriterInfo.rst b/docs/CompilerWriterInfo.rst
index a012c32..2dfdc9b 100644
--- a/docs/CompilerWriterInfo.rst
+++ b/docs/CompilerWriterInfo.rst
@@ -41,6 +41,8 @@ MIPS
* `MIPS Processor Architecture <http://imgtec.com/mips/mips-architectures.asp>`_
+* `MIPS 64-bit ELF Object File Specification <http://techpubs.sgi.com/library/manuals/4000/007-4658-001/pdf/007-4658-001.pdf>`_
+
PowerPC
-------
diff --git a/docs/ExceptionHandling.rst b/docs/ExceptionHandling.rst
index 64edca7..696b50f 100644
--- a/docs/ExceptionHandling.rst
+++ b/docs/ExceptionHandling.rst
@@ -64,6 +64,21 @@ handling at the expense of slower execution when no exceptions are thrown. As
exceptions are, by their nature, intended for uncommon code paths, DWARF
exception handling is generally preferred to SJLJ.
+Windows Runtime Exception Handling
+-----------------------------------
+
+Windows runtime based exception handling uses the same basic IR structure as
+Itanium ABI based exception handling, but it relies on the personality
+functions provided by the native Windows runtime library, ``__CxxFrameHandler3``
+for C++ exceptions: ``__C_specific_handler`` for 64-bit SEH or
+``_frame_handler3/4`` for 32-bit SEH. This results in a very different
+execution model and requires some minor modifications to the initial IR
+representation and a significant restructuring just before code generation.
+
+General information about the Windows x64 exception handling mechanism can be
+found at `MSDN Exception Handling (x64)
+<https://msdn.microsoft.com/en-us/library/1eyas8tf(v=vs.80).aspx>`_.
+
Overview
--------
@@ -263,9 +278,9 @@ there are no catches or filters that require it to.
exceptions and throws a third.
When all cleanups are finished, if the exception is not handled by the current
-function, resume unwinding by calling the `resume
-instruction <LangRef.html#i_resume>`_, passing in the result of the
-``landingpad`` instruction for the original landing pad.
+function, resume unwinding by calling the :ref:`resume instruction <i_resume>`,
+passing in the result of the ``landingpad`` instruction for the original
+landing pad.
Throw Filters
-------------
@@ -306,6 +321,97 @@ the selector results they understand and then resume exception propagation with
the `resume instruction <LangRef.html#i_resume>`_ if none of the conditions
match.
+C++ Exception Handling using the Windows Runtime
+=================================================
+
+(Note: Windows C++ exception handling support is a work in progress and is
+ not yet fully implemented. The text below describes how it will work
+ when completed.)
+
+The Windows runtime function for C++ exception handling uses a multi-phase
+approach. When an exception occurs it searches the current callstack for a
+frame that has a handler for the exception. If a handler is found, it then
+calls the cleanup handler for each frame above the handler which has a
+cleanup handler before calling the catch handler. These calls are all made
+from a stack context different from the original frame in which the handler
+is defined. Therefore, it is necessary to outline these handlers from their
+original context before code generation.
+
+Catch handlers are called with a pointer to the handler itself as the first
+argument and a pointer to the parent function's stack frame as the second
+argument. The catch handler uses the `llvm.recoverframe
+<LangRef.html#llvm-frameallocate-and-llvm-framerecover-intrinsics>`_ to get a
+pointer to a frame allocation block that is created in the parent frame using
+the `llvm.allocateframe
+<LangRef.html#llvm-frameallocate-and-llvm-framerecover-intrinsics>`_ intrinsic.
+The ``WinEHPrepare`` pass will have created a structure definition for the
+contents of this block. The first two members of the structure will always be
+(1) a 32-bit integer that the runtime uses to track the exception state of the
+parent frame for the purposes of handling chained exceptions and (2) a pointer
+to the object associated with the exception (roughly, the parameter of the
+catch clause). These two members will be followed by any frame variables from
+the parent function which must be accessed in any of the functions unwind or
+catch handlers. The catch handler returns the address at which execution
+should continue.
+
+Cleanup handlers perform any cleanup necessary as the frame goes out of scope,
+such as calling object destructors. The runtime handles the actual unwinding
+of the stack. If an exception occurs in a cleanup handler the runtime manages
+termination of the process. Cleanup handlers are called with the same arguments
+as catch handlers (a pointer to the handler and a pointer to the parent stack
+frame) and use the same mechanism described above to access frame variables
+in the parent function. Cleanup handlers do not return a value.
+
+The IR generated for Windows runtime based C++ exception handling is initially
+very similar to the ``landingpad`` mechanism described above. Calls to
+libc++abi functions (such as ``__cxa_begin_catch``/``__cxa_end_catch`` and
+``__cxa_throw_exception`` are replaced with calls to intrinsics or Windows
+runtime functions (such as ``llvm.eh.begincatch``/``llvm.eh.endcatch`` and
+``__CxxThrowException``).
+
+During the WinEHPrepare pass, the handler functions are outlined into handler
+functions and the original landing pad code is replaced with a call to the
+``llvm.eh.actions`` intrinsic that describes the order in which handlers will
+be processed from the logical location of the landing pad and an indirect
+branch to the return value of the ``llvm.eh.actions`` intrinsic. The
+``llvm.eh.actions`` intrinsic is defined as returning the address at which
+execution will continue. This is a temporary construct which will be removed
+before code generation, but it allows for the accurate tracking of control
+flow until then.
+
+A typical landing pad will look like this after outlining:
+
+.. code-block:: llvm
+
+ lpad:
+ %vals = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)
+ cleanup
+ catch i8* bitcast (i8** @_ZTIi to i8*)
+ catch i8* bitcast (i8** @_ZTIf to i8*)
+ %recover = call i8* (...)* @llvm.eh.actions(
+ i32 3, i8* bitcast (i8** @_ZTIi to i8*), i8* (i8*, i8*)* @_Z4testb.catch.1)
+ i32 2, i8* null, void (i8*, i8*)* @_Z4testb.cleanup.1)
+ i32 1, i8* bitcast (i8** @_ZTIf to i8*), i8* (i8*, i8*)* @_Z4testb.catch.0)
+ i32 0, i8* null, void (i8*, i8*)* @_Z4testb.cleanup.0)
+ indirectbr i8* %recover, [label %try.cont1, label %try.cont2]
+
+In this example, the landing pad represents an exception handling context with
+two catch handlers and a cleanup handler that have been outlined. If an
+exception is thrown with a type that matches ``_ZTIi``, the ``_Z4testb.catch.1``
+handler will be called an no clean-up is needed. If an exception is thrown
+with a type that matches ``_ZTIf``, first the ``_Z4testb.cleanup.1`` handler
+will be called to perform unwind-related cleanup, then the ``_Z4testb.catch.1``
+handler will be called. If an exception is throw which does not match either
+of these types and the exception is handled by another frame further up the
+call stack, first the ``_Z4testb.cleanup.1`` handler will be called, then the
+``_Z4testb.cleanup.0`` handler (which corresponds to a different scope) will be
+called, and exception handling will continue at the next frame in the call
+stack will be called. One of the catch handlers will return the address of
+``%try.cont1`` in the parent function and the other will return the address of
+``%try.cont2``, meaning that execution continues at one of those blocks after
+an exception is caught.
+
+
Exception Handling Intrinsics
=============================
@@ -329,6 +435,70 @@ function. This value can be used to compare against the result of
Uses of this intrinsic are generated by the C++ front-end.
+.. _llvm.eh.begincatch:
+
+``llvm.eh.begincatch``
+----------------------
+
+.. code-block:: llvm
+
+ i8* @llvm.eh.begincatch(i8* %exn)
+
+
+This intrinsic marks the beginning of catch handling code within the blocks
+following a ``landingpad`` instruction. The exact behavior of this function
+depends on the compilation target and the personality function associated
+with the ``landingpad`` instruction.
+
+The argument to this intrinsic is a pointer that was previously extracted from
+the aggregate return value of the ``landingpad`` instruction. The return
+value of the intrinsic is a pointer to the exception object to be used by the
+catch code. This pointer is returned as an ``i8*`` value, but the actual type
+of the object will depend on the exception that was thrown.
+
+Uses of this intrinsic are generated by the C++ front-end. Many targets will
+use implementation-specific functions (such as ``__cxa_begin_catch``) instead
+of this intrinsic. The intrinsic is provided for targets that require a more
+abstract interface.
+
+When used in the native Windows C++ exception handling implementation, this
+intrinsic serves as a placeholder to delimit code before a catch handler is
+outlined. When the handler is is outlined, this intrinsic will be replaced
+by instructions that retrieve the exception object pointer from the frame
+allocation block.
+
+
+.. _llvm.eh.endcatch:
+
+``llvm.eh.endcatch``
+----------------------
+
+.. code-block:: llvm
+
+ void @llvm.eh.endcatch()
+
+
+This intrinsic marks the end of catch handling code within the current block,
+which will be a successor of a block which called ``llvm.eh.begincatch''.
+The exact behavior of this function depends on the compilation target and the
+personality function associated with the corresponding ``landingpad``
+instruction.
+
+There may be more than one call to ``llvm.eh.endcatch`` for any given call to
+``llvm.eh.begincatch`` with each ``llvm.eh.endcatch`` call corresponding to the
+end of a different control path. All control paths following a call to
+``llvm.eh.begincatch`` must reach a call to ``llvm.eh.endcatch``.
+
+Uses of this intrinsic are generated by the C++ front-end. Many targets will
+use implementation-specific functions (such as ``__cxa_begin_catch``) instead
+of this intrinsic. The intrinsic is provided for targets that require a more
+abstract interface.
+
+When used in the native Windows C++ exception handling implementation, this
+intrinsic serves as a placeholder to delimit code before a catch handler is
+outlined. After the handler is outlined, this intrinsic is simply removed.
+
+
SJLJ Intrinsics
---------------
diff --git a/docs/ExtendingLLVM.rst b/docs/ExtendingLLVM.rst
index 60cbf01..2552c07 100644
--- a/docs/ExtendingLLVM.rst
+++ b/docs/ExtendingLLVM.rst
@@ -58,7 +58,7 @@ function and then be turned into an instruction if warranted.
If it is possible to constant fold your intrinsic, add support to it in the
``canConstantFoldCallTo`` and ``ConstantFoldCall`` functions.
-#. ``llvm/test/Regression/*``:
+#. ``llvm/test/*``:
Add test cases for your test cases to the test suite
@@ -164,10 +164,10 @@ complicated behavior in a single node (rotate).
#. TODO: document complex patterns.
-#. ``llvm/test/Regression/CodeGen/*``:
+#. ``llvm/test/CodeGen/*``:
Add test cases for your new node to the test suite.
- ``llvm/test/Regression/CodeGen/X86/bswap.ll`` is a good example.
+ ``llvm/test/CodeGen/X86/bswap.ll`` is a good example.
Adding a new instruction
========================
@@ -217,7 +217,7 @@ Adding a new instruction
add support for your instruction to code generators, or add a lowering pass.
-#. ``llvm/test/Regression/*``:
+#. ``llvm/test/*``:
add your test cases to the test suite.
diff --git a/docs/GarbageCollection.rst b/docs/GarbageCollection.rst
index 49d3496..a1557fc 100644
--- a/docs/GarbageCollection.rst
+++ b/docs/GarbageCollection.rst
@@ -1,13 +1,82 @@
=====================================
-Accurate Garbage Collection with LLVM
+Garbage Collection with LLVM
=====================================
.. contents::
:local:
+Abstract
+========
+
+This document covers how to integrate LLVM into a compiler for a language which
+supports garbage collection. **Note that LLVM itself does not provide a
+garbage collector.** You must provide your own.
+
+Quick Start
+============
+
+First, you should pick a collector strategy. LLVM includes a number of built
+in ones, but you can also implement a loadable plugin with a custom definition.
+Note that the collector strategy is a description of how LLVM should generate
+code such that it interacts with your collector and runtime, not a description
+of the collector itself.
+
+Next, mark your generated functions as using your chosen collector strategy.
+From c++, you can call:
+
+.. code-block:: c++
+
+ F.setGC(<collector description name>);
+
+
+This will produce IR like the following fragment:
+
+.. code-block:: llvm
+
+ define void @foo() gc "<collector description name>" { ... }
+
+
+When generating LLVM IR for your functions, you will need to:
+
+* Use ``@llvm.gcread`` and/or ``@llvm.gcwrite`` in place of standard load and
+ store instructions. These intrinsics are used to represent load and store
+ barriers. If you collector does not require such barriers, you can skip
+ this step.
+
+* Use the memory allocation routines provided by your garbage collector's
+ runtime library.
+
+* If your collector requires them, generate type maps according to your
+ runtime's binary interface. LLVM is not involved in the process. In
+ particular, the LLVM type system is not suitable for conveying such
+ information though the compiler.
+
+* Insert any coordination code required for interacting with your collector.
+ Many collectors require running application code to periodically check a
+ flag and conditionally call a runtime function. This is often referred to
+ as a safepoint poll.
+
+You will need to identify roots (i.e. references to heap objects your collector
+needs to know about) in your generated IR, so that LLVM can encode them into
+your final stack maps. Depending on the collector strategy chosen, this is
+accomplished by using either the ``@llvm.gcroot`` intrinsics or an
+``gc.statepoint`` relocation sequence.
+
+Don't forget to create a root for each intermediate value that is generated when
+evaluating an expression. In ``h(f(), g())``, the result of ``f()`` could
+easily be collected if evaluating ``g()`` triggers a collection.
+
+Finally, you need to link your runtime library with the generated program
+executable (for a static compiler) or ensure the appropriate symbols are
+available for the runtime linker (for a JIT compiler).
+
+
Introduction
============
+What is Garbage Collection?
+---------------------------
+
Garbage collection is a widely used technique that frees the programmer from
having to know the lifetimes of heap objects, making software easier to produce
and maintain. Many programming languages rely on garbage collection for
@@ -59,31 +128,34 @@ instance, the intrinsics permit:
* generational collectors
-* reference counting
-
* incremental collectors
* concurrent collectors
* cooperative collectors
-We hope that the primitive support built into the LLVM IR is sufficient to
-support a broad class of garbage collected languages including Scheme, ML, Java,
-C#, Perl, Python, Lua, Ruby, other scripting languages, and more.
+* reference counting
+
+We hope that the support built into the LLVM IR is sufficient to support a
+broad class of garbage collected languages including Scheme, ML, Java, C#,
+Perl, Python, Lua, Ruby, other scripting languages, and more.
-However, LLVM does not itself provide a garbage collector --- this should be
-part of your language's runtime library. LLVM provides a framework for compile
-time :ref:`code generation plugins <plugin>`. The role of these plugins is to
+Note that LLVM **does not itself provide a garbage collector** --- this should
+be part of your language's runtime library. LLVM provides a framework for
+describing the garbage collectors requirements to the compiler. In particular,
+LLVM provides support for generating stack maps at call sites, polling for a
+safepoint, and emitting load and store barriers. You can also extend LLVM -
+possibly through a loadable :ref:`code generation plugins <plugin>` - to
generate code and data structures which conforms to the *binary interface*
specified by the *runtime library*. This is similar to the relationship between
LLVM and DWARF debugging info, for example. The difference primarily lies in
the lack of an established standard in the domain of garbage collection --- thus
-the plugins.
+the need for a flexible extension mechanism.
The aspects of the binary interface with which LLVM's GC support is
concerned are:
-* Creation of GC-safe points within code where collection is allowed to execute
+* Creation of GC safepoints within code where collection is allowed to execute
safely.
* Computation of the stack map. For each safe point in the code, object
@@ -111,205 +183,63 @@ There are additional areas that LLVM does not directly address:
In general, LLVM's support for GC does not include features which can be
adequately addressed with other features of the IR and does not specify a
particular binary interface. On the plus side, this means that you should be
-able to integrate LLVM with an existing runtime. On the other hand, it leaves a
-lot of work for the developer of a novel language. However, it's easy to get
-started quickly and scale up to a more sophisticated implementation as your
-compiler matures.
-
-Getting started
-===============
-
-Using a GC with LLVM implies many things, for example:
-
-* Write a runtime library or find an existing one which implements a GC heap.
-
- #. Implement a memory allocator.
-
- #. Design a binary interface for the stack map, used to identify references
- within a stack frame on the machine stack.\*
-
- #. Implement a stack crawler to discover functions on the call stack.\*
-
- #. Implement a registry for global roots.
-
- #. Design a binary interface for type maps, used to identify references
- within heap objects.
-
- #. Implement a collection routine bringing together all of the above.
-
-* Emit compatible code from your compiler.
-
- * Initialization in the main function.
-
- * Use the ``gc "..."`` attribute to enable GC code generation (or
- ``F.setGC("...")``).
-
- * Use ``@llvm.gcroot`` to mark stack roots.
-
- * Use ``@llvm.gcread`` and/or ``@llvm.gcwrite`` to manipulate GC references,
- if necessary.
-
- * Allocate memory using the GC allocation routine provided by the runtime
- library.
-
- * Generate type maps according to your runtime's binary interface.
-
-* Write a compiler plugin to interface LLVM with the runtime library.\*
-
- * Lower ``@llvm.gcread`` and ``@llvm.gcwrite`` to appropriate code
- sequences.\*
-
- * Compile LLVM's stack map to the binary form expected by the runtime.
-
-* Load the plugin into the compiler. Use ``llc -load`` or link the plugin
- statically with your language's compiler.\*
-
-* Link program executables with the runtime.
-
-To help with several of these tasks (those indicated with a \*), LLVM includes a
-highly portable, built-in ShadowStack code generator. It is compiled into
-``llc`` and works even with the interpreter and C backends.
-
-In your compiler
-----------------
-
-To turn the shadow stack on for your functions, first call:
-
-.. code-block:: c++
-
- F.setGC("shadow-stack");
-
-for each function your compiler emits. Since the shadow stack is built into
-LLVM, you do not need to load a plugin.
-
-Your compiler must also use ``@llvm.gcroot`` as documented. Don't forget to
-create a root for each intermediate value that is generated when evaluating an
-expression. In ``h(f(), g())``, the result of ``f()`` could easily be collected
-if evaluating ``g()`` triggers a collection.
-
-There's no need to use ``@llvm.gcread`` and ``@llvm.gcwrite`` over plain
-``load`` and ``store`` for now. You will need them when switching to a more
-advanced GC.
-
-In your runtime
----------------
-
-The shadow stack doesn't imply a memory allocation algorithm. A semispace
-collector or building atop ``malloc`` are great places to start, and can be
-implemented with very little code.
-
-When it comes time to collect, however, your runtime needs to traverse the stack
-roots, and for this it needs to integrate with the shadow stack. Luckily, doing
-so is very simple. (This code is heavily commented to help you understand the
-data structure, but there are only 20 lines of meaningful code.)
-
-.. code-block:: c++
-
- /// @brief The map for a single function's stack frame. One of these is
- /// compiled as constant data into the executable for each function.
- ///
- /// Storage of metadata values is elided if the %metadata parameter to
- /// @llvm.gcroot is null.
- struct FrameMap {
- int32_t NumRoots; //< Number of roots in stack frame.
- int32_t NumMeta; //< Number of metadata entries. May be < NumRoots.
- const void *Meta[0]; //< Metadata for each root.
- };
-
- /// @brief A link in the dynamic shadow stack. One of these is embedded in
- /// the stack frame of each function on the call stack.
- struct StackEntry {
- StackEntry *Next; //< Link to next stack entry (the caller's).
- const FrameMap *Map; //< Pointer to constant FrameMap.
- void *Roots[0]; //< Stack roots (in-place array).
- };
-
- /// @brief The head of the singly-linked list of StackEntries. Functions push
- /// and pop onto this in their prologue and epilogue.
- ///
- /// Since there is only a global list, this technique is not threadsafe.
- StackEntry *llvm_gc_root_chain;
-
- /// @brief Calls Visitor(root, meta) for each GC root on the stack.
- /// root and meta are exactly the values passed to
- /// @llvm.gcroot.
- ///
- /// Visitor could be a function to recursively mark live objects. Or it
- /// might copy them to another heap or generation.
- ///
- /// @param Visitor A function to invoke for every GC root on the stack.
- void visitGCRoots(void (*Visitor)(void **Root, const void *Meta)) {
- for (StackEntry *R = llvm_gc_root_chain; R; R = R->Next) {
- unsigned i = 0;
-
- // For roots [0, NumMeta), the metadata pointer is in the FrameMap.
- for (unsigned e = R->Map->NumMeta; i != e; ++i)
- Visitor(&R->Roots[i], R->Map->Meta[i]);
-
- // For roots [NumMeta, NumRoots), the metadata pointer is null.
- for (unsigned e = R->Map->NumRoots; i != e; ++i)
- Visitor(&R->Roots[i], NULL);
- }
- }
-
-About the shadow stack
-----------------------
-
-Unlike many GC algorithms which rely on a cooperative code generator to compile
-stack maps, this algorithm carefully maintains a linked list of stack roots
-[:ref:`Henderson2002 <henderson02>`]. This so-called "shadow stack" mirrors the
-machine stack. Maintaining this data structure is slower than using a stack map
-compiled into the executable as constant data, but has a significant portability
-advantage because it requires no special support from the target code generator,
-and does not require tricky platform-specific code to crawl the machine stack.
-
-The tradeoff for this simplicity and portability is:
-
-* High overhead per function call.
-
-* Not thread-safe.
-
-Still, it's an easy way to get started. After your compiler and runtime are up
-and running, writing a :ref:`plugin <plugin>` will allow you to take advantage
-of :ref:`more advanced GC features <collector-algos>` of LLVM in order to
-improve performance.
+able to integrate LLVM with an existing runtime. On the other hand, it can
+have the effect of leaving a lot of work for the developer of a novel
+language. We try to mitigate this by providing built in collector strategy
+descriptions that can work with many common collector designs and easy
+extension points. If you don't already have a specific binary interface
+you need to support, we recommend trying to use one of these built in collector
+strategies.
.. _gc_intrinsics:
-IR features
-===========
+LLVM IR Features
+================
This section describes the garbage collection facilities provided by the
:doc:`LLVM intermediate representation <LangRef>`. The exact behavior of these
-IR features is specified by the binary interface implemented by a :ref:`code
-generation plugin <plugin>`, not by this document.
-
-These facilities are limited to those strictly necessary; they are not intended
-to be a complete interface to any garbage collector. A program will need to
-interface with the GC library using the facilities provided by that program.
+IR features is specified by the selected :ref:`GC strategy description
+<plugin>`.
Specifying GC code generation: ``gc "..."``
-------------------------------------------
.. code-block:: llvm
- define ty @name(...) gc "name" { ...
+ define <returntype> @name(...) gc "name" { ... }
-The ``gc`` function attribute is used to specify the desired GC style to the
+The ``gc`` function attribute is used to specify the desired GC strategy to the
compiler. Its programmatic equivalent is the ``setGC`` method of ``Function``.
-Setting ``gc "name"`` on a function triggers a search for a matching code
-generation plugin "*name*"; it is that plugin which defines the exact nature of
-the code generated to support GC. If none is found, the compiler will raise an
-error.
+Setting ``gc "name"`` on a function triggers a search for a matching subclass
+of GCStrategy. Some collector strategies are built in. You can add others
+using either the loadable plugin mechanism, or by patching your copy of LLVM.
+It is the selected GC strategy which defines the exact nature of the code
+generated to support GC. If none is found, the compiler will raise an error.
Specifying the GC style on a per-function basis allows LLVM to link together
programs that use different garbage collection algorithms (or none at all).
.. _gcroot:
-Identifying GC roots on the stack: ``llvm.gcroot``
---------------------------------------------------
+Identifying GC roots on the stack
+----------------------------------
+
+LLVM currently supports two different mechanisms for describing references in
+compiled code at safepoints. ``llvm.gcroot`` is the older mechanism;
+``gc.statepoint`` has been added more recently. At the moment, you can choose
+either implementation (on a per :ref:`GC strategy <plugin>` basis). Longer
+term, we will probably either migrate away from ``llvm.gcroot`` entirely, or
+substantially merge their implementations. Note that most new development
+work is focused on ``gc.statepoint``.
+
+Using ``gc.statepoint``
+^^^^^^^^^^^^^^^^^^^^^^^^
+:doc:`This page <Statepoints>` contains detailed documentation for
+``gc.statepoint``.
+
+Using ``llvm.gcwrite``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: llvm
@@ -317,24 +247,27 @@ Identifying GC roots on the stack: ``llvm.gcroot``
The ``llvm.gcroot`` intrinsic is used to inform LLVM that a stack variable
references an object on the heap and is to be tracked for garbage collection.
-The exact impact on generated code is specified by a :ref:`compiler plugin
-<plugin>`. All calls to ``llvm.gcroot`` **must** reside inside the first basic
-block.
+The exact impact on generated code is specified by the Function's selected
+:ref:`GC strategy <plugin>`. All calls to ``llvm.gcroot`` **must** reside
+inside the first basic block.
-A compiler which uses mem2reg to raise imperative code using ``alloca`` into SSA
-form need only add a call to ``@llvm.gcroot`` for those variables which a
-pointers into the GC heap.
+The first argument **must** be a value referring to an alloca instruction or a
+bitcast of an alloca. The second contains a pointer to metadata that should be
+associated with the pointer, and **must** be a constant or global value
+address. If your target collector uses tags, use a null pointer for metadata.
+
+A compiler which performs manual SSA construction **must** ensure that SSA
+values representing GC references are stored in to the alloca passed to the
+respective ``gcroot`` before every call site and reloaded after every call.
+A compiler which uses mem2reg to raise imperative code using ``alloca`` into
+SSA form need only add a call to ``@llvm.gcroot`` for those variables which
+are pointers into the GC heap.
It is also important to mark intermediate values with ``llvm.gcroot``. For
example, consider ``h(f(), g())``. Beware leaking the result of ``f()`` in the
case that ``g()`` triggers a collection. Note, that stack variables must be
initialized and marked with ``llvm.gcroot`` in function's prologue.
-The first argument **must** be a value referring to an alloca instruction or a
-bitcast of an alloca. The second contains a pointer to metadata that should be
-associated with the pointer, and **must** be a constant or global value
-address. If your target collector uses tags, use a null pointer for metadata.
-
The ``%metadata`` argument can be used to avoid requiring heap objects to have
'isa' pointers or tag bits. [Appel89_, Goldberg91_, Tolmach94_] If specified,
its value will be tracked along with the location of the pointer in the stack
@@ -407,12 +340,18 @@ pointer:
%derived = getelementptr %object, i32 0, i32 2, i32 %n
LLVM does not enforce this relationship between the object and derived pointer
-(although a :ref:`plugin <plugin>` might). However, it would be an unusual
-collector that violated it.
+(although a particular :ref:`collector strategy <plugin>` might). However, it
+would be an unusual collector that violated it.
+
+The use of these intrinsics is naturally optional if the target GC does not
+require the corresponding barrier. The GC strategy used with such a collector
+should replace the intrinsic calls with the corresponding ``load`` or
+``store`` instruction if they are used.
-The use of these intrinsics is naturally optional if the target GC does require
-the corresponding barrier. Such a GC plugin will replace the intrinsic calls
-with the corresponding ``load`` or ``store`` instruction if they are used.
+One known deficiency with the current design is that the barrier intrinsics do
+not include the size or alignment of the underlying operation performed. It is
+currently assumed that the operation is of pointer size and the alignment is
+assumed to be the target machine's default alignment.
Write barrier: ``llvm.gcwrite``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -423,8 +362,8 @@ Write barrier: ``llvm.gcwrite``
For write barriers, LLVM provides the ``llvm.gcwrite`` intrinsic function. It
has exactly the same semantics as a non-volatile ``store`` to the derived
-pointer (the third argument). The exact code generated is specified by a
-compiler :ref:`plugin <plugin>`.
+pointer (the third argument). The exact code generated is specified by the
+Function's selected :ref:`GC strategy <plugin>`.
Many important algorithms require write barriers, including generational and
concurrent collectors. Additionally, write barriers could be used to implement
@@ -439,16 +378,189 @@ Read barrier: ``llvm.gcread``
For read barriers, LLVM provides the ``llvm.gcread`` intrinsic function. It has
exactly the same semantics as a non-volatile ``load`` from the derived pointer
-(the second argument). The exact code generated is specified by a
-:ref:`compiler plugin <plugin>`.
+(the second argument). The exact code generated is specified by the Function's
+selected :ref:`GC strategy <plugin>`.
Read barriers are needed by fewer algorithms than write barriers, and may have a
greater performance impact since pointer reads are more frequent than writes.
.. _plugin:
+.. _builtin-gc-strategies:
+
+Built In GC Strategies
+======================
+
+LLVM includes built in support for several varieties of garbage collectors.
+
+The Shadow Stack GC
+----------------------
+
+To use this collector strategy, mark your functions with:
+
+.. code-block:: c++
+
+ F.setGC("shadow-stack");
+
+Unlike many GC algorithms which rely on a cooperative code generator to compile
+stack maps, this algorithm carefully maintains a linked list of stack roots
+[:ref:`Henderson2002 <henderson02>`]. This so-called "shadow stack" mirrors the
+machine stack. Maintaining this data structure is slower than using a stack map
+compiled into the executable as constant data, but has a significant portability
+advantage because it requires no special support from the target code generator,
+and does not require tricky platform-specific code to crawl the machine stack.
+
+The tradeoff for this simplicity and portability is:
+
+* High overhead per function call.
+
+* Not thread-safe.
+
+Still, it's an easy way to get started. After your compiler and runtime are up
+and running, writing a :ref:`plugin <plugin>` will allow you to take advantage
+of :ref:`more advanced GC features <collector-algos>` of LLVM in order to
+improve performance.
+
+
+The shadow stack doesn't imply a memory allocation algorithm. A semispace
+collector or building atop ``malloc`` are great places to start, and can be
+implemented with very little code.
+
+When it comes time to collect, however, your runtime needs to traverse the stack
+roots, and for this it needs to integrate with the shadow stack. Luckily, doing
+so is very simple. (This code is heavily commented to help you understand the
+data structure, but there are only 20 lines of meaningful code.)
+
+.. code-block:: c++
+
+ /// @brief The map for a single function's stack frame. One of these is
+ /// compiled as constant data into the executable for each function.
+ ///
+ /// Storage of metadata values is elided if the %metadata parameter to
+ /// @llvm.gcroot is null.
+ struct FrameMap {
+ int32_t NumRoots; //< Number of roots in stack frame.
+ int32_t NumMeta; //< Number of metadata entries. May be < NumRoots.
+ const void *Meta[0]; //< Metadata for each root.
+ };
+
+ /// @brief A link in the dynamic shadow stack. One of these is embedded in
+ /// the stack frame of each function on the call stack.
+ struct StackEntry {
+ StackEntry *Next; //< Link to next stack entry (the caller's).
+ const FrameMap *Map; //< Pointer to constant FrameMap.
+ void *Roots[0]; //< Stack roots (in-place array).
+ };
+
+ /// @brief The head of the singly-linked list of StackEntries. Functions push
+ /// and pop onto this in their prologue and epilogue.
+ ///
+ /// Since there is only a global list, this technique is not threadsafe.
+ StackEntry *llvm_gc_root_chain;
+
+ /// @brief Calls Visitor(root, meta) for each GC root on the stack.
+ /// root and meta are exactly the values passed to
+ /// @llvm.gcroot.
+ ///
+ /// Visitor could be a function to recursively mark live objects. Or it
+ /// might copy them to another heap or generation.
+ ///
+ /// @param Visitor A function to invoke for every GC root on the stack.
+ void visitGCRoots(void (*Visitor)(void **Root, const void *Meta)) {
+ for (StackEntry *R = llvm_gc_root_chain; R; R = R->Next) {
+ unsigned i = 0;
+
+ // For roots [0, NumMeta), the metadata pointer is in the FrameMap.
+ for (unsigned e = R->Map->NumMeta; i != e; ++i)
+ Visitor(&R->Roots[i], R->Map->Meta[i]);
+
+ // For roots [NumMeta, NumRoots), the metadata pointer is null.
+ for (unsigned e = R->Map->NumRoots; i != e; ++i)
+ Visitor(&R->Roots[i], NULL);
+ }
+ }
+
+
+The 'Erlang' and 'Ocaml' GCs
+-----------------------------
+
+LLVM ships with two example collectors which leverage the ``gcroot``
+mechanisms. To our knowledge, these are not actually used by any language
+runtime, but they do provide a reasonable starting point for someone interested
+in writing an ``gcroot`` compatible GC plugin. In particular, these are the
+only in tree examples of how to produce a custom binary stack map format using
+a ``gcroot`` strategy.
+
+As there names imply, the binary format produced is intended to model that
+used by the Erlang and OCaml compilers respectively.
+
+
+The Statepoint Example GC
+-------------------------
+
+.. code-block:: c++
+
+ F.setGC("statepoint-example");
+
+This GC provides an example of how one might use the infrastructure provided
+by ``gc.statepoint``. This example GC is compatible with the
+:ref:`PlaceSafepoints` and :ref:`RewriteStatepointsForGC` utility passes
+which simplify ``gc.statepoint`` sequence insertion. If you need to build a
+custom GC strategy around the ``gc.statepoints`` mechanisms, it is recommended
+that you use this one as a starting point.
+
+This GC strategy does not support read or write barriers. As a result, these
+intrinsics are lowered to normal loads and stores.
+
+The stack map format generated by this GC strategy can be found in the
+:ref:`stackmap-section` using a format documented :ref:`here
+<statepoint-stackmap-format>`. This format is intended to be the standard
+format supported by LLVM going forward.
+
+
+Custom GC Strategies
+====================
+
+If none of the built in GC strategy descriptions met your needs above, you will
+need to define a custom GCStrategy and possibly, a custom LLVM pass to perform
+lowering. Your best example of where to start defining a custom GCStrategy
+would be to look at one of the built in strategies.
+
+You may be able to structure this additional code as a loadable plugin library.
+Loadable plugins are sufficient if all you need is to enable a different
+combination of built in functionality, but if you need to provide a custom
+lowering pass, you will need to build a patched version of LLVM. If you think
+you need a patched build, please ask for advice on llvm-dev. There may be an
+easy way we can extend the support to make it work for your use case without
+requiring a custom build.
+
+Collector Requirements
+----------------------
+
+You should be able to leverage any existing collector library that includes the following elements:
+
+#. A memory allocator which exposes an allocation function your compiled
+ code can call.
+
+#. A binary format for the stack map. A stack map describes the location
+ of references at a safepoint and is used by precise collectors to identify
+ references within a stack frame on the machine stack. Note that collectors
+ which conservatively scan the stack don't require such a structure.
+
+#. A stack crawler to discover functions on the call stack, and enumerate the
+ references listed in the stack map for each call site.
+
+#. A mechanism for identifying references in global locations (e.g. global
+ variables).
+
+#. If you collector requires them, an LLVM IR implementation of your collectors
+ load and store barriers. Note that since many collectors don't require
+ barriers at all, LLVM defaults to lowering such barriers to normal loads
+ and stores unless you arrange otherwise.
+
+
Implementing a collector plugin
-===============================
+-------------------------------
User code specifies which GC code generation to use with the ``gc`` function
attribute or, equivalently, with the ``setGC`` method of ``Function``.
@@ -721,8 +833,9 @@ this feature should be used by all GC plugins. It is enabled by default.
Custom lowering of intrinsics: ``CustomRoots``, ``CustomReadBarriers``, and ``CustomWriteBarriers``
---------------------------------------------------------------------------------------------------
-For GCs which use barriers or unusual treatment of stack roots, these flags
-allow the collector to perform arbitrary transformations of the LLVM IR:
+For GCs which use barriers or unusual treatment of stack roots, these
+flags allow the collector to perform arbitrary transformations of the
+LLVM IR:
.. code-block:: c++
@@ -733,70 +846,18 @@ allow the collector to perform arbitrary transformations of the LLVM IR:
CustomReadBarriers = true;
CustomWriteBarriers = true;
}
-
- virtual bool initializeCustomLowering(Module &M);
- virtual bool performCustomLowering(Function &F);
};
-If any of these flags are set, then LLVM suppresses its default lowering for the
-corresponding intrinsics and instead calls ``performCustomLowering``.
-
-LLVM's default action for each intrinsic is as follows:
-
-* ``llvm.gcroot``: Leave it alone. The code generator must see it or the stack
- map will not be computed.
-
-* ``llvm.gcread``: Substitute a ``load`` instruction.
-
-* ``llvm.gcwrite``: Substitute a ``store`` instruction.
-
-If ``CustomReadBarriers`` or ``CustomWriteBarriers`` are specified, then
-``performCustomLowering`` **must** eliminate the corresponding barriers.
+If any of these flags are set, LLVM suppresses its default lowering for
+the corresponding intrinsics. Instead, you must provide a custom Pass
+which lowers the intrinsics as desired. If you have opted in to custom
+lowering of a particular intrinsic your pass **must** eliminate all
+instances of the corresponding intrinsic in functions which opt in to
+your GC. The best example of such a pass is the ShadowStackGC and it's
+ShadowStackGCLowering pass.
-``performCustomLowering`` must comply with the same restrictions as
-:ref:`FunctionPass::runOnFunction <writing-an-llvm-pass-runOnFunction>`
-Likewise, ``initializeCustomLowering`` has the same semantics as
-:ref:`Pass::doInitialization(Module&)
-<writing-an-llvm-pass-doInitialization-mod>`
-
-The following can be used as a template:
-
-.. code-block:: c++
-
- #include "llvm/IR/Module.h"
- #include "llvm/IR/IntrinsicInst.h"
-
- bool MyGC::initializeCustomLowering(Module &M) {
- return false;
- }
-
- bool MyGC::performCustomLowering(Function &F) {
- bool MadeChange = false;
-
- for (Function::iterator BB = F.begin(), E = F.end(); BB != E; ++BB)
- for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E; )
- if (IntrinsicInst *CI = dyn_cast<IntrinsicInst>(II++))
- if (Function *F = CI->getCalledFunction())
- switch (F->getIntrinsicID()) {
- case Intrinsic::gcwrite:
- // Handle llvm.gcwrite.
- CI->eraseFromParent();
- MadeChange = true;
- break;
- case Intrinsic::gcread:
- // Handle llvm.gcread.
- CI->eraseFromParent();
- MadeChange = true;
- break;
- case Intrinsic::gcroot:
- // Handle llvm.gcroot.
- CI->eraseFromParent();
- MadeChange = true;
- break;
- }
-
- return MadeChange;
- }
+There is currently no way to register such a custom lowering pass
+without building a custom copy of LLVM.
.. _safe-points:
diff --git a/docs/GettingStarted.rst b/docs/GettingStarted.rst
index 316f1f7..fa55ece 100644
--- a/docs/GettingStarted.rst
+++ b/docs/GettingStarted.rst
@@ -230,7 +230,7 @@ our build systems:
* Clang 3.1
* GCC 4.7
-* Visual Studio 2012
+* Visual Studio 2013
Anything older than these toolchains *may* work, but will require forcing the
build system with a special option and is not really a supported host platform.
@@ -280,7 +280,7 @@ Getting a Modern Host C++ Toolchain
This section mostly applies to Linux and older BSDs. On Mac OS X, you should
have a sufficiently modern Xcode, or you will likely need to upgrade until you
-do. On Windows, just use Visual Studio 2012 as the host compiler, it is
+do. On Windows, just use Visual Studio 2013 as the host compiler, it is
explicitly supported and widely available. FreeBSD 10.0 and newer have a modern
Clang as the system compiler.
diff --git a/docs/GettingStartedVS.rst b/docs/GettingStartedVS.rst
index fa20912..63e81f5 100644
--- a/docs/GettingStartedVS.rst
+++ b/docs/GettingStartedVS.rst
@@ -45,13 +45,13 @@ and software you will need.
Hardware
--------
-Any system that can adequately run Visual Studio 2012 is fine. The LLVM
+Any system that can adequately run Visual Studio 2013 is fine. The LLVM
source tree and object files, libraries and executables will consume
approximately 3GB.
Software
--------
-You will need Visual Studio 2012 or higher.
+You will need Visual Studio 2013 or higher.
You will also need the `CMake <http://www.cmake.org/>`_ build system since it
generates the project files you will use to build with.
diff --git a/docs/HowToSetUpLLVMStyleRTTI.rst b/docs/HowToSetUpLLVMStyleRTTI.rst
index 96275e7..3892994 100644
--- a/docs/HowToSetUpLLVMStyleRTTI.rst
+++ b/docs/HowToSetUpLLVMStyleRTTI.rst
@@ -40,14 +40,14 @@ RTTI for this class hierarchy:
double SideLength;
public:
Square(double S) : SideLength(S) {}
- double computeArea() /* override */;
+ double computeArea() override;
};
class Circle : public Shape {
double Radius;
public:
Circle(double R) : Radius(R) {}
- double computeArea() /* override */;
+ double computeArea() override;
};
The most basic working setup for LLVM-style RTTI requires the following
@@ -135,7 +135,7 @@ steps:
public:
- Square(double S) : SideLength(S) {}
+ Square(double S) : Shape(SK_Square), SideLength(S) {}
- double computeArea() /* override */;
+ double computeArea() override;
};
class Circle : public Shape {
@@ -143,7 +143,7 @@ steps:
public:
- Circle(double R) : Radius(R) {}
+ Circle(double R) : Shape(SK_Circle), Radius(R) {}
- double computeArea() /* override */;
+ double computeArea() override;
};
#. Finally, you need to inform LLVM's RTTI templates how to dynamically
@@ -175,7 +175,7 @@ steps:
double SideLength;
public:
Square(double S) : Shape(SK_Square), SideLength(S) {}
- double computeArea() /* override */;
+ double computeArea() override;
+
+ static bool classof(const Shape *S) {
+ return S->getKind() == SK_Square;
@@ -186,7 +186,7 @@ steps:
double Radius;
public:
Circle(double R) : Shape(SK_Circle), Radius(R) {}
- double computeArea() /* override */;
+ double computeArea() override;
+
+ static bool classof(const Shape *S) {
+ return S->getKind() == SK_Circle;
@@ -377,6 +377,20 @@ contract for ``classof`` is "return ``true`` if the dynamic type of the
argument is-a ``C``". As long as your implementation fulfills this
contract, you can tweak and optimize it as much as you want.
+For example, LLVM-style RTTI can work fine in the presence of
+multiple-inheritance by defining an appropriate ``classof``.
+An example of this in practice is
+`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_ vs.
+`DeclContext <http://clang.llvm.org/doxygen/classclang_1_1DeclContext.html>`_
+inside Clang.
+The ``Decl`` hierarchy is done very similarly to the example setup
+demonstrated in this tutorial.
+The key part is how to then incorporate ``DeclContext``: all that is needed
+is in ``bool DeclContext::classof(const Decl *)``, which asks the question
+"Given a ``Decl``, how can I determine if it is-a ``DeclContext``?".
+It answers this with a simple switch over the set of ``Decl`` "kinds", and
+returning true for ones that are known to be ``DeclContext``'s.
+
.. TODO::
Touch on some of the more advanced features, like ``isa_impl`` and
diff --git a/docs/LangRef.rst b/docs/LangRef.rst
index 3b7d80b..a0e9b18 100644
--- a/docs/LangRef.rst
+++ b/docs/LangRef.rst
@@ -75,7 +75,7 @@ identifiers, for different purposes:
#. Named values are represented as a string of characters with their
prefix. For example, ``%foo``, ``@DivisionByZero``,
``%a.really.long.identifier``. The actual regular expression used is
- '``[%@][a-zA-Z$._][a-zA-Z$._0-9]*``'. Identifiers that require other
+ '``[%@][-a-zA-Z$._][-a-zA-Z$._0-9]*``'. Identifiers that require other
characters in their names can be surrounded with quotes. Special
characters may be escaped using ``"\xx"`` where ``xx`` is the ASCII
code for the character in hexadecimal. In this way, any character can
@@ -170,7 +170,7 @@ symbol table entries. Here is an example of the "hello world" module:
}
; Named metadata
- !0 = metadata !{i32 42, null, metadata !"string"}
+ !0 = !{i32 42, null, !"string"}
!foo = !{!0}
This example is made up of a :ref:`global variable <globalvars>` named
@@ -368,7 +368,7 @@ added in the future:
The idea behind this convention is to support calls to runtime functions
that have a hot path and a cold path. The hot path is usually a small piece
- of code that doesn't many registers. The cold path might need to call out to
+ of code that doesn't use many registers. The cold path might need to call out to
another function and therefore only needs to preserve the caller-saved
registers, which haven't already been saved by the caller. The
`PreserveMost` calling convention is very similar to the `cold` calling
@@ -521,7 +521,7 @@ Global Variables
Global variables define regions of memory allocated at compilation time
instead of run-time.
-Global variables definitions must be initialized.
+Global variable definitions must be initialized.
Global variables in other translation units can also be declared, in which
case they don't have an initializer.
@@ -588,7 +588,7 @@ iteration. The maximum alignment is ``1 << 29``.
Globals can also have a :ref:`DLL storage class <dllstorageclass>`.
-Variables and aliasaes can have a
+Variables and aliases can have a
:ref:`Thread Local Storage Model <tls_model>`.
Syntax::
@@ -596,7 +596,8 @@ Syntax::
[@<GlobalVarName> =] [Linkage] [Visibility] [DLLStorageClass] [ThreadLocal]
[unnamed_addr] [AddrSpace] [ExternallyInitialized]
<global | constant> <Type> [<InitializerConstant>]
- [, section "name"] [, align <Alignment>]
+ [, section "name"] [, comdat [($name)]]
+ [, align <Alignment>]
For example, the following defines a global in a numbered address space
with an initializer, section, and alignment:
@@ -633,7 +634,8 @@ name, a (possibly empty) argument list (each with optional :ref:`parameter
attributes <paramattrs>`), optional :ref:`function attributes <fnattrs>`,
an optional section, an optional alignment,
an optional :ref:`comdat <langref_comdats>`,
-an optional :ref:`garbage collector name <gc>`, an optional :ref:`prefix <prefixdata>`, an opening
+an optional :ref:`garbage collector name <gc>`, an optional :ref:`prefix <prefixdata>`,
+an optional :ref:`prologue <prologuedata>`, an opening
curly brace, a list of basic blocks, and a closing curly brace.
LLVM function declarations consist of the "``declare``" keyword, an
@@ -643,7 +645,8 @@ an optional :ref:`calling convention <callingconv>`,
an optional ``unnamed_addr`` attribute, a return type, an optional
:ref:`parameter attribute <paramattrs>` for the return type, a function
name, a possibly empty list of arguments, an optional alignment, an optional
-:ref:`garbage collector name <gc>` and an optional :ref:`prefix <prefixdata>`.
+:ref:`garbage collector name <gc>`, an optional :ref:`prefix <prefixdata>`,
+and an optional :ref:`prologue <prologuedata>`.
A function definition contains a list of basic blocks, forming the CFG (Control
Flow Graph) for the function. Each basic block may optionally start with a label
@@ -663,7 +666,7 @@ predecessors, it also cannot have any :ref:`PHI nodes <i_phi>`.
LLVM allows an explicit section to be specified for functions. If the
target supports it, it will emit functions to the section specified.
-Additionally, the function can placed in a COMDAT.
+Additionally, the function can be placed in a COMDAT.
An explicit alignment may be specified for a function. If not present,
or if the alignment is set to zero, the alignment of the function is set
@@ -671,7 +674,7 @@ by the target to whatever it feels convenient. If an explicit alignment
is specified, the function is forced to have at least that much
alignment. All alignments must be a power of 2.
-If the ``unnamed_addr`` attribute is given, the address is know to not
+If the ``unnamed_addr`` attribute is given, the address is known to not
be significant and two identical functions can be merged.
Syntax::
@@ -679,8 +682,8 @@ Syntax::
define [linkage] [visibility] [DLLStorageClass]
[cconv] [ret attrs]
<ResultType> @<FunctionName> ([argument list])
- [unnamed_addr] [fn Attrs] [section "name"] [comdat $<ComdatName>]
- [align N] [gc] [prefix Constant] { ... }
+ [unnamed_addr] [fn Attrs] [section "name"] [comdat [($name)]]
+ [align N] [gc] [prefix Constant] [prologue Constant] { ... }
The argument list is a comma seperated sequence of arguments where each
argument is of the following form
@@ -713,7 +716,7 @@ The linkage must be one of ``private``, ``internal``, ``linkonce``, ``weak``,
``linkonce_odr``, ``weak_odr``, ``external``. Note that some system linkers
might not correctly handle dropping a weak symbol that is aliased.
-Alias that are not ``unnamed_addr`` are guaranteed to have the same address as
+Aliases that are not ``unnamed_addr`` are guaranteed to have the same address as
the aliasee expression. ``unnamed_addr`` ones are only guaranteed to point
to the same content.
@@ -773,12 +776,21 @@ the COMDAT key's section is the largest:
.. code-block:: llvm
$foo = comdat largest
- @foo = global i32 2, comdat $foo
+ @foo = global i32 2, comdat($foo)
- define void @bar() comdat $foo {
+ define void @bar() comdat($foo) {
ret void
}
+As a syntactic sugar the ``$name`` can be omitted if the name is the same as
+the global name:
+
+.. code-block:: llvm
+
+ $foo = comdat any
+ @foo = global i32 2, comdat
+
+
In a COFF object file, this will create a COMDAT section with selection kind
``IMAGE_COMDAT_SELECT_LARGEST`` containing the contents of the ``@foo`` symbol
and another COMDAT section with selection kind
@@ -801,8 +813,8 @@ For example:
$foo = comdat any
$bar = comdat any
- @g1 = global i32 42, section "sec", comdat $foo
- @g2 = global i32 42, section "sec", comdat $bar
+ @g1 = global i32 42, section "sec", comdat($foo)
+ @g2 = global i32 42, section "sec", comdat($bar)
From the object file perspective, this requires the creation of two sections
with the same name. This is necessary because both globals belong to different
@@ -825,9 +837,9 @@ operands for a named metadata.
Syntax::
; Some unnamed metadata nodes, which are referenced by the named metadata.
- !0 = metadata !{metadata !"zero"}
- !1 = metadata !{metadata !"one"}
- !2 = metadata !{metadata !"two"}
+ !0 = !{!"zero"}
+ !1 = !{!"one"}
+ !2 = !{!"two"}
; A named metadata.
!name = !{!0, !1, !2}
@@ -941,23 +953,26 @@ Currently, only the following parameter attributes are defined:
.. _noalias:
``noalias``
- This indicates that pointer values :ref:`based <pointeraliasing>` on
- the argument or return value do not alias pointer values that are
- not *based* on it, ignoring certain "irrelevant" dependencies. For a
- call to the parent function, dependencies between memory references
- from before or after the call and from those during the call are
- "irrelevant" to the ``noalias`` keyword for the arguments and return
- value used in that call. The caller shares the responsibility with
- the callee for ensuring that these requirements are met. For further
- details, please see the discussion of the NoAlias response in :ref:`alias
- analysis <Must, May, or No>`.
+ This indicates that objects accessed via pointer values
+ :ref:`based <pointeraliasing>` on the argument or return value are not also
+ accessed, during the execution of the function, via pointer values not
+ *based* on the argument or return value. The attribute on a return value
+ also has additional semantics described below. The caller shares the
+ responsibility with the callee for ensuring that these requirements are met.
+ For further details, please see the discussion of the NoAlias response in
+ :ref:`alias analysis <Must, May, or No>`.
Note that this definition of ``noalias`` is intentionally similar
- to the definition of ``restrict`` in C99 for function arguments,
- though it is slightly weaker.
+ to the definition of ``restrict`` in C99 for function arguments.
For function return values, C99's ``restrict`` is not meaningful,
- while LLVM's ``noalias`` is.
+ while LLVM's ``noalias`` is. Furthermore, the semantics of the ``noalias``
+ attribute on return values are stronger than the semantics of the attribute
+ when used on function arguments. On function return values, the ``noalias``
+ attribute indicates that the function acts like a system memory allocation
+ function, returning a pointer to allocated storage disjoint from the
+ storage for any other object accessible to the caller.
+
``nocapture``
This indicates that the callee does not make any copies of the
pointer that outlive the callee itself. This is not a valid
@@ -999,66 +1014,101 @@ Currently, only the following parameter attributes are defined:
.. _gc:
-Garbage Collector Names
------------------------
+Garbage Collector Strategy Names
+--------------------------------
-Each function may specify a garbage collector name, which is simply a
+Each function may specify a garbage collector strategy name, which is simply a
string:
.. code-block:: llvm
define void @f() gc "name" { ... }
-The compiler declares the supported values of *name*. Specifying a
-collector will cause the compiler to alter its output in order to
-support the named garbage collection algorithm.
+The supported values of *name* includes those :ref:`built in to LLVM
+<builtin-gc-strategies>` and any provided by loaded plugins. Specifying a GC
+strategy will cause the compiler to alter its output in order to support the
+named garbage collection algorithm. Note that LLVM itself does not contain a
+garbage collector, this functionality is restricted to generating machine code
+which can interoperate with a collector provided externally.
.. _prefixdata:
Prefix Data
-----------
-Prefix data is data associated with a function which the code generator
-will emit immediately before the function body. The purpose of this feature
-is to allow frontends to associate language-specific runtime metadata with
-specific functions and make it available through the function pointer while
-still allowing the function pointer to be called. To access the data for a
-given function, a program may bitcast the function pointer to a pointer to
-the constant's type. This implies that the IR symbol points to the start
-of the prefix data.
+Prefix data is data associated with a function which the code
+generator will emit immediately before the function's entrypoint.
+The purpose of this feature is to allow frontends to associate
+language-specific runtime metadata with specific functions and make it
+available through the function pointer while still allowing the
+function pointer to be called.
+
+To access the data for a given function, a program may bitcast the
+function pointer to a pointer to the constant's type and dereference
+index -1. This implies that the IR symbol points just past the end of
+the prefix data. For instance, take the example of a function annotated
+with a single ``i32``,
+
+.. code-block:: llvm
+
+ define void @f() prefix i32 123 { ... }
+
+The prefix data can be referenced as,
+
+.. code-block:: llvm
+
+ %0 = bitcast *void () @f to *i32
+ %a = getelementptr inbounds *i32 %0, i32 -1
+ %b = load i32* %a
+
+Prefix data is laid out as if it were an initializer for a global variable
+of the prefix data's type. The function will be placed such that the
+beginning of the prefix data is aligned. This means that if the size
+of the prefix data is not a multiple of the alignment size, the
+function's entrypoint will not be aligned. If alignment of the
+function's entrypoint is desired, padding must be added to the prefix
+data.
+
+A function may have prefix data but no body. This has similar semantics
+to the ``available_externally`` linkage in that the data may be used by the
+optimizers but will not be emitted in the object file.
+
+.. _prologuedata:
-To maintain the semantics of ordinary function calls, the prefix data must
+Prologue Data
+-------------
+
+The ``prologue`` attribute allows arbitrary code (encoded as bytes) to
+be inserted prior to the function body. This can be used for enabling
+function hot-patching and instrumentation.
+
+To maintain the semantics of ordinary function calls, the prologue data must
have a particular format. Specifically, it must begin with a sequence of
bytes which decode to a sequence of machine instructions, valid for the
module's target, which transfer control to the point immediately succeeding
-the prefix data, without performing any other visible action. This allows
+the prologue data, without performing any other visible action. This allows
the inliner and other passes to reason about the semantics of the function
-definition without needing to reason about the prefix data. Obviously this
-makes the format of the prefix data highly target dependent.
+definition without needing to reason about the prologue data. Obviously this
+makes the format of the prologue data highly target dependent.
-Prefix data is laid out as if it were an initializer for a global variable
-of the prefix data's type. No padding is automatically placed between the
-prefix data and the function body. If padding is required, it must be part
-of the prefix data.
-
-A trivial example of valid prefix data for the x86 architecture is ``i8 144``,
+A trivial example of valid prologue data for the x86 architecture is ``i8 144``,
which encodes the ``nop`` instruction:
.. code-block:: llvm
- define void @f() prefix i8 144 { ... }
+ define void @f() prologue i8 144 { ... }
-Generally prefix data can be formed by encoding a relative branch instruction
-which skips the metadata, as in this example of valid prefix data for the
+Generally prologue data can be formed by encoding a relative branch instruction
+which skips the metadata, as in this example of valid prologue data for the
x86_64 architecture, where the first two bytes encode ``jmp .+10``:
.. code-block:: llvm
%0 = type <{ i8, i8, i8* }>
- define void @f() prefix %0 <{ i8 235, i8 8, i8* @md}> { ... }
+ define void @f() prologue %0 <{ i8 235, i8 8, i8* @md}> { ... }
-A function may have prefix data but no body. This has similar semantics
+A function may have prologue data but no body. This has similar semantics
to the ``available_externally`` linkage in that the data may be used by the
optimizers but will not be emitted in the object file.
@@ -1189,9 +1239,12 @@ example:
normally. This produces undefined behavior at runtime if the
function ever does dynamically return.
``nounwind``
- This function attribute indicates that the function never returns
- with an unwind or exceptional control flow. If the function does
- unwind, its runtime behavior is undefined.
+ This function attribute indicates that the function never raises an
+ exception. If the function does raise an exception, its runtime
+ behavior is undefined. However, functions marked nounwind may still
+ trap or generate asynchronous exceptions. Exception handling schemes
+ that are recognized by LLVM to handle asynchronous exceptions, such
+ as SEH, will still provide their implementation defined semantics.
``optnone``
This function attribute indicates that the function is not optimized
by any optimization or code generator passes with the
@@ -1732,7 +1785,7 @@ Fast-Math Flags
LLVM IR floating-point binary ops (:ref:`fadd <i_fadd>`,
:ref:`fsub <i_fsub>`, :ref:`fmul <i_fmul>`, :ref:`fdiv <i_fdiv>`,
-:ref:`frem <i_frem>`) have the following flags that can set to enable
+:ref:`frem <i_frem>`) have the following flags that can be set to enable
otherwise unsafe floating point operations
``nnan``
@@ -2293,11 +2346,11 @@ constants and smaller complex constants.
having to print large zero initializers (e.g. for large arrays) and
is always exactly equivalent to using explicit zero initializers.
**Metadata node**
- A metadata node is a structure-like constant with :ref:`metadata
- type <t_metadata>`. For example:
- "``metadata !{ i32 0, metadata !"test" }``". Unlike other
- constants that are meant to be interpreted as part of the
- instruction stream, metadata is a place to attach additional
+ A metadata node is a constant tuple without types. For example:
+ "``!{!0, !{!2, !0}, !"test"}``". Metadata can reference constant values,
+ for example: "``!{!0, i32 0, i8* @global, i64 (i64)* @function, !"str"}``".
+ Unlike other typed constants that are meant to be interpreted as part of
+ the instruction stream, metadata is a place to attach additional
information such as debug info.
Global Variable and Function Addresses
@@ -2771,15 +2824,21 @@ occurs on.
.. _metadata:
-Metadata Nodes and Metadata Strings
------------------------------------
+Metadata
+========
LLVM IR allows metadata to be attached to instructions in the program
that can convey extra information about the code to the optimizers and
code generator. One example application of metadata is source-level
debug information. There are two metadata primitives: strings and nodes.
-All metadata has the ``metadata`` type and is identified in syntax by a
-preceding exclamation point ('``!``').
+
+Metadata does not have a type, and is not a value. If referenced from a
+``call`` instruction, it uses the ``metadata`` type.
+
+All metadata are identified in syntax by a exclamation point ('``!``').
+
+Metadata Nodes and Metadata Strings
+-----------------------------------
A metadata string is a string surrounded by double quotes. It can
contain any character by escaping non-printable characters with
@@ -2793,7 +2852,17 @@ their operand. For example:
.. code-block:: llvm
- !{ metadata !"test\00", i32 10}
+ !{ !"test\00", i32 10}
+
+Metadata nodes that aren't uniqued use the ``distinct`` keyword. For example:
+
+.. code-block:: llvm
+
+ !0 = distinct !{!"test\00", i32 10}
+
+``distinct`` nodes are useful when nodes shouldn't be merged based on their
+content. They can also occur when transformations cause uniquing collisions
+when metadata operands change.
A :ref:`named metadata <namedmetadatastructure>` is a collection of
metadata nodes, which can be looked up in the module symbol table. For
@@ -2801,7 +2870,7 @@ example:
.. code-block:: llvm
- !foo = metadata !{!4, !3}
+ !foo = !{!4, !3}
Metadata can be used as function arguments. Here ``llvm.dbg.value``
function is using two metadata arguments:
@@ -2820,6 +2889,23 @@ attached to the ``add`` instruction using the ``!dbg`` identifier:
More information about specific metadata nodes recognized by the
optimizers and code generator is found below.
+Specialized Metadata Nodes
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Specialized metadata nodes are custom data structures in metadata (as opposed
+to generic tuples). Their fields are labelled, and can be specified in any
+order.
+
+MDLocation
+""""""""""
+
+``MDLocation`` nodes represent source debug locations. The ``scope:`` field is
+mandatory.
+
+.. code-block:: llvm
+
+ !0 = !MDLocation(line: 2900, column: 42, scope: !1, inlinedAt: !2)
+
'``tbaa``' Metadata
^^^^^^^^^^^^^^^^^^^
@@ -2834,10 +2920,10 @@ to three fields, e.g.:
.. code-block:: llvm
- !0 = metadata !{ metadata !"an example type tree" }
- !1 = metadata !{ metadata !"int", metadata !0 }
- !2 = metadata !{ metadata !"float", metadata !0 }
- !3 = metadata !{ metadata !"const float", metadata !2, i64 1 }
+ !0 = !{ !"an example type tree" }
+ !1 = !{ !"int", !0 }
+ !2 = !{ !"float", !0 }
+ !3 = !{ !"const float", !2, i64 1 }
The first field is an identity field. It can be any value, usually a
metadata string, which uniquely identifies the type. The most important
@@ -2877,7 +2963,7 @@ its tbaa tag. e.g.:
.. code-block:: llvm
- !4 = metadata !{ i64 0, i64 4, metadata !1, i64 8, i64 4, metadata !2 }
+ !4 = !{ i64 0, i64 4, !1, i64 8, i64 4, !2 }
This describes a struct with two fields. The first is at offset 0 bytes
with size 4 bytes, and has tbaa tag !1. The second is at offset 8 bytes
@@ -2898,7 +2984,7 @@ collection of memory access instructions that carry ``alias.scope`` metadata.
Each type of metadata specifies a list of scopes where each scope has an id and
a domain. When evaluating an aliasing query, if for some some domain, the set
of scopes with that domain in one instruction's ``alias.scope`` list is a
-subset of (or qual to) the set of scopes for that domain in another
+subset of (or equal to) the set of scopes for that domain in another
instruction's ``noalias`` list, then the two memory accesses are assumed not to
alias.
@@ -2920,18 +3006,18 @@ For example,
.. code-block:: llvm
; Two scope domains:
- !0 = metadata !{metadata !0}
- !1 = metadata !{metadata !1}
+ !0 = !{!0}
+ !1 = !{!1}
; Some scopes in these domains:
- !2 = metadata !{metadata !2, metadata !0}
- !3 = metadata !{metadata !3, metadata !0}
- !4 = metadata !{metadata !4, metadata !1}
+ !2 = !{!2, !0}
+ !3 = !{!3, !0}
+ !4 = !{!4, !1}
; Some scope lists:
- !5 = metadata !{metadata !4} ; A list containing only scope !4
- !6 = metadata !{metadata !4, metadata !3, metadata !2}
- !7 = metadata !{metadata !3}
+ !5 = !{!4} ; A list containing only scope !4
+ !6 = !{!4, !3, !2}
+ !7 = !{!3}
; These two instructions don't alias:
%0 = load float* %c, align 4, !alias.scope !5
@@ -2968,7 +3054,7 @@ number representing the maximum relative error, for example:
.. code-block:: llvm
- !0 = metadata !{ float 2.5 } ; maximum acceptable inaccuracy is 2.5 ULPs
+ !0 = !{ float 2.5 } ; maximum acceptable inaccuracy is 2.5 ULPs
'``range``' Metadata
^^^^^^^^^^^^^^^^^^^^
@@ -3000,10 +3086,10 @@ Examples:
%d = invoke i8 @bar() to label %cont
unwind label %lpad, !range !3 ; Can only be -2, -1, 3, 4 or 5
...
- !0 = metadata !{ i8 0, i8 2 }
- !1 = metadata !{ i8 255, i8 2 }
- !2 = metadata !{ i8 0, i8 2, i8 3, i8 6 }
- !3 = metadata !{ i8 -2, i8 0, i8 3, i8 6 }
+ !0 = !{ i8 0, i8 2 }
+ !1 = !{ i8 255, i8 2 }
+ !2 = !{ i8 0, i8 2, i8 3, i8 6 }
+ !3 = !{ i8 -2, i8 0, i8 3, i8 6 }
'``llvm.loop``'
^^^^^^^^^^^^^^^
@@ -3023,8 +3109,8 @@ constructs:
.. code-block:: llvm
- !0 = metadata !{ metadata !0 }
- !1 = metadata !{ metadata !1 }
+ !0 = !{!0}
+ !1 = !{!1}
The loop identifier metadata can be used to specify additional
per-loop metadata. Any operands after the first operand can be treated
@@ -3035,8 +3121,8 @@ suggests an unroll factor to the loop unroller:
br i1 %exitcond, label %._crit_edge, label %.lr.ph, !llvm.loop !0
...
- !0 = metadata !{ metadata !0, metadata !1 }
- !1 = metadata !{ metadata !"llvm.loop.unroll.count", i32 4 }
+ !0 = !{!0, !1}
+ !1 = !{!"llvm.loop.unroll.count", i32 4}
'``llvm.loop.vectorize``' and '``llvm.loop.interleave``'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -3061,7 +3147,7 @@ example:
.. code-block:: llvm
- !0 = metadata !{ metadata !"llvm.loop.interleave.count", i32 4 }
+ !0 = !{!"llvm.loop.interleave.count", i32 4}
Note that setting ``llvm.loop.interleave.count`` to 1 disables interleaving
multiple iterations of the loop. If ``llvm.loop.interleave.count`` is set to 0
@@ -3077,8 +3163,8 @@ is a bit. If the bit operand value is 1 vectorization is enabled. A value of
.. code-block:: llvm
- !0 = metadata !{ metadata !"llvm.loop.vectorize.enable", i1 0 }
- !1 = metadata !{ metadata !"llvm.loop.vectorize.enable", i1 1 }
+ !0 = !{!"llvm.loop.vectorize.enable", i1 0}
+ !1 = !{!"llvm.loop.vectorize.enable", i1 1}
'``llvm.loop.vectorize.width``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -3089,7 +3175,7 @@ operand is an integer specifying the width. For example:
.. code-block:: llvm
- !0 = metadata !{ metadata !"llvm.loop.vectorize.width", i32 4 }
+ !0 = !{!"llvm.loop.vectorize.width", i32 4}
Note that setting ``llvm.loop.vectorize.width`` to 1 disables
vectorization of the loop. If ``llvm.loop.vectorize.width`` is set to
@@ -3116,7 +3202,7 @@ example:
.. code-block:: llvm
- !0 = metadata !{ metadata !"llvm.loop.unroll.count", i32 4 }
+ !0 = !{!"llvm.loop.unroll.count", i32 4}
If the trip count of the loop is less than the unroll count the loop
will be partially unrolled.
@@ -3129,7 +3215,7 @@ which is the string ``llvm.loop.unroll.disable``. For example:
.. code-block:: llvm
- !0 = metadata !{ metadata !"llvm.loop.unroll.disable" }
+ !0 = !{!"llvm.loop.unroll.disable"}
'``llvm.loop.unroll.full``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -3140,7 +3226,7 @@ For example:
.. code-block:: llvm
- !0 = metadata !{ metadata !"llvm.loop.unroll.full" }
+ !0 = !{!"llvm.loop.unroll.full"}
'``llvm.mem``'
^^^^^^^^^^^^^^^
@@ -3191,7 +3277,7 @@ metadata types that refer to the same loop identifier metadata.
for.end:
...
- !0 = metadata !{ metadata !0 }
+ !0 = !{!0}
It is also possible to have nested parallel loops. In that case the
memory accesses refer to a list of loop identifier metadata nodes instead of
@@ -3221,9 +3307,15 @@ the loop identifier metadata node directly:
outer.for.end: ; preds = %for.body
...
- !0 = metadata !{ metadata !1, metadata !2 } ; a list of loop identifiers
- !1 = metadata !{ metadata !1 } ; an identifier for the inner loop
- !2 = metadata !{ metadata !2 } ; an identifier for the outer loop
+ !0 = !{!1, !2} ; a list of loop identifiers
+ !1 = !{!1} ; an identifier for the inner loop
+ !2 = !{!2} ; an identifier for the outer loop
+
+'``llvm.bitsets``'
+^^^^^^^^^^^^^^^^^^
+
+The ``llvm.bitsets`` global metadata is used to implement
+:doc:`bitsets <BitSets>`.
Module Flags Metadata
=====================
@@ -3307,12 +3399,12 @@ An example of module flags:
.. code-block:: llvm
- !0 = metadata !{ i32 1, metadata !"foo", i32 1 }
- !1 = metadata !{ i32 4, metadata !"bar", i32 37 }
- !2 = metadata !{ i32 2, metadata !"qux", i32 42 }
- !3 = metadata !{ i32 3, metadata !"qux",
- metadata !{
- metadata !"foo", i32 1
+ !0 = !{ i32 1, !"foo", i32 1 }
+ !1 = !{ i32 4, !"bar", i32 37 }
+ !2 = !{ i32 2, !"qux", i32 42 }
+ !3 = !{ i32 3, !"qux",
+ !{
+ !"foo", i32 1
}
}
!llvm.module.flags = !{ !0, !1, !2, !3 }
@@ -3333,7 +3425,7 @@ An example of module flags:
::
- metadata !{ metadata !"foo", i32 1 }
+ !{ !"foo", i32 1 }
The behavior is to emit an error if the ``llvm.module.flags`` does not
contain a flag with the ID ``!"foo"`` that has the value '1' after linking is
@@ -3409,10 +3501,10 @@ For example, the following metadata section specifies two separate sets of
linker options, presumably to link against ``libz`` and the ``Cocoa``
framework::
- !0 = metadata !{ i32 6, metadata !"Linker Options",
- metadata !{
- metadata !{ metadata !"-lz" },
- metadata !{ metadata !"-framework", metadata !"Cocoa" } } }
+ !0 = !{ i32 6, !"Linker Options",
+ !{
+ !{ !"-lz" },
+ !{ !"-framework", !"Cocoa" } } }
!llvm.module.flags = !{ !0 }
The metadata encoding as lists of lists of options, as opposed to a collapsed
@@ -3458,8 +3550,8 @@ compiled with a ``wchar_t`` width of 4 bytes, and the underlying type of an
enum is the smallest type which can represent all of its values::
!llvm.module.flags = !{!0, !1}
- !0 = metadata !{i32 1, metadata !"short_wchar", i32 1}
- !1 = metadata !{i32 1, metadata !"short_enum", i32 0}
+ !0 = !{i32 1, !"short_wchar", i32 1}
+ !1 = !{i32 1, !"short_enum", i32 0}
.. _intrinsicglobalvariables:
@@ -5208,10 +5300,11 @@ as the ``MOVNT`` instruction on x86.
The optional ``!invariant.load`` metadata must reference a single
metadata name ``<index>`` corresponding to a metadata node with no
entries. The existence of the ``!invariant.load`` metadata on the
-instruction tells the optimizer and code generator that this load
-address points to memory which does not change value during program
-execution. The optimizer may then move this load around, for example, by
-hoisting it out of loops using loop invariant code motion.
+instruction tells the optimizer and code generator that the address
+operand to this load points to memory which can be assumed unchanged.
+Being invariant does not imply that a location is dereferenceable,
+but it does imply that once the location is known dereferenceable
+its value is henceforth unchanging.
The optional ``!nonnull`` metadata must reference a single
metadata name ``<index>`` corresponding to a metadata node with no
@@ -7016,18 +7109,28 @@ arbitrarily complex and require, for example, memory allocation.
Accurate Garbage Collection Intrinsics
--------------------------------------
-LLVM support for `Accurate Garbage Collection <GarbageCollection.html>`_
-(GC) requires the implementation and generation of these intrinsics.
+LLVM's support for `Accurate Garbage Collection <GarbageCollection.html>`_
+(GC) requires the frontend to generate code containing appropriate intrinsic
+calls and select an appropriate GC strategy which knows how to lower these
+intrinsics in a manner which is appropriate for the target collector.
+
These intrinsics allow identification of :ref:`GC roots on the
stack <int_gcroot>`, as well as garbage collector implementations that
require :ref:`read <int_gcread>` and :ref:`write <int_gcwrite>` barriers.
-Front-ends for type-safe garbage collected languages should generate
+Frontends for type-safe garbage collected languages should generate
these intrinsics to make use of the LLVM garbage collectors. For more
-details, see `Accurate Garbage Collection with
-LLVM <GarbageCollection.html>`_.
+details, see `Garbage Collection with LLVM <GarbageCollection.html>`_.
-The garbage collection intrinsics only operate on objects in the generic
-address space (address space zero).
+Experimental Statepoint Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+LLVM provides an second experimental set of intrinsics for describing garbage
+collection safepoints in compiled code. These intrinsics are an alternative
+to the ``llvm.gcroot`` intrinsics, but are compatible with the ones for
+:ref:`read <int_gcread>` and :ref:`write <int_gcwrite>` barriers. The
+differences in approach are covered in the `Garbage Collection with LLVM
+<GarbageCollection.html>`_ documentation. The intrinsics themselves are
+described in :doc:`Statepoints`.
.. _int_gcroot:
@@ -7217,6 +7320,56 @@ Note that calling this intrinsic does not prevent function inlining or
other aggressive transformations, so the value returned may not be that
of the obvious source-language caller.
+'``llvm.frameallocate``' and '``llvm.framerecover``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+ declare i8* @llvm.frameallocate(i32 %size)
+ declare i8* @llvm.framerecover(i8* %func, i8* %fp)
+
+Overview:
+"""""""""
+
+The '``llvm.frameallocate``' intrinsic allocates stack memory at some fixed
+offset from the frame pointer, and the '``llvm.framerecover``'
+intrinsic applies that offset to a live frame pointer to recover the address of
+the allocation. The offset is computed during frame layout of the caller of
+``llvm.frameallocate``.
+
+Arguments:
+""""""""""
+
+The ``size`` argument to '``llvm.frameallocate``' must be a constant integer
+indicating the amount of stack memory to allocate. As with allocas, allocating
+zero bytes is legal, but the result is undefined.
+
+The ``func`` argument to '``llvm.framerecover``' must be a constant
+bitcasted pointer to a function defined in the current module. The code
+generator cannot determine the frame allocation offset of functions defined in
+other modules.
+
+The ``fp`` argument to '``llvm.framerecover``' must be a frame
+pointer of a call frame that is currently live. The return value of
+'``llvm.frameaddress``' is one way to produce such a value, but most platforms
+also expose the frame pointer through stack unwinding mechanisms.
+
+Semantics:
+""""""""""
+
+These intrinsics allow a group of functions to access one stack memory
+allocation in an ancestor stack frame. The memory returned from
+'``llvm.frameallocate``' may be allocated prior to stack realignment, so the
+memory is only aligned to the ABI-required stack alignment. Each function may
+only call '``llvm.frameallocate``' one or zero times from the function entry
+block. The frame allocation intrinsic inhibits inlining, as any frame
+allocations in the inlined function frame are likely to be at a different
+offset from the one used by '``llvm.framerecover``' called with the
+uninlined function.
+
.. _int_read_register:
.. _int_write_register:
@@ -7232,7 +7385,7 @@ Syntax:
declare i64 @llvm.read_register.i64(metadata)
declare void @llvm.write_register.i32(metadata, i32 @value)
declare void @llvm.write_register.i64(metadata, i64 @value)
- !0 = metadata !{metadata !"sp\00"}
+ !0 = !{!"sp\00"}
Overview:
"""""""""
@@ -7454,6 +7607,50 @@ time library.
This instrinsic does *not* empty the instruction pipeline. Modifications
of the current function are outside the scope of the intrinsic.
+'``llvm.instrprof_increment``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+ declare void @llvm.instrprof_increment(i8* <name>, i64 <hash>,
+ i32 <num-counters>, i32 <index>)
+
+Overview:
+"""""""""
+
+The '``llvm.instrprof_increment``' intrinsic can be emitted by a
+frontend for use with instrumentation based profiling. These will be
+lowered by the ``-instrprof`` pass to generate execution counts of a
+program at runtime.
+
+Arguments:
+""""""""""
+
+The first argument is a pointer to a global variable containing the
+name of the entity being instrumented. This should generally be the
+(mangled) function name for a set of counters.
+
+The second argument is a hash value that can be used by the consumer
+of the profile data to detect changes to the instrumented source, and
+the third is the number of counters associated with ``name``. It is an
+error if ``hash`` or ``num-counters`` differ between two instances of
+``instrprof_increment`` that refer to the same name.
+
+The last argument refers to which of the counters for ``name`` should
+be incremented. It should be a value between 0 and ``num-counters``.
+
+Semantics:
+""""""""""
+
+This intrinsic represents an increment of a profiling counter. It will
+cause the ``-instrprof`` pass to generate the appropriate data
+structures and the code to increment the appropriate value, in a
+format that can be written out by a compiler runtime and consumed via
+the ``llvm-profdata`` tool.
+
Standard C Library Intrinsics
-----------------------------
@@ -8499,7 +8696,7 @@ Arguments:
""""""""""
The first argument is the value to be counted. This argument may be of
-any integer type, or a vectory with integer element type. The return
+any integer type, or a vector with integer element type. The return
type must match the first argument type.
The second argument must be a constant and is a flag to indicate whether
@@ -8546,7 +8743,7 @@ Arguments:
""""""""""
The first argument is the value to be counted. This argument may be of
-any integer type, or a vectory with integer element type. The return
+any integer type, or a vector with integer element type. The return
type must match the first argument type.
The second argument must be a constant and is a flag to indicate whether
@@ -9142,6 +9339,93 @@ intrinsic returns the executable address corresponding to ``tramp``
after performing the required machine specific adjustments. The pointer
returned can then be :ref:`bitcast and executed <int_trampoline>`.
+Masked Vector Load and Store Intrinsics
+---------------------------------------
+
+LLVM provides intrinsics for predicated vector load and store operations. The predicate is specified by a mask operand, which holds one bit per vector element, switching the associated vector lane on or off. The memory addresses corresponding to the "off" lanes are not accessed. When all bits of the mask are on, the intrinsic is identical to a regular vector load or store. When all bits are off, no memory is accessed.
+
+.. _int_mload:
+
+'``llvm.masked.load.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic. The loaded data is a vector of any integer or floating point data type.
+
+::
+
+ declare <16 x float> @llvm.masked.load.v16f32 (<16 x float>* <ptr>, i32 <alignment>, <16 x i1> <mask>, <16 x float> <passthru>)
+ declare <2 x double> @llvm.masked.load.v2f64 (<2 x double>* <ptr>, i32 <alignment>, <2 x i1> <mask>, <2 x double> <passthru>)
+
+Overview:
+"""""""""
+
+Reads a vector from memory according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes. The masked-off lanes in the result vector are taken from the corresponding lanes in the passthru operand.
+
+
+Arguments:
+""""""""""
+
+The first operand is the base pointer for the load. The second operand is the alignment of the source location. It must be a constant integer value. The third operand, mask, is a vector of boolean 'i1' values with the same number of elements as the return type. The fourth is a pass-through value that is used to fill the masked-off lanes of the result. The return type, underlying type of the base pointer and the type of passthru operand are the same vector types.
+
+
+Semantics:
+""""""""""
+
+The '``llvm.masked.load``' intrinsic is designed for conditional reading of selected vector elements in a single IR operation. It is useful for targets that support vector masked loads and allows vectorizing predicated basic blocks on these targets. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar load operations.
+The result of this operation is equivalent to a regular vector load instruction followed by a 'select' between the loaded and the passthru values, predicated on the same mask. However, using this intrinsic prevents exceptions on memory access to masked-off lanes.
+
+
+::
+
+ %res = call <16 x float> @llvm.masked.load.v16f32 (<16 x float>* %ptr, i32 4, <16 x i1>%mask, <16 x float> %passthru)
+
+ ;; The result of the two following instructions is identical aside from potential memory access exception
+ %loadlal = load <16 x float>* %ptr, align 4
+ %res = select <16 x i1> %mask, <16 x float> %loadlal, <16 x float> %passthru
+
+.. _int_mstore:
+
+'``llvm.masked.store.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic. The data stored in memory is a vector of any integer or floating point data type.
+
+::
+
+ declare void @llvm.masked.store.v8i32 (<8 x i32> <value>, <8 x i32> * <ptr>, i32 <alignment>, <8 x i1> <mask>)
+ declare void @llvm.masked.store.v16f32(<16 x i32> <value>, <16 x i32>* <ptr>, i32 <alignment>, <16 x i1> <mask>)
+
+Overview:
+"""""""""
+
+Writes a vector to memory according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes.
+
+Arguments:
+""""""""""
+
+The first operand is the vector value to be written to memory. The second operand is the base pointer for the store, it has the same underlying type as the value operand. The third operand is the alignment of the destination location. The fourth operand, mask, is a vector of boolean values. The types of the mask and the value operand must have the same number of vector elements.
+
+
+Semantics:
+""""""""""
+
+The '``llvm.masked.store``' intrinsics is designed for conditional writing of selected vector elements in a single IR operation. It is useful for targets that support vector masked store and allows vectorizing predicated basic blocks on these targets. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar store operations.
+The result of this operation is equivalent to a load-modify-store sequence. However, using this intrinsic prevents exceptions and data races on memory access to masked-off lanes.
+
+::
+
+ call void @llvm.masked.store.v16f32(<16 x float> %value, <16 x float>* %ptr, i32 4, <16 x i1> %mask)
+
+ ;; The result of the following instructions is identical aside from potential data races and memory access exceptions
+ %oldval = load <16 x float>* %ptr, align 4
+ %res = select <16 x i1> %mask, <16 x float> %value, <16 x float> %oldval
+ store <16 x float> %res, <16 x float>* %ptr, align 4
+
+
Memory Use Markers
------------------
@@ -9617,15 +9901,40 @@ generated for this intrinsic, and instructions that contribute only to the
provided condition are not used for code generation. If the condition is
violated during execution, the behavior is undefined.
-Please note that optimizer might limit the transformations performed on values
+Note that the optimizer might limit the transformations performed on values
used by the ``llvm.assume`` intrinsic in order to preserve the instructions
only used to form the intrinsic's input argument. This might prove undesirable
-if the extra information provided by the ``llvm.assume`` intrinsic does cause
+if the extra information provided by the ``llvm.assume`` intrinsic does not cause
sufficient overall improvement in code quality. For this reason,
``llvm.assume`` should not be used to document basic mathematical invariants
that the optimizer can otherwise deduce or facts that are of little use to the
optimizer.
+.. _bitset.test:
+
+'``llvm.bitset.test``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+ declare i1 @llvm.bitset.test(i8* %ptr, metadata %bitset) nounwind readnone
+
+
+Arguments:
+""""""""""
+
+The first argument is a pointer to be tested. The second argument is a
+metadata string containing the name of a :doc:`bitset <BitSets>`.
+
+Overview:
+"""""""""
+
+The ``llvm.bitset.test`` intrinsic tests whether the given pointer is a
+member of the given bitset.
+
'``llvm.donothing``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/docs/LinkTimeOptimization.rst b/docs/LinkTimeOptimization.rst
index c15abd3..55a7486 100644
--- a/docs/LinkTimeOptimization.rst
+++ b/docs/LinkTimeOptimization.rst
@@ -134,7 +134,7 @@ Alternative Approaches
Multi-phase communication between ``libLTO`` and linker
=======================================================
-The linker collects information about symbol defininitions and uses in various
+The linker collects information about symbol definitions and uses in various
link objects which is more accurate than any information collected by other
tools during typical build cycles. The linker collects this information by
looking at the definitions and uses of symbols in native .o files and using
diff --git a/docs/MergeFunctions.rst b/docs/MergeFunctions.rst
new file mode 100644
index 0000000..6b8012e
--- /dev/null
+++ b/docs/MergeFunctions.rst
@@ -0,0 +1,802 @@
+=================================
+MergeFunctions pass, how it works
+=================================
+
+.. contents::
+ :local:
+
+Introduction
+============
+Sometimes code contains equal functions, or functions that does exactly the same
+thing even though they are non-equal on the IR level (e.g.: multiplication on 2
+and 'shl 1'). It could happen due to several reasons: mainly, the usage of
+templates and automatic code generators. Though, sometimes user itself could
+write the same thing twice :-)
+
+The main purpose of this pass is to recognize such functions and merge them.
+
+Why would I want to read this document?
+---------------------------------------
+Document is the extension to pass comments and describes the pass logic. It
+describes algorithm that is used in order to compare functions, it also
+explains how we could combine equal functions correctly, keeping module valid.
+
+Material is brought in top-down form, so reader could start learn pass from
+ideas and end up with low-level algorithm details, thus preparing him for
+reading the sources.
+
+So main goal is do describe algorithm and logic here; the concept. This document
+is good for you, if you *don't want* to read the source code, but want to
+understand pass algorithms. Author tried not to repeat the source-code and
+cover only common cases, and thus avoid cases when after minor code changes we
+need to update this document.
+
+
+What should I know to be able to follow along with this document?
+-----------------------------------------------------------------
+
+Reader should be familiar with common compile-engineering principles and LLVM
+code fundamentals. In this article we suppose reader is familiar with
+`Single Static Assingment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
+concepts. Understanding of
+`IR structure <http://llvm.org/docs/LangRef.html#high-level-structure>`_ is
+also important.
+
+We will use such terms as
+"`module <http://llvm.org/docs/LangRef.html#high-level-structure>`_",
+"`function <http://llvm.org/docs/ProgrammersManual.html#the-function-class>`_",
+"`basic block <http://en.wikipedia.org/wiki/Basic_block>`_",
+"`user <http://llvm.org/docs/ProgrammersManual.html#the-user-class>`_",
+"`value <http://llvm.org/docs/ProgrammersManual.html#the-value-class>`_",
+"`instruction <http://llvm.org/docs/ProgrammersManual.html#the-instruction-class>`_".
+
+As a good start point, Kaleidoscope tutorial could be used:
+
+:doc:`tutorial/index`
+
+Especially it's important to understand chapter 3 of tutorial:
+
+:doc:`tutorial/LangImpl3`
+
+Reader also should know how passes work in LLVM, he could use next article as a
+reference and start point here:
+
+:doc:`WritingAnLLVMPass`
+
+What else? Well perhaps reader also should have some experience in LLVM pass
+debugging and bug-fixing.
+
+What I gain by reading this document?
+-------------------------------------
+Main purpose is to provide reader with comfortable form of algorithms
+description, namely the human reading text. Since it could be hard to
+understand algorithm straight from the source code: pass uses some principles
+that have to be explained first.
+
+Author wishes to everybody to avoid case, when you read code from top to bottom
+again and again, and yet you don't understand why we implemented it that way.
+
+We hope that after this article reader could easily debug and improve
+MergeFunctions pass and thus help LLVM project.
+
+Narrative structure
+-------------------
+Article consists of three parts. First part explains pass functionality on the
+top-level. Second part describes the comparison procedure itself. The third
+part describes the merging process.
+
+In every part author also tried to put the contents into the top-down form.
+First, the top-level methods will be described, while the terminal ones will be
+at the end, in the tail of each part. If reader will see the reference to the
+method that wasn't described yet, he will find its description a bit below.
+
+Basics
+======
+
+How to do it?
+-------------
+Do we need to merge functions? Obvious thing is: yes that's a quite possible
+case, since usually we *do* have duplicates. And it would be good to get rid of
+them. But how to detect such a duplicates? The idea is next: we split functions
+onto small bricks (parts), then we compare "bricks" amount, and if it equal,
+compare "bricks" themselves, and then do our conclusions about functions
+themselves.
+
+What the difference it could be? For example, on machine with 64-bit pointers
+(let's assume we have only one address space), one function stores 64-bit
+integer, while another one stores a pointer. So if the target is a machine
+mentioned above, and if functions are identical, except the parameter type (we
+could consider it as a part of function type), then we can treat ``uint64_t``
+and``void*`` as equal.
+
+It was just an example; possible details are described a bit below.
+
+As another example reader may imagine two more functions. First function
+performs multiplication on 2, while the second one performs arithmetic right
+shift on 1.
+
+Possible solutions
+^^^^^^^^^^^^^^^^^^
+Let's briefly consider possible options about how and what we have to implement
+in order to create full-featured functions merging, and also what it would
+meant for us.
+
+Equal functions detection, obviously supposes "detector" method to be
+implemented, latter should answer the question "whether functions are equal".
+This "detector" method consists of tiny "sub-detectors", each of them answers
+exactly the same question, but for function parts.
+
+As the second step, we should merge equal functions. So it should be a "merger"
+method. "Merger" accepts two functions *F1* and *F2*, and produces *F1F2*
+function, the result of merging.
+
+Having such a routines in our hands, we can process whole module, and merge all
+equal functions.
+
+In this case, we have to compare every function with every another function. As
+reader could notice, this way seems to be quite expensive. Of course we could
+introduce hashing and other helpers, but it is still just an optimization, and
+thus the level of O(N*N) complexity.
+
+Can we reach another level? Could we introduce logarithmical search, or random
+access lookup? The answer is: "yes".
+
+Random-access
+"""""""""""""
+How it could be done? Just convert each function to number, and gather all of
+them in special hash-table. Functions with equal hash are equal. Good hashing
+means, that every function part must be taken into account. That means we have
+to convert every function part into some number, and then add it into hash.
+Lookup-up time would be small, but such approach adds some delay due to hashing
+routine.
+
+Logarithmical search
+""""""""""""""""""""
+We could introduce total ordering among the functions set, once we had it we
+could then implement a logarithmical search. Lookup time still depends on N,
+but adds a little of delay (*log(N)*).
+
+Present state
+"""""""""""""
+Both of approaches (random-access and logarithmical) has been implemented and
+tested. And both of them gave a very good improvement. And what was most
+surprising, logarithmical search was faster; sometimes up to 15%. Hashing needs
+some extra CPU time, and it is the main reason why it works slower; in most of
+cases total "hashing" time was greater than total "logarithmical-search" time.
+
+So, preference has been granted to the "logarithmical search".
+
+Though in the case of need, *logarithmical-search* (read "total-ordering") could
+be used as a milestone on our way to the *random-access* implementation.
+
+Every comparison is based either on the numbers or on flags comparison. In
+*random-access* approach we could use the same comparison algorithm. During
+comparison we exit once we find the difference, but here we might have to scan
+whole function body every time (note, it could be slower). Like in
+"total-ordering", we will track every numbers and flags, but instead of
+comparison, we should get numbers sequence and then create the hash number. So,
+once again, *total-ordering* could be considered as a milestone for even faster
+(in theory) random-access approach.
+
+MergeFunctions, main fields and runOnModule
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+There are two most important fields in class:
+
+``FnTree`` – the set of all unique functions. It keeps items that couldn't be
+merged with each other. It is defined as:
+
+``std::set<FunctionNode> FnTree;``
+
+Here ``FunctionNode`` is a wrapper for ``llvm::Function`` class, with
+implemented “<” operator among the functions set (below we explain how it works
+exactly; this is a key point in fast functions comparison).
+
+``Deferred`` – merging process can affect bodies of functions that are in
+``FnTree`` already. Obviously such functions should be rechecked again. In this
+case we remove them from ``FnTree``, and mark them as to be rescanned, namely
+put them into ``Deferred`` list.
+
+runOnModule
+"""""""""""
+The algorithm is pretty simple:
+
+1. Put all module's functions into the *worklist*.
+
+2. Scan *worklist*'s functions twice: first enumerate only strong functions and
+then only weak ones:
+
+ 2.1. Loop body: take function from *worklist* (call it *FCur*) and try to
+ insert it into *FnTree*: check whether *FCur* is equal to one of functions
+ in *FnTree*. If there *is* equal function in *FnTree* (call it *FExists*):
+ merge function *FCur* with *FExists*. Otherwise add function from *worklist*
+ to *FnTree*.
+
+3. Once *worklist* scanning and merging operations is complete, check *Deferred*
+list. If it is not empty: refill *worklist* contents with *Deferred* list and
+do step 2 again, if *Deferred* is empty, then exit from method.
+
+Comparison and logarithmical search
+"""""""""""""""""""""""""""""""""""
+Let's recall our task: for every function *F* from module *M*, we have to find
+equal functions *F`* in shortest time, and merge them into the single function.
+
+Defining total ordering among the functions set allows to organize functions
+into the binary tree. The lookup procedure complexity would be estimated as
+O(log(N)) in this case. But how to define *total-ordering*?
+
+We have to introduce a single rule applicable to every pair of functions, and
+following this rule then evaluate which of them is greater. What kind of rule
+it could be? Let's declare it as "compare" method, that returns one of 3
+possible values:
+
+-1, left is *less* than right,
+
+0, left and right are *equal*,
+
+1, left is *greater* than right.
+
+Of course it means, that we have to maintain
+*strict and non-strict order relation properties*:
+
+* reflexivity (``a <= a``, ``a == a``, ``a >= a``),
+* antisymmetry (if ``a <= b`` and ``b <= a`` then ``a == b``),
+* transitivity (``a <= b`` and ``b <= c``, then ``a <= c``)
+* asymmetry (if ``a < b``, then ``a > b`` or ``a == b``).
+
+As it was mentioned before, comparison routine consists of
+"sub-comparison-routines", each of them also consists
+"sub-comparison-routines", and so on, finally it ends up with a primitives
+comparison.
+
+Below, we will use the next operations:
+
+#. ``cmpNumbers(number1, number2)`` is method that returns -1 if left is less
+ than right; 0, if left and right are equal; and 1 otherwise.
+
+#. ``cmpFlags(flag1, flag2)`` is hypothetical method that compares two flags.
+ The logic is the same as in ``cmpNumbers``, where ``true`` is 1, and
+ ``false`` is 0.
+
+The rest of article is based on *MergeFunctions.cpp* source code
+(*<llvm_dir>/lib/Transforms/IPO/MergeFunctions.cpp*). We would like to ask
+reader to keep this file open nearby, so we could use it as a reference for
+further explanations.
+
+Now we're ready to proceed to the next chapter and see how it works.
+
+Functions comparison
+====================
+At first, let's define how exactly we compare complex objects.
+
+Complex objects comparison (function, basic-block, etc) is mostly based on its
+sub-objects comparison results. So it is similar to the next "tree" objects
+comparison:
+
+#. For two trees *T1* and *T2* we perform *depth-first-traversal* and have
+ two sequences as a product: "*T1Items*" and "*T2Items*".
+
+#. Then compare chains "*T1Items*" and "*T2Items*" in
+ most-significant-item-first order. Result of items comparison would be the
+ result of *T1* and *T2* comparison itself.
+
+FunctionComparator::compare(void)
+---------------------------------
+Brief look at the source code tells us, that comparison starts in
+“``int FunctionComparator::compare(void)``” method.
+
+1. First parts to be compared are function's attributes and some properties that
+outsides “attributes” term, but still could make function different without
+changing its body. This part of comparison is usually done within simple
+*cmpNumbers* or *cmpFlags* operations (e.g.
+``cmpFlags(F1->hasGC(), F2->hasGC())``). Below is full list of function's
+properties to be compared on this stage:
+
+ * *Attributes* (those are returned by ``Function::getAttributes()``
+ method).
+
+ * *GC*, for equivalence, *RHS* and *LHS* should be both either without
+ *GC* or with the same one.
+
+ * *Section*, just like a *GC*: *RHS* and *LHS* should be defined in the
+ same section.
+
+ * *Variable arguments*. *LHS* and *RHS* should be both either with or
+ without *var-args*.
+
+ * *Calling convention* should be the same.
+
+2. Function type. Checked by ``FunctionComparator::cmpType(Type*, Type*)``
+method. It checks return type and parameters type; the method itself will be
+described later.
+
+3. Associate function formal parameters with each other. Then comparing function
+bodies, if we see the usage of *LHS*'s *i*-th argument in *LHS*'s body, then,
+we want to see usage of *RHS*'s *i*-th argument at the same place in *RHS*'s
+body, otherwise functions are different. On this stage we grant the preference
+to those we met later in function body (value we met first would be *less*).
+This is done by “``FunctionComparator::cmpValues(const Value*, const Value*)``”
+method (will be described a bit later).
+
+4. Function body comparison. As it written in method comments:
+
+“We do a CFG-ordered walk since the actual ordering of the blocks in the linked
+list is immaterial. Our walk starts at the entry block for both functions, then
+takes each block from each terminator in order. As an artifact, this also means
+that unreachable blocks are ignored.”
+
+So, using this walk we get BBs from *left* and *right* in the same order, and
+compare them by “``FunctionComparator::compare(const BasicBlock*, const
+BasicBlock*)``” method.
+
+We also associate BBs with each other, like we did it with function formal
+arguments (see ``cmpValues`` method below).
+
+FunctionComparator::cmpType
+---------------------------
+Consider how types comparison works.
+
+1. Coerce pointer to integer. If left type is a pointer, try to coerce it to the
+integer type. It could be done if its address space is 0, or if address spaces
+are ignored at all. Do the same thing for the right type.
+
+2. If left and right types are equal, return 0. Otherwise we need to give
+preference to one of them. So proceed to the next step.
+
+3. If types are of different kind (different type IDs). Return result of type
+IDs comparison, treating them as a numbers (use ``cmpNumbers`` operation).
+
+4. If types are vectors or integers, return result of their pointers comparison,
+comparing them as numbers.
+
+5. Check whether type ID belongs to the next group (call it equivalent-group):
+
+ * Void
+
+ * Float
+
+ * Double
+
+ * X86_FP80
+
+ * FP128
+
+ * PPC_FP128
+
+ * Label
+
+ * Metadata.
+
+ If ID belongs to group above, return 0. Since it's enough to see that
+ types has the same ``TypeID``. No additional information is required.
+
+6. Left and right are pointers. Return result of address space comparison
+(numbers comparison).
+
+7. Complex types (structures, arrays, etc.). Follow complex objects comparison
+technique (see the very first paragraph of this chapter). Both *left* and
+*right* are to be expanded and their element types will be checked the same
+way. If we get -1 or 1 on some stage, return it. Otherwise return 0.
+
+8. Steps 1-6 describe all the possible cases, if we passed steps 1-6 and didn't
+get any conclusions, then invoke ``llvm_unreachable``, since it's quite
+unexpectable case.
+
+cmpValues(const Value*, const Value*)
+-------------------------------------
+Method that compares local values.
+
+This method gives us an answer on a very curious quesion: whether we could treat
+local values as equal, and which value is greater otherwise. It's better to
+start from example:
+
+Consider situation when we're looking at the same place in left function "*FL*"
+and in right function "*FR*". And every part of *left* place is equal to the
+corresponding part of *right* place, and (!) both parts use *Value* instances,
+for example:
+
+.. code-block:: llvm
+
+ instr0 i32 %LV ; left side, function FL
+ instr0 i32 %RV ; right side, function FR
+
+So, now our conclusion depends on *Value* instances comparison.
+
+Main purpose of this method is to determine relation between such values.
+
+What we expect from equal functions? At the same place, in functions "*FL*" and
+"*FR*" we expect to see *equal* values, or values *defined* at the same place
+in "*FL*" and "*FR*".
+
+Consider small example here:
+
+.. code-block:: llvm
+
+ define void %f(i32 %pf0, i32 %pf1) {
+ instr0 i32 %pf0 instr1 i32 %pf1 instr2 i32 123
+ }
+
+.. code-block:: llvm
+
+ define void %g(i32 %pg0, i32 %pg1) {
+ instr0 i32 %pg0 instr1 i32 %pg0 instr2 i32 123
+ }
+
+In this example, *pf0* is associated with *pg0*, *pf1* is associated with *pg1*,
+and we also declare that *pf0* < *pf1*, and thus *pg0* < *pf1*.
+
+Instructions with opcode "*instr0*" would be *equal*, since their types and
+opcodes are equal, and values are *associated*.
+
+Instruction with opcode "*instr1*" from *f* is *greater* than instruction with
+opcode "*instr1*" from *g*; here we have equal types and opcodes, but "*pf1* is
+greater than "*pg0*".
+
+And instructions with opcode "*instr2*" are equal, because their opcodes and
+types are equal, and the same constant is used as a value.
+
+What we assiciate in cmpValues?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+* Function arguments. *i*-th argument from left function associated with
+ *i*-th argument from right function.
+* BasicBlock instances. In basic-block enumeration loop we associate *i*-th
+ BasicBlock from the left function with *i*-th BasicBlock from the right
+ function.
+* Instructions.
+* Instruction operands. Note, we can meet *Value* here we have never seen
+ before. In this case it is not a function argument, nor *BasicBlock*, nor
+ *Instruction*. It is global value. It is constant, since its the only
+ supposed global here. Method also compares:
+* Constants that are of the same type.
+* If right constant could be losslessly bit-casted to the left one, then we
+ also compare them.
+
+How to implement cmpValues?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+*Association* is a case of equality for us. We just treat such values as equal.
+But, in general, we need to implement antisymmetric relation. As it was
+mentioned above, to understand what is *less*, we can use order in which we
+meet values. If both of values has the same order in function (met at the same
+time), then treat values as *associated*. Otherwise – it depends on who was
+first.
+
+Every time we run top-level compare method, we initialize two identical maps
+(one for the left side, another one for the right side):
+
+``map<Value, int> sn_mapL, sn_mapR;``
+
+The key of the map is the *Value* itself, the *value* – is its order (call it
+*serial number*).
+
+To add value *V* we need to perform the next procedure:
+
+``sn_map.insert(std::make_pair(V, sn_map.size()));``
+
+For the first *Value*, map will return *0*, for second *Value* map will return
+*1*, and so on.
+
+Then we can check whether left and right values met at the same time with simple
+comparison:
+
+``cmpNumbers(sn_mapL[Left], sn_mapR[Right]);``
+
+Of course, we can combine insertion and comparison:
+
+.. code-block:: c++
+
+ std::pair<iterator, bool>
+ LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())), RightRes
+ = sn_mapR.insert(std::make_pair(Right, sn_mapR.size()));
+ return cmpNumbers(LeftRes.first->second, RightRes.first->second);
+
+Let's look, how whole method could be implemented.
+
+1. we have to start from the bad news. Consider function self and
+cross-referencing cases:
+
+.. code-block:: c++
+
+ // self-reference unsigned fact0(unsigned n) { return n > 1 ? n
+ * fact0(n-1) : 1; } unsigned fact1(unsigned n) { return n > 1 ? n *
+ fact1(n-1) : 1; }
+
+ // cross-reference unsigned ping(unsigned n) { return n!= 0 ? pong(n-1) : 0;
+ } unsigned pong(unsigned n) { return n!= 0 ? ping(n-1) : 0; }
+
+..
+
+ This comparison has been implemented in initial *MergeFunctions* pass
+ version. But, unfortunately, it is not transitive. And this is the only case
+ we can't convert to less-equal-greater comparison. It is a seldom case, 4-5
+ functions of 10000 (checked on test-suite), and, we hope, reader would
+ forgive us for such a sacrifice in order to get the O(log(N)) pass time.
+
+2. If left/right *Value* is a constant, we have to compare them. Return 0 if it
+is the same constant, or use ``cmpConstants`` method otherwise.
+
+3. If left/right is *InlineAsm* instance. Return result of *Value* pointers
+comparison.
+
+4. Explicit association of *L* (left value) and *R* (right value). We need to
+find out whether values met at the same time, and thus are *associated*. Or we
+need to put the rule: when we treat *L* < *R*. Now it is easy: just return
+result of numbers comparison:
+
+.. code-block:: c++
+
+ std::pair<iterator, bool>
+ LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())),
+ RightRes = sn_mapR.insert(std::make_pair(Right, sn_mapR.size()));
+ if (LeftRes.first->second == RightRes.first->second) return 0;
+ if (LeftRes.first->second < RightRes.first->second) return -1;
+ return 1;
+
+Now when *cmpValues* returns 0, we can proceed comparison procedure. Otherwise,
+if we get (-1 or 1), we need to pass this result to the top level, and finish
+comparison procedure.
+
+cmpConstants
+------------
+Performs constants comparison as follows:
+
+1. Compare constant types using ``cmpType`` method. If result is -1 or 1, goto
+step 2, otherwise proceed to step 3.
+
+2. If types are different, we still can check whether constants could be
+losslessly bitcasted to each other. The further explanation is modification of
+``canLosslesslyBitCastTo`` method.
+
+ 2.1 Check whether constants are of the first class types
+ (``isFirstClassType`` check):
+
+ 2.1.1. If both constants are *not* of the first class type: return result
+ of ``cmpType``.
+
+ 2.1.2. Otherwise, if left type is not of the first class, return -1. If
+ right type is not of the first class, return 1.
+
+ 2.1.3. If both types are of the first class type, proceed to the next step
+ (2.1.3.1).
+
+ 2.1.3.1. If types are vectors, compare their bitwidth using the
+ *cmpNumbers*. If result is not 0, return it.
+
+ 2.1.3.2. Different types, but not a vectors:
+
+ * if both of them are pointers, good for us, we can proceed to step 3.
+ * if one of types is pointer, return result of *isPointer* flags
+ comparison (*cmpFlags* operation).
+ * otherwise we have no methods to prove bitcastability, and thus return
+ result of types comparison (-1 or 1).
+
+Steps below are for the case when types are equal, or case when constants are
+bitcastable:
+
+3. One of constants is a "*null*" value. Return the result of
+``cmpFlags(L->isNullValue, R->isNullValue)`` comparison.
+
+4. Compare value IDs, and return result if it is not 0:
+
+.. code-block:: c++
+
+ if (int Res = cmpNumbers(L->getValueID(), R->getValueID()))
+ return Res;
+
+5. Compare the contents of constants. The comparison depends on kind of
+constants, but on this stage it is just a lexicographical comparison. Just see
+how it was described in the beginning of "*Functions comparison*" paragraph.
+Mathematically it is equal to the next case: we encode left constant and right
+constant (with similar way *bitcode-writer* does). Then compare left code
+sequence and right code sequence.
+
+compare(const BasicBlock*, const BasicBlock*)
+---------------------------------------------
+Compares two *BasicBlock* instances.
+
+It enumerates instructions from left *BB* and right *BB*.
+
+1. It assigns serial numbers to the left and right instructions, using
+``cmpValues`` method.
+
+2. If one of left or right is *GEP* (``GetElementPtr``), then treat *GEP* as
+greater than other instructions, if both instructions are *GEPs* use ``cmpGEP``
+method for comparison. If result is -1 or 1, pass it to the top-level
+comparison (return it).
+
+ 3.1. Compare operations. Call ``cmpOperation`` method. If result is -1 or
+ 1, return it.
+
+ 3.2. Compare number of operands, if result is -1 or 1, return it.
+
+ 3.3. Compare operands themselves, use ``cmpValues`` method. Return result
+ if it is -1 or 1.
+
+ 3.4. Compare type of operands, using ``cmpType`` method. Return result if
+ it is -1 or 1.
+
+ 3.5. Proceed to the next instruction.
+
+4. We can finish instruction enumeration in 3 cases:
+
+ 4.1. We reached the end of both left and right basic-blocks. We didn't
+ exit on steps 1-3, so contents is equal, return 0.
+
+ 4.2. We have reached the end of the left basic-block. Return -1.
+
+ 4.3. Return 1 (the end of the right basic block).
+
+cmpGEP
+------
+Compares two GEPs (``getelementptr`` instructions).
+
+It differs from regular operations comparison with the only thing: possibility
+to use ``accumulateConstantOffset`` method.
+
+So, if we get constant offset for both left and right *GEPs*, then compare it as
+numbers, and return comparison result.
+
+Otherwise treat it like a regular operation (see previous paragraph).
+
+cmpOperation
+------------
+Compares instruction opcodes and some important operation properties.
+
+1. Compare opcodes, if it differs return the result.
+
+2. Compare number of operands. If it differs – return the result.
+
+3. Compare operation types, use *cmpType*. All the same – if types are
+different, return result.
+
+4. Compare *subclassOptionalData*, get it with ``getRawSubclassOptionalData``
+method, and compare it like a numbers.
+
+5. Compare operand types.
+
+6. For some particular instructions check equivalence (relation in our case) of
+some significant attributes. For example we have to compare alignment for
+``load`` instructions.
+
+O(log(N))
+---------
+Methods described above implement order relationship. And latter, could be used
+for nodes comparison in a binary tree. So we can organize functions set into
+the binary tree and reduce the cost of lookup procedure from
+O(N*N) to O(log(N)).
+
+Merging process, mergeTwoFunctions
+==================================
+Once *MergeFunctions* detected that current function (*G*) is equal to one that
+were analyzed before (function *F*) it calls ``mergeTwoFunctions(Function*,
+Function*)``.
+
+Operation affects ``FnTree`` contents with next way: *F* will stay in
+``FnTree``. *G* being equal to *F* will not be added to ``FnTree``. Calls of
+*G* would be replaced with something else. It changes bodies of callers. So,
+functions that calls *G* would be put into ``Deferred`` set and removed from
+``FnTree``, and analyzed again.
+
+The approach is next:
+
+1. Most wished case: when we can use alias and both of *F* and *G* are weak. We
+make both of them with aliases to the third strong function *H*. Actually *H*
+is *F*. See below how it's made (but it's better to look straight into the
+source code). Well, this is a case when we can just replace *G* with *F*
+everywhere, we use ``replaceAllUsesWith`` operation here (*RAUW*).
+
+2. *F* could not be overridden, while *G* could. It would be good to do the
+next: after merging the places where overridable function were used, still use
+overridable stub. So try to make *G* alias to *F*, or create overridable tail
+call wrapper around *F* and replace *G* with that call.
+
+3. Neither *F* nor *G* could be overridden. We can't use *RAUW*. We can just
+change the callers: call *F* instead of *G*. That's what
+``replaceDirectCallers`` does.
+
+Below is detailed body description.
+
+If “F” may be overridden
+------------------------
+As follows from ``mayBeOverridden`` comments: “whether the definition of this
+global may be replaced by something non-equivalent at link time”. If so, thats
+ok: we can use alias to *F* instead of *G* or change call instructions itself.
+
+HasGlobalAliases, removeUsers
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+First consider the case when we have global aliases of one function name to
+another. Our purpose is make both of them with aliases to the third strong
+function. Though if we keep *F* alive and without major changes we can leave it
+in ``FnTree``. Try to combine these two goals.
+
+Do stub replacement of *F* itself with an alias to *F*.
+
+1. Create stub function *H*, with the same name and attributes like function
+*F*. It takes maximum alignment of *F* and *G*.
+
+2. Replace all uses of function *F* with uses of function *H*. It is the two
+steps procedure instead. First of all, we must take into account, all functions
+from whom *F* is called would be changed: since we change the call argument
+(from *F* to *H*). If so we must to review these caller functions again after
+this procedure. We remove callers from ``FnTree``, method with name
+``removeUsers(F)`` does that (don't confuse with ``replaceAllUsesWith``):
+
+ 2.1. ``Inside removeUsers(Value*
+ V)`` we go through the all values that use value *V* (or *F* in our context).
+ If value is instruction, we go to function that holds this instruction and
+ mark it as to-be-analyzed-again (put to ``Deferred`` set), we also remove
+ caller from ``FnTree``.
+
+ 2.2. Now we can do the replacement: call ``F->replaceAllUsesWith(H)``.
+
+3. *H* (that now "officially" plays *F*'s role) is replaced with alias to *F*.
+Do the same with *G*: replace it with alias to *F*. So finally everywhere *F*
+was used, we use *H* and it is alias to *F*, and everywhere *G* was used we
+also have alias to *F*.
+
+4. Set *F* linkage to private. Make it strong :-)
+
+No global aliases, replaceDirectCallers
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+If global aliases are not supported. We call ``replaceDirectCallers`` then. Just
+go through all calls of *G* and replace it with calls of *F*. If you look into
+method you will see that it scans all uses of *G* too, and if use is callee (if
+user is call instruction and *G* is used as what to be called), we replace it
+with use of *F*.
+
+If “F” could not be overridden, fix it!
+"""""""""""""""""""""""""""""""""""""""
+
+We call ``writeThunkOrAlias(Function *F, Function *G)``. Here we try to replace
+*G* with alias to *F* first. Next conditions are essential:
+
+* target should support global aliases,
+* the address itself of *G* should be not significant, not named and not
+ referenced anywhere,
+* function should come with external, local or weak linkage.
+
+Otherwise we write thunk: some wrapper that has *G's* interface and calls *F*,
+so *G* could be replaced with this wrapper.
+
+*writeAlias*
+
+As follows from *llvm* reference:
+
+“Aliases act as *second name* for the aliasee value”. So we just want to create
+second name for *F* and use it instead of *G*:
+
+1. create global alias itself (*GA*),
+
+2. adjust alignment of *F* so it must be maximum of current and *G's* alignment;
+
+3. replace uses of *G*:
+
+ 3.1. first mark all callers of *G* as to-be-analyzed-again, using
+ ``removeUsers`` method (see chapter above),
+
+ 3.2. call ``G->replaceAllUsesWith(GA)``.
+
+4. Get rid of *G*.
+
+*writeThunk*
+
+As it written in method comments:
+
+“Replace G with a simple tail call to bitcast(F). Also replace direct uses of G
+with bitcast(F). Deletes G.”
+
+In general it does the same as usual when we want to replace callee, except the
+first point:
+
+1. We generate tail call wrapper around *F*, but with interface that allows use
+it instead of *G*.
+
+2. “As-usual”: ``removeUsers`` and ``replaceAllUsesWith`` then.
+
+3. Get rid of *G*.
+
+That's it.
+==========
+We have described how to detect equal functions, and how to merge them, and in
+first chapter we have described how it works all-together. Author hopes, reader
+have some picture from now, and it helps him improve and debug ­this pass.
+
+Reader is welcomed to send us any questions and proposals ;-)
diff --git a/docs/Passes.rst b/docs/Passes.rst
index 9f40092..cc0a853 100644
--- a/docs/Passes.rst
+++ b/docs/Passes.rst
@@ -891,17 +891,24 @@ calls, or transforming sets of stores into ``memset``\ s.
This pass looks for equivalent functions that are mergable and folds them.
-A hash is computed from the function, based on its type and number of basic
-blocks.
+Total-ordering is introduced among the functions set: we define comparison
+that answers for every two functions which of them is greater. It allows to
+arrange functions into the binary tree.
-Once all hashes are computed, we perform an expensive equality comparison on
-each function pair. This takes n^2/2 comparisons per bucket, so it's important
-that the hash function be high quality. The equality comparison iterates
-through each instruction in each basic block.
+For every new function we check for equivalent in tree.
-When a match is found the functions are folded. If both functions are
-overridable, we move the functionality into a new internal function and leave
-two overridable thunks to it.
+If equivalent exists we fold such functions. If both functions are overridable,
+we move the functionality into a new internal function and leave two
+overridable thunks to it.
+
+If there is no equivalent, then we add this function to tree.
+
+Lookup routine has O(log(n)) complexity, while whole merging process has
+complexity of O(n*log(n)).
+
+Read
+:doc:`this <MergeFunctions>`
+article for more details.
``-mergereturn``: Unify function exit nodes
-------------------------------------------
@@ -1112,13 +1119,6 @@ useful when diffing the effect of an optimization because deleting an unnamed
instruction can change all other instruction numbering, making the diff very
noisy.
-``-preverify``: Preliminary module verification
------------------------------------------------
-
-Ensures that the module is in the form required by the :ref:`Module Verifier
-<passes-verify>` pass. Running the verifier runs this pass automatically, so
-there should be no need to use it directly.
-
.. _passes-verify:
``-verify``: Module Verifier
diff --git a/docs/Phabricator.rst b/docs/Phabricator.rst
index 8ac9afe..3f4f72a 100644
--- a/docs/Phabricator.rst
+++ b/docs/Phabricator.rst
@@ -66,7 +66,8 @@ To upload a new patch:
* Leave the drop down on *Create a new Revision...* and click *Continue*.
* Enter a descriptive title and summary; add reviewers and mailing
lists that you want to be included in the review. If your patch is
- for LLVM, cc llvm-commits; if your patch is for Clang, cc cfe-commits.
+ for LLVM, add llvm-commits as a subscriber; if your patch is for Clang,
+ add cfe-commits.
* Click *Save*.
To submit an updated patch:
diff --git a/docs/ProgrammersManual.rst b/docs/ProgrammersManual.rst
index 85a4ad8..753e658 100644
--- a/docs/ProgrammersManual.rst
+++ b/docs/ProgrammersManual.rst
@@ -488,6 +488,9 @@ gathered, use the '``-stats``' option:
$ opt -stats -mypassname < program.bc > /dev/null
... statistics output ...
+Note that in order to use the '``-stats``' option, LLVM must be
+compiled with assertions enabled.
+
When running ``opt`` on a C file from the SPEC benchmark suite, it gives a
report that looks like this:
@@ -1408,7 +1411,7 @@ llvm/ADT/IntervalMap.h
IntervalMap is a compact map for small keys and values. It maps key intervals
instead of single keys, and it will automatically coalesce adjacent intervals.
-When then map only contains a few intervals, they are stored in the map object
+When the map only contains a few intervals, they are stored in the map object
itself to avoid allocations.
The IntervalMap iterators are quite big, so they should not be passed around as
@@ -2480,6 +2483,76 @@ ensures that the first bytes of ``User`` (if interpreted as a pointer) never has
the LSBit set. (Portability is relying on the fact that all known compilers
place the ``vptr`` in the first word of the instances.)
+.. _polymorphism:
+
+Designing Type Hiercharies and Polymorphic Interfaces
+-----------------------------------------------------
+
+There are two different design patterns that tend to result in the use of
+virtual dispatch for methods in a type hierarchy in C++ programs. The first is
+a genuine type hierarchy where different types in the hierarchy model
+a specific subset of the functionality and semantics, and these types nest
+strictly within each other. Good examples of this can be seen in the ``Value``
+or ``Type`` type hierarchies.
+
+A second is the desire to dispatch dynamically across a collection of
+polymorphic interface implementations. This latter use case can be modeled with
+virtual dispatch and inheritance by defining an abstract interface base class
+which all implementations derive from and override. However, this
+implementation strategy forces an **"is-a"** relationship to exist that is not
+actually meaningful. There is often not some nested hierarchy of useful
+generalizations which code might interact with and move up and down. Instead,
+there is a singular interface which is dispatched across a range of
+implementations.
+
+The preferred implementation strategy for the second use case is that of
+generic programming (sometimes called "compile-time duck typing" or "static
+polymorphism"). For example, a template over some type parameter ``T`` can be
+instantiated across any particular implementation that conforms to the
+interface or *concept*. A good example here is the highly generic properties of
+any type which models a node in a directed graph. LLVM models these primarily
+through templates and generic programming. Such templates include the
+``LoopInfoBase`` and ``DominatorTreeBase``. When this type of polymorphism
+truly needs **dynamic** dispatch you can generalize it using a technique
+called *concept-based polymorphism*. This pattern emulates the interfaces and
+behaviors of templates using a very limited form of virtual dispatch for type
+erasure inside its implementation. You can find examples of this technique in
+the ``PassManager.h`` system, and there is a more detailed introduction to it
+by Sean Parent in several of his talks and papers:
+
+#. `Inheritance Is The Base Class of Evil
+ <http://channel9.msdn.com/Events/GoingNative/2013/Inheritance-Is-The-Base-Class-of-Evil>`_
+ - The GoingNative 2013 talk describing this technique, and probably the best
+ place to start.
+#. `Value Semantics and Concepts-based Polymorphism
+ <http://www.youtube.com/watch?v=_BpMYeUFXv8>`_ - The C++Now! 2012 talk
+ describing this technique in more detail.
+#. `Sean Parent's Papers and Presentations
+ <http://github.com/sean-parent/sean-parent.github.com/wiki/Papers-and-Presentations>`_
+ - A Github project full of links to slides, video, and sometimes code.
+
+When deciding between creating a type hierarchy (with either tagged or virtual
+dispatch) and using templates or concepts-based polymorphism, consider whether
+there is some refinement of an abstract base class which is a semantically
+meaningful type on an interface boundary. If anything more refined than the
+root abstract interface is meaningless to talk about as a partial extension of
+the semantic model, then your use case likely fits better with polymorphism and
+you should avoid using virtual dispatch. However, there may be some exigent
+circumstances that require one technique or the other to be used.
+
+If you do need to introduce a type hierarchy, we prefer to use explicitly
+closed type hierarchies with manual tagged dispatch and/or RTTI rather than the
+open inheritance model and virtual dispatch that is more common in C++ code.
+This is because LLVM rarely encourages library consumers to extend its core
+types, and leverages the closed and tag-dispatched nature of its hierarchies to
+generate significantly more efficient code. We have also found that a large
+amount of our usage of type hierarchies fits better with tag-based pattern
+matching rather than dynamic dispatch across a common interface. Within LLVM we
+have built custom helpers to facilitate this design. See this document's
+section on :ref:`isa and dyn_cast <isa>` and our :doc:`detailed document
+<HowToSetUpLLVMStyleRTTI>` which describes how you can implement this
+pattern for use with the LLVM helpers.
+
.. _coreclasses:
The Core LLVM Class Hierarchy Reference
diff --git a/docs/ReleaseNotes.rst b/docs/ReleaseNotes.rst
index be2954c..6a38363 100644
--- a/docs/ReleaseNotes.rst
+++ b/docs/ReleaseNotes.rst
@@ -1,12 +1,12 @@
======================
-LLVM 3.5 Release Notes
+LLVM 3.7 Release Notes
======================
.. contents::
:local:
.. warning::
- These are in-progress notes for the upcoming LLVM 3.6 release. You may
+ These are in-progress notes for the upcoming LLVM 3.7 release. You may
prefer the `LLVM 3.5 Release Notes <http://llvm.org/releases/3.5.0/docs
/ReleaseNotes.html>`_.
@@ -15,7 +15,7 @@ Introduction
============
This document contains the release notes for the LLVM Compiler Infrastructure,
-release 3.6. Here we describe the status of LLVM, including major improvements
+release 3.7. Here we describe the status of LLVM, including major improvements
from the previous release, improvements in various subprojects of LLVM, and
some of the current users of the code. All LLVM releases may be downloaded
from the `LLVM releases web site <http://llvm.org/releases/>`_.
@@ -41,10 +41,8 @@ Non-comprehensive list of changes in this release
functionality, or simply have a lot to talk about), see the `NOTE` below
for adding a new subsection.
-* Support for AuroraUX has been removed.
-
-* Added support for a `native object file-based bitcode wrapper format
- <BitCodeFormat.html#native-object-file>`_.
+* The minimum required Visual Studio version for building LLVM is now 2013
+ Update 4.
* ... next change ...
@@ -67,19 +65,27 @@ Changes to the ARM Backend
Changes to the MIPS Target
--------------------------
-During this release ...
+ During this release ...
+
Changes to the PowerPC Target
-----------------------------
-During this release ...
+ During this release ...
+
+
+Changes to the OCaml bindings
+-----------------------------
+
+ During this release ...
+
-External Open Source Projects Using LLVM 3.6
+External Open Source Projects Using LLVM 3.7
============================================
An exciting aspect of LLVM is that it is used as an enabling technology for
a lot of other language and tools projects. This section lists some of the
-projects that have already been updated to work with LLVM 3.6.
+projects that have already been updated to work with LLVM 3.7.
* A project
diff --git a/docs/SourceLevelDebugging.rst b/docs/SourceLevelDebugging.rst
index 3a5fa6e..350604c 100644
--- a/docs/SourceLevelDebugging.rst
+++ b/docs/SourceLevelDebugging.rst
@@ -1807,7 +1807,6 @@ tag is one of:
* DW_TAG_subrange_type
* DW_TAG_base_type
* DW_TAG_const_type
-* DW_TAG_constant
* DW_TAG_file_type
* DW_TAG_namelist
* DW_TAG_packed_type
diff --git a/docs/StackMaps.rst b/docs/StackMaps.rst
index bd0fb94..43c60c9 100644
--- a/docs/StackMaps.rst
+++ b/docs/StackMaps.rst
@@ -221,6 +221,13 @@ lowered according to the calling convention specified at the
intrinsic's callsite. Variants of the intrinsic with non-void return
type also return a value according to calling convention.
+On PowerPC, note that ``<target>`` must be the actual intended target of
+the indirect call. Specifically, even when compiling for the ELF V1 ABI,
+``<target>`` is not the function-descriptor address normally used as the C/C++
+function-pointer representation. As a result, the call target must be local
+because no adjustment or restoration of the TOC pointer (in register r2) will
+be performed.
+
Requesting zero patch point arguments is valid. In this case, all
variable operands are handled just like
``llvm.experimental.stackmap.*``. The difference is that space will
diff --git a/docs/Statepoints.rst b/docs/Statepoints.rst
new file mode 100644
index 0000000..9741c93
--- /dev/null
+++ b/docs/Statepoints.rst
@@ -0,0 +1,580 @@
+=====================================
+Garbage Collection Safepoints in LLVM
+=====================================
+
+.. contents::
+ :local:
+ :depth: 2
+
+Status
+=======
+
+This document describes a set of experimental extensions to LLVM. Use
+with caution. Because the intrinsics have experimental status,
+compatibility across LLVM releases is not guaranteed.
+
+LLVM currently supports an alternate mechanism for conservative
+garbage collection support using the ``gcroot`` intrinsic. The mechanism
+described here shares little in common with the alternate ``gcroot``
+implementation and it is hoped that this mechanism will eventually
+replace the gc_root mechanism.
+
+Overview
+========
+
+To collect dead objects, garbage collectors must be able to identify
+any references to objects contained within executing code, and,
+depending on the collector, potentially update them. The collector
+does not need this information at all points in code - that would make
+the problem much harder - but only at well-defined points in the
+execution known as 'safepoints' For most collectors, it is sufficient
+to track at least one copy of each unique pointer value. However, for
+a collector which wishes to relocate objects directly reachable from
+running code, a higher standard is required.
+
+One additional challenge is that the compiler may compute intermediate
+results ("derived pointers") which point outside of the allocation or
+even into the middle of another allocation. The eventual use of this
+intermediate value must yield an address within the bounds of the
+allocation, but such "exterior derived pointers" may be visible to the
+collector. Given this, a garbage collector can not safely rely on the
+runtime value of an address to indicate the object it is associated
+with. If the garbage collector wishes to move any object, the
+compiler must provide a mapping, for each pointer, to an indication of
+its allocation.
+
+To simplify the interaction between a collector and the compiled code,
+most garbage collectors are organized in terms of three abstractions:
+load barriers, store barriers, and safepoints.
+
+#. A load barrier is a bit of code executed immediately after the
+ machine load instruction, but before any use of the value loaded.
+ Depending on the collector, such a barrier may be needed for all
+ loads, merely loads of a particular type (in the original source
+ language), or none at all.
+
+#. Analogously, a store barrier is a code fragement that runs
+ immediately before the machine store instruction, but after the
+ computation of the value stored. The most common use of a store
+ barrier is to update a 'card table' in a generational garbage
+ collector.
+
+#. A safepoint is a location at which pointers visible to the compiled
+ code (i.e. currently in registers or on the stack) are allowed to
+ change. After the safepoint completes, the actual pointer value
+ may differ, but the 'object' (as seen by the source language)
+ pointed to will not.
+
+ Note that the term 'safepoint' is somewhat overloaded. It refers to
+ both the location at which the machine state is parsable and the
+ coordination protocol involved in bring application threads to a
+ point at which the collector can safely use that information. The
+ term "statepoint" as used in this document refers exclusively to the
+ former.
+
+This document focuses on the last item - compiler support for
+safepoints in generated code. We will assume that an outside
+mechanism has decided where to place safepoints. From our
+perspective, all safepoints will be function calls. To support
+relocation of objects directly reachable from values in compiled code,
+the collector must be able to:
+
+#. identify every copy of a pointer (including copies introduced by
+ the compiler itself) at the safepoint,
+#. identify which object each pointer relates to, and
+#. potentially update each of those copies.
+
+This document describes the mechanism by which an LLVM based compiler
+can provide this information to a language runtime/collector, and
+ensure that all pointers can be read and updated if desired. The
+heart of the approach is to construct (or rewrite) the IR in a manner
+where the possible updates performed by the garbage collector are
+explicitly visible in the IR. Doing so requires that we:
+
+#. create a new SSA value for each potentially relocated pointer, and
+ ensure that no uses of the original (non relocated) value is
+ reachable after the safepoint,
+#. specify the relocation in a way which is opaque to the compiler to
+ ensure that the optimizer can not introduce new uses of an
+ unrelocated value after a statepoint. This prevents the optimizer
+ from performing unsound optimizations.
+#. recording a mapping of live pointers (and the allocation they're
+ associated with) for each statepoint.
+
+At the most abstract level, inserting a safepoint can be thought of as
+replacing a call instruction with a call to a multiple return value
+function which both calls the original target of the call, returns
+it's result, and returns updated values for any live pointers to
+garbage collected objects.
+
+ Note that the task of identifying all live pointers to garbage
+ collected values, transforming the IR to expose a pointer giving the
+ base object for every such live pointer, and inserting all the
+ intrinsics correctly is explicitly out of scope for this document.
+ The recommended approach is to use the :ref:`utility passes
+ <statepoint-utilities>` described below.
+
+This abstract function call is concretely represented by a sequence of
+intrinsic calls known collectively as a "statepoint relocation sequence".
+
+Let's consider a simple call in LLVM IR:
+
+.. code-block:: llvm
+
+ define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
+ gc "statepoint-example" {
+ call void ()* @foo()
+ ret i8 addrspace(1)* %obj
+ }
+
+Depending on our language we may need to allow a safepoint during the execution
+of ``foo``. If so, we need to let the collector update local values in the
+current frame. If we don't, we'll be accessing a potential invalid reference
+once we eventually return from the call.
+
+In this example, we need to relocate the SSA value ``%obj``. Since we can't
+actually change the value in the SSA value ``%obj``, we need to introduce a new
+SSA value ``%obj.relocated`` which represents the potentially changed value of
+``%obj`` after the safepoint and update any following uses appropriately. The
+resulting relocation sequence is:
+
+.. code-block:: llvm
+
+ define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
+ gc "statepoint-example" {
+ %0 = call i32 (void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(void ()* @foo, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i8 addrspace(1)* %obj)
+ %obj.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(i32 %0, i32 9, i32 9)
+ ret i8 addrspace(1)* %obj.relocated
+ }
+
+Ideally, this sequence would have been represented as a M argument, N
+return value function (where M is the number of values being
+relocated + the original call arguments and N is the original return
+value + each relocated value), but LLVM does not easily support such a
+representation.
+
+Instead, the statepoint intrinsic marks the actual site of the
+safepoint or statepoint. The statepoint returns a token value (which
+exists only at compile time). To get back the original return value
+of the call, we use the ``gc.result`` intrinsic. To get the relocation
+of each pointer in turn, we use the ``gc.relocate`` intrinsic with the
+appropriate index. Note that both the ``gc.relocate`` and ``gc.result`` are
+tied to the statepoint. The combination forms a "statepoint relocation
+sequence" and represents the entitety of a parseable call or 'statepoint'.
+
+When lowered, this example would generate the following x86 assembly:
+
+.. code-block:: gas
+
+ .globl test1
+ .align 16, 0x90
+ pushq %rax
+ callq foo
+ .Ltmp1:
+ movq (%rsp), %rax # This load is redundant (oops!)
+ popq %rdx
+ retq
+
+Each of the potentially relocated values has been spilled to the
+stack, and a record of that location has been recorded to the
+:ref:`Stack Map section <stackmap-section>`. If the garbage collector
+needs to update any of these pointers during the call, it knows
+exactly what to change.
+
+The relevant parts of the StackMap section for our example are:
+
+.. code-block:: gas
+
+ # This describes the call site
+ # Stack Maps: callsite 2882400000
+ .quad 2882400000
+ .long .Ltmp1-test1
+ .short 0
+ # .. 8 entries skipped ..
+ # This entry describes the spill slot which is directly addressable
+ # off RSP with offset 0. Given the value was spilled with a pushq,
+ # that makes sense.
+ # Stack Maps: Loc 8: Direct RSP [encoding: .byte 2, .byte 8, .short 7, .int 0]
+ .byte 2
+ .byte 8
+ .short 7
+ .long 0
+
+This example was taken from the tests for the :ref:`RewriteStatepointsForGC` utility pass. As such, it's full StackMap can be easily examined with the following command.
+
+.. code-block:: bash
+
+ opt -rewrite-statepoints-for-gc test/Transforms/RewriteStatepointsForGC/basics.ll -S | llc -debug-only=stackmaps
+
+
+
+
+
+Intrinsics
+===========
+
+'llvm.experimental.gc.statepoint' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+ declare i32
+ @llvm.experimental.gc.statepoint(func_type <target>,
+ i64 <#call args>. i64 <unused>,
+ ... (call parameters),
+ i64 <# deopt args>, ... (deopt parameters),
+ ... (gc parameters))
+
+Overview:
+"""""""""
+
+The statepoint intrinsic represents a call which is parse-able by the
+runtime.
+
+Operands:
+"""""""""
+
+The 'target' operand is the function actually being called. The
+target can be specified as either a symbolic LLVM function, or as an
+arbitrary Value of appropriate function type. Note that the function
+type must match the signature of the callee and the types of the 'call
+parameters' arguments.
+
+The '#call args' operand is the number of arguments to the actual
+call. It must exactly match the number of arguments passed in the
+'call parameters' variable length section.
+
+The 'unused' operand is unused and likely to be removed. Please do
+not use.
+
+The 'call parameters' arguments are simply the arguments which need to
+be passed to the call target. They will be lowered according to the
+specified calling convention and otherwise handled like a normal call
+instruction. The number of arguments must exactly match what is
+specified in '# call args'. The types must match the signature of
+'target'.
+
+The 'deopt parameters' arguments contain an arbitrary list of Values
+which is meaningful to the runtime. The runtime may read any of these
+values, but is assumed not to modify them. If the garbage collector
+might need to modify one of these values, it must also be listed in
+the 'gc pointer' argument list. The '# deopt args' field indicates
+how many operands are to be interpreted as 'deopt parameters'.
+
+The 'gc parameters' arguments contain every pointer to a garbage
+collector object which potentially needs to be updated by the garbage
+collector. Note that the argument list must explicitly contain a base
+pointer for every derived pointer listed. The order of arguments is
+unimportant. Unlike the other variable length parameter sets, this
+list is not length prefixed.
+
+Semantics:
+""""""""""
+
+A statepoint is assumed to read and write all memory. As a result,
+memory operations can not be reordered past a statepoint. It is
+illegal to mark a statepoint as being either 'readonly' or 'readnone'.
+
+Note that legal IR can not perform any memory operation on a 'gc
+pointer' argument of the statepoint in a location statically reachable
+from the statepoint. Instead, the explicitly relocated value (from a
+``gc.relocate``) must be used.
+
+'llvm.experimental.gc.result' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+ declare type*
+ @llvm.experimental.gc.result(i32 %statepoint_token)
+
+Overview:
+"""""""""
+
+``gc.result`` extracts the result of the original call instruction
+which was replaced by the ``gc.statepoint``. The ``gc.result``
+intrinsic is actually a family of three intrinsics due to an
+implementation limitation. Other than the type of the return value,
+the semantics are the same.
+
+Operands:
+"""""""""
+
+The first and only argument is the ``gc.statepoint`` which starts
+the safepoint sequence of which this ``gc.result`` is a part.
+Despite the typing of this as a generic i32, *only* the value defined
+by a ``gc.statepoint`` is legal here.
+
+Semantics:
+""""""""""
+
+The ``gc.result`` represents the return value of the call target of
+the ``statepoint``. The type of the ``gc.result`` must exactly match
+the type of the target. If the call target returns void, there will
+be no ``gc.result``.
+
+A ``gc.result`` is modeled as a 'readnone' pure function. It has no
+side effects since it is just a projection of the return value of the
+previous call represented by the ``gc.statepoint``.
+
+'llvm.experimental.gc.relocate' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+ declare <pointer type>
+ @llvm.experimental.gc.relocate(i32 %statepoint_token,
+ i32 %base_offset,
+ i32 %pointer_offset)
+
+Overview:
+"""""""""
+
+A ``gc.relocate`` returns the potentially relocated value of a pointer
+at the safepoint.
+
+Operands:
+"""""""""
+
+The first argument is the ``gc.statepoint`` which starts the
+safepoint sequence of which this ``gc.relocation`` is a part.
+Despite the typing of this as a generic i32, *only* the value defined
+by a ``gc.statepoint`` is legal here.
+
+The second argument is an index into the statepoints list of arguments
+which specifies the base pointer for the pointer being relocated.
+This index must land within the 'gc parameter' section of the
+statepoint's argument list.
+
+The third argument is an index into the statepoint's list of arguments
+which specify the (potentially) derived pointer being relocated. It
+is legal for this index to be the same as the second argument
+if-and-only-if a base pointer is being relocated. This index must land
+within the 'gc parameter' section of the statepoint's argument list.
+
+Semantics:
+""""""""""
+
+The return value of ``gc.relocate`` is the potentially relocated value
+of the pointer specified by it's arguments. It is unspecified how the
+value of the returned pointer relates to the argument to the
+``gc.statepoint`` other than that a) it points to the same source
+language object with the same offset, and b) the 'based-on'
+relationship of the newly relocated pointers is a projection of the
+unrelocated pointers. In particular, the integer value of the pointer
+returned is unspecified.
+
+A ``gc.relocate`` is modeled as a ``readnone`` pure function. It has no
+side effects since it is just a way to extract information about work
+done during the actual call modeled by the ``gc.statepoint``.
+
+.. _statepoint-stackmap-format:
+
+Stack Map Format
+================
+
+Locations for each pointer value which may need read and/or updated by
+the runtime or collector are provided via the :ref:`Stack Map format
+<stackmap-format>` specified in the PatchPoint documentation.
+
+Each statepoint generates the following Locations:
+
+* Constant which describes number of following deopt *Locations* (not
+ operands)
+* Variable number of Locations, one for each deopt parameter listed in
+ the IR statepoint (same number as described by previous Constant)
+* Variable number of Locations pairs, one pair for each unique pointer
+ which needs relocated. The first Location in each pair describes
+ the base pointer for the object. The second is the derived pointer
+ actually being relocated. It is guaranteed that the base pointer
+ must also appear explicitly as a relocation pair if used after the
+ statepoint. There may be fewer pairs then gc parameters in the IR
+ statepoint. Each *unique* pair will occur at least once; duplicates
+ are possible.
+
+Note that the Locations used in each section may describe the same
+physical location. e.g. A stack slot may appear as a deopt location,
+a gc base pointer, and a gc derived pointer.
+
+The ID field of the 'StkMapRecord' for a statepoint is meaningless and
+it's value is explicitly unspecified.
+
+The LiveOut section of the StkMapRecord will be empty for a statepoint
+record.
+
+Safepoint Semantics & Verification
+==================================
+
+The fundamental correctness property for the compiled code's
+correctness w.r.t. the garbage collector is a dynamic one. It must be
+the case that there is no dynamic trace such that a operation
+involving a potentially relocated pointer is observably-after a
+safepoint which could relocate it. 'observably-after' is this usage
+means that an outside observer could observe this sequence of events
+in a way which precludes the operation being performed before the
+safepoint.
+
+To understand why this 'observable-after' property is required,
+consider a null comparison performed on the original copy of a
+relocated pointer. Assuming that control flow follows the safepoint,
+there is no way to observe externally whether the null comparison is
+performed before or after the safepoint. (Remember, the original
+Value is unmodified by the safepoint.) The compiler is free to make
+either scheduling choice.
+
+The actual correctness property implemented is slightly stronger than
+this. We require that there be no *static path* on which a
+potentially relocated pointer is 'observably-after' it may have been
+relocated. This is slightly stronger than is strictly necessary (and
+thus may disallow some otherwise valid programs), but greatly
+simplifies reasoning about correctness of the compiled code.
+
+By construction, this property will be upheld by the optimizer if
+correctly established in the source IR. This is a key invariant of
+the design.
+
+The existing IR Verifier pass has been extended to check most of the
+local restrictions on the intrinsics mentioned in their respective
+documentation. The current implementation in LLVM does not check the
+key relocation invariant, but this is ongoing work on developing such
+a verifier. Please ask on llvmdev if you're interested in
+experimenting with the current version.
+
+.. _statepoint-utilities:
+
+Utility Passes for Safepoint Insertion
+======================================
+
+.. _RewriteStatepointsForGC:
+
+RewriteStatepointsForGC
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The pass RewriteStatepointsForGC transforms a functions IR by replacing a
+``gc.statepoint`` (with an optional ``gc.result``) with a full relocation
+sequence, including all required ``gc.relocates``. To function, the pass
+requires that the GC strategy specified for the function be able to reliably
+distinguish between GC references and non-GC references in IR it is given.
+
+As an example, given this code:
+
+.. code-block:: llvm
+
+ define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
+ gc "statepoint-example" {
+ call i32 (void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(void ()* @foo, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0)
+ ret i8 addrspace(1)* %obj
+ }
+
+The pass would produce this IR:
+
+.. code-block:: llvm
+
+ define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
+ gc "statepoint-example" {
+ %0 = call i32 (void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(void ()* @foo, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i8 addrspace(1)* %obj)
+ %obj.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(i32 %0, i32 9, i32 9)
+ ret i8 addrspace(1)* %obj.relocated
+ }
+
+In the above examples, the addrspace(1) marker on the pointers is the mechanism
+that the ``statepoint-example`` GC strategy uses to distinguish references from
+non references. Address space 1 is not globally reserved for this purpose.
+
+This pass can be used an utility function by a language frontend that doesn't
+want to manually reason about liveness, base pointers, or relocation when
+constructing IR. As currently implemented, RewriteStatepointsForGC must be
+run after SSA construction (i.e. mem2ref).
+
+
+In practice, RewriteStatepointsForGC can be run much later in the pass
+pipeline, after most optimization is already done. This helps to improve
+the quality of the generated code when compiled with garbage collection support.
+In the long run, this is the intended usage model. At this time, a few details
+have yet to be worked out about the semantic model required to guarantee this
+is always correct. As such, please use with caution and report bugs.
+
+.. _PlaceSafepoints:
+
+PlaceSafepoints
+^^^^^^^^^^^^^^^^
+
+The pass PlaceSafepoints transforms a function's IR by replacing any call or
+invoke instructions with appropriate ``gc.statepoint`` and ``gc.result`` pairs,
+and inserting safepoint polls sufficient to ensure running code checks for a
+safepoint request on a timely manner. This pass is expected to be run before
+RewriteStatepointsForGC and thus does not produce full relocation sequences.
+
+As an example, given input IR of the following:
+
+.. code-block:: llvm
+
+ define void @test() gc "statepoint-example" {
+ call void @foo()
+ ret void
+ }
+
+ declare void @do_safepoint()
+ define void @gc.safepoint_poll() {
+ call void @do_safepoint()
+ ret void
+ }
+
+
+This pass would produce the following IR:
+
+.. code-block:: llvm
+
+ define void @test() gc "statepoint-example" {
+ %safepoint_token = call i32 (void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(void ()* @do_safepoint, i32 0, i32 0, i32 0)
+ %safepoint_token1 = call i32 (void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(void ()* @foo, i32 0, i32 0, i32 0)
+ ret void
+ }
+
+In this case, we've added an (unconditional) entry safepoint poll and converted the call into a ``gc.statepoint``. Note that despite appearances, the entry poll is not necessarily redundant. We'd have to know that ``foo`` and ``test`` were not mutually recursive for the poll to be redundant. In practice, you'd probably want to your poll definition to contain a conditional branch of some form.
+
+
+At the moment, PlaceSafepoints can insert safepoint polls at method entry and
+loop backedges locations. Extending this to work with return polls would be
+straight forward if desired.
+
+PlaceSafepoints includes a number of optimizations to avoid placing safepoint
+polls at particular sites unless needed to ensure timely execution of a poll
+under normal conditions. PlaceSafepoints does not attempt to ensure timely
+execution of a poll under worst case conditions such as heavy system paging.
+
+The implementation of a safepoint poll action is specified by looking up a
+function of the name ``gc.safepoint_poll`` in the containing Module. The body
+of this function is inserted at each poll site desired. While calls or invokes
+inside this method are transformed to a ``gc.statepoints``, recursive poll
+insertion is not performed.
+
+If you are scheduling the RewriteStatepointsForGC pass late in the pass order,
+you should probably schedule this pass immediately before it. The exception
+would be if you need to preserve abstract frame information (e.g. for
+deoptimization or introspection) at safepoints. In that case, ask on the
+llvmdev mailing list for suggestions.
+
+
+Bugs and Enhancements
+=====================
+
+Currently known bugs and enhancements under consideration can be
+tracked by performing a `bugzilla search
+<http://llvm.org/bugs/buglist.cgi?cmdtype=runnamed&namedcmd=Statepoint%20Bugs&list_id=64342>`_
+for [Statepoint] in the summary field. When filing new bugs, please
+use this tag so that interested parties see the newly filed bug. As
+with most LLVM features, design discussions take place on `llvmdev
+<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_, and patches
+should be sent to `llvm-commits
+<http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>`_ for review.
+
diff --git a/docs/TableGen/index.rst b/docs/TableGen/index.rst
index cda41b5..9526240 100644
--- a/docs/TableGen/index.rst
+++ b/docs/TableGen/index.rst
@@ -123,7 +123,6 @@ this (at the time of this writing):
bit hasCtrlDep = 0;
bit isNotDuplicable = 0;
bit hasSideEffects = 0;
- bit neverHasSideEffects = 0;
InstrItinClass Itinerary = NoItinerary;
string Constraints = "";
string DisableEncoding = "";
diff --git a/docs/WritingAnLLVMPass.rst b/docs/WritingAnLLVMPass.rst
index ef2b953..1d5a52f 100644
--- a/docs/WritingAnLLVMPass.rst
+++ b/docs/WritingAnLLVMPass.rst
@@ -853,7 +853,7 @@ Example implementations of ``getAnalysisUsage``
// This example modifies the program, but does not modify the CFG
void LICM::getAnalysisUsage(AnalysisUsage &AU) const {
AU.setPreservesCFG();
- AU.addRequired<LoopInfo>();
+ AU.addRequired<LoopInfoWrapperPass>();
}
.. _writing-an-llvm-pass-getAnalysis:
@@ -870,7 +870,7 @@ you want, and returns a reference to that pass. For example:
.. code-block:: c++
bool LICM::runOnFunction(Function &F) {
- LoopInfo &LI = getAnalysis<LoopInfo>();
+ LoopInfo &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
//...
}
diff --git a/docs/conf.py b/docs/conf.py
index 659c3e0..1897282 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -47,9 +47,9 @@ copyright = u'2003-2014, LLVM Project'
# built documents.
#
# The short X.Y version.
-version = '3.6'
+version = '3.7'
# The full version, including alpha/beta/rc tags.
-release = '3.6'
+release = '3.7'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
diff --git a/docs/index.rst b/docs/index.rst
index 5ac5443..56567db 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -199,6 +199,8 @@ For developers of applications which use LLVM as a library.
(`classes <http://llvm.org/doxygen/inherits.html>`_)
(`tarball <http://llvm.org/doxygen/doxygen.tar.gz>`_)
+`Documentation for Go bindings <http://godoc.org/llvm.org/llvm/bindings/go/llvm>`_
+
`ViewVC Repository Browser <http://llvm.org/viewvc/>`_
..
@@ -240,6 +242,9 @@ For API clients and LLVM developers.
InAlloca
BigEndianNEON
CoverageMappingFormat
+ Statepoints
+ MergeFunctions
+ BitSets
:doc:`WritingAnLLVMPass`
Information on how to write LLVM transformations and analyses.
@@ -332,6 +337,16 @@ For API clients and LLVM developers.
:doc:`CoverageMappingFormat`
This describes the format and encoding used for LLVM’s code coverage mapping.
+:doc:`Statepoints`
+ This describes a set of experimental extensions for garbage
+ collection support.
+
+:doc:`MergeFunctions`
+ Describes functions merging optimization.
+
+:doc:`InAlloca`
+ Description of the ``inalloca`` argument attribute.
+
Development Process Documentation
=================================
diff --git a/docs/tutorial/LangImpl1.rst b/docs/tutorial/LangImpl1.rst
index a2c5eee..f4b0191 100644
--- a/docs/tutorial/LangImpl1.rst
+++ b/docs/tutorial/LangImpl1.rst
@@ -73,14 +73,21 @@ in the various pieces. The structure of the tutorial is:
about this is how easy and trivial it is to construct SSA form in
LLVM: no, LLVM does *not* require your front-end to construct SSA
form!
-- `Chapter #8 <LangImpl8.html>`_: Conclusion and other useful LLVM
+- `Chapter #8 <LangImpl8.html>`_: Extending the Language: Debug
+ Information - Having built a decent little programming language with
+ control flow, functions and mutable variables, we consider what it
+ takes to add debug information to standalone executables. This debug
+ information will allow you to set breakpoints in Kaleidoscope
+ functions, print out argument variables, and call functions - all
+ from within the debugger!
+- `Chapter #9 <LangImpl8.html>`_: Conclusion and other useful LLVM
tidbits - This chapter wraps up the series by talking about
potential ways to extend the language, but also includes a bunch of
pointers to info about "special topics" like adding garbage
collection support, exceptions, debugging, support for "spaghetti
stacks", and a bunch of other tips and tricks.
-By the end of the tutorial, we'll have written a bit less than 700 lines
+By the end of the tutorial, we'll have written a bit less than 1000 lines
of non-comment, non-blank, lines of code. With this small amount of
code, we'll have built up a very reasonable compiler for a non-trivial
language including a hand-written lexer, parser, AST, as well as code
diff --git a/docs/tutorial/LangImpl4.rst b/docs/tutorial/LangImpl4.rst
index aa469ca..cdaac63 100644
--- a/docs/tutorial/LangImpl4.rst
+++ b/docs/tutorial/LangImpl4.rst
@@ -428,7 +428,7 @@ the LLVM JIT and optimizer. To build this example, use:
.. code-block:: bash
# Compile
- clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core jit native` -O3 -o toy
+ clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
# Run
./toy
diff --git a/docs/tutorial/LangImpl5.rst b/docs/tutorial/LangImpl5.rst
index 2a3a4ce..72e34b1 100644
--- a/docs/tutorial/LangImpl5.rst
+++ b/docs/tutorial/LangImpl5.rst
@@ -736,7 +736,7 @@ the if/then/else and for expressions.. To build this example, use:
.. code-block:: bash
# Compile
- clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core jit native` -O3 -o toy
+ clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
# Run
./toy
diff --git a/docs/tutorial/LangImpl6.rst b/docs/tutorial/LangImpl6.rst
index cdceb03..bf78bde 100644
--- a/docs/tutorial/LangImpl6.rst
+++ b/docs/tutorial/LangImpl6.rst
@@ -729,7 +729,7 @@ the if/then/else and for expressions.. To build this example, use:
.. code-block:: bash
# Compile
- clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core jit native` -O3 -o toy
+ clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
# Run
./toy
diff --git a/docs/tutorial/LangImpl7.rst b/docs/tutorial/LangImpl7.rst
index c4c7233..c445908 100644
--- a/docs/tutorial/LangImpl7.rst
+++ b/docs/tutorial/LangImpl7.rst
@@ -847,7 +847,7 @@ mutable variables and var/in support. To build this example, use:
.. code-block:: bash
# Compile
- clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core jit native` -O3 -o toy
+ clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
# Run
./toy
@@ -856,5 +856,5 @@ Here is the code:
.. literalinclude:: ../../examples/Kaleidoscope/Chapter7/toy.cpp
:language: c++
-`Next: Conclusion and other useful LLVM tidbits <LangImpl8.html>`_
+`Next: Adding Debug Information <LangImpl8.html>`_
diff --git a/docs/tutorial/LangImpl8.rst b/docs/tutorial/LangImpl8.rst
index 6f69493..4473035 100644
--- a/docs/tutorial/LangImpl8.rst
+++ b/docs/tutorial/LangImpl8.rst
@@ -1,267 +1,459 @@
-======================================================
-Kaleidoscope: Conclusion and other useful LLVM tidbits
-======================================================
+======================================
+Kaleidoscope: Adding Debug Information
+======================================
.. contents::
:local:
-Tutorial Conclusion
-===================
-
-Welcome to the final chapter of the "`Implementing a language with
-LLVM <index.html>`_" tutorial. In the course of this tutorial, we have
-grown our little Kaleidoscope language from being a useless toy, to
-being a semi-interesting (but probably still useless) toy. :)
-
-It is interesting to see how far we've come, and how little code it has
-taken. We built the entire lexer, parser, AST, code generator, and an
-interactive run-loop (with a JIT!) by-hand in under 700 lines of
-(non-comment/non-blank) code.
-
-Our little language supports a couple of interesting features: it
-supports user defined binary and unary operators, it uses JIT
-compilation for immediate evaluation, and it supports a few control flow
-constructs with SSA construction.
-
-Part of the idea of this tutorial was to show you how easy and fun it
-can be to define, build, and play with languages. Building a compiler
-need not be a scary or mystical process! Now that you've seen some of
-the basics, I strongly encourage you to take the code and hack on it.
-For example, try adding:
-
-- **global variables** - While global variables have questional value
- in modern software engineering, they are often useful when putting
- together quick little hacks like the Kaleidoscope compiler itself.
- Fortunately, our current setup makes it very easy to add global
- variables: just have value lookup check to see if an unresolved
- variable is in the global variable symbol table before rejecting it.
- To create a new global variable, make an instance of the LLVM
- ``GlobalVariable`` class.
-- **typed variables** - Kaleidoscope currently only supports variables
- of type double. This gives the language a very nice elegance, because
- only supporting one type means that you never have to specify types.
- Different languages have different ways of handling this. The easiest
- way is to require the user to specify types for every variable
- definition, and record the type of the variable in the symbol table
- along with its Value\*.
-- **arrays, structs, vectors, etc** - Once you add types, you can start
- extending the type system in all sorts of interesting ways. Simple
- arrays are very easy and are quite useful for many different
- applications. Adding them is mostly an exercise in learning how the
- LLVM `getelementptr <../LangRef.html#i_getelementptr>`_ instruction
- works: it is so nifty/unconventional, it `has its own
- FAQ <../GetElementPtr.html>`_! If you add support for recursive types
- (e.g. linked lists), make sure to read the `section in the LLVM
- Programmer's Manual <../ProgrammersManual.html#TypeResolve>`_ that
- describes how to construct them.
-- **standard runtime** - Our current language allows the user to access
- arbitrary external functions, and we use it for things like "printd"
- and "putchard". As you extend the language to add higher-level
- constructs, often these constructs make the most sense if they are
- lowered to calls into a language-supplied runtime. For example, if
- you add hash tables to the language, it would probably make sense to
- add the routines to a runtime, instead of inlining them all the way.
-- **memory management** - Currently we can only access the stack in
- Kaleidoscope. It would also be useful to be able to allocate heap
- memory, either with calls to the standard libc malloc/free interface
- or with a garbage collector. If you would like to use garbage
- collection, note that LLVM fully supports `Accurate Garbage
- Collection <../GarbageCollection.html>`_ including algorithms that
- move objects and need to scan/update the stack.
-- **debugger support** - LLVM supports generation of `DWARF Debug
- info <../SourceLevelDebugging.html>`_ which is understood by common
- debuggers like GDB. Adding support for debug info is fairly
- straightforward. The best way to understand it is to compile some
- C/C++ code with "``clang -g -O0``" and taking a look at what it
- produces.
-- **exception handling support** - LLVM supports generation of `zero
- cost exceptions <../ExceptionHandling.html>`_ which interoperate with
- code compiled in other languages. You could also generate code by
- implicitly making every function return an error value and checking
- it. You could also make explicit use of setjmp/longjmp. There are
- many different ways to go here.
-- **object orientation, generics, database access, complex numbers,
- geometric programming, ...** - Really, there is no end of crazy
- features that you can add to the language.
-- **unusual domains** - We've been talking about applying LLVM to a
- domain that many people are interested in: building a compiler for a
- specific language. However, there are many other domains that can use
- compiler technology that are not typically considered. For example,
- LLVM has been used to implement OpenGL graphics acceleration,
- translate C++ code to ActionScript, and many other cute and clever
- things. Maybe you will be the first to JIT compile a regular
- expression interpreter into native code with LLVM?
-
-Have fun - try doing something crazy and unusual. Building a language
-like everyone else always has, is much less fun than trying something a
-little crazy or off the wall and seeing how it turns out. If you get
-stuck or want to talk about it, feel free to email the `llvmdev mailing
-list <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_: it has lots
-of people who are interested in languages and are often willing to help
-out.
-
-Before we end this tutorial, I want to talk about some "tips and tricks"
-for generating LLVM IR. These are some of the more subtle things that
-may not be obvious, but are very useful if you want to take advantage of
-LLVM's capabilities.
-
-Properties of the LLVM IR
-=========================
-
-We have a couple common questions about code in the LLVM IR form - lets
-just get these out of the way right now, shall we?
-
-Target Independence
--------------------
-
-Kaleidoscope is an example of a "portable language": any program written
-in Kaleidoscope will work the same way on any target that it runs on.
-Many other languages have this property, e.g. lisp, java, haskell,
-javascript, python, etc (note that while these languages are portable,
-not all their libraries are).
-
-One nice aspect of LLVM is that it is often capable of preserving target
-independence in the IR: you can take the LLVM IR for a
-Kaleidoscope-compiled program and run it on any target that LLVM
-supports, even emitting C code and compiling that on targets that LLVM
-doesn't support natively. You can trivially tell that the Kaleidoscope
-compiler generates target-independent code because it never queries for
-any target-specific information when generating code.
-
-The fact that LLVM provides a compact, target-independent,
-representation for code gets a lot of people excited. Unfortunately,
-these people are usually thinking about C or a language from the C
-family when they are asking questions about language portability. I say
-"unfortunately", because there is really no way to make (fully general)
-C code portable, other than shipping the source code around (and of
-course, C source code is not actually portable in general either - ever
-port a really old application from 32- to 64-bits?).
-
-The problem with C (again, in its full generality) is that it is heavily
-laden with target specific assumptions. As one simple example, the
-preprocessor often destructively removes target-independence from the
-code when it processes the input text:
-
-.. code-block:: c
-
- #ifdef __i386__
- int X = 1;
- #else
- int X = 42;
- #endif
-
-While it is possible to engineer more and more complex solutions to
-problems like this, it cannot be solved in full generality in a way that
-is better than shipping the actual source code.
-
-That said, there are interesting subsets of C that can be made portable.
-If you are willing to fix primitive types to a fixed size (say int =
-32-bits, and long = 64-bits), don't care about ABI compatibility with
-existing binaries, and are willing to give up some other minor features,
-you can have portable code. This can make sense for specialized domains
-such as an in-kernel language.
-
-Safety Guarantees
------------------
-
-Many of the languages above are also "safe" languages: it is impossible
-for a program written in Java to corrupt its address space and crash the
-process (assuming the JVM has no bugs). Safety is an interesting
-property that requires a combination of language design, runtime
-support, and often operating system support.
-
-It is certainly possible to implement a safe language in LLVM, but LLVM
-IR does not itself guarantee safety. The LLVM IR allows unsafe pointer
-casts, use after free bugs, buffer over-runs, and a variety of other
-problems. Safety needs to be implemented as a layer on top of LLVM and,
-conveniently, several groups have investigated this. Ask on the `llvmdev
-mailing list <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_ if
-you are interested in more details.
-
-Language-Specific Optimizations
--------------------------------
-
-One thing about LLVM that turns off many people is that it does not
-solve all the world's problems in one system (sorry 'world hunger',
-someone else will have to solve you some other day). One specific
-complaint is that people perceive LLVM as being incapable of performing
-high-level language-specific optimization: LLVM "loses too much
-information".
-
-Unfortunately, this is really not the place to give you a full and
-unified version of "Chris Lattner's theory of compiler design". Instead,
-I'll make a few observations:
-
-First, you're right that LLVM does lose information. For example, as of
-this writing, there is no way to distinguish in the LLVM IR whether an
-SSA-value came from a C "int" or a C "long" on an ILP32 machine (other
-than debug info). Both get compiled down to an 'i32' value and the
-information about what it came from is lost. The more general issue
-here, is that the LLVM type system uses "structural equivalence" instead
-of "name equivalence". Another place this surprises people is if you
-have two types in a high-level language that have the same structure
-(e.g. two different structs that have a single int field): these types
-will compile down into a single LLVM type and it will be impossible to
-tell what it came from.
-
-Second, while LLVM does lose information, LLVM is not a fixed target: we
-continue to enhance and improve it in many different ways. In addition
-to adding new features (LLVM did not always support exceptions or debug
-info), we also extend the IR to capture important information for
-optimization (e.g. whether an argument is sign or zero extended,
-information about pointers aliasing, etc). Many of the enhancements are
-user-driven: people want LLVM to include some specific feature, so they
-go ahead and extend it.
-
-Third, it is *possible and easy* to add language-specific optimizations,
-and you have a number of choices in how to do it. As one trivial
-example, it is easy to add language-specific optimization passes that
-"know" things about code compiled for a language. In the case of the C
-family, there is an optimization pass that "knows" about the standard C
-library functions. If you call "exit(0)" in main(), it knows that it is
-safe to optimize that into "return 0;" because C specifies what the
-'exit' function does.
-
-In addition to simple library knowledge, it is possible to embed a
-variety of other language-specific information into the LLVM IR. If you
-have a specific need and run into a wall, please bring the topic up on
-the llvmdev list. At the very worst, you can always treat LLVM as if it
-were a "dumb code generator" and implement the high-level optimizations
-you desire in your front-end, on the language-specific AST.
-
-Tips and Tricks
-===============
-
-There is a variety of useful tips and tricks that you come to know after
-working on/with LLVM that aren't obvious at first glance. Instead of
-letting everyone rediscover them, this section talks about some of these
-issues.
-
-Implementing portable offsetof/sizeof
--------------------------------------
-
-One interesting thing that comes up, if you are trying to keep the code
-generated by your compiler "target independent", is that you often need
-to know the size of some LLVM type or the offset of some field in an
-llvm structure. For example, you might need to pass the size of a type
-into a function that allocates memory.
-
-Unfortunately, this can vary widely across targets: for example the
-width of a pointer is trivially target-specific. However, there is a
-`clever way to use the getelementptr
-instruction <http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt>`_
-that allows you to compute this in a portable way.
-
-Garbage Collected Stack Frames
-------------------------------
-
-Some languages want to explicitly manage their stack frames, often so
-that they are garbage collected or to allow easy implementation of
-closures. There are often better ways to implement these features than
-explicit stack frames, but `LLVM does support
-them, <http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt>`_
-if you want. It requires your front-end to convert the code into
-`Continuation Passing
-Style <http://en.wikipedia.org/wiki/Continuation-passing_style>`_ and
-the use of tail calls (which LLVM also supports).
+Chapter 8 Introduction
+======================
+
+Welcome to Chapter 8 of the "`Implementing a language with
+LLVM <index.html>`_" tutorial. In chapters 1 through 7, we've built a
+decent little programming language with functions and variables.
+What happens if something goes wrong though, how do you debug your
+program?
+
+Source level debugging uses formatted data that helps a debugger
+translate from binary and the state of the machine back to the
+source that the programmer wrote. In LLVM we generally use a format
+called `DWARF <http://dwarfstd.org>`_. DWARF is a compact encoding
+that represents types, source locations, and variable locations.
+
+The short summary of this chapter is that we'll go through the
+various things you have to add to a programming language to
+support debug info, and how you translate that into DWARF.
+
+Caveat: For now we can't debug via the JIT, so we'll need to compile
+our program down to something small and standalone. As part of this
+we'll make a few modifications to the running of the language and
+how programs are compiled. This means that we'll have a source file
+with a simple program written in Kaleidoscope rather than the
+interactive JIT. It does involve a limitation that we can only
+have one "top level" command at a time to reduce the number of
+changes necessary.
+
+Here's the sample program we'll be compiling:
+
+.. code-block:: python
+
+ def fib(x)
+ if x < 3 then
+ 1
+ else
+ fib(x-1)+fib(x-2);
+
+ fib(10)
+
+
+Why is this a hard problem?
+===========================
+
+Debug information is a hard problem for a few different reasons - mostly
+centered around optimized code. First, optimization makes keeping source
+locations more difficult. In LLVM IR we keep the original source location
+for each IR level instruction on the instruction. Optimization passes
+should keep the source locations for newly created instructions, but merged
+instructions only get to keep a single location - this can cause jumping
+around when stepping through optimized programs. Secondly, optimization
+can move variables in ways that are either optimized out, shared in memory
+with other variables, or difficult to track. For the purposes of this
+tutorial we're going to avoid optimization (as you'll see with one of the
+next sets of patches).
+
+Ahead-of-Time Compilation Mode
+==============================
+
+To highlight only the aspects of adding debug information to a source
+language without needing to worry about the complexities of JIT debugging
+we're going to make a few changes to Kaleidoscope to support compiling
+the IR emitted by the front end into a simple standalone program that
+you can execute, debug, and see results.
+
+First we make our anonymous function that contains our top level
+statement be our "main":
+
+.. code-block:: udiff
+
+ - PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
+ + PrototypeAST *Proto = new PrototypeAST("main", std::vector<std::string>());
+
+just with the simple change of giving it a name.
+
+Then we're going to remove the command line code wherever it exists:
+
+.. code-block:: udiff
+
+ @@ -1129,7 +1129,6 @@ static void HandleTopLevelExpression() {
+ /// top ::= definition | external | expression | ';'
+ static void MainLoop() {
+ while (1) {
+ - fprintf(stderr, "ready> ");
+ switch (CurTok) {
+ case tok_eof:
+ return;
+ @@ -1184,7 +1183,6 @@ int main() {
+ BinopPrecedence['*'] = 40; // highest.
+
+ // Prime the first token.
+ - fprintf(stderr, "ready> ");
+ getNextToken();
+
+Lastly we're going to disable all of the optimization passes and the JIT so
+that the only thing that happens after we're done parsing and generating
+code is that the llvm IR goes to standard error:
+
+.. code-block:: udiff
+
+ @@ -1108,17 +1108,8 @@ static void HandleExtern() {
+ static void HandleTopLevelExpression() {
+ // Evaluate a top-level expression into an anonymous function.
+ if (FunctionAST *F = ParseTopLevelExpr()) {
+ - if (Function *LF = F->Codegen()) {
+ - // We're just doing this to make sure it executes.
+ - TheExecutionEngine->finalizeObject();
+ - // JIT the function, returning a function pointer.
+ - void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
+ -
+ - // Cast it to the right type (takes no arguments, returns a double) so we
+ - // can call it as a native function.
+ - double (*FP)() = (double (*)())(intptr_t)FPtr;
+ - // Ignore the return value for this.
+ - (void)FP;
+ + if (!F->Codegen()) {
+ + fprintf(stderr, "Error generating code for top level expr");
+ }
+ } else {
+ // Skip token for error recovery.
+ @@ -1439,11 +1459,11 @@ int main() {
+ // target lays out data structures.
+ TheModule->setDataLayout(TheExecutionEngine->getDataLayout());
+ OurFPM.add(new DataLayoutPass());
+ +#if 0
+ OurFPM.add(createBasicAliasAnalysisPass());
+ // Promote allocas to registers.
+ OurFPM.add(createPromoteMemoryToRegisterPass());
+ @@ -1218,7 +1210,7 @@ int main() {
+ OurFPM.add(createGVNPass());
+ // Simplify the control flow graph (deleting unreachable blocks, etc).
+ OurFPM.add(createCFGSimplificationPass());
+ -
+ + #endif
+ OurFPM.doInitialization();
+
+ // Set the global so the code gen can use this.
+
+This relatively small set of changes get us to the point that we can compile
+our piece of Kaleidoscope language down to an executable program via this
+command line:
+
+.. code-block:: bash
+
+ Kaleidoscope-Ch8 < fib.ks | & clang -x ir -
+
+which gives an a.out/a.exe in the current working directory.
+
+Compile Unit
+============
+
+The top level container for a section of code in DWARF is a compile unit.
+This contains the type and function data for an individual translation unit
+(read: one file of source code). So the first thing we need to do is
+construct one for our fib.ks file.
+
+DWARF Emission Setup
+====================
+
+Similar to the ``IRBuilder`` class we have a
+```DIBuilder`` <http://llvm.org/doxygen/classllvm_1_1DIBuilder.html>`_ class
+that helps in constructing debug metadata for an llvm IR file. It
+corresponds 1:1 similarly to ``IRBuilder`` and llvm IR, but with nicer names.
+Using it does require that you be more familiar with DWARF terminology than
+you needed to be with ``IRBuilder`` and ``Instruction`` names, but if you
+read through the general documentation on the
+```Metadata Format`` <http://llvm.org/docs/SourceLevelDebugging.html>`_ it
+should be a little more clear. We'll be using this class to construct all
+of our IR level descriptions. Construction for it takes a module so we
+need to construct it shortly after we construct our module. We've left it
+as a global static variable to make it a bit easier to use.
+
+Next we're going to create a small container to cache some of our frequent
+data. The first will be our compile unit, but we'll also write a bit of
+code for our one type since we won't have to worry about multiple typed
+expressions:
+
+.. code-block:: c++
+
+ static DIBuilder *DBuilder;
+
+ struct DebugInfo {
+ DICompileUnit TheCU;
+ DIType DblTy;
+
+ DIType getDoubleTy();
+ } KSDbgInfo;
+
+ DIType DebugInfo::getDoubleTy() {
+ if (DblTy.isValid())
+ return DblTy;
+
+ DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float);
+ return DblTy;
+ }
+
+And then later on in ``main`` when we're constructing our module:
+
+.. code-block:: c++
+
+ DBuilder = new DIBuilder(*TheModule);
+
+ KSDbgInfo.TheCU = DBuilder->createCompileUnit(
+ dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", 0, "", 0);
+
+There are a couple of things to note here. First, while we're producing a
+compile unit for a language called Kaleidoscope we used the language
+constant for C. This is because a debugger wouldn't necessarily understand
+the calling conventions or default ABI for a language it doesn't recognize
+and we follow the C ABI in our llvm code generation so it's the closest
+thing to accurate. This ensures we can actually call functions from the
+debugger and have them execute. Secondly, you'll see the "fib.ks" in the
+call to ``createCompileUnit``. This is a default hard coded value since
+we're using shell redirection to put our source into the Kaleidoscope
+compiler. In a usual front end you'd have an input file name and it would
+go there.
+
+One last thing as part of emitting debug information via DIBuilder is that
+we need to "finalize" the debug information. The reasons are part of the
+underlying API for DIBuilder, but make sure you do this near the end of
+main:
+
+.. code-block:: c++
+
+ DBuilder->finalize();
+
+before you dump out the module.
+
+Functions
+=========
+
+Now that we have our ``Compile Unit`` and our source locations, we can add
+function definitions to the debug info. So in ``PrototypeAST::Codegen`` we
+add a few lines of code to describe a context for our subprogram, in this
+case the "File", and the actual definition of the function itself.
+
+So the context:
+
+.. code-block:: c++
+
+ DIFile Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(),
+ KSDbgInfo.TheCU.getDirectory());
+
+giving us a DIFile and asking the ``Compile Unit`` we created above for the
+directory and filename where we are currently. Then, for now, we use some
+source locations of 0 (since our AST doesn't currently have source location
+information) and construct our function definition:
+
+.. code-block:: c++
+
+ DIDescriptor FContext(Unit);
+ unsigned LineNo = 0;
+ unsigned ScopeLine = 0;
+ DISubprogram SP = DBuilder->createFunction(
+ FContext, Name, StringRef(), Unit, LineNo,
+ CreateFunctionType(Args.size(), Unit), false /* internal linkage */,
+ true /* definition */, ScopeLine, DIDescriptor::FlagPrototyped, false, F);
+
+and we now have a DISubprogram that contains a reference to all of our metadata
+for the function.
+
+Source Locations
+================
+
+The most important thing for debug information is accurate source location -
+this makes it possible to map your source code back. We have a problem though,
+Kaleidoscope really doesn't have any source location information in the lexer
+or parser so we'll need to add it.
+
+.. code-block:: c++
+
+ struct SourceLocation {
+ int Line;
+ int Col;
+ };
+ static SourceLocation CurLoc;
+ static SourceLocation LexLoc = {1, 0};
+
+ static int advance() {
+ int LastChar = getchar();
+
+ if (LastChar == '\n' || LastChar == '\r') {
+ LexLoc.Line++;
+ LexLoc.Col = 0;
+ } else
+ LexLoc.Col++;
+ return LastChar;
+ }
+
+In this set of code we've added some functionality on how to keep track of the
+line and column of the "source file". As we lex every token we set our current
+current "lexical location" to the assorted line and column for the beginning
+of the token. We do this by overriding all of the previous calls to
+``getchar()`` with our new ``advance()`` that keeps track of the information
+and then we have added to all of our AST classes a source location:
+
+.. code-block:: c++
+
+ class ExprAST {
+ SourceLocation Loc;
+
+ public:
+ int getLine() const { return Loc.Line; }
+ int getCol() const { return Loc.Col; }
+ ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {}
+ virtual std::ostream &dump(std::ostream &out, int ind) {
+ return out << ':' << getLine() << ':' << getCol() << '\n';
+ }
+
+that we pass down through when we create a new expression:
+
+.. code-block:: c++
+
+ LHS = new BinaryExprAST(BinLoc, BinOp, LHS, RHS);
+
+giving us locations for each of our expressions and variables.
+
+From this we can make sure to tell ``DIBuilder`` when we're at a new source
+location so it can use that when we generate the rest of our code and make
+sure that each instruction has source location information. We do this
+by constructing another small function:
+
+.. code-block:: c++
+
+ void DebugInfo::emitLocation(ExprAST *AST) {
+ DIScope *Scope;
+ if (LexicalBlocks.empty())
+ Scope = &TheCU;
+ else
+ Scope = LexicalBlocks.back();
+ Builder.SetCurrentDebugLocation(
+ DebugLoc::get(AST->getLine(), AST->getCol(), DIScope(*Scope)));
+ }
+
+that both tells the main ``IRBuilder`` where we are, but also what scope
+we're in. Since we've just created a function above we can either be in
+the main file scope (like when we created our function), or now we can be
+in the function scope we just created. To represent this we create a stack
+of scopes:
+
+.. code-block:: c++
+
+ std::vector<DIScope *> LexicalBlocks;
+ std::map<const PrototypeAST *, DIScope> FnScopeMap;
+
+and keep a map of each function to the scope that it represents (a DISubprogram
+is also a DIScope).
+
+Then we make sure to:
+
+.. code-block:: c++
+
+ KSDbgInfo.emitLocation(this);
+
+emit the location every time we start to generate code for a new AST, and
+also:
+
+.. code-block:: c++
+
+ KSDbgInfo.FnScopeMap[this] = SP;
+
+store the scope (function) when we create it and use it:
+
+ KSDbgInfo.LexicalBlocks.push_back(&KSDbgInfo.FnScopeMap[Proto]);
+
+when we start generating the code for each function.
+
+also, don't forget to pop the scope back off of your scope stack at the
+end of the code generation for the function:
+
+.. code-block:: c++
+
+ // Pop off the lexical block for the function since we added it
+ // unconditionally.
+ KSDbgInfo.LexicalBlocks.pop_back();
+
+Variables
+=========
+
+Now that we have functions, we need to be able to print out the variables
+we have in scope. Let's get our function arguments set up so we can get
+decent backtraces and see how our functions are being called. It isn't
+a lot of code, and we generally handle it when we're creating the
+argument allocas in ``PrototypeAST::CreateArgumentAllocas``.
+
+.. code-block:: c++
+
+ DIScope *Scope = KSDbgInfo.LexicalBlocks.back();
+ DIFile Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(),
+ KSDbgInfo.TheCU.getDirectory());
+ DIVariable D = DBuilder->createLocalVariable(dwarf::DW_TAG_arg_variable,
+ *Scope, Args[Idx], Unit, Line,
+ KSDbgInfo.getDoubleTy(), Idx);
+
+ Instruction *Call = DBuilder->insertDeclare(
+ Alloca, D, DBuilder->createExpression(), Builder.GetInsertBlock());
+ Call->setDebugLoc(DebugLoc::get(Line, 0, *Scope));
+
+Here we're doing a few things. First, we're grabbing our current scope
+for the variable so we can say what range of code our variable is valid
+through. Second, we're creating the variable, giving it the scope,
+the name, source location, type, and since it's an argument, the argument
+index. Third, we create an ``lvm.dbg.declare`` call to indicate at the IR
+level that we've got a variable in an alloca (and it gives a starting
+location for the variable). Lastly, we set a source location for the
+beginning of the scope on the declare.
+
+One interesting thing to note at this point is that various debuggers have
+assumptions based on how code and debug information was generated for them
+in the past. In this case we need to do a little bit of a hack to avoid
+generating line information for the function prologue so that the debugger
+knows to skip over those instructions when setting a breakpoint. So in
+``FunctionAST::CodeGen`` we add a couple of lines:
+
+.. code-block:: c++
+
+ // Unset the location for the prologue emission (leading instructions with no
+ // location in a function are considered part of the prologue and the debugger
+ // will run past them when breaking on a function)
+ KSDbgInfo.emitLocation(nullptr);
+
+and then emit a new location when we actually start generating code for the
+body of the function:
+
+.. code-block:: c++
+
+ KSDbgInfo.emitLocation(Body);
+
+With this we have enough debug information to set breakpoints in functions,
+print out argument variables, and call functions. Not too bad for just a
+few simple lines of code!
+
+Full Code Listing
+=================
+
+Here is the complete code listing for our running example, enhanced with
+debug information. To build this example, use:
+
+.. code-block:: bash
+
+ # Compile
+ clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
+ # Run
+ ./toy
+
+Here is the code:
+
+.. literalinclude:: ../../examples/Kaleidoscope/Chapter8/toy.cpp
+ :language: c++
+
+`Next: Conclusion and other useful LLVM tidbits <LangImpl9.html>`_
diff --git a/docs/tutorial/LangImpl9.rst b/docs/tutorial/LangImpl9.rst
new file mode 100644
index 0000000..3398768
--- /dev/null
+++ b/docs/tutorial/LangImpl9.rst
@@ -0,0 +1,262 @@
+======================================================
+Kaleidoscope: Conclusion and other useful LLVM tidbits
+======================================================
+
+.. contents::
+ :local:
+
+Tutorial Conclusion
+===================
+
+Welcome to the final chapter of the "`Implementing a language with
+LLVM <index.html>`_" tutorial. In the course of this tutorial, we have
+grown our little Kaleidoscope language from being a useless toy, to
+being a semi-interesting (but probably still useless) toy. :)
+
+It is interesting to see how far we've come, and how little code it has
+taken. We built the entire lexer, parser, AST, code generator, an
+interactive run-loop (with a JIT!), and emitted debug information in
+standalone executables - all in under 1000 lines of (non-comment/non-blank)
+code.
+
+Our little language supports a couple of interesting features: it
+supports user defined binary and unary operators, it uses JIT
+compilation for immediate evaluation, and it supports a few control flow
+constructs with SSA construction.
+
+Part of the idea of this tutorial was to show you how easy and fun it
+can be to define, build, and play with languages. Building a compiler
+need not be a scary or mystical process! Now that you've seen some of
+the basics, I strongly encourage you to take the code and hack on it.
+For example, try adding:
+
+- **global variables** - While global variables have questional value
+ in modern software engineering, they are often useful when putting
+ together quick little hacks like the Kaleidoscope compiler itself.
+ Fortunately, our current setup makes it very easy to add global
+ variables: just have value lookup check to see if an unresolved
+ variable is in the global variable symbol table before rejecting it.
+ To create a new global variable, make an instance of the LLVM
+ ``GlobalVariable`` class.
+- **typed variables** - Kaleidoscope currently only supports variables
+ of type double. This gives the language a very nice elegance, because
+ only supporting one type means that you never have to specify types.
+ Different languages have different ways of handling this. The easiest
+ way is to require the user to specify types for every variable
+ definition, and record the type of the variable in the symbol table
+ along with its Value\*.
+- **arrays, structs, vectors, etc** - Once you add types, you can start
+ extending the type system in all sorts of interesting ways. Simple
+ arrays are very easy and are quite useful for many different
+ applications. Adding them is mostly an exercise in learning how the
+ LLVM `getelementptr <../LangRef.html#i_getelementptr>`_ instruction
+ works: it is so nifty/unconventional, it `has its own
+ FAQ <../GetElementPtr.html>`_! If you add support for recursive types
+ (e.g. linked lists), make sure to read the `section in the LLVM
+ Programmer's Manual <../ProgrammersManual.html#TypeResolve>`_ that
+ describes how to construct them.
+- **standard runtime** - Our current language allows the user to access
+ arbitrary external functions, and we use it for things like "printd"
+ and "putchard". As you extend the language to add higher-level
+ constructs, often these constructs make the most sense if they are
+ lowered to calls into a language-supplied runtime. For example, if
+ you add hash tables to the language, it would probably make sense to
+ add the routines to a runtime, instead of inlining them all the way.
+- **memory management** - Currently we can only access the stack in
+ Kaleidoscope. It would also be useful to be able to allocate heap
+ memory, either with calls to the standard libc malloc/free interface
+ or with a garbage collector. If you would like to use garbage
+ collection, note that LLVM fully supports `Accurate Garbage
+ Collection <../GarbageCollection.html>`_ including algorithms that
+ move objects and need to scan/update the stack.
+- **exception handling support** - LLVM supports generation of `zero
+ cost exceptions <../ExceptionHandling.html>`_ which interoperate with
+ code compiled in other languages. You could also generate code by
+ implicitly making every function return an error value and checking
+ it. You could also make explicit use of setjmp/longjmp. There are
+ many different ways to go here.
+- **object orientation, generics, database access, complex numbers,
+ geometric programming, ...** - Really, there is no end of crazy
+ features that you can add to the language.
+- **unusual domains** - We've been talking about applying LLVM to a
+ domain that many people are interested in: building a compiler for a
+ specific language. However, there are many other domains that can use
+ compiler technology that are not typically considered. For example,
+ LLVM has been used to implement OpenGL graphics acceleration,
+ translate C++ code to ActionScript, and many other cute and clever
+ things. Maybe you will be the first to JIT compile a regular
+ expression interpreter into native code with LLVM?
+
+Have fun - try doing something crazy and unusual. Building a language
+like everyone else always has, is much less fun than trying something a
+little crazy or off the wall and seeing how it turns out. If you get
+stuck or want to talk about it, feel free to email the `llvmdev mailing
+list <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_: it has lots
+of people who are interested in languages and are often willing to help
+out.
+
+Before we end this tutorial, I want to talk about some "tips and tricks"
+for generating LLVM IR. These are some of the more subtle things that
+may not be obvious, but are very useful if you want to take advantage of
+LLVM's capabilities.
+
+Properties of the LLVM IR
+=========================
+
+We have a couple common questions about code in the LLVM IR form - lets
+just get these out of the way right now, shall we?
+
+Target Independence
+-------------------
+
+Kaleidoscope is an example of a "portable language": any program written
+in Kaleidoscope will work the same way on any target that it runs on.
+Many other languages have this property, e.g. lisp, java, haskell,
+javascript, python, etc (note that while these languages are portable,
+not all their libraries are).
+
+One nice aspect of LLVM is that it is often capable of preserving target
+independence in the IR: you can take the LLVM IR for a
+Kaleidoscope-compiled program and run it on any target that LLVM
+supports, even emitting C code and compiling that on targets that LLVM
+doesn't support natively. You can trivially tell that the Kaleidoscope
+compiler generates target-independent code because it never queries for
+any target-specific information when generating code.
+
+The fact that LLVM provides a compact, target-independent,
+representation for code gets a lot of people excited. Unfortunately,
+these people are usually thinking about C or a language from the C
+family when they are asking questions about language portability. I say
+"unfortunately", because there is really no way to make (fully general)
+C code portable, other than shipping the source code around (and of
+course, C source code is not actually portable in general either - ever
+port a really old application from 32- to 64-bits?).
+
+The problem with C (again, in its full generality) is that it is heavily
+laden with target specific assumptions. As one simple example, the
+preprocessor often destructively removes target-independence from the
+code when it processes the input text:
+
+.. code-block:: c
+
+ #ifdef __i386__
+ int X = 1;
+ #else
+ int X = 42;
+ #endif
+
+While it is possible to engineer more and more complex solutions to
+problems like this, it cannot be solved in full generality in a way that
+is better than shipping the actual source code.
+
+That said, there are interesting subsets of C that can be made portable.
+If you are willing to fix primitive types to a fixed size (say int =
+32-bits, and long = 64-bits), don't care about ABI compatibility with
+existing binaries, and are willing to give up some other minor features,
+you can have portable code. This can make sense for specialized domains
+such as an in-kernel language.
+
+Safety Guarantees
+-----------------
+
+Many of the languages above are also "safe" languages: it is impossible
+for a program written in Java to corrupt its address space and crash the
+process (assuming the JVM has no bugs). Safety is an interesting
+property that requires a combination of language design, runtime
+support, and often operating system support.
+
+It is certainly possible to implement a safe language in LLVM, but LLVM
+IR does not itself guarantee safety. The LLVM IR allows unsafe pointer
+casts, use after free bugs, buffer over-runs, and a variety of other
+problems. Safety needs to be implemented as a layer on top of LLVM and,
+conveniently, several groups have investigated this. Ask on the `llvmdev
+mailing list <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_ if
+you are interested in more details.
+
+Language-Specific Optimizations
+-------------------------------
+
+One thing about LLVM that turns off many people is that it does not
+solve all the world's problems in one system (sorry 'world hunger',
+someone else will have to solve you some other day). One specific
+complaint is that people perceive LLVM as being incapable of performing
+high-level language-specific optimization: LLVM "loses too much
+information".
+
+Unfortunately, this is really not the place to give you a full and
+unified version of "Chris Lattner's theory of compiler design". Instead,
+I'll make a few observations:
+
+First, you're right that LLVM does lose information. For example, as of
+this writing, there is no way to distinguish in the LLVM IR whether an
+SSA-value came from a C "int" or a C "long" on an ILP32 machine (other
+than debug info). Both get compiled down to an 'i32' value and the
+information about what it came from is lost. The more general issue
+here, is that the LLVM type system uses "structural equivalence" instead
+of "name equivalence". Another place this surprises people is if you
+have two types in a high-level language that have the same structure
+(e.g. two different structs that have a single int field): these types
+will compile down into a single LLVM type and it will be impossible to
+tell what it came from.
+
+Second, while LLVM does lose information, LLVM is not a fixed target: we
+continue to enhance and improve it in many different ways. In addition
+to adding new features (LLVM did not always support exceptions or debug
+info), we also extend the IR to capture important information for
+optimization (e.g. whether an argument is sign or zero extended,
+information about pointers aliasing, etc). Many of the enhancements are
+user-driven: people want LLVM to include some specific feature, so they
+go ahead and extend it.
+
+Third, it is *possible and easy* to add language-specific optimizations,
+and you have a number of choices in how to do it. As one trivial
+example, it is easy to add language-specific optimization passes that
+"know" things about code compiled for a language. In the case of the C
+family, there is an optimization pass that "knows" about the standard C
+library functions. If you call "exit(0)" in main(), it knows that it is
+safe to optimize that into "return 0;" because C specifies what the
+'exit' function does.
+
+In addition to simple library knowledge, it is possible to embed a
+variety of other language-specific information into the LLVM IR. If you
+have a specific need and run into a wall, please bring the topic up on
+the llvmdev list. At the very worst, you can always treat LLVM as if it
+were a "dumb code generator" and implement the high-level optimizations
+you desire in your front-end, on the language-specific AST.
+
+Tips and Tricks
+===============
+
+There is a variety of useful tips and tricks that you come to know after
+working on/with LLVM that aren't obvious at first glance. Instead of
+letting everyone rediscover them, this section talks about some of these
+issues.
+
+Implementing portable offsetof/sizeof
+-------------------------------------
+
+One interesting thing that comes up, if you are trying to keep the code
+generated by your compiler "target independent", is that you often need
+to know the size of some LLVM type or the offset of some field in an
+llvm structure. For example, you might need to pass the size of a type
+into a function that allocates memory.
+
+Unfortunately, this can vary widely across targets: for example the
+width of a pointer is trivially target-specific. However, there is a
+`clever way to use the getelementptr
+instruction <http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt>`_
+that allows you to compute this in a portable way.
+
+Garbage Collected Stack Frames
+------------------------------
+
+Some languages want to explicitly manage their stack frames, often so
+that they are garbage collected or to allow easy implementation of
+closures. There are often better ways to implement these features than
+explicit stack frames, but `LLVM does support
+them, <http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt>`_
+if you want. It requires your front-end to convert the code into
+`Continuation Passing
+Style <http://en.wikipedia.org/wiki/Continuation-passing_style>`_ and
+the use of tail calls (which LLVM also supports).
+