42 files changed, 2074 insertions, 415 deletions
diff --git a/docs/BitCodeFormat.rst b/docs/BitCodeFormat.rst
index c83b6c1..d9d1df0 100644
--- a/docs/BitCodeFormat.rst
+++ b/docs/BitCodeFormat.rst
@@ -718,7 +718,7 @@ global variable. The operand fields are:
 MODULE_CODE_FUNCTION Record
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-``[FUNCTION, type, callingconv, isproto, linkage, paramattr, alignment, section, visibility, gc]``
+``[FUNCTION, type, callingconv, isproto, linkage, paramattr, alignment, section, visibility, gc, prefix]``
 
 The ``FUNCTION`` record (code 8) marks the declaration or definition of a
 function. The operand fields are:
@@ -757,6 +757,9 @@ function. The operand fields are:
 * *unnamed_addr*: If present and non-zero, indicates that the function has
   ``unnamed_addr``
 
+* *prefix*: If non-zero, the value index of the prefix data for this function,
+  plus 1.
+
 MODULE_CODE_ALIAS Record
 ^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/docs/CMakeLists.txt b/docs/CMakeLists.txt
new file mode 100644
index 0000000..8c49aa5
--- /dev/null
+++ b/docs/CMakeLists.txt
@@ -0,0 +1,51 @@
+
+if (DOXYGEN_FOUND)
+if (LLVM_ENABLE_DOXYGEN)
+  set(abs_top_srcdir ${LLVM_MAIN_SRC_DIR})
+  set(abs_top_builddir ${LLVM_BINARY_DIR})
+  
+  if (HAVE_DOT)
+    set(DOT ${LLVM_PATH_DOT})
+  endif()
+
+  if (LLVM_DOXYGEN_EXTERNAL_SEARCH)
+    set(enable_searchengine "YES")
+    set(searchengine_url "${LLVM_DOXYGEN_SEARCHENGINE_URL}")
+    set(enable_server_based_search "YES")
+    set(enable_external_search "YES")
+    set(extra_search_mappings "${LLVM_DOXYGEN_SEARCH_MAPPINGS}")
+  else()
+    set(enable_searchengine "NO")
+    set(searchengine_url "")
+    set(enable_server_based_search "NO")
+    set(enable_external_search "NO")
+    set(extra_search_mappings "")
+  endif()
+  
+  configure_file(${CMAKE_CURRENT_SOURCE_DIR}/doxygen.cfg.in
+    ${CMAKE_CURRENT_BINARY_DIR}/doxygen.cfg @ONLY)
+
+  set(abs_top_srcdir)
+  set(abs_top_builddir)
+  set(DOT)
+  set(enable_searchengine)
+  set(searchengine_url)
+  set(enable_server_based_search)
+  set(enable_external_search)
+  set(extra_search_mappings)
+
+  add_custom_target(doxygen-llvm
+    COMMAND ${DOXYGEN_EXECUTABLE} ${CMAKE_CURRENT_BINARY_DIR}/doxygen.cfg
+    WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
+    COMMENT "Generating llvm doxygen documentation." VERBATIM)
+
+  if (LLVM_BUILD_DOCS)
+    add_dependencies(doxygen doxygen-llvm)
+  endif()
+
+  if (NOT LLVM_INSTALL_TOOLCHAIN_ONLY)
+    install(DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/doxygen/html
+      DESTINATION docs/html)
+  endif()
+endif()
+endif()
diff --git a/docs/CodeGenerator.rst b/docs/CodeGenerator.rst
index 1f2dc6c..c87a628 100644
--- a/docs/CodeGenerator.rst
+++ b/docs/CodeGenerator.rst
@@ -636,6 +636,18 @@ file (MCObjectStreamer).  MCAsmStreamer is a straight-forward implementation
 that prints out a directive for each method (e.g. ``EmitValue -> .byte``), but
 MCObjectStreamer implements a full assembler.
 
+For target specific directives, the MCStreamer has a MCTargetStreamer instance.
+Each target that needs it defines a class that inherits from it and is a lot
+like MCStreamer itself: It has one method per directive and two classes that
+inherit from it, a target object streamer and a target asm streamer. The target
+asm streamer just prints it (``emitFnStart -> .fnstrart``), and the object
+streamer implement the assembler logic for it.
+
+To make llvm use these classes, the target initialization must call
+TargetRegistry::RegisterAsmStreamer and TargetRegistry::RegisterMCObjectStreamer
+passing callbacks that allocate the corresponding target streamer and pass it
+to createAsmStreamer or to the appropriate object streamer constructor.
+
 The ``MCContext`` class
 -----------------------
 
@@ -1614,7 +1626,7 @@ Implementing a Native Assembler
 ===============================
 
 Though you're probably reading this because you want to write or maintain a
-compiler backend, LLVM also fully supports building a native assemblers too.
+compiler backend, LLVM also fully supports building a native assembler.
 We've tried hard to automate the generation of the assembler from the .td files
 (in particular the instruction syntax and encodings), which means that a large
 part of the manual and repetitive data entry can be factored and shared with the
diff --git a/docs/CommandGuide/FileCheck.rst b/docs/CommandGuide/FileCheck.rst
index 6be5fc3..5a60d60 100644
--- a/docs/CommandGuide/FileCheck.rst
+++ b/docs/CommandGuide/FileCheck.rst
@@ -30,11 +30,13 @@ OPTIONS
 
 .. option:: --check-prefix prefix
 
- FileCheck searches the contents of ``match-filename`` for patterns to match.
- By default, these patterns are prefixed with "``CHECK:``".  If you'd like to
- use a different prefix (e.g. because the same input file is checking multiple
- different tool or options), the :option:`--check-prefix` argument allows you
- to specify a specific prefix to match.
+ FileCheck searches the contents of ``match-filename`` for patterns to
+ match.  By default, these patterns are prefixed with "``CHECK:``".
+ If you'd like to use a different prefix (e.g. because the same input
+ file is checking multiple different tool or options), the
+ :option:`--check-prefix` argument allows you to specify one or more
+ prefixes to match. Multiple prefixes are useful for tests which might
+ change for different run options, but most lines remain the same.
 
 .. option:: --input-file filename
 
@@ -216,6 +218,19 @@ in the natural order:
     Bar b;
     // CHECK-DAG: @_ZTV3Bar =
 
+``CHECK-NOT:`` directives could be mixed with ``CHECK-DAG:`` directives to
+exclude strings between the surrounding ``CHECK-DAG:`` directives. As a result,
+the surrounding ``CHECK-DAG:`` directives cannot be reordered, i.e. all
+occurrences matching ``CHECK-DAG:`` before ``CHECK-NOT:`` must not fall behind
+occurrences matching ``CHECK-DAG:`` after ``CHECK-NOT:``. For example,
+
+.. code-block:: llvm
+
+   ; CHECK-DAG: BEFORE
+   ; CHECK-NOT: NOT
+   ; CHECK-DAG: AFTER
+
+This case will reject input strings where ``BEFORE`` occurs after ``AFTER``.
 
 With captured variables, ``CHECK-DAG:`` is able to match valid topological
 orderings of a DAG with edges from the definition of a variable to its use.
@@ -230,19 +245,34 @@ sequences from the instruction scheduler. For example,
 
 In this case, any order of that two ``add`` instructions will be allowed.
 
-``CHECK-NOT:`` directives could be mixed with ``CHECK-DAG:`` directives to
-exclude strings between the surrounding ``CHECK-DAG:`` directives. As a result,
-the surrounding ``CHECK-DAG:`` directives cannot be reordered, i.e. all
-occurrences matching ``CHECK-DAG:`` before ``CHECK-NOT:`` must not fall behind
-occurrences matching ``CHECK-DAG:`` after ``CHECK-NOT:``. For example,
+If you are defining `and` using variables in the same ``CHECK-DAG:`` block,
+be aware that the definition rule can match `after` its use.
+
+So, for instance, the code below will pass:
 
 .. code-block:: llvm
 
-   ; CHECK-DAG: BEFORE
-   ; CHECK-NOT: NOT
-   ; CHECK-DAG: AFTER
+  ; CHECK-DAG: vmov.32 [[REG2:d[0-9]+]][0]
+  ; CHECK-DAG: vmov.32 [[REG2]][1]
+  vmov.32 d0[1]
+  vmov.32 d0[0]
 
-This case will reject input strings where ``BEFORE`` occurs after ``AFTER``.
+While this other code, will not:
+
+.. code-block:: llvm
+
+  ; CHECK-DAG: vmov.32 [[REG2:d[0-9]+]][0]
+  ; CHECK-DAG: vmov.32 [[REG2]][1]
+  vmov.32 d1[1]
+  vmov.32 d0[0]
+
+While this can be very useful, it's also dangerous, because in the case of
+register sequence, you must have a strong order (read before write, copy before
+use, etc). If the definition your test is looking for doesn't match (because
+of a bug in the compiler), it may match further away from the use, and mask
+real bugs away.
+
+In those cases, to enforce the order, use a non-DAG directive between DAG-blocks.
 
 The "CHECK-LABEL:" directive
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/docs/CommandGuide/lit.rst b/docs/CommandGuide/lit.rst
index a4681fb..4d84be6 100644
--- a/docs/CommandGuide/lit.rst
+++ b/docs/CommandGuide/lit.rst
@@ -149,12 +149,11 @@ ADDITIONAL OPTIONS
 
 .. option:: --show-suites
 
- List the discovered test suites as part of the standard output.
+ List the discovered test suites and exit.
 
-.. option:: --repeat=N
+.. option:: --show-tests
 
- Run each test ``N`` times.  Currently this is primarily useful for timing
- tests, other results are not collated in any reasonable fashion.
+ List all of the the discovered tests and exit.
 
 EXIT STATUS
 -----------
diff --git a/docs/CommandGuide/llvm-extract.rst b/docs/CommandGuide/llvm-extract.rst
index d569e35..d0e9c1c 100644
--- a/docs/CommandGuide/llvm-extract.rst
+++ b/docs/CommandGuide/llvm-extract.rst
@@ -1,104 +1,79 @@
 llvm-extract - extract a function from an LLVM module
 =====================================================
 
-
 SYNOPSIS
 --------
 
-
-**llvm-extract** [*options*] **--func** *function-name* [*filename*]
-
+:program:`llvm-extract` [*options*] **--func** *function-name* [*filename*]
 
 DESCRIPTION
 -----------
 
-
-The **llvm-extract** command takes the name of a function and extracts it from
-the specified LLVM bitcode file.  It is primarily used as a debugging tool to
-reduce test cases from larger programs that are triggering a bug.
+The :program:`llvm-extract` command takes the name of a function and extracts
+it from the specified LLVM bitcode file.  It is primarily used as a debugging
+tool to reduce test cases from larger programs that are triggering a bug.
 
 In addition to extracting the bitcode of the specified function,
-**llvm-extract** will also remove unreachable global variables, prototypes, and
-unused types.
-
-The **llvm-extract** command reads its input from standard input if filename is
-omitted or if filename is -.  The output is always written to standard output,
-unless the **-o** option is specified (see below).
+:program:`llvm-extract` will also remove unreachable global variables,
+prototypes, and unused types.
 
+The :program:`llvm-extract` command reads its input from standard input if
+filename is omitted or if filename is ``-``.  The output is always written to
+standard output, unless the **-o** option is specified (see below).
 
 OPTIONS
 -------
 
-
-
 **-f**
 
- Enable binary output on terminals.  Normally, **llvm-extract** will refuse to
- write raw bitcode output if the output stream is a terminal. With this option,
- **llvm-extract** will write raw bitcode regardless of the output device.
-
-
+ Enable binary output on terminals.  Normally, :program:`llvm-extract` will
+ refuse to write raw bitcode output if the output stream is a terminal.  With
+ this option, :program:`llvm-extract` will write raw bitcode regardless of the
+ output device.
 
 **--func** *function-name*
 
- Extract the function named *function-name* from the LLVM bitcode. May be
+ Extract the function named *function-name* from the LLVM bitcode.  May be
  specified multiple times to extract multiple functions at once.
 
-
-
 **--rfunc** *function-regular-expr*
 
  Extract the function(s) matching *function-regular-expr* from the LLVM bitcode.
  All functions matching the regular expression will be extracted.  May be
  specified multiple times.
 
-
-
 **--glob** *global-name*
 
- Extract the global variable named *global-name* from the LLVM bitcode. May be
+ Extract the global variable named *global-name* from the LLVM bitcode.  May be
  specified multiple times to extract multiple global variables at once.
 
-
-
 **--rglob** *glob-regular-expr*
 
  Extract the global variable(s) matching *global-regular-expr* from the LLVM
- bitcode. All global variables matching the regular expression will be extracted.
- May be specified multiple times.
-
-
+ bitcode.  All global variables matching the regular expression will be
+ extracted.  May be specified multiple times.
 
 **-help**
 
  Print a summary of command line options.
 
-
-
 **-o** *filename*
 
  Specify the output filename.  If filename is "-" (the default), then
- **llvm-extract** sends its output to standard output.
-
-
+ :program:`llvm-extract` sends its output to standard output.
 
 **-S**
 
  Write output in LLVM intermediate language (instead of bitcode).
 
-
-
-
 EXIT STATUS
 -----------
 
-
-If **llvm-extract** succeeds, it will exit with 0.  Otherwise, if an error
+If :program:`llvm-extract` succeeds, it will exit with 0.  Otherwise, if an error
 occurs, it will exit with a non-zero value.
 
-
 SEE ALSO
 --------
 
+bugpoint
 
-bugpoint|bugpoint
diff --git a/docs/CommandGuide/llvm-nm.rst b/docs/CommandGuide/llvm-nm.rst
index cbc7af2..83d9fba 100644
--- a/docs/CommandGuide/llvm-nm.rst
+++ b/docs/CommandGuide/llvm-nm.rst
@@ -1,189 +1,146 @@
 llvm-nm - list LLVM bitcode and object file's symbol table
 ==========================================================
 
-
 SYNOPSIS
 --------
 
-
 :program:`llvm-nm` [*options*] [*filenames...*]
 
-
 DESCRIPTION
 -----------
 
-
 The :program:`llvm-nm` utility lists the names of symbols from the LLVM bitcode
 files, object files, or :program:`ar` archives containing them, named on the
-command line. Each symbol is listed along with some simple information about its
-provenance. If no file name is specified, or *-* is used as a file name,
+command line.  Each symbol is listed along with some simple information about
+its provenance.  If no file name is specified, or *-* is used as a file name,
 :program:`llvm-nm` will process a file on its standard input stream.
 
 :program:`llvm-nm`'s default output format is the traditional BSD :program:`nm`
-output format. Each such output record consists of an (optional) 8-digit
+output format.  Each such output record consists of an (optional) 8-digit
 hexadecimal address, followed by a type code character, followed by a name, for
-each symbol. One record is printed per line; fields are separated by spaces.
+each symbol.  One record is printed per line; fields are separated by spaces.
 When the address is omitted, it is replaced by 8 spaces.
 
 Type code characters currently supported, and their meanings, are as follows:
 
-
 U
 
  Named object is referenced but undefined in this bitcode file
 
-
-
 C
 
  Common (multiple definitions link together into one def)
 
-
-
 W
 
  Weak reference (multiple definitions link together into zero or one definitions)
 
-
-
 t
 
  Local function (text) object
 
-
-
 T
 
  Global function (text) object
 
-
-
 d
 
  Local data object
 
-
-
 D
 
  Global data object
 
-
-
 ?
 
  Something unrecognizable
 
-
-
 Because LLVM bitcode files typically contain objects that are not considered to
 have addresses until they are linked into an executable image or dynamically
 compiled "just-in-time", :program:`llvm-nm` does not print an address for any
-symbol in a LLVM bitcode file, even symbols which are defined in the bitcode
+symbol in an LLVM bitcode file, even symbols which are defined in the bitcode
 file.
 
-
 OPTIONS
 -------
 
-
 .. program:: llvm-nm
 
-
 .. option:: -B    (default)
 
- Use BSD output format. Alias for :option:`--format=bsd`.
-
+ Use BSD output format.  Alias for :option:`--format=bsd`.
 
 .. option:: -P
 
- Use POSIX.2 output format. Alias for :option:`--format=posix`.
-
+ Use POSIX.2 output format.  Alias for :option:`--format=posix`.
 
 .. option:: --debug-syms, -a
 
  Show all symbols, even debugger only.
 
-
 .. option:: --defined-only
 
  Print only symbols defined in this file (as opposed to
  symbols which may be referenced by objects in this file, but not
  defined in this file.)
 
-
 .. option:: --dynamic, -D
 
  Display dynamic symbols instead of normal symbols.
 
-
 .. option:: --extern-only, -g
 
  Print only symbols whose definitions are external; that is, accessible
  from other files.
 
-
 .. option:: --format=format, -f format
 
- Select an output format; *format* may be *sysv*, *posix*, or *bsd*. The default
+ Select an output format; *format* may be *sysv*, *posix*, or *bsd*.  The default
  is *bsd*.
 
-
 .. option:: -help
 
  Print a summary of command-line options and their meanings.
 
-
 .. option:: --no-sort, -p
 
  Shows symbols in order encountered.
 
-
 .. option:: --numeric-sort, -n, -v
 
  Sort symbols by address.
 
-
 .. option:: --print-file-name, -A, -o
 
  Precede each symbol with the file it came from.
 
-
 .. option:: --print-size, -S
 
  Show symbol size instead of address.
 
-
 .. option:: --size-sort
 
  Sort symbols by size.
 
-
 .. option:: --undefined-only, -u
 
  Print only symbols referenced but not defined in this file.
 
-
 BUGS
 ----
 
-
  * :program:`llvm-nm` cannot demangle C++ mangled names, like GNU :program:`nm`
    can.
 
  * :program:`llvm-nm` does not support the full set of arguments that GNU
    :program:`nm` does.
 
-
 EXIT STATUS
 -----------
 
-
 :program:`llvm-nm` exits with an exit code of zero.
 
-
 SEE ALSO
 --------
 
-
-llvm-dis|llvm-dis, ar(1), nm(1)
+llvm-dis, ar(1), nm(1)
diff --git a/docs/CompilerWriterInfo.rst b/docs/CompilerWriterInfo.rst
index 6110d0b..7b02a78 100644
--- a/docs/CompilerWriterInfo.rst
+++ b/docs/CompilerWriterInfo.rst
@@ -39,7 +39,7 @@ Itanium (ia64)
 MIPS
 ----
 
-* `MIPS Processor Architecture <http://mips.com/content/Documentation/MIPSDocumentation/ProcessorArchitecture/doclibrary>`_
+* `MIPS Processor Architecture <http://imgtec.com/mips/mips-architectures.asp>`_
 
 PowerPC
 -------
@@ -124,6 +124,7 @@ Linux
 
 * `PowerPC 64-bit ELF ABI Supplement <http://www.linuxbase.org/spec/ELF/ppc64/>`_
 * `Procedure Call Standard for the AArch64 Architecture <http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055a/IHI0055A_aapcs64.pdf>`_
+* `ELF for the ARM Architecture <http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044e/IHI0044E_aaelf.pdf>`_
 * `ELF for the ARM 64-bit Architecture (AArch64) <http://infocenter.arm.com/help/topic/com.arm.doc.ihi0056a/IHI0056A_aaelf64.pdf>`_
 * `System z ELF ABI Supplement <http://legacy.redhat.com/pub/redhat/linux/7.1/es/os/s390x/doc/lzsabi0.pdf>`_
 
diff --git a/docs/DeveloperPolicy.rst b/docs/DeveloperPolicy.rst
index 0655559..ea5a7d1 100644
--- a/docs/DeveloperPolicy.rst
+++ b/docs/DeveloperPolicy.rst
@@ -68,6 +68,9 @@ of bugs and enhancements occurring in LLVM.  We really appreciate people who are
 proactive at catching incoming bugs in their components and dealing with them
 promptly.
 
+Please be aware that all public LLVM mailing lists are public and archived, and
+that notices of confidentiality or non-disclosure cannot be respected.
+
 .. _patch:
 .. _one-off patches:
 
@@ -107,6 +110,10 @@ rather than ``Content-Disposition: attachment``. Apple Mail gamely displays such
 a file inline, making it difficult to work with for reviewers using that
 program.
 
+When submitting patches, please do not add confidentiality or non-disclosure
+notices to the patches themselves.  These notices conflict with the `LLVM
+License`_ and may result in your contribution being excluded.
+
 .. _code review:
 
 Code Reviews
@@ -128,7 +135,24 @@ software. We generally follow these policies:
    all necessary review-related changes.
 
 #. Code review can be an iterative process, which continues until the patch is
-   ready to be committed.
+   ready to be committed. Specifically, once a patch is sent out for review, it
+   needs an explicit "looks good" before it is submitted. Do not assume silent
+   approval, or request active objections to the patch with a deadline.
+
+Sometimes code reviews will take longer than you would hope for, especially for
+larger features. Accepted ways to speed up review times for your patches are:
+
+* Review other people's patches. If you help out, everybody will be more
+  willing to do the same for you; goodwill is our currency.
+* Ping the patch. If it is urgent, provide reasons why it is important to you to
+  get this patch landed and ping it every couple of days. If it is
+  not urgent, the common courtesy ping rate is one week. Remember that you're
+  asking for valuable time from other professional developers.
+* Ask for help on IRC. Developers on IRC will be able to either help you
+  directly, or tell you who might be a good reviewer.
+* Split your patch into multiple smaller patches that build on each other. The
+  smaller your patch, the higher the probability that somebody will take a quick
+  look at it.
 
 Developers should participate in code reviews as both reviewers and
 reviewees. If someone is kind enough to review your code, you should return the
diff --git a/docs/Extensions.rst b/docs/Extensions.rst
index 78ff874..e308dbc 100644
--- a/docs/Extensions.rst
+++ b/docs/Extensions.rst
@@ -14,6 +14,20 @@ Introduction
 This document describes extensions to tools and formats LLVM seeks compatibility
 with.
 
+General Assembly Syntax
+===========================
+
+C99-style Hexadecimal Floating-point Constants
+----------------------------------------------
+
+LLVM's assemblers allow floating-point constants to be written in C99's
+hexadecimal format instead of decimal if desired.
+
+.. code-block:: gas
+
+  .section .data
+  .float 0x1c2.2ap3
+
 Machine-specific Assembly Syntax
 ================================
 
@@ -91,3 +105,41 @@ Supported COMDAT types:
   .section .xdata$foo
   .linkonce associative .text$foo
     ...
+
+``.section`` Directive
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+MC supports passing the information in ``.linkonce`` at the end of
+``.section``. For example,  these two codes are equivalent
+
+.. code-block:: gas
+
+  .section secName, "dr", discard, "Symbol1"
+  .globl Symbol1
+  Symbol1:
+  .long 1
+
+.. code-block:: gas
+
+  .section secName, "dr"
+  .linkonce discard
+  .globl Symbol1
+  Symbol1:
+  .long 1
+
+Note that in the combined form the COMDAT symbol is explict. This
+extension exits to support multiple sections with the same name in
+different comdats:
+
+
+.. code-block:: gas
+
+  .section secName, "dr", discard, "Symbol1"
+  .globl Symbol1
+  Symbol1:
+  .long 1
+
+  .section secName, "dr", discard, "Symbol2"
+  .globl Symbol2
+  Symbol2:
+  .long 1
diff --git a/docs/GarbageCollection.rst b/docs/GarbageCollection.rst
index b722277..73bc5ee 100644
--- a/docs/GarbageCollection.rst
+++ b/docs/GarbageCollection.rst
@@ -523,7 +523,7 @@ extension):
   $ cat sample.ll
   define void @f() gc "mygc" {
   entry:
-          ret void
+    ret void
   }
   $ llvm-as < sample.ll | llc -load=MyGC.so
 
@@ -896,21 +896,19 @@ in the JIT, nor using the object writers.
   namespace {
     class LLVM_LIBRARY_VISIBILITY MyGCPrinter : public GCMetadataPrinter {
     public:
-      virtual void beginAssembly(std::ostream &OS, AsmPrinter &AP,
-                                 const TargetAsmInfo &TAI);
+      virtual void beginAssembly(AsmPrinter &AP);
 
-      virtual void finishAssembly(std::ostream &OS, AsmPrinter &AP,
-                                  const TargetAsmInfo &TAI);
+      virtual void finishAssembly(AsmPrinter &AP);
     };
 
     GCMetadataPrinterRegistry::Add<MyGCPrinter>
     X("mygc", "My bespoke garbage collector.");
   }
 
-The collector should use ``AsmPrinter`` and ``TargetAsmInfo`` to print portable
-assembly code to the ``std::ostream``.  The collector itself contains the stack
-map for the entire module, and may access the ``GCFunctionInfo`` using its own
-``begin()`` and ``end()`` methods.  Here's a realistic example:
+The collector should use ``AsmPrinter`` to print portable assembly code.  The
+collector itself contains the stack map for the entire module, and may access
+the ``GCFunctionInfo`` using its own ``begin()`` and ``end()`` methods.  Here's
+a realistic example:
 
 .. code-block:: c++
 
@@ -920,85 +918,74 @@ map for the entire module, and may access the ``GCFunctionInfo`` using its own
   #include "llvm/Target/TargetAsmInfo.h"
   #include "llvm/Target/TargetMachine.h"
 
-  void MyGCPrinter::beginAssembly(std::ostream &OS, AsmPrinter &AP,
-                                  const TargetAsmInfo &TAI) {
+  void MyGCPrinter::beginAssembly(AsmPrinter &AP) {
     // Nothing to do.
   }
 
-  void MyGCPrinter::finishAssembly(std::ostream &OS, AsmPrinter &AP,
-                                   const TargetAsmInfo &TAI) {
-    // Set up for emitting addresses.
-    const char *AddressDirective;
-    int AddressAlignLog;
-    if (AP.TM.getDataLayout()->getPointerSize() == sizeof(int32_t)) {
-      AddressDirective = TAI.getData32bitsDirective();
-      AddressAlignLog = 2;
-    } else {
-      AddressDirective = TAI.getData64bitsDirective();
-      AddressAlignLog = 3;
-    }
+  void MyGCPrinter::finishAssembly(AsmPrinter &AP) {
+    MCStreamer &OS = AP.OutStreamer;
+    unsigned IntPtrSize = AP.TM.getDataLayout()->getPointerSize();
 
     // Put this in the data section.
-    AP.SwitchToDataSection(TAI.getDataSection());
+    OS.SwitchSection(AP.getObjFileLowering().getDataSection());
 
     // For each function...
     for (iterator FI = begin(), FE = end(); FI != FE; ++FI) {
       GCFunctionInfo &MD = **FI;
 
-      // Emit this data structure:
+      // A compact GC layout. Emit this data structure:
       //
       // struct {
       //   int32_t PointCount;
-      //   struct {
-      //     void *SafePointAddress;
-      //     int32_t LiveCount;
-      //     int32_t LiveOffsets[LiveCount];
-      //   } Points[PointCount];
+      //   void *SafePointAddress[PointCount];
+      //   int32_t StackFrameSize; // in words
+      //   int32_t StackArity;
+      //   int32_t LiveCount;
+      //   int32_t LiveOffsets[LiveCount];
       // } __gcmap_<FUNCTIONNAME>;
 
       // Align to address width.
-      AP.EmitAlignment(AddressAlignLog);
-
-      // Emit the symbol by which the stack map entry can be found.
-      std::string Symbol;
-      Symbol += TAI.getGlobalPrefix();
-      Symbol += "__gcmap_";
-      Symbol += MD.getFunction().getName();
-      if (const char *GlobalDirective = TAI.getGlobalDirective())
-        OS << GlobalDirective << Symbol << "\n";
-      OS << TAI.getGlobalPrefix() << Symbol << ":\n";
+      AP.EmitAlignment(IntPtrSize == 4 ? 2 : 3);
 
       // Emit PointCount.
+      OS.AddComment("safe point count");
       AP.EmitInt32(MD.size());
-      AP.EOL("safe point count");
 
       // And each safe point...
       for (GCFunctionInfo::iterator PI = MD.begin(),
-                                       PE = MD.end(); PI != PE; ++PI) {
-        // Align to address width.
-        AP.EmitAlignment(AddressAlignLog);
-
+                                    PE = MD.end(); PI != PE; ++PI) {
         // Emit the address of the safe point.
-        OS << AddressDirective
-           << TAI.getPrivateGlobalPrefix() << "label" << PI->Num;
-        AP.EOL("safe point address");
-
-        // Emit the stack frame size.
-        AP.EmitInt32(MD.getFrameSize());
-        AP.EOL("stack frame size");
-
-        // Emit the number of live roots in the function.
-        AP.EmitInt32(MD.live_size(PI));
-        AP.EOL("live root count");
-
-        // And for each live root...
-        for (GCFunctionInfo::live_iterator LI = MD.live_begin(PI),
-                                           LE = MD.live_end(PI);
-                                           LI != LE; ++LI) {
-          // Print its offset within the stack frame.
-          AP.EmitInt32(LI->StackOffset);
-          AP.EOL("stack offset");
-        }
+        OS.AddComment("safe point address");
+        MCSymbol *Label = PI->Label;
+        AP.EmitLabelPlusOffset(Label/*Hi*/, 0/*Offset*/, 4/*Size*/);
+      }
+
+      // Stack information never change in safe points! Only print info from the
+      // first call-site.
+      GCFunctionInfo::iterator PI = MD.begin();
+
+      // Emit the stack frame size.
+      OS.AddComment("stack frame size (in words)");
+      AP.EmitInt32(MD.getFrameSize() / IntPtrSize);
+
+      // Emit stack arity, i.e. the number of stacked arguments.
+      unsigned RegisteredArgs = IntPtrSize == 4 ? 5 : 6;
+      unsigned StackArity = MD.getFunction().arg_size() > RegisteredArgs ?
+                            MD.getFunction().arg_size() - RegisteredArgs : 0;
+      OS.AddComment("stack arity");
+      AP.EmitInt32(StackArity);
+
+      // Emit the number of live roots in the function.
+      OS.AddComment("live root count");
+      AP.EmitInt32(MD.live_size(PI));
+
+      // And for each live root...
+      for (GCFunctionInfo::live_iterator LI = MD.live_begin(PI),
+                                         LE = MD.live_end(PI);
+                                         LI != LE; ++LI) {
+        // Emit live root's offset within the stack frame.
+        OS.AddComment("stack index (offset / wordsize)");
+        AP.EmitInt32(LI->StackOffset);
       }
     }
   }
@@ -1026,4 +1013,3 @@ programming.
 
 [Henderson2002] `Accurate Garbage Collection in an Uncooperative Environment
 <http://citeseer.ist.psu.edu/henderson02accurate.html>`__
-
diff --git a/docs/GettingStarted.rst b/docs/GettingStarted.rst
index 40dfc45..2a6b637 100644
--- a/docs/GettingStarted.rst
+++ b/docs/GettingStarted.rst
@@ -217,9 +217,7 @@ uses the package and provides other details.
 +--------------------------------------------------------------+-----------------+---------------------------------------------+
 | `SVN <http://subversion.tigris.org/project_packages.html>`_  | >=1.3           | Subversion access to LLVM\ :sup:`2`         |
 +--------------------------------------------------------------+-----------------+---------------------------------------------+
-| `python <http://www.python.org/>`_                           | >=2.4           | Automated test suite\ :sup:`3`              |
-+--------------------------------------------------------------+-----------------+---------------------------------------------+
-| `perl <http://www.perl.com/download.csp>`_                   | >=5.6.0         | Utilities                                   |
+| `python <http://www.python.org/>`_                           | >=2.5           | Automated test suite\ :sup:`3`              |
 +--------------------------------------------------------------+-----------------+---------------------------------------------+
 | `GNU M4 <http://savannah.gnu.org/projects/m4>`_              | 1.4             | Macro processor for configuration\ :sup:`4` |
 +--------------------------------------------------------------+-----------------+---------------------------------------------+
@@ -459,15 +457,6 @@ The files are as follows, with *x.y* marking the version number:
 
   Source release for the LLVM test-suite.
 
-``llvm-gcc-4.2-x.y.source.tar.gz``
-
-  Source release of the llvm-gcc-4.2 front end.  See README.LLVM in the root
-  directory for build instructions.
-
-``llvm-gcc-4.2-x.y-platform.tar.gz``
-
-  Binary release of the llvm-gcc-4.2 front end for a specific platform.
-
 .. _checkout:
 
 Checkout LLVM from Subversion
@@ -490,6 +479,8 @@ you can checkout it from the '``tags``' directory (instead of '``trunk``'). The
 following releases are located in the following subdirectories of the '``tags``'
 directory:
 
+* Release 3.3: **RELEASE_33/final**
+* Release 3.2: **RELEASE_32/final**
 * Release 3.1: **RELEASE_31/final**
 * Release 3.0: **RELEASE_30/final**
 * Release 2.9: **RELEASE_29/final**
@@ -939,6 +930,10 @@ GCC compiler supports.
 The result of such a build is executables that are not runnable on on the build
 host (--build option) but can be executed on the compile host (--host option).
 
+Check :doc:`HowToCrossCompileLLVM` and `Clang docs on how to cross-compile in general
+<http://clang.llvm.org/docs/CrossCompilation.html>`_ for more information
+about cross-compiling.
+
 The Location of LLVM Object Files
 ---------------------------------
 
@@ -1312,7 +1307,7 @@ Example with clang
      Clang works just like GCC by default.  The standard -S and -c arguments
      work as usual (producing a native .s or .o file, respectively).
 
-#. Next, compile the C file into a LLVM bitcode file:
+#. Next, compile the C file into an LLVM bitcode file:
 
    .. code-block:: console
 
diff --git a/docs/GettingStartedVS.rst b/docs/GettingStartedVS.rst
index 9847c83..c46dc83 100644
--- a/docs/GettingStartedVS.rst
+++ b/docs/GettingStartedVS.rst
@@ -164,7 +164,7 @@ An Example Using the LLVM Tool Chain
         return 0;
       }
 
-2. Next, compile the C file into a LLVM bitcode file:
+2. Next, compile the C file into an LLVM bitcode file:
 
    .. code-block:: bat
 
diff --git a/docs/GoldPlugin.rst b/docs/GoldPlugin.rst
index 17bbeb8..28b202a 100644
--- a/docs/GoldPlugin.rst
+++ b/docs/GoldPlugin.rst
@@ -30,29 +30,22 @@ by running ``/usr/bin/ld -plugin``. If it complains "missing argument" then
 you have plugin support. If not, such as an "unknown option" error then you
 will either need to build gold or install a version with plugin support.
 
-* To build gold with plugin support:
+* Download, configure and build gold with plugin support:
 
   .. code-block:: bash
 
-     $ mkdir binutils
-     $ cd binutils
-     $ cvs -z 9 -d :pserver:anoncvs@sourceware.org:/cvs/src login
-     {enter "anoncvs" as the password}
-     $ cvs -z 9 -d :pserver:anoncvs@sourceware.org:/cvs/src co binutils
+     $ git clone --depth 1 git://sourceware.org/git/binutils-gdb.git binutils
      $ mkdir build
      $ cd build
-     $ ../src/configure --enable-gold --enable-plugins
+     $ ../binutils/configure --enable-gold --enable-plugins --disable-werror
      $ make all-gold
 
-  That should leave you with ``binutils/build/gold/ld-new`` which supports
-  the ``-plugin`` option. It also built would have
-  ``binutils/build/binutils/ar`` and ``nm-new`` which support plugins but
-  don't have a visible -plugin option, instead relying on the gold plugin
-  being present in ``../lib/bfd-plugins`` relative to where the binaries
-  are placed.
+  That should leave you with ``build/gold/ld-new`` which supports
+  the ``-plugin`` option. Running ``make`` will additionally build
+  ``build/binutils/ar`` and ``nm-new`` binaries supporting plugins.
 
 * Build the LLVMgold plugin: Configure LLVM with
-  ``--with-binutils-include=/path/to/binutils/src/include`` and run
+  ``--with-binutils-include=/path/to/binutils/include`` and run
   ``make``.
 
 Usage
@@ -66,17 +59,16 @@ look for the line where it runs ``collect2``. Replace that with
 ready to switch to using gold, backup your existing ``/usr/bin/ld``
 then replace it with ``ld-new``.
 
-You can produce bitcode files from ``clang`` using ``-emit-llvm`` or
-``-flto``, or the ``-O4`` flag which is synonymous with ``-O3 -flto``.
-
-Any of these flags will also cause ``clang`` to look for the gold plugin in
+You should produce bitcode files from ``clang`` with the option
+``-flto``. This flag will also cause ``clang`` to look for the gold plugin in
 the ``lib`` directory under its prefix and pass the ``-plugin`` option to
 ``ld``. It will not look for an alternate linker, which is why you need
 gold to be the installed system linker in your path.
 
-If you want ``ar`` and ``nm`` to work seamlessly as well, install
-``LLVMgold.so`` to ``/usr/lib/bfd-plugins``. If you built your own gold, be
-sure to install the ``ar`` and ``nm-new`` you built to ``/usr/bin``.
+``ar`` and ``nm`` also accept the ``-plugin`` option and it's possible to
+to install ``LLVMgold.so`` to ``/usr/lib/bfd-plugins`` for a seamless setup.
+If you built your own gold, be sure to install the ``ar`` and ``nm-new`` you
+built to ``/usr/bin``.
 
 
 Example of link time optimization
@@ -153,7 +145,6 @@ everything is in place for an easy to use LTO build of autotooled projects:
      export AR="$PREFIX/bin/ar"
      export NM="$PREFIX/bin/nm"
      export RANLIB=/bin/true #ranlib is not needed, and doesn't support .bc files in .a
-     export CFLAGS="-O4"
 
 * Or you can just set your path:
 
@@ -163,7 +154,6 @@ everything is in place for an easy to use LTO build of autotooled projects:
      export CC="clang -flto"
      export CXX="clang++ -flto"
      export RANLIB=/bin/true
-     export CFLAGS="-O4"
 * Configure and build the project as usual:
 
   .. code-block:: bash
diff --git a/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt b/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt
index a745784..521526f 100644
--- a/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt
+++ b/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt
@@ -132,6 +132,6 @@ is supposed to be cache-line-aligned, but it is not page-aligned.
 We generate instrumentation traces and optimized traces into separate
 trace caches. We keep the instrumented code around because you don't
 want to delete a trace when you still might have to return to it
-(i.e., return from a llvm_first_trigger() or countPath() call.)
+(i.e., return from an llvm_first_trigger() or countPath() call.)
 
 
diff --git a/docs/HowToBuildOnARM.rst b/docs/HowToBuildOnARM.rst
index 32ae39b..f2edaef 100644
--- a/docs/HowToBuildOnARM.rst
+++ b/docs/HowToBuildOnARM.rst
@@ -6,7 +6,11 @@ Introduction
 ============
 
 This document contains information about building/testing LLVM and
-Clang on ARM.
+Clang on an ARM machine.
+
+This document is *NOT* tailored to help you cross-compile LLVM/Clang
+to ARM on another architecture, for example an x86_64 machine. To find
+out more about cross-compiling, please check :doc:`HowToCrossCompileLLVM`.
 
 Notes On Building LLVM/Clang on ARM
 =====================================
@@ -17,19 +21,19 @@ on the ARMv6 and ARMv7 architectures and may be inapplicable to older chips.
 #. If you are building LLVM/Clang on an ARM board with 1G of memory or less,
    please use ``gold`` rather then GNU ``ld``.
    Building LLVM/Clang with ``--enable-optimized``
-   is prefered since it consumes less memory. Otherwise, the building
+   is preferred since it consumes less memory. Otherwise, the building
    process will very likely fail due to insufficient memory. In any
    case it is probably a good idea to set up a swap partition.
 
-#. If you want to run ``make
-   check-all`` after building LLVM/Clang, to avoid false alarms (eg, ARCMT
-   failure) please use at least the following configuration:
+#. If you want to run ``make check-all`` after building LLVM/Clang, to avoid
+   false alarms (e.g., ARCMT failure) please use at least the following
+   configuration:
 
    .. code-block:: bash
 
      $ ../$LLVM_SRC_DIR/configure --with-abi=aapcs-vfp
 
-#. The most popular linaro/ubuntu OS's for ARM boards, eg, the
+#. The most popular Linaro/Ubuntu OS's for ARM boards, e.g., the
    Pandaboard, have become hard-float platforms. The following set
    of configuration options appears to be a good choice for this
    platform:
@@ -41,3 +45,25 @@ on the ARMv6 and ARMv7 architectures and may be inapplicable to older chips.
      --target=armv7l-unknown-linux-gnueabihf --with-cpu=cortex-a9 \
      --with-float=hard --with-abi=aapcs-vfp --with-fpu=neon \
      --enable-targets=arm --enable-optimized --enable-assertions
+
+#. ARM development boards can be unstable and you may experience that cores
+   are disappearing, caches being flushed on every big.LITTLE switch, and
+   other similar issues.  To help ease the effect of this, set the Linux
+   scheduler to "performance" on **all** cores using this little script:
+
+   .. code-block:: bash
+
+      # The code below requires the package 'cpufrequtils' to be installed.
+      for ((cpu=0; cpu<`grep -c proc /proc/cpuinfo`; cpu++)); do
+          sudo cpufreq-set -c $cpu -g performance
+      done
+
+#. Running the build on SD cards is ok, but they are more prone to failures
+   than good quality USB sticks, and those are more prone to failures than
+   external hard-drives (those are also a lot faster). So, at least, you
+   should consider to buy a fast USB stick.  On systems with a fast eMMC,
+   that's a good option too.
+
+#. Make sure you have a decent power supply (dozens of dollars worth) that can
+   provide *at least* 4 amperes, this is especially important if you use USB
+   devices with your board.
diff --git a/docs/HowToCrossCompileLLVM.rst b/docs/HowToCrossCompileLLVM.rst
new file mode 100644
index 0000000..1072517
--- /dev/null
+++ b/docs/HowToCrossCompileLLVM.rst
@@ -0,0 +1,175 @@
+===================================================================
+How To Cross-Compile Clang/LLVM using Clang/LLVM
+===================================================================
+
+Introduction
+============
+
+This document contains information about building LLVM and
+Clang on host machine, targeting another platform.
+
+For more information on how to use Clang as a cross-compiler,
+please check http://clang.llvm.org/docs/CrossCompilation.html.
+
+TODO: Add MIPS and other platforms to this document.
+
+Cross-Compiling from x86_64 to ARM
+==================================
+
+In this use case, we'll be using CMake and Ninja, on a Debian-based Linux
+system, cross-compiling from an x86_64 host (most Intel and AMD chips
+nowadays) to a hard-float ARM target (most ARM targets nowadays).
+
+The packages you'll need are:
+
+ * ``cmake``
+ * ``ninja-build`` (from backports in Ubuntu)
+ * ``gcc-4.7-arm-linux-gnueabihf``
+ * ``gcc-4.7-multilib-arm-linux-gnueabihf``
+ * ``binutils-arm-linux-gnueabihf``
+ * ``libgcc1-armhf-cross``
+ * ``libsfgcc1-armhf-cross``
+ * ``libstdc++6-armhf-cross``
+ * ``libstdc++6-4.7-dev-armhf-cross``
+
+Configuring CMake
+-----------------
+
+For more information on how to configure CMake for LLVM/Clang,
+see :doc:`CMake`.
+
+The CMake options you need to add are:
+ * ``-DCMAKE_CROSSCOMPILING=True``
+ * ``-DCMAKE_INSTALL_PREFIX=<install-dir>``
+ * ``-DLLVM_TABLEGEN=<path-to-host-bin>/llvm-tblgen``
+ * ``-DCLANG_TABLEGEN=<path-to-host-bin>/clang-tblgen``
+ * ``-DLLVM_DEFAULT_TARGET_TRIPLE=arm-linux-gnueabihf``
+ * ``-DLLVM_TARGET_ARCH=ARM``
+ * ``-DLLVM_TARGETS_TO_BUILD=ARM``
+ * ``-DCMAKE_CXX_FLAGS='-target armv7a-linux-gnueabihf -mcpu=cortex-a9 -I/usr/arm-linux-gnueabihf/include/c++/4.7.2/arm-linux-gnueabihf/ -I/usr/arm-linux-gnueabihf/include/ -mfloat-abi=hard -ccc-gcc-name arm-linux-gnueabihf-gcc'``
+
+The TableGen options are required to compile it with the host compiler,
+so you'll need to compile LLVM (or at least ``llvm-tblgen``) to your host
+platform before you start. The CXX flags define the target, cpu (which
+defaults to ``fpu=VFP3`` with NEON), and forcing the hard-float ABI. If you're
+using Clang as a cross-compiler, you will *also* have to set ``-ccc-gcc-name``,
+to make sure it picks the correct linker.
+
+Most of the time, what you want is to have a native compiler to the
+platform itself, but not others. It might not even be feasible to
+produce x86 binaries from ARM targets, so there's no point in compiling
+all back-ends. For that reason, you should also set the
+``TARGETS_TO_BUILD`` to only build the ARM back-end.
+
+You must set the ``CMAKE_INSTALL_PREFIX``, otherwise a ``ninja install``
+will copy ARM binaries to your root filesystem, which is not what you
+want.
+
+Hacks
+-----
+
+There are some bugs in current LLVM, which require some fiddling before
+running CMake:
+
+#. If you're using Clang as the cross-compiler, there is a problem in
+   the LLVM ARM back-end that is producing absolute relocations on
+   position-independent code (``R_ARM_THM_MOVW_ABS_NC``), so for now, you
+   should disable PIC:
+
+   .. code-block:: bash
+
+      -DLLVM_ENABLE_PIC=False
+
+   This is not a problem, since Clang/LLVM libraries are statically
+   linked anyway, it shouldn't affect much.
+
+#. The ARM libraries won't be installed in your system, and possibly
+   not easily installable anyway, so you'll have to build/download
+   them separately. But the CMake prepare step, which checks for
+   dependencies, will check the *host* libraries, not the *target*
+   ones.
+
+   A quick way of getting the libraries is to download them from
+   a distribution repository, like Debian (http://packages.debian.org/wheezy/),
+   and download the missing libraries. Note that the ``libXXX``
+   will have the shared objects (``.so``) and the ``libXXX-dev`` will
+   give you the headers and the static (``.a``) library. Just in
+   case, download both.
+
+   The ones you need for ARM are: ``libtinfo``, ``zlib1g``,
+   ``libxml2`` and ``liblzma``. In the Debian repository you'll
+   find downloads for all architectures.
+
+   After you download and unpack all ``.deb`` packages, copy all
+   ``.so`` and ``.a`` to a directory, make the appropriate
+   symbolic links (if necessary), and add the relevant ``-L``
+   and ``-I`` paths to ``-DCMAKE_CXX_FLAGS`` above.
+
+
+Running CMake and Building
+--------------------------
+
+Finally, if you're using your platform compiler, run:
+
+   .. code-block:: bash
+
+     $ cmake -G Ninja <source-dir> <options above>
+
+If you're using Clang as the cross-compiler, run:
+
+   .. code-block:: bash
+
+     $ CC='clang' CXX='clang++' cmake -G Ninja <source-dir> <options above>
+
+If you have ``clang``/``clang++`` on the path, it should just work, and special
+Ninja files will be created in the build directory. I strongly suggest
+you to run ``cmake`` on a separate build directory, *not* inside the
+source tree.
+
+To build, simply type:
+
+   .. code-block:: bash
+
+     $ ninja
+
+It should automatically find out how many cores you have, what are
+the rules that needs building and will build the whole thing.
+
+You can't run ``ninja check-all`` on this tree because the created
+binaries are targeted to ARM, not x86_64.
+
+Installing and Using
+--------------------
+
+After the LLVM/Clang has built successfully, you should install it
+via:
+
+   .. code-block:: bash
+
+     $ ninja install
+
+which will create a sysroot on the install-dir. You can then tar
+that directory into a binary with the full triple name (for easy
+identification), like:
+
+   .. code-block:: bash
+
+     $ ln -sf <install-dir> arm-linux-gnueabihf-clang
+     $ tar zchf arm-linux-gnueabihf-clang.tar.gz arm-linux-gnueabihf-clang
+
+If you copy that tarball to your target board, you'll be able to use
+it for running the test-suite, for example. Follow the guidelines at
+http://llvm.org/docs/lnt/quickstart.html, unpack the tarball in the
+test directory, and use options:
+
+   .. code-block:: bash
+
+     $ ./sandbox/bin/python sandbox/bin/lnt runtest nt \
+         --sandbox sandbox \
+         --test-suite `pwd`/test-suite \
+         --cc `pwd`/arm-linux-gnueabihf-clang/bin/clang \
+         --cxx `pwd`/arm-linux-gnueabihf-clang/bin/clang++
+
+Remember to add the ``-jN`` options to ``lnt`` to the number of CPUs
+on your board. Also, the path to your clang has to be absolute, so
+you'll need the `pwd` trick above.
diff --git a/docs/LangRef.rst b/docs/LangRef.rst
index b69e2a3..810455c 100644
--- a/docs/LangRef.rst
+++ b/docs/LangRef.rst
@@ -128,7 +128,9 @@ lexical features of LLVM:
 #. Unnamed temporaries are created when the result of a computation is
    not assigned to a named value.
 #. Unnamed temporaries are numbered sequentially (using a per-function
-   incrementing counter, starting with 0).
+   incrementing counter, starting with 0). Note that basic blocks are
+   included in this numbering. For example, if the entry basic block is not
+   given a label name, then it will get number 0.
 
 It also shows a convention that we follow in this document. When
 demonstrating instructions, we will follow an instruction with a comment
@@ -267,13 +269,6 @@ linkage:
     ``linkonce_odr`` and ``weak_odr`` linkage types to indicate that the
     global will only be merged with equivalent globals. These linkage
     types are otherwise the same as their non-``odr`` versions.
-``linkonce_odr_auto_hide``
-    Similar to "``linkonce_odr``", but nothing in the translation unit
-    takes the address of this definition. For instance, functions that
-    had an inline definition, but the compiler decided not to inline it.
-    ``linkonce_odr_auto_hide`` may have only ``default`` visibility. The
-    symbols are removed by the linker from the final linked image
-    (executable or dynamic library).
 ``external``
     If none of the above identifiers are used, the global is externally
     visible, meaning that it participates in linkage and can be used to
@@ -305,9 +300,6 @@ declarations), they are accessible outside of the current module.
 It is illegal for a function *declaration* to have any linkage type
 other than ``external``, ``dllimport`` or ``extern_weak``.
 
-Aliases can have only ``external``, ``internal``, ``weak`` or
-``weak_odr`` linkages.
-
 .. _callingconv:
 
 Calling Conventions
@@ -448,9 +440,13 @@ Global Variables
 ----------------
 
 Global variables define regions of memory allocated at compilation time
-instead of run-time. Global variables may optionally be initialized, may
-have an explicit section to be placed in, and may have an optional
-explicit alignment specified.
+instead of run-time.
+
+Global variables definitions must be initialized, may have an explicit section
+to be placed in, and may have an optional explicit alignment specified.
+
+Global variables in other translation units can also be declared, in which
+case they don't have an initializer.
 
 A variable may be defined as ``thread_local``, which means that it will
 not be shared by threads (each thread will have a separated copy of the
@@ -532,6 +528,12 @@ with an initializer, section, and alignment:
 
     @G = addrspace(5) constant float 1.0, section "foo", align 4
 
+The following example just declares a global variable
+
+.. code-block:: llvm
+
+   @G = external global i32
+
 The following example defines a thread-local global with the
 ``initialexec`` TLS model:
 
@@ -552,27 +554,26 @@ an optional ``unnamed_addr`` attribute, a return type, an optional
 name, a (possibly empty) argument list (each with optional :ref:`parameter
 attributes <paramattrs>`), optional :ref:`function attributes <fnattrs>`,
 an optional section, an optional alignment, an optional :ref:`garbage
-collector name <gc>`, an opening curly brace, a list of basic blocks,
-and a closing curly brace.
+collector name <gc>`, an optional :ref:`prefix <prefixdata>`, an opening
+curly brace, a list of basic blocks, and a closing curly brace.
 
 LLVM function declarations consist of the "``declare``" keyword, an
 optional :ref:`linkage type <linkage>`, an optional :ref:`visibility
 style <visibility>`, an optional :ref:`calling convention <callingconv>`,
 an optional ``unnamed_addr`` attribute, a return type, an optional
 :ref:`parameter attribute <paramattrs>` for the return type, a function
-name, a possibly empty list of arguments, an optional alignment, and an
-optional :ref:`garbage collector name <gc>`.
-
-A function definition contains a list of basic blocks, forming the CFG
-(Control Flow Graph) for the function. Each basic block may optionally
-start with a label (giving the basic block a symbol table entry),
-contains a list of instructions, and ends with a
-:ref:`terminator <terminators>` instruction (such as a branch or function
-return). If explicit label is not provided, a block is assigned an
-implicit numbered label, using a next value from the same counter as used
-for unnamed temporaries (:ref:`see above<identifiers>`). For example, if a
-function entry block does not have explicit label, it will be assigned
-label "%0", then first unnamed temporary in that block will be "%1", etc.
+name, a possibly empty list of arguments, an optional alignment, an optional
+:ref:`garbage collector name <gc>` and an optional :ref:`prefix <prefixdata>`.
+
+A function definition contains a list of basic blocks, forming the CFG (Control
+Flow Graph) for the function. Each basic block may optionally start with a label
+(giving the basic block a symbol table entry), contains a list of instructions,
+and ends with a :ref:`terminator <terminators>` instruction (such as a branch or
+function return). If an explicit label is not provided, a block is assigned an
+implicit numbered label, using the next value from the same counter as used for
+unnamed temporaries (:ref:`see above<identifiers>`). For example, if a function
+entry block does not have an explicit label, it will be assigned label "%0",
+then the first unnamed temporary in that block will be "%1", etc.
 
 The first basic block in a function is special in two ways: it is
 immediately executed on entrance to the function, and it is not allowed
@@ -598,7 +599,7 @@ Syntax::
            [cconv] [ret attrs]
            <ResultType> @<FunctionName> ([argument list])
            [fn Attrs] [section "name"] [align N]
-           [gc] { ... }
+           [gc] [prefix Constant] { ... }
 
 .. _langref_aliases:
 
@@ -614,6 +615,12 @@ Syntax::
 
     @<Name> = alias [Linkage] [Visibility] <AliaseeTy> @<Aliasee>
 
+The linkage must be one of ``private``, ``linker_private``,
+``linker_private_weak``, ``internal``, ``linkonce``, ``weak``,
+``linkonce_odr``, ``weak_odr``, ``external``. Note that some system linkers
+might not correctly handle dropping a weak symbol that is aliased by a non weak
+alias.
+
 .. _namedmetadatastructure:
 
 Named Metadata
@@ -757,6 +764,55 @@ The compiler declares the supported values of *name*. Specifying a
 collector which will cause the compiler to alter its output in order to
 support the named garbage collection algorithm.
 
+.. _prefixdata:
+
+Prefix Data
+-----------
+
+Prefix data is data associated with a function which the code generator
+will emit immediately before the function body.  The purpose of this feature
+is to allow frontends to associate language-specific runtime metadata with
+specific functions and make it available through the function pointer while
+still allowing the function pointer to be called.  To access the data for a
+given function, a program may bitcast the function pointer to a pointer to
+the constant's type.  This implies that the IR symbol points to the start
+of the prefix data.
+
+To maintain the semantics of ordinary function calls, the prefix data must
+have a particular format.  Specifically, it must begin with a sequence of
+bytes which decode to a sequence of machine instructions, valid for the
+module's target, which transfer control to the point immediately succeeding
+the prefix data, without performing any other visible action.  This allows
+the inliner and other passes to reason about the semantics of the function
+definition without needing to reason about the prefix data.  Obviously this
+makes the format of the prefix data highly target dependent.
+
+Prefix data is laid out as if it were an initializer for a global variable
+of the prefix data's type.  No padding is automatically placed between the
+prefix data and the function body.  If padding is required, it must be part
+of the prefix data.
+
+A trivial example of valid prefix data for the x86 architecture is ``i8 144``,
+which encodes the ``nop`` instruction:
+
+.. code-block:: llvm
+
+    define void @f() prefix i8 144 { ... }
+
+Generally prefix data can be formed by encoding a relative branch instruction
+which skips the metadata, as in this example of valid prefix data for the
+x86_64 architecture, where the first two bytes encode ``jmp .+10``:
+
+.. code-block:: llvm
+
+    %0 = type <{ i8, i8, i8* }>
+
+    define void @f() prefix %0 <{ i8 235, i8 8, i8* @md}> { ... }
+
+A function may have prefix data but no body.  This has similar semantics
+to the ``available_externally`` linkage in that the data may be used by the
+optimizers but will not be emitted in the object file.
+
 .. _attrgrp:
 
 Attribute Groups
@@ -833,6 +889,11 @@ example:
     inlining this function is desirable (such as the "inline" keyword in
     C/C++). It is just a hint; it imposes no requirements on the
     inliner.
+``minsize``
+    This attribute suggests that optimization passes and code generator
+    passes make choices that keep the code size of this function as small
+    as possible and perform optimizations that may sacrifice runtime
+    performance in order to minimize the size of the generated code.
 ``naked``
     This attribute disables prologue / epilogue emission for the
     function. This can have very system-specific consequences.
@@ -874,10 +935,23 @@ example:
     This function attribute indicates that the function never returns
     with an unwind or exceptional control flow. If the function does
     unwind, its runtime behavior is undefined.
+``optnone``
+    This function attribute indicates that the function is not optimized
+    by any optimization or code generator passes with the
+    exception of interprocedural optimization passes.
+    This attribute cannot be used together with the ``alwaysinline``
+    attribute; this attribute is also incompatible
+    with the ``minsize`` attribute and the ``optsize`` attribute.
+
+    This attribute requires the ``noinline`` attribute to be specified on
+    the function as well, so the function is never inlined into any caller.
+    Only functions with the ``alwaysinline`` attribute are valid
+    candidates for inlining into the body of this function.
 ``optsize``
     This attribute suggests that optimization passes and code generator
     passes make choices that keep the code size of this function low,
-    and otherwise do optimizations specifically to reduce code size.
+    and otherwise do optimizations specifically to reduce code size as
+    long as they do not significantly impact runtime performance.
 ``readnone``
     On a function, this attribute indicates that the function computes its
     result (or decides to unwind an exception) based strictly on its arguments,
@@ -887,7 +961,7 @@ example:
     (including ``byval`` arguments) and never changes any state visible
     to callers. This means that it cannot unwind exceptions by calling
     the ``C++`` exception throwing methods.
-    
+
     On an argument, this attribute indicates that the function does not
     dereference that pointer argument, even though it may read or write the
     memory that the pointer points to if accessed through other pointers.
@@ -901,7 +975,7 @@ example:
     called with the same set of arguments and global state. It cannot
     unwind an exception by calling the ``C++`` exception throwing
     methods.
-    
+
     On an argument, this attribute indicates that the function does not write
     through this pointer argument, even though it may write to the memory that
     the pointer points to.
@@ -1109,6 +1183,30 @@ don't have to specify the string. This will disable some optimizations
 that require precise layout information, but this also prevents those
 optimizations from introducing target specificity into the IR.
 
+.. _langref_triple:
+
+Target Triple
+-------------
+
+A module may specify a target triple string that describes the target
+host. The syntax for the target triple is simply:
+
+.. code-block:: llvm
+
+    target triple = "x86_64-apple-macosx10.7.0"
+
+The *target triple* string consists of a series of identifiers delimited
+by the minus sign character ('-'). The canonical forms are:
+
+::
+
+    ARCHITECTURE-VENDOR-OPERATING_SYSTEM
+    ARCHITECTURE-VENDOR-OPERATING_SYSTEM-ENVIRONMENT
+
+This information is passed along to the backend so that it generates
+code for the proper architecture. It's possible to override this on the
+command line with the ``-mtriple`` command line option.
+
 .. _pointeraliasing:
 
 Pointer Aliasing Rules
@@ -1659,9 +1757,10 @@ Function Type
 Overview:
 """""""""
 
-The function type can be thought of as a function signature. It consists
-of a return type and a list of formal parameter types. The return type
-of a function type is a first class type or a void type.
+The function type can be thought of as a function signature. It consists of a
+return type and a list of formal parameter types. The return type of a function
+type is a void type or first class type --- except for :ref:`label <t_label>`
+and :ref:`metadata <t_metadata>` types.
 
 Syntax:
 """""""
@@ -1671,11 +1770,11 @@ Syntax:
       <returntype> (<parameter list>)
 
 ...where '``<parameter list>``' is a comma-separated list of type
-specifiers. Optionally, the parameter list may include a type ``...``,
-which indicates that the function takes a variable number of arguments.
-Variable argument functions can access their arguments with the
-:ref:`variable argument handling intrinsic <int_varargs>` functions.
-'``<returntype>``' is any type except :ref:`label <t_label>`.
+specifiers. Optionally, the parameter list may include a type ``...``, which
+indicates that the function takes a variable number of arguments.  Variable
+argument functions can access their arguments with the :ref:`variable argument
+handling intrinsic <int_varargs>` functions.  '``<returntype>``' is any type
+except :ref:`label <t_label>` and :ref:`metadata <t_metadata>`.
 
 Examples:
 """""""""
@@ -2286,6 +2385,10 @@ The following is the syntax for constant expressions:
     Convert a constant, CST, to another TYPE. The constraints of the
     operands are the same as those for the :ref:`bitcast
     instruction <i_bitcast>`.
+``addrspacecast (CST to TYPE)``
+    Convert a constant pointer or constant vector of pointer, CST, to another
+    TYPE in a different address space. The constraints of the operands are the
+    same as those for the :ref:`addrspacecast instruction <i_addrspacecast>`.
 ``getelementptr (CSTPTR, IDX0, IDX1, ...)``, ``getelementptr inbounds (CSTPTR, IDX0, IDX1, ...)``
     Perform the :ref:`getelementptr operation <i_getelementptr>` on
     constants. As with the :ref:`getelementptr <i_getelementptr>`
@@ -5630,9 +5733,9 @@ is always a *no-op cast* because no bits change with this
 conversion. The conversion is done as if the ``value`` had been stored
 to memory and read back as type ``ty2``. Pointer (or vector of
 pointers) types may only be converted to other pointer (or vector of
-pointers) types with this instruction if the pointer sizes are
-equal. To convert pointers to other types, use the :ref:`inttoptr
-<i_inttoptr>` or :ref:`ptrtoint <i_ptrtoint>` instructions first.
+pointers) types with the same address space through this instruction.
+To convert pointers to other types, use the :ref:`inttoptr <i_inttoptr>`
+or :ref:`ptrtoint <i_ptrtoint>` instructions first.
 
 Example:
 """"""""
@@ -5644,6 +5747,51 @@ Example:
       %Z = bitcast <2 x int> %V to i64;        ; yields i64: %V
       %Z = bitcast <2 x i32*> %V to <2 x i64*> ; yields <2 x i64*>
 
+.. _i_addrspacecast:
+
+'``addrspacecast .. to``' Instruction
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      <result> = addrspacecast <pty> <ptrval> to <pty2>       ; yields pty2
+
+Overview:
+"""""""""
+
+The '``addrspacecast``' instruction converts ``ptrval`` from ``pty`` in
+address space ``n`` to type ``pty2`` in address space ``m``.
+
+Arguments:
+""""""""""
+
+The '``addrspacecast``' instruction takes a pointer or vector of pointer value
+to cast and a pointer type to cast it to, which must have a different
+address space.
+
+Semantics:
+""""""""""
+
+The '``addrspacecast``' instruction converts the pointer value
+``ptrval`` to type ``pty2``. It can be a *no-op cast* or a complex
+value modification, depending on the target and the address space
+pair. Pointer conversions within the same address space must be
+performed with the ``bitcast`` instruction. Note that if the address space
+conversion is legal then both result and operand refer to the same memory
+location.
+
+Example:
+""""""""
+
+.. code-block:: llvm
+
+      %X = addrspacecast i32* %x to i32 addrspace(1)*    ; yields i32 addrspace(1)*:%x
+      %Y = addrspacecast i32 addrspace(1)* %y to i64 addrspace(2)*    ; yields i64 addrspace(2)*:%y
+      %Z = addrspacecast <4 x i32*> %z to <4 x float addrspace(3)*>   ; yields <4 x float addrspace(3)*>:%z
+
 .. _otherops:
 
 Other Operations
@@ -6293,7 +6441,7 @@ Syntax:
 
 ::
 
-      declare void %llvm.va_start(i8* <arglist>)
+      declare void @llvm.va_start(i8* <arglist>)
 
 Overview:
 """""""""
@@ -6805,7 +6953,7 @@ The '``llvm.memcpy.*``' intrinsics copy a block of memory from the
 source location to the destination location, which are not allowed to
 overlap. It copies "len" bytes of memory over. If the argument is known
 to be aligned to some boundary, this can be specified as the fourth
-argument, otherwise it should be set to 0 or 1.
+argument, otherwise it should be set to 0 or 1 (both meaning no alignment).
 
 '``llvm.memmove``' Intrinsic
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -6860,7 +7008,7 @@ The '``llvm.memmove.*``' intrinsics copy a block of memory from the
 source location to the destination location, which may overlap. It
 copies "len" bytes of memory over. If the argument is known to be
 aligned to some boundary, this can be specified as the fourth argument,
-otherwise it should be set to 0 or 1.
+otherwise it should be set to 0 or 1 (both meaning no alignment).
 
 '``llvm.memset.*``' Intrinsics
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -6911,7 +7059,7 @@ Semantics:
 The '``llvm.memset.*``' intrinsics fill "len" bytes of memory starting
 at the destination location. If the argument is known to be aligned to
 some boundary, this can be specified as the fourth argument, otherwise
-it should be set to 0 or 1.
+it should be set to 0 or 1 (both meaning no alignment).
 
 '``llvm.sqrt.*``' Intrinsic
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -7347,6 +7495,42 @@ Semantics:
 This function returns the same values as the libm ``fabs`` functions
 would, and handles error conditions in the same way.
 
+'``llvm.copysign.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+This is an overloaded intrinsic. You can use ``llvm.copysign`` on any
+floating point or vector of floating point type. Not all targets support
+all types however.
+
+::
+
+      declare float     @llvm.copysign.f32(float  %Mag, float  %Sgn)
+      declare double    @llvm.copysign.f64(double %Mag, double %Sgn)
+      declare x86_fp80  @llvm.copysign.f80(x86_fp80  %Mag, x86_fp80  %Sgn)
+      declare fp128     @llvm.copysign.f128(fp128 %Mag, fp128 %Sgn)
+      declare ppc_fp128 @llvm.copysign.ppcf128(ppc_fp128  %Mag, ppc_fp128  %Sgn)
+
+Overview:
+"""""""""
+
+The '``llvm.copysign.*``' intrinsics return a value with the magnitude of the
+first operand and the sign of the second operand.
+
+Arguments:
+""""""""""
+
+The arguments and return value are floating point numbers of the same
+type.
+
+Semantics:
+""""""""""
+
+This function returns the same values as the libm ``copysign``
+functions would, and handles error conditions in the same way.
+
 '``llvm.floor.*``' Intrinsic
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -7526,6 +7710,42 @@ Semantics:
 This function returns the same values as the libm ``nearbyint``
 functions would, and handles error conditions in the same way.
 
+'``llvm.round.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+This is an overloaded intrinsic. You can use ``llvm.round`` on any
+floating point or vector of floating point type. Not all targets support
+all types however.
+
+::
+
+      declare float     @llvm.round.f32(float  %Val)
+      declare double    @llvm.round.f64(double %Val)
+      declare x86_fp80  @llvm.round.f80(x86_fp80  %Val)
+      declare fp128     @llvm.round.f128(fp128 %Val)
+      declare ppc_fp128 @llvm.round.ppcf128(ppc_fp128  %Val)
+
+Overview:
+"""""""""
+
+The '``llvm.round.*``' intrinsics returns the operand rounded to the
+nearest integer.
+
+Arguments:
+""""""""""
+
+The argument and return value are floating point numbers of the same
+type.
+
+Semantics:
+""""""""""
+
+This function returns the same values as the libm ``round``
+functions would, and handles error conditions in the same way.
+
 Bit Manipulation Intrinsics
 ---------------------------
 
@@ -8599,14 +8819,52 @@ enough space to hold the value of the guard.
 Semantics:
 """"""""""
 
-This intrinsic causes the prologue/epilogue inserter to force the
-position of the ``AllocaInst`` stack slot to be before local variables
-on the stack. This is to ensure that if a local variable on the stack is
-overwritten, it will destroy the value of the guard. When the function
-exits, the guard on the stack is checked against the original guard. If
-they are different, then the program aborts by calling the
+This intrinsic causes the prologue/epilogue inserter to force the position of
+the ``AllocaInst`` stack slot to be before local variables on the stack. This is
+to ensure that if a local variable on the stack is overwritten, it will destroy
+the value of the guard. When the function exits, the guard on the stack is
+checked against the original guard by ``llvm.stackprotectorcheck``. If they are
+different, then ``llvm.stackprotectorcheck`` causes the program to abort by
+calling the ``__stack_chk_fail()`` function.
+
+'``llvm.stackprotectorcheck``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      declare void @llvm.stackprotectorcheck(i8** <guard>)
+
+Overview:
+"""""""""
+
+The ``llvm.stackprotectorcheck`` intrinsic compares ``guard`` against an already
+created stack protector and if they are not equal calls the
 ``__stack_chk_fail()`` function.
 
+Arguments:
+""""""""""
+
+The ``llvm.stackprotectorcheck`` intrinsic requires one pointer argument, the
+the variable ``@__stack_chk_guard``.
+
+Semantics:
+""""""""""
+
+This intrinsic is provided to perform the stack protector check by comparing
+``guard`` with the stack slot created by ``llvm.stackprotector`` and if the
+values do not match call the ``__stack_chk_fail()`` function.
+
+The reason to provide this as an IR level intrinsic instead of implementing it
+via other IR operations is that in order to perform this operation at the IR
+level without an intrinsic, one would need to create additional basic blocks to
+handle the success/failure cases. This makes it difficult to stop the stack
+protector check from disrupting sibling tail calls in Codegen. With this
+intrinsic, we are able to generate the stack protector basic blocks late in
+codegen after the tail call decision has occurred.
+
 '``llvm.objectsize``' Intrinsic
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/docs/MCJIT-creation.png b/docs/MCJIT-creation.png
new file mode 100644
index 0000000..7abdb9d
--- /dev/null
+++ b/docs/MCJIT-creation.png
diff --git a/docs/MCJIT-dyld-load.png b/docs/MCJIT-dyld-load.png
new file mode 100644
index 0000000..1534190
--- /dev/null
+++ b/docs/MCJIT-dyld-load.png
diff --git a/docs/MCJIT-engine-builder.png b/docs/MCJIT-engine-builder.png
new file mode 100644
index 0000000..8fdd7a9
--- /dev/null
+++ b/docs/MCJIT-engine-builder.png
diff --git a/docs/MCJIT-load-object.png b/docs/MCJIT-load-object.png
new file mode 100644
index 0000000..533a52e
--- /dev/null
+++ b/docs/MCJIT-load-object.png
diff --git a/docs/MCJIT-load.png b/docs/MCJIT-load.png
new file mode 100644
index 0000000..9672a66
--- /dev/null
+++ b/docs/MCJIT-load.png
diff --git a/docs/MCJIT-resolve-relocations.png b/docs/MCJIT-resolve-relocations.png
new file mode 100644
index 0000000..fedeacc
--- /dev/null
+++ b/docs/MCJIT-resolve-relocations.png
diff --git a/docs/MCJITDesignAndImplementation.rst b/docs/MCJITDesignAndImplementation.rst
new file mode 100644
index 0000000..2cb6296
--- /dev/null
+++ b/docs/MCJITDesignAndImplementation.rst
@@ -0,0 +1,180 @@
+===============================
+MCJIT Design and Implementation
+===============================
+
+Introduction
+============
+
+This document describes the internal workings of the MCJIT execution
+engine and the RuntimeDyld component.  It is intended as a high level
+overview of the implementation, showing the flow and interactions of
+objects throughout the code generation and dynamic loading process.
+
+Engine Creation
+===============
+
+In most cases, an EngineBuilder object is used to create an instance of
+the MCJIT execution engine.  The EngineBuilder takes an llvm::Module
+object as an argument to its constructor.  The client may then set various
+options that we control the later be passed along to the MCJIT engine,
+including the selection of MCJIT as the engine type to be created.
+Of particular interest is the EngineBuilder::setMCJITMemoryManager
+function.  If the client does not explicitly create a memory manager at
+this time, a default memory manager (specifically SectionMemoryManager)
+will be created when the MCJIT engine is instantiated.
+
+Once the options have been set, a client calls EngineBuilder::create to
+create an instance of the MCJIT engine.  If the client does not use the
+form of this function that takes a TargetMachine as a parameter, a new
+TargetMachine will be created based on the target triple associated with
+the Module that was used to create the EngineBuilder.
+
+.. image:: MCJIT-engine-builder.png
+ 
+EngineBuilder::create will call the static MCJIT::createJIT function,
+passing in its pointers to the module, memory manager and target machine
+objects, all of which will subsequently be owned by the MCJIT object.
+
+The MCJIT class has a member variable, Dyld, which contains an instance of
+the RuntimeDyld wrapper class.  This member will be used for
+communications between MCJIT and the actual RuntimeDyldImpl object that
+gets created when an object is loaded.
+
+.. image:: MCJIT-creation.png
+ 
+Upon creation, MCJIT holds a pointer to the Module object that it received
+from EngineBuilder but it does not immediately generate code for this
+module.  Code generation is deferred until either the
+MCJIT::finalizeObject method is called explicitly or a function such as
+MCJIT::getPointerToFunction is called which requires the code to have been
+generated.
+
+Code Generation
+===============
+
+When code generation is triggered, as described above, MCJIT will first
+attempt to retrieve an object image from its ObjectCache member, if one
+has been set.  If a cached object image cannot be retrieved, MCJIT will
+call its emitObject method.  MCJIT::emitObject uses a local PassManager
+instance and creates a new ObjectBufferStream instance, both of which it
+passes to TargetManager::addPassesToEmitMC before calling PassManager::run
+on the Module with which it was created.
+
+.. image:: MCJIT-load.png
+ 
+The PassManager::run call causes the MC code generation mechanisms to emit
+a complete relocatable binary object image (either in either ELF or MachO
+format, depending on the target) into the ObjectBufferStream object, which
+is flushed to complete the process.  If an ObjectCache is being used, the
+image will be passed to the ObjectCache here.
+
+At this point, the ObjectBufferStream contains the raw object image.
+Before the code can be executed, the code and data sections from this
+image must be loaded into suitable memory, relocations must be applied and
+memory permission and code cache invalidation (if required) must be completed.
+
+Object Loading
+==============
+
+Once an object image has been obtained, either through code generation or
+having been retrieved from an ObjectCache, it is passed to RuntimeDyld to
+be loaded.  The RuntimeDyld wrapper class examines the object to determine
+its file format and creates an instance of either RuntimeDyldELF or
+RuntimeDyldMachO (both of which derive from the RuntimeDyldImpl base
+class) and calls the RuntimeDyldImpl::loadObject method to perform that
+actual loading.
+
+.. image:: MCJIT-dyld-load.png
+ 
+RuntimeDyldImpl::loadObject begins by creating an ObjectImage instance
+from the ObjectBuffer it received.  ObjectImage, which wraps the
+ObjectFile class, is a helper class which parses the binary object image
+and provides access to the information contained in the format-specific
+headers, including section, symbol and relocation information.
+
+RuntimeDyldImpl::loadObject then iterates through the symbols in the
+image.  Information about common symbols is collected for later use.  For
+each function or data symbol, the associated section is loaded into memory
+and the symbol is stored in a symbol table map data structure.  When the
+iteration is complete, a section is emitted for the common symbols.
+
+Next, RuntimeDyldImpl::loadObject iterates through the sections in the
+object image and for each section iterates through the relocations for
+that sections.  For each relocation, it calls the format-specific
+processRelocationRef method, which will examine the relocation and store
+it in one of two data structures, a section-based relocation list map and
+an external symbol relocation map.
+
+.. image:: MCJIT-load-object.png
+ 
+When RuntimeDyldImpl::loadObject returns, all of the code and data
+sections for the object will have been loaded into memory allocated by the
+memory manager and relocation information will have been prepared, but the
+relocations have not yet been applied and the generated code is still not
+ready to be executed.
+
+[Currently (as of August 2013) the MCJIT engine will immediately apply
+relocations when loadObject completes.  However, this shouldn't be
+happening.  Because the code may have been generated for a remote target,
+the client should be given a chance to re-map the section addresses before
+relocations are applied.  It is possible to apply relocations multiple
+times, but in the case where addresses are to be re-mapped, this first
+application is wasted effort.]
+
+Address Remapping
+=================
+
+At any time after initial code has been generated and before
+finalizeObject is called, the client can remap the address of sections in
+the object.  Typically this is done because the code was generated for an
+external process and is being mapped into that process' address space.
+The client remaps the section address by calling MCJIT::mapSectionAddress.
+This should happen before the section memory is copied to its new
+location.
+
+When MCJIT::mapSectionAddress is called, MCJIT passes the call on to
+RuntimeDyldImpl (via its Dyld member).  RuntimeDyldImpl stores the new
+address in an internal data structure but does not update the code at this
+time, since other sections are likely to change.
+
+When the client is finished remapping section addresses, it will call
+MCJIT::finalizeObject to complete the remapping process.
+
+Final Preparations
+==================
+
+When MCJIT::finalizeObject is called, MCJIT calls
+RuntimeDyld::resolveRelocations.  This function will attempt to locate any
+external symbols and then apply all relocations for the object.
+
+External symbols are resolved by calling the memory manager's
+getPointerToNamedFunction method.  The memory manager will return the
+address of the requested symbol in the target address space.  (Note, this
+may not be a valid pointer in the host process.)  RuntimeDyld will then
+iterate through the list of relocations it has stored which are associated
+with this symbol and invoke the resolveRelocation method which, through an
+format-specific implementation, will apply the relocation to the loaded
+section memory.
+
+Next, RuntimeDyld::resolveRelocations iterates through the list of
+sections and for each section iterates through a list of relocations that
+have been saved which reference that symbol and call resolveRelocation for
+each entry in this list.  The relocation list here is a list of
+relocations for which the symbol associated with the relocation is located
+in the section associated with the list.  Each of these locations will
+have a target location at which the relocation will be applied that is
+likely located in a different section.
+
+.. image:: MCJIT-resolve-relocations.png
+ 
+Once relocations have been applied as described above, MCJIT calls
+RuntimeDyld::getEHFrameSection, and if a non-zero result is returned
+passes the section data to the memory manager's registerEHFrames method.
+This allows the memory manager to call any desired target-specific
+functions, such as registering the EH frame information with a debugger.
+
+Finally, MCJIT calls the memory manager's finalizeMemory method.  In this
+method, the memory manager will invalidate the target code cache, if
+necessary, and apply final permissions to the memory pages it has
+allocated for code and data memory.
+
diff --git a/docs/Makefile b/docs/Makefile
index 122c4b8..d973af5 100644
--- a/docs/Makefile
+++ b/docs/Makefile
@@ -19,7 +19,12 @@ $(PROJ_OBJ_DIR)/doxygen.cfg: doxygen.cfg.in
 	  -e 's/@abs_top_srcdir@/../g' \
 	  -e 's/@DOT@/dot/g' \
 	  -e 's/@PACKAGE_VERSION@/mainline/' \
-	  -e 's/@abs_top_builddir@/../g' > $@
+	  -e 's/@abs_top_builddir@/../g' \
+	  -e 's/@enable_searchengine@/NO/g' \
+	  -e 's/@searchengine_url@//g' \
+	  -e 's/@enable_server_based_search@/NO/g' \
+	  -e 's/@enable_external_search@/NO/g' \
+	  -e 's/@extra_search_mappings@//g' > $@
 endif
 
 include $(LEVEL)/Makefile.common
@@ -77,9 +82,7 @@ doxygen: regendoc $(PROJ_OBJ_DIR)/doxygen.tar.gz
 
 regendoc:
 	$(Echo) Building doxygen documentation
-	$(Verb) if test -e $(PROJ_OBJ_DIR)/doxygen ; then \
-	  $(RM) -rf $(PROJ_OBJ_DIR)/doxygen ; \
-	fi
+	$(Verb) $(RM) -rf $(PROJ_OBJ_DIR)/doxygen
 	$(Verb) $(DOXYGEN) $(PROJ_OBJ_DIR)/doxygen.cfg
 
 $(PROJ_OBJ_DIR)/doxygen.tar.gz: $(DOXYFILES) $(PROJ_OBJ_DIR)/doxygen.cfg
@@ -113,9 +116,7 @@ ocamldoc: regen-ocamldoc
 
 regen-ocamldoc:
 	$(Echo) Building ocamldoc documentation
-	$(Verb) if test -e $(PROJ_OBJ_DIR)/ocamldoc ; then \
-		$(RM) -rf $(PROJ_OBJ_DIR)/ocamldoc ; \
-	fi
+	$(Verb) $(RM) -rf $(PROJ_OBJ_DIR)/ocamldoc
 	$(Verb) $(MAKE) -C $(LEVEL)/bindings/ocaml ocamldoc
 	$(Verb) $(MKDIR) $(PROJ_OBJ_DIR)/ocamldoc/html
 	$(Verb) \
diff --git a/docs/MakefileGuide.rst b/docs/MakefileGuide.rst
index 0bb4a3c..120c108 100644
--- a/docs/MakefileGuide.rst
+++ b/docs/MakefileGuide.rst
@@ -153,7 +153,7 @@ libraries are the default. For example:
 
   LIBRARYNAME = mylib
   SHARED_LIBRARY = 1
-  ARCHIVE_LIBRARY = 1
+  BUILD_ARCHIVE = 1
 
 says to build a library named ``mylib`` with both a shared library
 (``mylib.so``) and an archive library (``mylib.a``) version. The contents of all
@@ -168,9 +168,9 @@ openable with the ``dlopen`` function and searchable with the ``dlsym`` function
 (or your operating system's equivalents). While this isn't strictly necessary on
 Linux and a few other platforms, it is required on systems like HP-UX and
 Darwin. You should use ``LOADABLE_MODULE`` for any shared library that you
-intend to be loaded into an tool via the ``-load`` option.  `Pass documentation
-<writing-an-llvm-pass-makefile>`_ has an example of why you might want to do
-this.
+intend to be loaded into an tool via the ``-load`` option.  :ref:`Pass
+documentation <writing-an-llvm-pass-makefile>` has an example of why you might
+want to do this.
 
 Loadable Modules
 ^^^^^^^^^^^^^^^^
@@ -236,7 +236,7 @@ the ``-l`` option). In this case, only the symbols that are unresolved *at
 that point* will be resolved from the library, if they exist. Other
 (unreferenced) symbols will not be included when the ``.a`` syntax is used. Note
 that in order to use the ``.a`` suffix, the library in question must have been
-built with the ``ARCHIVE_LIBRARY`` option set.
+built with the ``BUILD_ARCHIVE`` option set.
 
 JIT Tools
 ^^^^^^^^^
diff --git a/docs/NVPTXUsage.rst b/docs/NVPTXUsage.rst
index 5451619..a9065ce 100644
--- a/docs/NVPTXUsage.rst
+++ b/docs/NVPTXUsage.rst
@@ -66,6 +66,8 @@ function ``@my_kernel`` is callable from host code, but ``@my_fmad`` is not.
 When compiled, the PTX kernel functions are callable by host-side code.
 
 
+.. _address_spaces:
+
 Address Spaces
 --------------
 
@@ -103,6 +105,25 @@ space in LLVM, so the ``addrspace(N)`` annotation is *required* for global
 variables.
 
 
+Triples
+-------
+
+The NVPTX target uses the module triple to select between 32/64-bit code
+generation and the driver-compiler interface to use. The triple architecture
+can be one of ``nvptx`` (32-bit PTX) or ``nvptx64`` (64-bit PTX). The
+operating system should be one of ``cuda`` or ``nvcl``, which determines the
+interface used by the generated code to communicate with the driver.  Most
+users will want to use ``cuda`` as the operating system, which makes the
+generated PTX compatible with the CUDA Driver API.
+
+Example: 32-bit PTX for CUDA Driver API: ``nvptx-nvidia-cuda``
+
+Example: 64-bit PTX for CUDA Driver API: ``nvptx64-nvidia-cuda``
+
+
+
+.. _nvptx_intrinsics:
+
 NVPTX Intrinsics
 ================
 
@@ -238,6 +259,116 @@ For the full set of NVPTX intrinsics, please see the
 ``include/llvm/IR/IntrinsicsNVVM.td`` file in the LLVM source tree.
 
 
+.. _libdevice:
+
+Linking with Libdevice
+======================
+
+The CUDA Toolkit comes with an LLVM bitcode library called ``libdevice`` that
+implements many common mathematical functions. This library can be used as a
+high-performance math library for any compilers using the LLVM NVPTX target.
+The library can be found under ``nvvm/libdevice/`` in the CUDA Toolkit and
+there is a separate version for each compute architecture.
+
+For a list of all math functions implemented in libdevice, see
+`libdevice Users Guide <http://docs.nvidia.com/cuda/libdevice-users-guide/index.html>`_.
+
+To accomodate various math-related compiler flags that can affect code
+generation of libdevice code, the library code depends on a special LLVM IR
+pass (``NVVMReflect``) to handle conditional compilation within LLVM IR. This
+pass looks for calls to the ``@__nvvm_reflect`` function and replaces them
+with constants based on the defined reflection parameters. Such conditional
+code often follows a pattern:
+
+.. code-block:: c++
+
+  float my_function(float a) {
+    if (__nvvm_reflect("FASTMATH"))
+      return my_function_fast(a);
+    else
+      return my_function_precise(a);
+  }
+
+The default value for all unspecified reflection parameters is zero. 
+
+The ``NVVMReflect`` pass should be executed early in the optimization
+pipeline, immediately after the link stage. The ``internalize`` pass is also
+recommended to remove unused math functions from the resulting PTX. For an
+input IR module ``module.bc``, the following compilation flow is recommended:
+
+1. Save list of external functions in ``module.bc``
+2. Link ``module.bc`` with ``libdevice.compute_XX.YY.bc``
+3. Internalize all functions not in list from (1)
+4. Eliminate all unused internal functions
+5. Run ``NVVMReflect`` pass
+6. Run standard optimization pipeline
+
+.. note::
+
+  ``linkonce`` and ``linkonce_odr`` linkage types are not suitable for the
+  libdevice functions. It is possible to link two IR modules that have been
+  linked against libdevice using different reflection variables.
+
+Since the ``NVVMReflect`` pass replaces conditionals with constants, it will
+often leave behind dead code of the form:
+
+.. code-block:: llvm
+
+  entry:
+    ..
+    br i1 true, label %foo, label %bar
+  foo:
+    ..
+  bar:
+    ; Dead code
+    ..
+
+Therefore, it is recommended that ``NVVMReflect`` is executed early in the
+optimization pipeline before dead-code elimination.
+
+
+Reflection Parameters
+---------------------
+
+The libdevice library currently uses the following reflection parameters to
+control code generation:
+
+==================== ======================================================
+Flag                 Description
+==================== ======================================================
+``__CUDA_FTZ=[0,1]`` Use optimized code paths that flush subnormals to zero
+==================== ======================================================
+
+
+Invoking NVVMReflect
+--------------------
+
+To ensure that all dead code caused by the reflection pass is eliminated, it
+is recommended that the reflection pass is executed early in the LLVM IR
+optimization pipeline. The pass takes an optional mapping of reflection
+parameter name to an integer value. This mapping can be specified as either a
+command-line option to ``opt`` or as an LLVM ``StringMap<int>`` object when
+programmatically creating a pass pipeline.
+
+With ``opt``:
+
+.. code-block:: text
+
+  # opt -nvvm-reflect -nvvm-reflect-list=<var>=<value>,<var>=<value> module.bc -o module.reflect.bc
+
+
+With programmatic pass pipeline:
+
+.. code-block:: c++
+
+  extern ModulePass *llvm::createNVVMReflectPass(const StringMap<int>& Mapping);
+
+  StringMap<int> ReflectParams;
+  ReflectParams["__CUDA_FTZ"] = 1;
+  Passes.add(createNVVMReflectPass(ReflectParams));
+
+
+
 Executing PTX
 =============
 
@@ -274,3 +405,576 @@ JIT compiling a PTX string to a device binary:
 
 For full examples of executing PTX assembly, please see the `CUDA Samples
 <https://developer.nvidia.com/cuda-downloads>`_ distribution.
+
+
+Common Issues
+=============
+
+ptxas complains of undefined function: __nvvm_reflect
+-----------------------------------------------------
+
+When linking with libdevice, the ``NVVMReflect`` pass must be used. See
+:ref:`libdevice` for more information.
+
+
+Tutorial: A Simple Compute Kernel
+=================================
+
+To start, let us take a look at a simple compute kernel written directly in
+LLVM IR. The kernel implements vector addition, where each thread computes one
+element of the output vector C from the input vectors A and B.  To make this
+easier, we also assume that only a single CTA (thread block) will be launched,
+and that it will be one dimensional.
+
+
+The Kernel
+----------
+
+.. code-block:: llvm
+
+  target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
+  target triple = "nvptx64-nvidia-cuda"
+
+  ; Intrinsic to read X component of thread ID
+  declare i32 @llvm.nvvm.read.ptx.sreg.tid.x() readnone nounwind
+
+  define void @kernel(float addrspace(1)* %A,
+                      float addrspace(1)* %B,
+                      float addrspace(1)* %C) {
+  entry:
+    ; What is my ID?
+    %id = tail call i32 @llvm.nvvm.read.ptx.sreg.tid.x() readnone nounwind
+
+    ; Compute pointers into A, B, and C
+    %ptrA = getelementptr float addrspace(1)* %A, i32 %id
+    %ptrB = getelementptr float addrspace(1)* %B, i32 %id
+    %ptrC = getelementptr float addrspace(1)* %C, i32 %id
+
+    ; Read A, B
+    %valA = load float addrspace(1)* %ptrA, align 4
+    %valB = load float addrspace(1)* %ptrB, align 4
+
+    ; Compute C = A + B
+    %valC = fadd float %valA, %valB
+
+    ; Store back to C
+    store float %valC, float addrspace(1)* %ptrC, align 4
+
+    ret void
+  }
+
+  !nvvm.annotations = !{!0}
+  !0 = metadata !{void (float addrspace(1)*,
+                        float addrspace(1)*,
+                        float addrspace(1)*)* @kernel, metadata !"kernel", i32 1}
+
+
+We can use the LLVM ``llc`` tool to directly run the NVPTX code generator:
+
+.. code-block:: text
+
+  # llc -mcpu=sm_20 kernel.ll -o kernel.ptx
+
+
+.. note::
+
+  If you want to generate 32-bit code, change ``p:64:64:64`` to ``p:32:32:32``
+  in the module data layout string and use ``nvptx-nvidia-cuda`` as the
+  target triple.
+
+
+The output we get from ``llc`` (as of LLVM 3.4):
+
+.. code-block:: text
+
+  //
+  // Generated by LLVM NVPTX Back-End
+  //
+
+  .version 3.1
+  .target sm_20
+  .address_size 64
+
+    // .globl kernel
+                                          // @kernel
+  .visible .entry kernel(
+    .param .u64 kernel_param_0,
+    .param .u64 kernel_param_1,
+    .param .u64 kernel_param_2
+  )
+  {
+    .reg .f32   %f<4>;
+    .reg .s32   %r<2>;
+    .reg .s64   %rl<8>;
+
+  // BB#0:                                // %entry
+    ld.param.u64    %rl1, [kernel_param_0];
+    mov.u32         %r1, %tid.x;
+    mul.wide.s32    %rl2, %r1, 4;
+    add.s64         %rl3, %rl1, %rl2;
+    ld.param.u64    %rl4, [kernel_param_1];
+    add.s64         %rl5, %rl4, %rl2;
+    ld.param.u64    %rl6, [kernel_param_2];
+    add.s64         %rl7, %rl6, %rl2;
+    ld.global.f32   %f1, [%rl3];
+    ld.global.f32   %f2, [%rl5];
+    add.f32         %f3, %f1, %f2;
+    st.global.f32   [%rl7], %f3;
+    ret;
+  }
+
+
+Dissecting the Kernel
+---------------------
+
+Now let us dissect the LLVM IR that makes up this kernel. 
+
+Data Layout
+^^^^^^^^^^^
+
+The data layout string determines the size in bits of common data types, their
+ABI alignment, and their storage size.  For NVPTX, you should use one of the
+following:
+
+32-bit PTX:
+
+.. code-block:: llvm
+
+  target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
+
+64-bit PTX:
+
+.. code-block:: llvm
+
+  target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
+
+
+Target Intrinsics
+^^^^^^^^^^^^^^^^^
+
+In this example, we use the ``@llvm.nvvm.read.ptx.sreg.tid.x`` intrinsic to
+read the X component of the current thread's ID, which corresponds to a read
+of register ``%tid.x`` in PTX. The NVPTX back-end supports a large set of
+intrinsics.  A short list is shown below; please see
+``include/llvm/IR/IntrinsicsNVVM.td`` for the full list.
+
+
+================================================ ====================
+Intrinsic                                        CUDA Equivalent
+================================================ ====================
+``i32 @llvm.nvvm.read.ptx.sreg.tid.{x,y,z}``     threadIdx.{x,y,z}
+``i32 @llvm.nvvm.read.ptx.sreg.ctaid.{x,y,z}``   blockIdx.{x,y,z}
+``i32 @llvm.nvvm.read.ptx.sreg.ntid.{x,y,z}``    blockDim.{x,y,z}
+``i32 @llvm.nvvm.read.ptx.sreg.nctaid.{x,y,z}``  gridDim.{x,y,z}
+``void @llvm.cuda.syncthreads()``                __syncthreads()
+================================================ ====================
+
+
+Address Spaces
+^^^^^^^^^^^^^^
+
+You may have noticed that all of the pointer types in the LLVM IR example had
+an explicit address space specifier. What is address space 1? NVIDIA GPU
+devices (generally) have four types of memory:
+
+- Global: Large, off-chip memory
+- Shared: Small, on-chip memory shared among all threads in a CTA
+- Local: Per-thread, private memory
+- Constant: Read-only memory shared across all threads
+
+These different types of memory are represented in LLVM IR as address spaces.
+There is also a fifth address space used by the NVPTX code generator that
+corresponds to the "generic" address space.  This address space can represent
+addresses in any other address space (with a few exceptions).  This allows
+users to write IR functions that can load/store memory using the same
+instructions. Intrinsics are provided to convert pointers between the generic
+and non-generic address spaces.
+
+See :ref:`address_spaces` and :ref:`nvptx_intrinsics` for more information.
+
+
+Kernel Metadata
+^^^^^^^^^^^^^^^
+
+In PTX, a function can be either a `kernel` function (callable from the host
+program), or a `device` function (callable only from GPU code). You can think
+of `kernel` functions as entry-points in the GPU program. To mark an LLVM IR
+function as a `kernel` function, we make use of special LLVM metadata. The
+NVPTX back-end will look for a named metadata node called
+``nvvm.annotations``. This named metadata must contain a list of metadata that
+describe the IR. For our purposes, we need to declare a metadata node that
+assigns the "kernel" attribute to the LLVM IR function that should be emitted
+as a PTX `kernel` function. These metadata nodes take the form:
+
+.. code-block:: text
+
+  metadata !{<function ref>, metadata !"kernel", i32 1}
+
+For the previous example, we have:
+
+.. code-block:: llvm
+
+  !nvvm.annotations = !{!0}
+  !0 = metadata !{void (float addrspace(1)*,
+                        float addrspace(1)*,
+                        float addrspace(1)*)* @kernel, metadata !"kernel", i32 1}
+
+Here, we have a single metadata declaration in ``nvvm.annotations``. This
+metadata annotates our ``@kernel`` function with the ``kernel`` attribute.
+
+
+Running the Kernel
+------------------
+
+Generating PTX from LLVM IR is all well and good, but how do we execute it on
+a real GPU device? The CUDA Driver API provides a convenient mechanism for
+loading and JIT compiling PTX to a native GPU device, and launching a kernel.
+The API is similar to OpenCL.  A simple example showing how to load and
+execute our vector addition code is shown below. Note that for brevity this
+code does not perform much error checking!
+
+.. note::
+
+  You can also use the ``ptxas`` tool provided by the CUDA Toolkit to offline
+  compile PTX to machine code (SASS) for a specific GPU architecture. Such
+  binaries can be loaded by the CUDA Driver API in the same way as PTX. This
+  can be useful for reducing startup time by precompiling the PTX kernels.
+
+
+.. code-block:: c++
+
+  #include <iostream>
+  #include <fstream>
+  #include <cassert>
+  #include "cuda.h"
+
+
+  void checkCudaErrors(CUresult err) {
+    assert(err == CUDA_SUCCESS);
+  }
+
+  /// main - Program entry point
+  int main(int argc, char **argv) {
+    CUdevice    device;
+    CUmodule    cudaModule;
+    CUcontext   context;
+    CUfunction  function;
+    CUlinkState linker;
+    int         devCount;
+
+    // CUDA initialization
+    checkCudaErrors(cuInit(0));
+    checkCudaErrors(cuDeviceGetCount(&devCount));
+    checkCudaErrors(cuDeviceGet(&device, 0));
+
+    char name[128];
+    checkCudaErrors(cuDeviceGetName(name, 128, device));
+    std::cout << "Using CUDA Device [0]: " << name << "\n";
+
+    int devMajor, devMinor;
+    checkCudaErrors(cuDeviceComputeCapability(&devMajor, &devMinor, device));
+    std::cout << "Device Compute Capability: "
+              << devMajor << "." << devMinor << "\n";
+    if (devMajor < 2) {
+      std::cerr << "ERROR: Device 0 is not SM 2.0 or greater\n";
+      return 1;
+    }
+
+    std::ifstream t("kernel.ptx");
+    if (!t.is_open()) {
+      std::cerr << "kernel.ptx not found\n";
+      return 1;
+    }
+    std::string str((std::istreambuf_iterator<char>(t)),
+                      std::istreambuf_iterator<char>());
+
+    // Create driver context
+    checkCudaErrors(cuCtxCreate(&context, 0, device));
+
+    // Create module for object
+    checkCudaErrors(cuModuleLoadDataEx(&cudaModule, str.c_str(), 0, 0, 0));
+
+    // Get kernel function
+    checkCudaErrors(cuModuleGetFunction(&function, cudaModule, "kernel"));
+
+    // Device data
+    CUdeviceptr devBufferA;
+    CUdeviceptr devBufferB;
+    CUdeviceptr devBufferC;
+
+    checkCudaErrors(cuMemAlloc(&devBufferA, sizeof(float)*16));
+    checkCudaErrors(cuMemAlloc(&devBufferB, sizeof(float)*16));
+    checkCudaErrors(cuMemAlloc(&devBufferC, sizeof(float)*16));
+
+    float* hostA = new float[16];
+    float* hostB = new float[16];
+    float* hostC = new float[16];
+
+    // Populate input
+    for (unsigned i = 0; i != 16; ++i) {
+      hostA[i] = (float)i;
+      hostB[i] = (float)(2*i);
+      hostC[i] = 0.0f;
+    }
+
+    checkCudaErrors(cuMemcpyHtoD(devBufferA, &hostA[0], sizeof(float)*16));
+    checkCudaErrors(cuMemcpyHtoD(devBufferB, &hostB[0], sizeof(float)*16));
+
+
+    unsigned blockSizeX = 16;
+    unsigned blockSizeY = 1;
+    unsigned blockSizeZ = 1;
+    unsigned gridSizeX  = 1;
+    unsigned gridSizeY  = 1;
+    unsigned gridSizeZ  = 1;
+
+    // Kernel parameters
+    void *KernelParams[] = { &devBufferA, &devBufferB, &devBufferC };
+
+    std::cout << "Launching kernel\n";
+
+    // Kernel launch
+    checkCudaErrors(cuLaunchKernel(function, gridSizeX, gridSizeY, gridSizeZ,
+                                   blockSizeX, blockSizeY, blockSizeZ,
+                                   0, NULL, KernelParams, NULL));
+
+    // Retrieve device data
+    checkCudaErrors(cuMemcpyDtoH(&hostC[0], devBufferC, sizeof(float)*16));
+
+
+    std::cout << "Results:\n";
+    for (unsigned i = 0; i != 16; ++i) {
+      std::cout << hostA[i] << " + " << hostB[i] << " = " << hostC[i] << "\n";
+    }
+
+
+    // Clean up after ourselves
+    delete [] hostA;
+    delete [] hostB;
+    delete [] hostC;
+
+    // Clean-up
+    checkCudaErrors(cuMemFree(devBufferA));
+    checkCudaErrors(cuMemFree(devBufferB));
+    checkCudaErrors(cuMemFree(devBufferC));
+    checkCudaErrors(cuModuleUnload(cudaModule));
+    checkCudaErrors(cuCtxDestroy(context));
+
+    return 0;
+  }
+
+
+You will need to link with the CUDA driver and specify the path to cuda.h.
+
+.. code-block:: text
+
+  # clang++ sample.cpp -o sample -O2 -g -I/usr/local/cuda-5.5/include -lcuda
+
+We don't need to specify a path to ``libcuda.so`` since this is installed in a
+system location by the driver, not the CUDA toolkit.
+
+If everything goes as planned, you should see the following output when
+running the compiled program:
+
+.. code-block:: text
+
+  Using CUDA Device [0]: GeForce GTX 680
+  Device Compute Capability: 3.0
+  Launching kernel
+  Results:
+  0 + 0 = 0
+  1 + 2 = 3
+  2 + 4 = 6
+  3 + 6 = 9
+  4 + 8 = 12
+  5 + 10 = 15
+  6 + 12 = 18
+  7 + 14 = 21
+  8 + 16 = 24
+  9 + 18 = 27
+  10 + 20 = 30
+  11 + 22 = 33
+  12 + 24 = 36
+  13 + 26 = 39
+  14 + 28 = 42
+  15 + 30 = 45
+
+.. note::
+
+  You will likely see a different device identifier based on your hardware
+
+
+Tutorial: Linking with Libdevice
+================================
+
+In this tutorial, we show a simple example of linking LLVM IR with the
+libdevice library. We will use the same kernel as the previous tutorial,
+except that we will compute ``C = pow(A, B)`` instead of ``C = A + B``.
+Libdevice provides an ``__nv_powf`` function that we will use.
+
+.. code-block:: llvm
+
+  target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
+  target triple = "nvptx64-nvidia-cuda"
+
+  ; Intrinsic to read X component of thread ID
+  declare i32 @llvm.nvvm.read.ptx.sreg.tid.x() readnone nounwind
+  ; libdevice function
+  declare float @__nv_powf(float, float)
+
+  define void @kernel(float addrspace(1)* %A,
+                      float addrspace(1)* %B,
+                      float addrspace(1)* %C) {
+  entry:
+    ; What is my ID?
+    %id = tail call i32 @llvm.nvvm.read.ptx.sreg.tid.x() readnone nounwind
+
+    ; Compute pointers into A, B, and C
+    %ptrA = getelementptr float addrspace(1)* %A, i32 %id
+    %ptrB = getelementptr float addrspace(1)* %B, i32 %id
+    %ptrC = getelementptr float addrspace(1)* %C, i32 %id
+
+    ; Read A, B
+    %valA = load float addrspace(1)* %ptrA, align 4
+    %valB = load float addrspace(1)* %ptrB, align 4
+
+    ; Compute C = pow(A, B)
+    %valC = call float @__nv_exp2f(float %valA, float %valB)
+
+    ; Store back to C
+    store float %valC, float addrspace(1)* %ptrC, align 4
+
+    ret void
+  }
+
+  !nvvm.annotations = !{!0}
+  !0 = metadata !{void (float addrspace(1)*,
+                        float addrspace(1)*,
+                        float addrspace(1)*)* @kernel, metadata !"kernel", i32 1}%
+
+
+To compile this kernel, we perform the following steps:
+
+1. Link with libdevice
+2. Internalize all but the public kernel function
+3. Run ``NVVMReflect`` and set ``__CUDA_FTZ`` to 0
+4. Optimize the linked module
+5. Codegen the module
+
+
+These steps can be performed by the LLVM ``llvm-link``, ``opt``, and ``llc``
+tools. In a complete compiler, these steps can also be performed entirely
+programmatically by setting up an appropriate pass configuration (see
+:ref:`libdevice`).
+
+.. code-block:: text
+
+  # llvm-link t2.bc libdevice.compute_20.10.bc -o t2.linked.bc
+  # opt -internalize -internalize-public-api-list=kernel -nvvm-reflect-list=__CUDA_FTZ=0 -nvvm-reflect -O3 t2.linked.bc -o t2.opt.bc
+  # llc -mcpu=sm_20 t2.opt.bc -o t2.ptx
+
+.. note::
+
+  The ``-nvvm-reflect-list=_CUDA_FTZ=0`` is not strictly required, as any
+  undefined variables will default to zero. It is shown here for evaluation
+  purposes.
+
+
+This gives us the following PTX (excerpt):
+
+.. code-block:: text
+
+  //
+  // Generated by LLVM NVPTX Back-End
+  //
+
+  .version 3.1
+  .target sm_20
+  .address_size 64
+
+    // .globl kernel
+                                          // @kernel
+  .visible .entry kernel(
+    .param .u64 kernel_param_0,
+    .param .u64 kernel_param_1,
+    .param .u64 kernel_param_2
+  )
+  {
+    .reg .pred  %p<30>;
+    .reg .f32   %f<111>;
+    .reg .s32   %r<21>;
+    .reg .s64   %rl<8>;
+
+  // BB#0:                                // %entry
+    ld.param.u64  %rl2, [kernel_param_0];
+    mov.u32   %r3, %tid.x;
+    ld.param.u64  %rl3, [kernel_param_1];
+    mul.wide.s32  %rl4, %r3, 4;
+    add.s64   %rl5, %rl2, %rl4;
+    ld.param.u64  %rl6, [kernel_param_2];
+    add.s64   %rl7, %rl3, %rl4;
+    add.s64   %rl1, %rl6, %rl4;
+    ld.global.f32   %f1, [%rl5];
+    ld.global.f32   %f2, [%rl7];
+    setp.eq.f32 %p1, %f1, 0f3F800000;
+    setp.eq.f32 %p2, %f2, 0f00000000;
+    or.pred   %p3, %p1, %p2;
+    @%p3 bra  BB0_1;
+    bra.uni   BB0_2;
+  BB0_1:
+    mov.f32   %f110, 0f3F800000;
+    st.global.f32   [%rl1], %f110;
+    ret;
+  BB0_2:                                  // %__nv_isnanf.exit.i
+    abs.f32   %f4, %f1;
+    setp.gtu.f32  %p4, %f4, 0f7F800000;
+    @%p4 bra  BB0_4;
+  // BB#3:                                // %__nv_isnanf.exit5.i
+    abs.f32   %f5, %f2;
+    setp.le.f32 %p5, %f5, 0f7F800000;
+    @%p5 bra  BB0_5;
+  BB0_4:                                  // %.critedge1.i
+    add.f32   %f110, %f1, %f2;
+    st.global.f32   [%rl1], %f110;
+    ret;
+  BB0_5:                                  // %__nv_isinff.exit.i
+
+    ...
+
+  BB0_26:                                 // %__nv_truncf.exit.i.i.i.i.i
+    mul.f32   %f90, %f107, 0f3FB8AA3B;
+    cvt.rzi.f32.f32 %f91, %f90;
+    mov.f32   %f92, 0fBF317200;
+    fma.rn.f32  %f93, %f91, %f92, %f107;
+    mov.f32   %f94, 0fB5BFBE8E;
+    fma.rn.f32  %f95, %f91, %f94, %f93;
+    mul.f32   %f89, %f95, 0f3FB8AA3B;
+    // inline asm
+    ex2.approx.ftz.f32 %f88,%f89;
+    // inline asm
+    add.f32   %f96, %f91, 0f00000000;
+    ex2.approx.f32  %f97, %f96;
+    mul.f32   %f98, %f88, %f97;
+    setp.lt.f32 %p15, %f107, 0fC2D20000;
+    selp.f32  %f99, 0f00000000, %f98, %p15;
+    setp.gt.f32 %p16, %f107, 0f42D20000;
+    selp.f32  %f110, 0f7F800000, %f99, %p16;
+    setp.eq.f32 %p17, %f110, 0f7F800000;
+    @%p17 bra   BB0_28;
+  // BB#27:
+    fma.rn.f32  %f110, %f110, %f108, %f110;
+  BB0_28:                                 // %__internal_accurate_powf.exit.i
+    setp.lt.f32 %p18, %f1, 0f00000000;
+    setp.eq.f32 %p19, %f3, 0f3F800000;
+    and.pred    %p20, %p18, %p19;
+    @!%p20 bra  BB0_30;
+    bra.uni   BB0_29;
+  BB0_29:
+    mov.b32    %r9, %f110;
+    xor.b32   %r10, %r9, -2147483648;
+    mov.b32    %f110, %r10;
+  BB0_30:                                 // %__nv_powf.exit
+    st.global.f32   [%rl1], %f110;
+    ret;
+  }
+
diff --git a/docs/Passes.rst b/docs/Passes.rst
index d30c3ca..029e472 100644
--- a/docs/Passes.rst
+++ b/docs/Passes.rst
@@ -476,7 +476,7 @@ transformation obviously invalidates the CFG, but can update forward dominator
 -------------------------------------------------
 
 This pass munges the code in the input function to better prepare it for
-SelectionDAG-based code generation.  This works around limitations in it's
+SelectionDAG-based code generation.  This works around limitations in its
 basic-block-at-a-time approach.  It should eventually be removed.
 
 ``-constmerge``: Merge Duplicate Global Constants
diff --git a/docs/Phabricator.rst b/docs/Phabricator.rst
index 0c6990e..6fdea1f 100644
--- a/docs/Phabricator.rst
+++ b/docs/Phabricator.rst
@@ -68,7 +68,7 @@ To upload a new patch:
 To submit an updated patch:
 
 * Click *Differential*.
-* Click *Create Revision*.
+* Click *Create Diff*.
 * Paste the updated diff.
 * Select the review you want to from the *Attach To* dropdown and click
   *Continue*.
diff --git a/docs/ProgrammersManual.rst b/docs/ProgrammersManual.rst
index 2910a2a..99aa5c7 100644
--- a/docs/ProgrammersManual.rst
+++ b/docs/ProgrammersManual.rst
@@ -1172,7 +1172,7 @@ The drawback of SetVector is that it requires twice as much space as a normal
 set and has the sum of constant factors from the set-like container and the
 sequential container that it uses.  Use it **only** if you need to iterate over
 the elements in a deterministic order.  SetVector is also expensive to delete
-elements out of (linear time), unless you use it's "pop_back" method, which is
+elements out of (linear time), unless you use its "pop_back" method, which is
 faster.
 
 ``SetVector`` is an adapter class that defaults to using ``std::vector`` and a
@@ -2080,7 +2080,7 @@ the ``llvm_stop_multithreaded()`` call.  You can also use the
 
 Note that both of these calls must be made *in isolation*.  That is to say that
 no other LLVM API calls may be executing at any time during the execution of
-``llvm_start_multithreaded()`` or ``llvm_stop_multithreaded``.  It's is the
+``llvm_start_multithreaded()`` or ``llvm_stop_multithreaded``.  It is the
 client's responsibility to enforce this isolation.
 
 The return value of ``llvm_start_multithreaded()`` indicates the success or
diff --git a/docs/ReleaseNotes.rst b/docs/ReleaseNotes.rst
index 7b143f6..94663c4 100644
--- a/docs/ReleaseNotes.rst
+++ b/docs/ReleaseNotes.rst
@@ -5,12 +5,6 @@ LLVM 3.4 Release Notes
 .. contents::
     :local:
 
-.. warning::
-   These are in-progress notes for the upcoming LLVM 3.4 release.  You may
-   prefer the `LLVM 3.3 Release Notes <http://llvm.org/releases/3.3/docs
-   /ReleaseNotes.html>`_.
-
-
 Introduction
 ============
 
@@ -34,12 +28,15 @@ page <http://llvm.org/releases/>`_.
 Non-comprehensive list of changes in this release
 =================================================
 
-.. NOTE
-   For small 1-3 sentence descriptions, just add an entry at the end of
-   this list. If your description won't fit comfortably in one bullet
-   point (e.g. maybe you would like to give an example of the
-   functionality, or simply have a lot to talk about), see the `NOTE` below
-   for adding a new subsection.
+* This is expected to be the last release of LLVM which compiles using a C++98
+  toolchain. We expect to start using some C++11 features in LLVM and other
+  sub-projects starting after this release. That said, we are committed to
+  supporting a reasonable set of modern C++ toolchains as the host compiler on
+  all of the platforms. This will at least include Visual Studio 2012 on
+  Windows, and Clang 3.1 or GCC 4.7.x on Mac and Linux. The final set of
+  compilers (and the C++11 features they support) is not set in stone, but we
+  wanted users of LLVM to have a heads up that the next release will involve
+  a substantial change in the host toolchain requirements.
 
 * The regression tests now fail if any command in a pipe fails. To disable it in
   a directory, just add ``config.pipefail = False`` to its ``lit.local.cfg``.
@@ -50,9 +47,9 @@ Non-comprehensive list of changes in this release
 
 * The R600 backend is not marked experimental anymore and is built by default.
 
-* APFloat::isNormal() was renamed to APFloat::isFiniteNonZero() and
-  APFloat::isIEEENormal() was renamed to APFloat::isNormal(). This ensures that
-  APFloat::isNormal() conforms to IEEE-754R-2008.
+* ``APFloat::isNormal()`` was renamed to ``APFloat::isFiniteNonZero()`` and
+  ``APFloat::isIEEENormal()`` was renamed to ``APFloat::isNormal()``. This
+  ensures that ``APFloat::isNormal()`` conforms to IEEE-754R-2008.
 
 * The library call simplification pass has been removed.  Its functionality
   has been integrated into the instruction combiner and function attribute
@@ -62,26 +59,74 @@ Non-comprehensive list of changes in this release
   or later instead. For more information, see the `Getting Started using Visual
   Studio <GettingStartedVS.html>`_ page.
 
-* The Loop Vectorizer that was previously enabled for -O3 is now enabled for
-  -Os and -O2.
+* The Loop Vectorizer that was previously enabled for ``-O3`` is now enabled
+  for ``-Os`` and ``-O2``.
 
 * The new SLP Vectorizer is now enabled by default.
 
-* llvm-ar now uses the new Object library and produces archives and
+* ``llvm-ar`` now uses the new Object library and produces archives and
   symbol tables in the gnu format.
 
-* ... next change ...
+* FileCheck now allows specifing ``-check-prefix`` multiple times. This
+  helps reduce duplicate check lines when using multiple RUN lines.
+
+* The bitcast instruction no longer allows casting between pointers
+   with different address spaces. To achieve this, use the new addrspacecast
+   instruction.
+
+* Different sized pointers for different address spaces should now
+  generally work. This is primarily useful for GPU targets.
+
+* OCaml bindings have been significantly extended to cover almost all of the
+  LLVM libraries.
+
+Mips Target
+-----------
+
+Support for the MIPS SIMD Architecture (MSA) has been added. MSA is supported
+through inline assembly, intrinsics with the prefix '``__builtin_msa``', and
+normal code generation.
 
-.. NOTE
-   If you would like to document a larger change, then you can add a
-   subsection about it right here. You can copy the following boilerplate
-   and un-indent it (the indentation causes it to be inside this comment).
+For more information on MSA (including documentation for the instruction set),
+see the `MIPS SIMD page at Imagination Technologies
+<http://imgtec.com/mips/mips-simd.asp>`_
 
-   Special New Feature
-   -------------------
+PowerPC Target
+--------------
 
-   Makes programs 10x faster by doing Special New Thing.
+Changes in the PowerPC backend include:
 
+* fast-isel support (for faster ``-O0`` code generation)
+* many improvements to the builtin assembler
+* support for generating unaligned (Altivec) vector loads
+* support for generating the fcpsgn instruction
+* generate ``frin`` for ``round()`` (not ``nearbyint()`` and ``rint()``, which
+  had been done only in fast-math mode)
+* improved instruction scheduling for embedded cores (such as the A2)
+* improved prologue/epilogue generation (especially in 32-bit mode)
+* support for dynamic stack alignment (and dynamic stack allocations with large alignments)
+* improved generation of counter-register-based loops
+* bug fixes
+
+SPARC Target
+------------
+
+The SPARC backend got many improvements, namely
+
+* experimental SPARC V9 backend
+* JIT support for SPARC
+* fp128 support
+* exception handling
+* TLS support
+* leaf functions optimization
+* bug fixes
+
+SystemZ/s390x Backend
+---------------------
+
+LLVM and clang can now optimize for zEnterprise z196 and zEnterprise EC12
+targets.  In clang these targets are selected using ``-march=z196`` and
+``-march=zEC12`` respectively.
 
 External Open Source Projects Using LLVM 3.4
 ============================================
@@ -90,6 +135,105 @@ An exciting aspect of LLVM is that it is used as an enabling technology for
 a lot of other language and tools projects. This section lists some of the
 projects that have already been updated to work with LLVM 3.4.
 
+DXR
+---
+
+`DXR <https://wiki.mozilla.org/DXR>`_ is Mozilla's code search and navigation
+tool, aimed at making sense of large projects like Firefox. It supports
+full-text and regex searches as well as structural queries like "Find all the
+callers of this function." Behind the scenes, it uses a custom trigram index,
+the re2 library, and structural data collected by a clang compiler plugin.
+
+LDC - the LLVM-based D compiler
+-------------------------------
+
+`D <http://dlang.org>`_ is a language with C-like syntax and static typing. It
+pragmatically combines efficiency, control, and modeling power, with safety and
+programmer productivity. D supports powerful concepts like Compile-Time Function
+Execution (CTFE) and Template Meta-Programming, provides an innovative approach
+to concurrency and offers many classical paradigms.
+
+`LDC <http://wiki.dlang.org/LDC>`_ uses the frontend from the reference compiler
+combined with LLVM as backend to produce efficient native code. LDC targets
+x86/x86_64 systems like Linux, OS X, FreeBSD and Windows and also Linux/PPC64.
+Ports to other architectures like ARM and AArch64 are underway.
+
+LibBeauty
+---------
+
+The `LibBeauty <http://www.libbeauty.com>`_ decompiler and reverse
+engineering tool currently utilises the LLVM disassembler and the LLVM IR
+Builder. The current aim of the project is to take a x86_64 binary ``.o`` file
+as input, and produce an equivalent LLVM IR ``.bc`` or ``.ll`` file as
+output. Support for ARM binary ``.o`` file as input will be added later.
+
+Likely
+------
+
+`Likely <http://www.liblikely.org/>`_ is an open source domain specific
+language for image recognition.  Algorithms are just-in-time compiled using
+LLVM's MCJIT infrastructure to execute on single or multi-threaded CPUs as well
+as OpenCL SPIR or CUDA enabled GPUs. Likely exploits the observation that while
+image processing and statistical learning kernels must be written generically
+to handle any matrix datatype, at runtime they tend to be executed repeatedly
+on the same type.
+
+Portable Computing Language (pocl)
+----------------------------------
+
+In addition to producing an easily portable open source OpenCL
+implementation, another major goal of `pocl <http://portablecl.org/>`_
+is improving performance portability of OpenCL programs with
+compiler optimizations, reducing the need for target-dependent manual
+optimizations. An important part of pocl is a set of LLVM passes used to
+statically parallelize multiple work-items with the kernel compiler, even in
+the presence of work-group barriers. This enables static parallelization of
+the fine-grained static concurrency in the work groups in multiple ways. 
+
+Portable Native Client (PNaCl)
+------------------------------
+
+`Portable Native Client (PNaCl) <http://www.chromium.org/nativeclient/pnacl>`_
+is a Chrome initiative to bring the performance and low-level control of native
+code to modern web browsers, without sacrificing the security benefits and
+portability of web applications. PNaCl works by compiling native C and C++ code
+to an intermediate representation using the LLVM clang compiler. This
+intermediate representation is a subset of LLVM bytecode that is wrapped into a
+portable executable, which can be hosted on a web server like any other website
+asset. When the site is accessed, Chrome fetches and translates the portable
+executable into an architecture-specific machine code optimized directly for
+the underlying device. PNaCl lets developers compile their code once to run on
+any hardware platform and embed their PNaCl application in any website,
+enabling developers to directly leverage the power of the underlying CPU and
+GPU.
+
+TTA-based Co-design Environment (TCE)
+-------------------------------------
+
+`TCE <http://tce.cs.tut.fi/>`_ is a toolset for designing new
+exposed datapath processors based on the Transport triggered architecture (TTA). 
+The toolset provides a complete co-design flow from C/C++
+programs down to synthesizable VHDL/Verilog and parallel program binaries.
+Processor customization points include the register files, function units,
+supported operations, and the interconnection network.
+
+TCE uses Clang and LLVM for C/C++/OpenCL C language support, target independent 
+optimizations and also for parts of code generation. It generates
+new LLVM-based code generators "on the fly" for the designed processors and
+loads them in to the compiler backend as runtime libraries to avoid
+per-target recompilation of larger parts of the compiler chain. 
+
+WebCL Validator
+---------------
+
+`WebCL Validator <https://github.com/KhronosGroup/webcl-validator>`_ implements
+validation for WebCL C language which is a subset of OpenCL ES 1.1. Validator
+checks the correctness of WebCL C, and implements memory protection for it as a
+source-2-source transformation. The transformation converts WebCL to memory
+protected OpenCL. The protected OpenCL cannot access any memory ranges which
+were not allocated for it, and its memory is always initialized to prevent
+information leakage from other programs.
+
 
 Additional Information
 ======================
@@ -103,4 +247,3 @@ going into the ``llvm/docs/`` directory in the LLVM tree.
 
 If you have any questions or comments about LLVM, please feel free to contact
 us via the `mailing lists <http://llvm.org/docs/#maillist>`_.
-
diff --git a/docs/ReleaseProcess.rst b/docs/ReleaseProcess.rst
index c4bbc91..0836b6e 100644
--- a/docs/ReleaseProcess.rst
+++ b/docs/ReleaseProcess.rst
@@ -52,16 +52,18 @@ The scripts are in the ``utils/release`` directory.
 test-release.sh
 ---------------
 
-This script will check-out, configure and compile LLVM+Clang (+ most add-ons, like ``compiler-rt``,
-``libcxx`` and ``clang-extra-tools``) in three stages, and will test the final stage.
-It'll have installed the final binaries on the Phase3/Releasei(+Asserts) directory, and
-that's the one you should use for the test-suite and other external tests.
+This script will check-out, configure and compile LLVM+Clang (+ most add-ons,
+like ``compiler-rt``, ``libcxx`` and ``clang-extra-tools``) in three stages, and
+will test the final stage.  It'll have installed the final binaries on the
+Phase3/Releasei(+Asserts) directory, and that's the one you should use for the
+test-suite and other external tests.
 
 To run the script on a specific release candidate run::
 
    ./test-release.sh \
-        -release 3.3 \
+        -release 3.4 \
         -rc 1 \
+        -triple x86_64-apple-darwin \
         -no-64bit \
         -test-asserts \
         -no-compare-files
diff --git a/docs/SourceLevelDebugging.rst b/docs/SourceLevelDebugging.rst
index ad03871..a1d8110 100644
--- a/docs/SourceLevelDebugging.rst
+++ b/docs/SourceLevelDebugging.rst
@@ -295,7 +295,7 @@ Subprogram descriptors
     i32,      ;; Index into a virtual function
     metadata, ;; indicates which base type contains the vtable pointer for the
               ;; derived class
-    i32,      ;; Flags - Artifical, Private, Protected, Explicit, Prototyped.
+    i32,      ;; Flags - Artificial, Private, Protected, Explicit, Prototyped.
     i1,       ;; isOptimized
     Function * , ;; Pointer to LLVM function
     metadata, ;; Lists function template parameters
@@ -346,7 +346,7 @@ Basic type descriptors
 
   !4 = metadata !{
     i32,      ;; Tag = 36 (DW_TAG_base_type)
-    metadata,;; Source directory (including trailing slash) & file pair (may be null)
+    metadata, ;; Source directory (including trailing slash) & file pair (may be null)
     metadata, ;; Reference to context
     metadata, ;; Name (may be "" for anonymous types)
     i32,      ;; Line number where defined (may be 0)
@@ -389,7 +389,7 @@ Derived type descriptors
 
   !5 = metadata !{
     i32,      ;; Tag (see below)
-    metadata,;; Source directory (including trailing slash) & file pair (may be null)
+    metadata, ;; Source directory (including trailing slash) & file pair (may be null)
     metadata, ;; Reference to context
     metadata, ;; Name (may be "" for anonymous types)
     i32,      ;; Line number where defined (may be 0)
@@ -452,7 +452,7 @@ Composite type descriptors
 
   !6 = metadata !{
     i32,      ;; Tag (see below)
-    metadata,;; Source directory (including trailing slash) & file pair (may be null)
+    metadata, ;; Source directory (including trailing slash) & file pair (may be null)
     metadata, ;; Reference to context
     metadata, ;; Name (may be "" for anonymous types)
     i32,      ;; Line number where defined (may be 0)
@@ -462,9 +462,10 @@ Composite type descriptors
     i32,      ;; Flags
     metadata, ;; Reference to type derived from
     metadata, ;; Reference to array of member descriptors
-    i32       ;; Runtime languages
+    i32,      ;; Runtime languages
     metadata, ;; Base type containing the vtable pointer for this type
-    metadata  ;; Template parameters
+    metadata, ;; Template parameters
+    metadata  ;; A unique identifier for type uniquing purpose (may be null)
   }
 
 These descriptors are used to define types that are composed of 0 or more
@@ -649,85 +650,86 @@ Compiled to LLVM, this function would be represented like this:
 
 .. code-block:: llvm
 
-  define void @_Z3foov() #0 {
+  define void @foo() #0 {
   entry:
-    %X = alloca i32, align 4                        ; [#uses=3 type=i32*]
-    %Y = alloca i32, align 4                        ; [#uses=2 type=i32*]
-    %Z = alloca i32, align 4                        ; [#uses=2 type=i32*]
-    call void @llvm.dbg.declare(metadata !{i32* %X}, metadata !8), !dbg !10
+   %X = alloca i32, align 4
+    %Y = alloca i32, align 4
+    %Z = alloca i32, align 4
+    call void @llvm.dbg.declare(metadata !{i32* %X}, metadata !10), !dbg !12
       ; [debug line = 2:7] [debug variable = X]
-    store i32 21, i32* %X, align 4, !dbg !11        ; [debug line = 2:13]
-    call void @llvm.dbg.declare(metadata !{i32* %Y}, metadata !12), !dbg !13
+    store i32 21, i32* %X, align 4, !dbg !12
+    call void @llvm.dbg.declare(metadata !{i32* %Y}, metadata !13), !dbg !14
       ; [debug line = 3:7] [debug variable = Y]
-    store i32 22, i32* %Y, align 4, !dbg !14        ; [debug line = 3:13]
+    store i32 22, i32* %Y, align 4, !dbg !14
     call void @llvm.dbg.declare(metadata !{i32* %Z}, metadata !15), !dbg !17
       ; [debug line = 5:9] [debug variable = Z]
-    store i32 23, i32* %Z, align 4, !dbg !18        ; [debug line = 5:15]
-    %0 = load i32* %X, align 4, !dbg !19            ; [#uses=1 type=i32] \
+    store i32 23, i32* %Z, align 4, !dbg !17
+    %0 = load i32* %X, align 4, !dbg !18
       [debug line = 6:5]
-    store i32 %0, i32* %Z, align 4, !dbg !19        ; [debug line = 6:5]
-    %1 = load i32* %Y, align 4, !dbg !20            ; [#uses=1 type=i32] \
+    store i32 %0, i32* %Z, align 4, !dbg !18
+    %1 = load i32* %Y, align 4, !dbg !19
       [debug line = 8:3]
-    store i32 %1, i32* %X, align 4, !dbg !20        ; [debug line = 8:3]
-    ret void, !dbg !21                              ; [debug line = 9:1]
+    store i32 %1, i32* %X, align 4, !dbg !19
+    ret void, !dbg !20
   }
 
-  ; [#uses=3]
   ; Function Attrs: nounwind readnone
   declare void @llvm.dbg.declare(metadata, metadata) #1
 
-  attributes #0 = { optsize zeroext "less-precise-fpmad"="false"
-    "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true"
-    "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false"
+  attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false"
+    "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"
+    "no-infs-fp-math"="false" "no-nans-fp-math"="false"
+    "stack-protector-buffer-size"="8" "unsafe-fp-math"="false"
     "use-soft-float"="false" }
   attributes #1 = { nounwind readnone }
 
   !llvm.dbg.cu = !{!0}
-  
+  !llvm.module.flags = !{!8}
+  !llvm.ident = !{!9}
+
   !0 = metadata !{i32 786449, metadata !1, i32 12,
-                  metadata !"clang version 3.4 ", i1 false, metadata !"", i32 0,
-                  metadata !2, metadata !2, metadata !3, metadata !2,
-                  metadata !2, metadata !""} ; [ DW_TAG_compile_unit ] \
+                  metadata !"clang version 3.4 (trunk 193128) (llvm/trunk 193139)",
+                  i1 false, metadata !"", i32 0, metadata !2, metadata !2, metadata !3,
+                  metadata !2, metadata !2, metadata !""} ; [ DW_TAG_compile_unit ] \
                     [/private/tmp/foo.c] \
-                    [DW_LANG_C]
-  !1 = metadata !{metadata !"foo.c", metadata !"/private/tmp"}
+                    [DW_LANG_C99]
+  !1 = metadata !{metadata !"t.c", metadata !"/private/tmp"}
   !2 = metadata !{i32 0}
   !3 = metadata !{metadata !4}
   !4 = metadata !{i32 786478, metadata !1, metadata !5, metadata !"foo",
-                  metadata !"foo", metadata !"_Z3foov", i32 1, metadata !6,
-                  i1 false, i1 true, i32 0, i32 0, null, i32 256, i1 false,
-                  void ()* @_Z3foov, null, null, metadata !2, i32 1}
+                  metadata !"foo", metadata !"", i32 1, metadata !6,
+                  i1 false, i1 true, i32 0, i32 0, null, i32 0, i1 false,
+                  void ()* @foo, null, null, metadata !2, i32 1}
                   ; [ DW_TAG_subprogram ] [line 1] [def] [foo]
-  !5 = metadata !{i32 786473, metadata !1} ; [ DW_TAG_file_type ] \
-                    [/private/tmp/foo.c]
-  !6 = metadata !{i32 786453, i32 0, i32 0, metadata !"", i32 0, i64 0, i64 0,
-                  i64 0, i32 0, null, metadata !7, i32 0, i32 0}
+  !5 = metadata !{i32 786473, metadata !1}  ; [ DW_TAG_file_type ] \
+                    [/private/tmp/t.c]
+  !6 = metadata !{i32 786453, i32 0, null, metadata !"", i32 0, i64 0, i64 0,
+                  i64 0, i32 0, null, metadata !7, i32 0, null, null, null}
                   ; [ DW_TAG_subroutine_type ] \
                     [line 0, size 0, align 0, offset 0] [from ]
   !7 = metadata !{null}
-  !8 = metadata !{i32 786688, metadata !4, metadata !"X", metadata !5, i32 2, \
-                  metadata !9, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [X] \
-                    [line 2]
-  !9 = metadata !{i32 786468, null, null, metadata !"int", i32 0, i64 32, \
-                  i64 32, i64 0, i32 0, i32 5} ; [ DW_TAG_base_type ] [int] \
-                    [line 0, size 32, align 32, offset 0, enc DW_ATE_signed]
-  !10 = metadata !{i32 2, i32 7, metadata !4, null}
-  !11 = metadata !{i32 2, i32 13, metadata !4, null}
-  !12 = metadata !{i32 786688, metadata !4, metadata !"Y", metadata !5, i32 3, \
-                   metadata !9, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [Y] \
+  !8 = metadata !{i32 2, metadata !"Dwarf Version", i32 2}
+  !9 = metadata !{metadata !"clang version 3.4 (trunk 193128) (llvm/trunk 193139)"}
+  !10 = metadata !{i32 786688, metadata !4, metadata !"X", metadata !5, i32 2,
+                   metadata !11, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [X] \
+                     [line 2]
+  !11 = metadata !{i32 786468, null, null, metadata !"int", i32 0, i64 32,
+                   i64 32, i64 0, i32 0, i32 5} ; [ DW_TAG_base_type ] [int] \
+                     [line 0, size 32, align 32, offset 0, enc DW_ATE_signed]
+  !12 = metadata !{i32 2, i32 0, metadata !4, null}
+  !13 = metadata !{i32 786688, metadata !4, metadata !"Y", metadata !5, i32 3,
+                   metadata !11, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [Y] \
                      [line 3]
-  !13 = metadata !{i32 3, i32 7, metadata !4, null}
-  !14 = metadata !{i32 3, i32 13, metadata !4, null}
-  !15 = metadata !{i32 786688, metadata !16, metadata !"Z", metadata !5, i32 5, \
-                   metadata !9, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [Z] \
+  !14 = metadata !{i32 3, i32 0, metadata !4, null}
+  !15 = metadata !{i32 786688, metadata !16, metadata !"Z", metadata !5, i32 5,
+                   metadata !11, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [Z] \
                      [line 5]
-  !16 = metadata !{i32 786443, metadata !1, metadata !4, i32 4, i32 3, i32 0}
-                   ; [ DW_TAG_lexical_block ] [/private/tmp/foo.c]
-  !17 = metadata !{i32 5, i32 9, metadata !16, null}
-  !18 = metadata !{i32 5, i32 15, metadata !16, null}
-  !19 = metadata !{i32 6, i32 5, metadata !16, null}
-  !20 = metadata !{i32 8, i32 3, metadata !4, null}
-  !21 = metadata !{i32 9, i32 1, metadata !4, null}
+  !16 = metadata !{i32 786443, metadata !1, metadata !4, i32 4, i32 0, i32 0} \
+                   ; [ DW_TAG_lexical_block ] [/private/tmp/t.c]
+  !17 = metadata !{i32 5, i32 0, metadata !16, null}
+  !18 = metadata !{i32 6, i32 0, metadata !16, null}
+  !19 = metadata !{i32 8, i32 0, metadata !4, null} ; [ DW_TAG_imported_declaration ]
+  !20 = metadata !{i32 9, i32 0, metadata !4, null}
 
 This example illustrates a few important details about LLVM debugging
 information.  In particular, it shows how the ``llvm.dbg.declare`` intrinsic and
@@ -737,23 +739,23 @@ variable definitions, and the code used to implement the function.
 
 .. code-block:: llvm
 
-  call void @llvm.dbg.declare(metadata !{i32* %X}, metadata !8), !dbg !10
+  call void @llvm.dbg.declare(metadata !{i32* %X}, metadata !10), !dbg !12
     ; [debug line = 2:7] [debug variable = X]
 
 The first intrinsic ``%llvm.dbg.declare`` encodes debugging information for the
-variable ``X``.  The metadata ``!dbg !10`` attached to the intrinsic provides
+variable ``X``.  The metadata ``!dbg !12`` attached to the intrinsic provides
 scope information for the variable ``X``.
 
 .. code-block:: llvm
 
-  !10 = metadata !{i32 2, i32 7, metadata !4, null}
+  !12 = metadata !{i32 2, i32 0, metadata !4, null}
   !4 = metadata !{i32 786478, metadata !1, metadata !5, metadata !"foo",
-                  metadata !"foo", metadata !"_Z3foov", i32 1, metadata !6,
-                  i1 false, i1 true, i32 0, i32 0, null, i32 256, i1 false,
-                  void ()* @_Z3foov, null, null, metadata !2, i32 1}
-                  ; [ DW_TAG_subprogram ] [line 1] [def] [foo]
+                  metadata !"foo", metadata !"", i32 1, metadata !6,
+                  i1 false, i1 true, i32 0, i32 0, null, i32 0, i1 false,
+                  void ()* @foo, null, null, metadata !2, i32 1}
+                    ; [ DW_TAG_subprogram ] [line 1] [def] [foo]
 
-Here ``!10`` is metadata providing location information.  It has four fields:
+Here ``!12`` is metadata providing location information.  It has four fields:
 line number, column number, scope, and original scope.  The original scope
 represents inline location if this instruction is inlined inside a caller, and
 is null otherwise.  In this example, scope is encoded by ``!4``, a
@@ -774,12 +776,12 @@ scope information for the variable ``Z``.
 
 .. code-block:: llvm
 
-  !16 = metadata !{i32 786443, metadata !1, metadata !4, i32 4, i32 3, i32 0}
-                   ; [ DW_TAG_lexical_block ] [/private/tmp/foo.c]
-  !17 = metadata !{i32 5, i32 9, metadata !16, null}
+  !16 = metadata !{i32 786443, metadata !1, metadata !4, i32 4, i32 0, i32 0}
+                   ; [ DW_TAG_lexical_block ] [/private/tmp/t.c]
+  !17 = metadata !{i32 5, i32 0, metadata !16, null}
 
 Here ``!15`` indicates that ``Z`` is declared at line number 5 and
-column number 9 inside of lexical scope ``!16``.  The lexical scope itself
+column number 0 inside of lexical scope ``!16``.  The lexical scope itself
 resides inside of subprogram ``!4`` described above.
 
 The scope information attached with each instruction provides a straightforward
diff --git a/docs/TestingGuide.rst b/docs/TestingGuide.rst
index 3cfbb21..c9a35cd 100644
--- a/docs/TestingGuide.rst
+++ b/docs/TestingGuide.rst
@@ -21,9 +21,9 @@ tests.
 Requirements
 ============
 
-In order to use the LLVM testing infrastructure, you will need all of
-the software required to build LLVM, as well as
-`Python <http://python.org>`_ 2.4 or later.
+In order to use the LLVM testing infrastructure, you will need all of the
+software required to build LLVM, as well as `Python <http://python.org>`_ 2.5 or
+later.
 
 LLVM testing infrastructure organization
 ========================================
@@ -120,12 +120,14 @@ can run the LLVM and Clang tests simultaneously using:
 
     % make check-all
 
-To run the tests with Valgrind (Memcheck by default), just append
-``VG=1`` to the commands above, e.g.:
+To run the tests with Valgrind (Memcheck by default), use the ``LIT_ARGS`` make
+variable to pass the required options to lit. For example, you can use:
 
 .. code-block:: bash
 
-    % make check VG=1
+    % make check LIT_ARGS="-v --vg --vg-leak"
+
+to enable testing with valgrind and with leak checking enabled.
 
 To run individual tests or subsets of tests, you can use the ``llvm-lit``
 script which is built as part of LLVM. For example, to run the
diff --git a/docs/WritingAnLLVMBackend.rst b/docs/WritingAnLLVMBackend.rst
index 73381b5..35a2d16 100644
--- a/docs/WritingAnLLVMBackend.rst
+++ b/docs/WritingAnLLVMBackend.rst
@@ -911,6 +911,9 @@ format instructions will bind the operands to the ``rd``, ``rs1``, and ``rs2``
 fields.  This results in the ``XNORrr`` instruction binding ``$dst``, ``$b``,
 and ``$c`` operands to the ``rd``, ``rs1``, and ``rs2`` fields respectively.
 
+Instruction Operand Name Mapping
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
 TableGen will also generate a function called getNamedOperandIdx() which
 can be used to look up an operand's index in a MachineInstr based on its
 TableGen name.  Setting the UseNamedOperandTable bit in an instruction's
@@ -952,6 +955,59 @@ XXXInstrInfo.h:
     int16_t getNamedOperandIdx(uint16_t Opcode, uint16_t NamedIndex);
   } // End namespace XXX
 
+Instruction Operand Types
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+TableGen will also generate an enumeration consisting of all named Operand
+types defined in the backend, in the llvm::XXX::OpTypes namespace.
+Some common immediate Operand types (for instance i8, i32, i64, f32, f64)
+are defined for all targets in ``include/llvm/Target/Target.td``, and are
+available in each Target's OpTypes enum.  Also, only named Operand types appear
+in the enumeration: anonymous types are ignored.
+For example, the X86 backend defines ``brtarget`` and ``brtarget8``, both
+instances of the TableGen ``Operand`` class, which represent branch target
+operands:
+
+.. code-block:: llvm
+
+  def brtarget : Operand<OtherVT>;
+  def brtarget8 : Operand<OtherVT>;
+
+This results in:
+
+.. code-block:: c++
+
+  namespace X86 {
+  namespace OpTypes {
+  enum OperandType {
+    ...
+    brtarget,
+    brtarget8,
+    ...
+    i32imm,
+    i64imm,
+    ...
+    OPERAND_TYPE_LIST_END
+  } // End namespace OpTypes
+  } // End namespace X86
+
+In typical TableGen fashion, to use the enum, you will need to define a
+preprocessor macro:
+
+.. code-block:: c++
+
+  #define GET_INSTRINFO_OPERAND_TYPES_ENUM // For OpTypes enum
+  #include "XXXGenInstrInfo.inc"
+
+
+Instruction Scheduling
+----------------------
+
+Instruction itineraries can be queried using MCDesc::getSchedClass(). The
+value can be named by an enumemation in llvm::XXX::Sched namespace generated
+by TableGen in XXXGenInstrInfo.inc. The name of the schedule classes are
+the same as provided in XXXSchedule.td plus a default NoItinerary class.
+
 Instruction Relation Mapping
 ----------------------------
 
diff --git a/docs/WritingAnLLVMPass.rst b/docs/WritingAnLLVMPass.rst
index 1114fa0..f9cb4fe 100644
--- a/docs/WritingAnLLVMPass.rst
+++ b/docs/WritingAnLLVMPass.rst
@@ -131,7 +131,7 @@ Next, we declare our pass itself:
 
   struct Hello : public FunctionPass {
 
-This declares a "``Hello``" class that is a subclass of `FunctionPass
+This declares a "``Hello``" class that is a subclass of :ref:`FunctionPass
 <writing-an-llvm-pass-FunctionPass>`.  The different builtin pass subclasses
 are described in detail :ref:`later <writing-an-llvm-pass-pass-classes>`, but
 for now, know that ``FunctionPass`` operates on a function at a time.
diff --git a/docs/YamlIO.rst b/docs/YamlIO.rst
index a5cb637..3ecd03a 100644
--- a/docs/YamlIO.rst
+++ b/docs/YamlIO.rst
@@ -408,7 +408,7 @@ some time format (e.g. 4-May-2012 10:30pm).  YAML I/O has a way to support
 custom formatting and parsing of scalar types by specializing ScalarTraits<> on
 your data type.  When writing, YAML I/O will provide the native type and
 your specialization must create a temporary llvm::StringRef.  When reading,
-YAML I/O will provide a llvm::StringRef of scalar and your specialization
+YAML I/O will provide an llvm::StringRef of scalar and your specialization
 must convert that to your native data type.  An outline of a custom scalar type
 looks like:
 
@@ -549,7 +549,7 @@ coordinates into polar when reading YAML.
     };
 
 When writing YAML, the local variable "keys" will be a stack allocated 
-instance of NormalizedPolar, constructed from the suppled polar object which
+instance of NormalizedPolar, constructed from the supplied polar object which
 initializes it x and y fields.  The mapRequired() methods then write out the x
 and y values as key/value pairs.  
 
@@ -633,6 +633,20 @@ This works for both reading and writing. For example:
     };
 
 
+Tags
+----
+
+The YAML syntax supports tags as a way to specify the type of a node before
+it is parsed. This allows dynamic types of nodes.  But the YAML I/O model uses
+static typing, so there are limits to how you can use tags with the YAML I/O
+model. Recently, we added support to YAML I/O for checking/setting the optional 
+tag on a map. Using this functionality it is even possbile to support differnt 
+mappings, as long as they are convertable.  
+
+To check a tag, inside your mapping() method you can use io.mapTag() to specify
+what the tag should be.  This will also add that tag when writing yaml.
+
+
 Sequence
 ========
 
@@ -646,7 +660,7 @@ llvm::yaml::SequenceTraits on T and implement two methods:
   template <>
   struct SequenceTraits<MySeq> {
     static size_t size(IO &io, MySeq &list) { ... }
-    static MySeqEl element(IO &io, MySeq &list, size_t index) { ... }
+    static MySeqEl &element(IO &io, MySeq &list, size_t index) { ... }
   };
 
 The size() method returns how many elements are currently in your sequence.
@@ -669,7 +683,7 @@ add "static const bool flow = true;".  For instance:
   template <>
   struct SequenceTraits<MyList> {
     static size_t size(IO &io, MyList &list) { ... }
-    static MyListEl element(IO &io, MyList &list, size_t index) { ... }
+    static MyListEl &element(IO &io, MyList &list, size_t index) { ... }
     
     // The existence of this member causes YAML I/O to use a flow sequence
     static const bool flow = true;
diff --git a/docs/doxygen.cfg.in b/docs/doxygen.cfg.in
index 20de077..0ed686b 100644
--- a/docs/doxygen.cfg.in
+++ b/docs/doxygen.cfg.in
@@ -1,3 +1,4 @@
+
 # Doxyfile 1.7.1
 
 # This file describes the settings to be used by the documentation system
@@ -1068,7 +1069,7 @@ FORMULA_TRANSPARENT    = YES
 # typically be disabled. For large projects the javascript based search engine
 # can be slow, then enabling SERVER_BASED_SEARCH may provide a better solution.
 
-SEARCHENGINE           = NO
+SEARCHENGINE           = @enable_searchengine@
 
 # When the SERVER_BASED_SEARCH tag is enabled the search engine will be
 # implemented using a PHP enabled web server instead of at the web client
@@ -1078,7 +1079,15 @@ SEARCHENGINE           = NO
 # full text search. The disadvances is that it is more difficult to setup
 # and does not have live searching capabilities.
 
-SERVER_BASED_SEARCH    = NO
+SERVER_BASED_SEARCH    = @enable_server_based_search@
+
+SEARCHENGINE_URL       = @searchengine_url@
+
+EXTERNAL_SEARCH        = @enable_external_search@
+
+EXTERNAL_SEARCH_ID     = llvm
+
+EXTRA_SEARCH_MAPPINGS  = @extra_search_mappings@
 
 #---------------------------------------------------------------------------
 # configuration options related to the LaTeX output
diff --git a/docs/index.rst b/docs/index.rst
index be72195..62766f1 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -66,6 +66,7 @@ representation.
 
    CMake
    HowToBuildOnARM
+   HowToCrossCompileLLVM
    CommandGuide/index
    GettingStarted
    GettingStartedVS
@@ -82,6 +83,7 @@ representation.
    Passes
    YamlIO
    GetElementPtr
+   MCJITDesignAndImplementation
 
 :doc:`GettingStarted`
    Discusses how to get up and running quickly with the LLVM infrastructure.
@@ -95,6 +97,9 @@ representation.
 :doc:`HowToBuildOnARM`
    Notes on building and testing LLVM/Clang on ARM.
 
+:doc:`HowToCrossCompileLLVM`
+   Notes on cross-building and testing LLVM/Clang.
+
 :doc:`GettingStartedVS`
    An addendum to the main Getting Started guide for those using Visual Studio
    on Windows.
@@ -285,6 +290,9 @@ For API clients and LLVM developers.
 :doc:`DebuggingJITedCode`
    How to debug JITed code with GDB.
 
+:doc:`MCJITDesignAndImplementation`
+   Describes the inner workings of MCJIT execution engine.
+
 :doc:`BranchWeightMetadata`
    Provides information about Branch Prediction Information.
 
@@ -315,6 +323,7 @@ Information about LLVM's development process.
    LLVMBuild
    HowToReleaseLLVM
    Packaging
+   ReleaseProcess
 
 :doc:`DeveloperPolicy`
    The LLVM project's policy towards developers and their contributions.
diff --git a/docs/tutorial/OCamlLangImpl2.rst b/docs/tutorial/OCamlLangImpl2.rst
index 83a22ab..905b306 100644
--- a/docs/tutorial/OCamlLangImpl2.rst
+++ b/docs/tutorial/OCamlLangImpl2.rst
@@ -339,6 +339,9 @@ expression:
             (* Eat the binop. *)
             Stream.junk stream;
 
+            (* Parse the primary expression after the binary operator *)
+            let rhs = parse_primary stream in
+
             (* Okay, we know this is a binop. *)
             let rhs =
               match Stream.peek stream with
diff --git a/docs/yaml2obj.rst b/docs/yaml2obj.rst
index b269806..2c55f02 100644
--- a/docs/yaml2obj.rst
+++ b/docs/yaml2obj.rst
@@ -38,7 +38,7 @@ Here's a sample COFF file.
       ComplexType: IMAGE_SYM_DTYPE_NULL # (0)
       StorageClass: IMAGE_SYM_CLASS_STATIC # (3)
       NumberOfAuxSymbols: 1
-      AuxillaryData:
+      AuxiliaryData:
         "\x24\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00" # |$.................|
 
     - Name: _main