aboutsummaryrefslogtreecommitdiffstats
path: root/docs
diff options
context:
space:
mode:
authorBill Wendling <isanbard@gmail.com>2012-06-28 08:43:12 +0000
committerBill Wendling <isanbard@gmail.com>2012-06-28 08:43:12 +0000
commit0ca9927a71fed311eea7459b4c85c98cc7ed0352 (patch)
tree4e950553c2a69c6db06092a5dc69138c0e2e6378 /docs
parent87dc7a4c8d21bd638465881f3b6d091f22d9767c (diff)
downloadexternal_llvm-0ca9927a71fed311eea7459b4c85c98cc7ed0352.zip
external_llvm-0ca9927a71fed311eea7459b4c85c98cc7ed0352.tar.gz
external_llvm-0ca9927a71fed311eea7459b4c85c98cc7ed0352.tar.bz2
Sphinxify the bitcode format document.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159340 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs')
-rw-r--r--docs/BitCodeFormat.html1490
-rw-r--r--docs/BitCodeFormat.rst1045
-rw-r--r--docs/subsystems.rst3
3 files changed, 1047 insertions, 1491 deletions
diff --git a/docs/BitCodeFormat.html b/docs/BitCodeFormat.html
deleted file mode 100644
index 6a670f5..0000000
--- a/docs/BitCodeFormat.html
+++ /dev/null
@@ -1,1490 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- "http://www.w3.org/TR/html4/strict.dtd">
-<html>
-<head>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <title>LLVM Bitcode File Format</title>
- <link rel="stylesheet" href="_static/llvm.css" type="text/css">
-</head>
-<body>
-<h1> LLVM Bitcode File Format</h1>
-<ol>
- <li><a href="#abstract">Abstract</a></li>
- <li><a href="#overview">Overview</a></li>
- <li><a href="#bitstream">Bitstream Format</a>
- <ol>
- <li><a href="#magic">Magic Numbers</a></li>
- <li><a href="#primitives">Primitives</a></li>
- <li><a href="#abbrevid">Abbreviation IDs</a></li>
- <li><a href="#blocks">Blocks</a></li>
- <li><a href="#datarecord">Data Records</a></li>
- <li><a href="#abbreviations">Abbreviations</a></li>
- <li><a href="#stdblocks">Standard Blocks</a></li>
- </ol>
- </li>
- <li><a href="#wrapper">Bitcode Wrapper Format</a>
- </li>
- <li><a href="#llvmir">LLVM IR Encoding</a>
- <ol>
- <li><a href="#basics">Basics</a></li>
- <li><a href="#MODULE_BLOCK">MODULE_BLOCK Contents</a></li>
- <li><a href="#PARAMATTR_BLOCK">PARAMATTR_BLOCK Contents</a></li>
- <li><a href="#TYPE_BLOCK">TYPE_BLOCK Contents</a></li>
- <li><a href="#CONSTANTS_BLOCK">CONSTANTS_BLOCK Contents</a></li>
- <li><a href="#FUNCTION_BLOCK">FUNCTION_BLOCK Contents</a></li>
- <li><a href="#TYPE_SYMTAB_BLOCK">TYPE_SYMTAB_BLOCK Contents</a></li>
- <li><a href="#VALUE_SYMTAB_BLOCK">VALUE_SYMTAB_BLOCK Contents</a></li>
- <li><a href="#METADATA_BLOCK">METADATA_BLOCK Contents</a></li>
- <li><a href="#METADATA_ATTACHMENT">METADATA_ATTACHMENT Contents</a></li>
- </ol>
- </li>
-</ol>
-<div class="doc_author">
- <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>,
- <a href="http://www.reverberate.org">Joshua Haberman</a>,
- and <a href="mailto:housel@acm.org">Peter S. Housel</a>.
-</p>
-</div>
-
-<!-- *********************************************************************** -->
-<h2><a name="abstract">Abstract</a></h2>
-<!-- *********************************************************************** -->
-
-<div>
-
-<p>This document describes the LLVM bitstream file format and the encoding of
-the LLVM IR into it.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<h2><a name="overview">Overview</a></h2>
-<!-- *********************************************************************** -->
-
-<div>
-
-<p>
-What is commonly known as the LLVM bitcode file format (also, sometimes
-anachronistically known as bytecode) is actually two things: a <a
-href="#bitstream">bitstream container format</a>
-and an <a href="#llvmir">encoding of LLVM IR</a> into the container format.</p>
-
-<p>
-The bitstream format is an abstract encoding of structured data, very
-similar to XML in some ways. Like XML, bitstream files contain tags, and nested
-structures, and you can parse the file without having to understand the tags.
-Unlike XML, the bitstream format is a binary encoding, and unlike XML it
-provides a mechanism for the file to self-describe "abbreviations", which are
-effectively size optimizations for the content.</p>
-
-<p>LLVM IR files may be optionally embedded into a <a
-href="#wrapper">wrapper</a> structure that makes it easy to embed extra data
-along with LLVM IR files.</p>
-
-<p>This document first describes the LLVM bitstream format, describes the
-wrapper format, then describes the record structure used by LLVM IR files.
-</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<h2><a name="bitstream">Bitstream Format</a></h2>
-<!-- *********************************************************************** -->
-
-<div>
-
-<p>
-The bitstream format is literally a stream of bits, with a very simple
-structure. This structure consists of the following concepts:
-</p>
-
-<ul>
-<li>A "<a href="#magic">magic number</a>" that identifies the contents of
- the stream.</li>
-<li>Encoding <a href="#primitives">primitives</a> like variable bit-rate
- integers.</li>
-<li><a href="#blocks">Blocks</a>, which define nested content.</li>
-<li><a href="#datarecord">Data Records</a>, which describe entities within the
- file.</li>
-<li>Abbreviations, which specify compression optimizations for the file.</li>
-</ul>
-
-<p>Note that the <a
-href="CommandGuide/html/llvm-bcanalyzer.html">llvm-bcanalyzer</a> tool can be
-used to dump and inspect arbitrary bitstreams, which is very useful for
-understanding the encoding.</p>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="magic">Magic Numbers</a>
-</h3>
-
-<div>
-
-<p>The first two bytes of a bitcode file are 'BC' (0x42, 0x43).
-The second two bytes are an application-specific magic number. Generic
-bitcode tools can look at only the first two bytes to verify the file is
-bitcode, while application-specific programs will want to look at all four.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="primitives">Primitives</a>
-</h3>
-
-<div>
-
-<p>
-A bitstream literally consists of a stream of bits, which are read in order
-starting with the least significant bit of each byte. The stream is made up of a
-number of primitive values that encode a stream of unsigned integer values.
-These integers are encoded in two ways: either as <a href="#fixedwidth">Fixed
-Width Integers</a> or as <a href="#variablewidth">Variable Width
-Integers</a>.
-</p>
-
-<!-- _______________________________________________________________________ -->
-<h4>
- <a name="fixedwidth">Fixed Width Integers</a>
-</h4>
-
-<div>
-
-<p>Fixed-width integer values have their low bits emitted directly to the file.
- For example, a 3-bit integer value encodes 1 as 001. Fixed width integers
- are used when there are a well-known number of options for a field. For
- example, boolean values are usually encoded with a 1-bit wide integer.
-</p>
-
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4>
- <a name="variablewidth">Variable Width Integers</a>
-</h4>
-
-<div>
-
-<p>Variable-width integer (VBR) values encode values of arbitrary size,
-optimizing for the case where the values are small. Given a 4-bit VBR field,
-any 3-bit value (0 through 7) is encoded directly, with the high bit set to
-zero. Values larger than N-1 bits emit their bits in a series of N-1 bit
-chunks, where all but the last set the high bit.</p>
-
-<p>For example, the value 27 (0x1B) is encoded as 1011 0011 when emitted as a
-vbr4 value. The first set of four bits indicates the value 3 (011) with a
-continuation piece (indicated by a high bit of 1). The next word indicates a
-value of 24 (011 << 3) with no continuation. The sum (3+24) yields the value
-27.
-</p>
-
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="char6">6-bit characters</a></h4>
-
-<div>
-
-<p>6-bit characters encode common characters into a fixed 6-bit field. They
-represent the following characters with the following 6-bit values:</p>
-
-<div class="doc_code">
-<pre>
-'a' .. 'z' &mdash; 0 .. 25
-'A' .. 'Z' &mdash; 26 .. 51
-'0' .. '9' &mdash; 52 .. 61
- '.' &mdash; 62
- '_' &mdash; 63
-</pre>
-</div>
-
-<p>This encoding is only suitable for encoding characters and strings that
-consist only of the above characters. It is completely incapable of encoding
-characters not in the set.</p>
-
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="wordalign">Word Alignment</a></h4>
-
-<div>
-
-<p>Occasionally, it is useful to emit zero bits until the bitstream is a
-multiple of 32 bits. This ensures that the bit position in the stream can be
-represented as a multiple of 32-bit words.</p>
-
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="abbrevid">Abbreviation IDs</a>
-</h3>
-
-<div>
-
-<p>
-A bitstream is a sequential series of <a href="#blocks">Blocks</a> and
-<a href="#datarecord">Data Records</a>. Both of these start with an
-abbreviation ID encoded as a fixed-bitwidth field. The width is specified by
-the current block, as described below. The value of the abbreviation ID
-specifies either a builtin ID (which have special meanings, defined below) or
-one of the abbreviation IDs defined for the current block by the stream itself.
-</p>
-
-<p>
-The set of builtin abbrev IDs is:
-</p>
-
-<ul>
-<li><tt>0 - <a href="#END_BLOCK">END_BLOCK</a></tt> &mdash; This abbrev ID marks
- the end of the current block.</li>
-<li><tt>1 - <a href="#ENTER_SUBBLOCK">ENTER_SUBBLOCK</a></tt> &mdash; This
- abbrev ID marks the beginning of a new block.</li>
-<li><tt>2 - <a href="#DEFINE_ABBREV">DEFINE_ABBREV</a></tt> &mdash; This defines
- a new abbreviation.</li>
-<li><tt>3 - <a href="#UNABBREV_RECORD">UNABBREV_RECORD</a></tt> &mdash; This ID
- specifies the definition of an unabbreviated record.</li>
-</ul>
-
-<p>Abbreviation IDs 4 and above are defined by the stream itself, and specify
-an <a href="#abbrev_records">abbreviated record encoding</a>.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="blocks">Blocks</a>
-</h3>
-
-<div>
-
-<p>
-Blocks in a bitstream denote nested regions of the stream, and are identified by
-a content-specific id number (for example, LLVM IR uses an ID of 12 to represent
-function bodies). Block IDs 0-7 are reserved for <a href="#stdblocks">standard blocks</a>
-whose meaning is defined by Bitcode; block IDs 8 and greater are
-application specific. Nested blocks capture the hierarchical structure of the data
-encoded in it, and various properties are associated with blocks as the file is
-parsed. Block definitions allow the reader to efficiently skip blocks
-in constant time if the reader wants a summary of blocks, or if it wants to
-efficiently skip data it does not understand. The LLVM IR reader uses this
-mechanism to skip function bodies, lazily reading them on demand.
-</p>
-
-<p>
-When reading and encoding the stream, several properties are maintained for the
-block. In particular, each block maintains:
-</p>
-
-<ol>
-<li>A current abbrev id width. This value starts at 2 at the beginning of
- the stream, and is set every time a
- block record is entered. The block entry specifies the abbrev id width for
- the body of the block.</li>
-
-<li>A set of abbreviations. Abbreviations may be defined within a block, in
- which case they are only defined in that block (neither subblocks nor
- enclosing blocks see the abbreviation). Abbreviations can also be defined
- inside a <tt><a href="#BLOCKINFO">BLOCKINFO</a></tt> block, in which case
- they are defined in all blocks that match the ID that the BLOCKINFO block is
- describing.
-</li>
-</ol>
-
-<p>
-As sub blocks are entered, these properties are saved and the new sub-block has
-its own set of abbreviations, and its own abbrev id width. When a sub-block is
-popped, the saved values are restored.
-</p>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="ENTER_SUBBLOCK">ENTER_SUBBLOCK Encoding</a></h4>
-
-<div>
-
-<p><tt>[ENTER_SUBBLOCK, blockid<sub>vbr8</sub>, newabbrevlen<sub>vbr4</sub>,
- &lt;align32bits&gt;, blocklen<sub>32</sub>]</tt></p>
-
-<p>
-The <tt>ENTER_SUBBLOCK</tt> abbreviation ID specifies the start of a new block
-record. The <tt>blockid</tt> value is encoded as an 8-bit VBR identifier, and
-indicates the type of block being entered, which can be
-a <a href="#stdblocks">standard block</a> or an application-specific block.
-The <tt>newabbrevlen</tt> value is a 4-bit VBR, which specifies the abbrev id
-width for the sub-block. The <tt>blocklen</tt> value is a 32-bit aligned value
-that specifies the size of the subblock in 32-bit words. This value allows the
-reader to skip over the entire block in one jump.
-</p>
-
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="END_BLOCK">END_BLOCK Encoding</a></h4>
-
-<div>
-
-<p><tt>[END_BLOCK, &lt;align32bits&gt;]</tt></p>
-
-<p>
-The <tt>END_BLOCK</tt> abbreviation ID specifies the end of the current block
-record. Its end is aligned to 32-bits to ensure that the size of the block is
-an even multiple of 32-bits.
-</p>
-
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="datarecord">Data Records</a>
-</h3>
-
-<div>
-<p>
-Data records consist of a record code and a number of (up to) 64-bit
-integer values. The interpretation of the code and values is
-application specific and may vary between different block types.
-Records can be encoded either using an unabbrev record, or with an
-abbreviation. In the LLVM IR format, for example, there is a record
-which encodes the target triple of a module. The code is
-<tt>MODULE_CODE_TRIPLE</tt>, and the values of the record are the
-ASCII codes for the characters in the string.
-</p>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="UNABBREV_RECORD">UNABBREV_RECORD Encoding</a></h4>
-
-<div>
-
-<p><tt>[UNABBREV_RECORD, code<sub>vbr6</sub>, numops<sub>vbr6</sub>,
- op0<sub>vbr6</sub>, op1<sub>vbr6</sub>, ...]</tt></p>
-
-<p>
-An <tt>UNABBREV_RECORD</tt> provides a default fallback encoding, which is both
-completely general and extremely inefficient. It can describe an arbitrary
-record by emitting the code and operands as VBRs.
-</p>
-
-<p>
-For example, emitting an LLVM IR target triple as an unabbreviated record
-requires emitting the <tt>UNABBREV_RECORD</tt> abbrevid, a vbr6 for the
-<tt>MODULE_CODE_TRIPLE</tt> code, a vbr6 for the length of the string, which is
-equal to the number of operands, and a vbr6 for each character. Because there
-are no letters with values less than 32, each letter would need to be emitted as
-at least a two-part VBR, which means that each letter would require at least 12
-bits. This is not an efficient encoding, but it is fully general.
-</p>
-
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="abbrev_records">Abbreviated Record Encoding</a></h4>
-
-<div>
-
-<p><tt>[&lt;abbrevid&gt;, fields...]</tt></p>
-
-<p>
-An abbreviated record is a abbreviation id followed by a set of fields that are
-encoded according to the <a href="#abbreviations">abbreviation definition</a>.
-This allows records to be encoded significantly more densely than records
-encoded with the <tt><a href="#UNABBREV_RECORD">UNABBREV_RECORD</a></tt> type,
-and allows the abbreviation types to be specified in the stream itself, which
-allows the files to be completely self describing. The actual encoding of
-abbreviations is defined below.
-</p>
-
-<p>The record code, which is the first field of an abbreviated record,
-may be encoded in the abbreviation definition (as a literal
-operand) or supplied in the abbreviated record (as a Fixed or VBR
-operand value).</p>
-
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="abbreviations">Abbreviations</a>
-</h3>
-
-<div>
-<p>
-Abbreviations are an important form of compression for bitstreams. The idea is
-to specify a dense encoding for a class of records once, then use that encoding
-to emit many records. It takes space to emit the encoding into the file, but
-the space is recouped (hopefully plus some) when the records that use it are
-emitted.
-</p>
-
-<p>
-Abbreviations can be determined dynamically per client, per file. Because the
-abbreviations are stored in the bitstream itself, different streams of the same
-format can contain different sets of abbreviations according to the needs
-of the specific stream.
-As a concrete example, LLVM IR files usually emit an abbreviation
-for binary operators. If a specific LLVM module contained no or few binary
-operators, the abbreviation does not need to be emitted.
-</p>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="DEFINE_ABBREV">DEFINE_ABBREV Encoding</a></h4>
-
-<div>
-
-<p><tt>[DEFINE_ABBREV, numabbrevops<sub>vbr5</sub>, abbrevop0, abbrevop1,
- ...]</tt></p>
-
-<p>
-A <tt>DEFINE_ABBREV</tt> record adds an abbreviation to the list of currently
-defined abbreviations in the scope of this block. This definition only exists
-inside this immediate block &mdash; it is not visible in subblocks or enclosing
-blocks. Abbreviations are implicitly assigned IDs sequentially starting from 4
-(the first application-defined abbreviation ID). Any abbreviations defined in a
-<tt>BLOCKINFO</tt> record for the particular block type
-receive IDs first, in order, followed by any
-abbreviations defined within the block itself. Abbreviated data records
-reference this ID to indicate what abbreviation they are invoking.
-</p>
-
-<p>
-An abbreviation definition consists of the <tt>DEFINE_ABBREV</tt> abbrevid
-followed by a VBR that specifies the number of abbrev operands, then the abbrev
-operands themselves. Abbreviation operands come in three forms. They all start
-with a single bit that indicates whether the abbrev operand is a literal operand
-(when the bit is 1) or an encoding operand (when the bit is 0).
-</p>
-
-<ol>
-<li>Literal operands &mdash; <tt>[1<sub>1</sub>, litvalue<sub>vbr8</sub>]</tt>
-&mdash; Literal operands specify that the value in the result is always a single
-specific value. This specific value is emitted as a vbr8 after the bit
-indicating that it is a literal operand.</li>
-<li>Encoding info without data &mdash; <tt>[0<sub>1</sub>,
- encoding<sub>3</sub>]</tt> &mdash; Operand encodings that do not have extra
- data are just emitted as their code.
-</li>
-<li>Encoding info with data &mdash; <tt>[0<sub>1</sub>, encoding<sub>3</sub>,
-value<sub>vbr5</sub>]</tt> &mdash; Operand encodings that do have extra data are
-emitted as their code, followed by the extra data.
-</li>
-</ol>
-
-<p>The possible operand encodings are:</p>
-
-<ul>
-<li>Fixed (code 1): The field should be emitted as
- a <a href="#fixedwidth">fixed-width value</a>, whose width is specified by
- the operand's extra data.</li>
-<li>VBR (code 2): The field should be emitted as
- a <a href="#variablewidth">variable-width value</a>, whose width is
- specified by the operand's extra data.</li>
-<li>Array (code 3): This field is an array of values. The array operand
- has no extra data, but expects another operand to follow it, indicating
- the element type of the array. When reading an array in an abbreviated
- record, the first integer is a vbr6 that indicates the array length,
- followed by the encoded elements of the array. An array may only occur as
- the last operand of an abbreviation (except for the one final operand that
- gives the array's type).</li>
-<li>Char6 (code 4): This field should be emitted as
- a <a href="#char6">char6-encoded value</a>. This operand type takes no
- extra data. Char6 encoding is normally used as an array element type.
- </li>
-<li>Blob (code 5): This field is emitted as a vbr6, followed by padding to a
- 32-bit boundary (for alignment) and an array of 8-bit objects. The array of
- bytes is further followed by tail padding to ensure that its total length is
- a multiple of 4 bytes. This makes it very efficient for the reader to
- decode the data without having to make a copy of it: it can use a pointer to
- the data in the mapped in file and poke directly at it. A blob may only
- occur as the last operand of an abbreviation.</li>
-</ul>
-
-<p>
-For example, target triples in LLVM modules are encoded as a record of the
-form <tt>[TRIPLE, 'a', 'b', 'c', 'd']</tt>. Consider if the bitstream emitted
-the following abbrev entry:
-</p>
-
-<div class="doc_code">
-<pre>
-[0, Fixed, 4]
-[0, Array]
-[0, Char6]
-</pre>
-</div>
-
-<p>
-When emitting a record with this abbreviation, the above entry would be emitted
-as:
-</p>
-
-<div class="doc_code">
-<p>
-<tt>[4<sub>abbrevwidth</sub>, 2<sub>4</sub>, 4<sub>vbr6</sub>, 0<sub>6</sub>,
-1<sub>6</sub>, 2<sub>6</sub>, 3<sub>6</sub>]</tt>
-</p>
-</div>
-
-<p>These values are:</p>
-
-<ol>
-<li>The first value, 4, is the abbreviation ID for this abbreviation.</li>
-<li>The second value, 2, is the record code for <tt>TRIPLE</tt> records within LLVM IR file <tt>MODULE_BLOCK</tt> blocks.</li>
-<li>The third value, 4, is the length of the array.</li>
-<li>The rest of the values are the char6 encoded values
- for <tt>"abcd"</tt>.</li>
-</ol>
-
-<p>
-With this abbreviation, the triple is emitted with only 37 bits (assuming a
-abbrev id width of 3). Without the abbreviation, significantly more space would
-be required to emit the target triple. Also, because the <tt>TRIPLE</tt> value
-is not emitted as a literal in the abbreviation, the abbreviation can also be
-used for any other string value.
-</p>
-
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="stdblocks">Standard Blocks</a>
-</h3>
-
-<div>
-
-<p>
-In addition to the basic block structure and record encodings, the bitstream
-also defines specific built-in block types. These block types specify how the
-stream is to be decoded or other metadata. In the future, new standard blocks
-may be added. Block IDs 0-7 are reserved for standard blocks.
-</p>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="BLOCKINFO">#0 - BLOCKINFO Block</a></h4>
-
-<div>
-
-<p>
-The <tt>BLOCKINFO</tt> block allows the description of metadata for other
-blocks. The currently specified records are:
-</p>
-
-<div class="doc_code">
-<pre>
-[SETBID (#1), blockid]
-[DEFINE_ABBREV, ...]
-[BLOCKNAME, ...name...]
-[SETRECORDNAME, RecordID, ...name...]
-</pre>
-</div>
-
-<p>
-The <tt>SETBID</tt> record (code 1) indicates which block ID is being
-described. <tt>SETBID</tt> records can occur multiple times throughout the
-block to change which block ID is being described. There must be
-a <tt>SETBID</tt> record prior to any other records.
-</p>
-
-<p>
-Standard <tt>DEFINE_ABBREV</tt> records can occur inside <tt>BLOCKINFO</tt>
-blocks, but unlike their occurrence in normal blocks, the abbreviation is
-defined for blocks matching the block ID we are describing, <i>not</i> the
-<tt>BLOCKINFO</tt> block itself. The abbreviations defined
-in <tt>BLOCKINFO</tt> blocks receive abbreviation IDs as described
-in <tt><a href="#DEFINE_ABBREV">DEFINE_ABBREV</a></tt>.
-</p>
-
-<p>The <tt>BLOCKNAME</tt> record (code 2) can optionally occur in this block. The elements of
-the record are the bytes of the string name of the block. llvm-bcanalyzer can use
-this to dump out bitcode files symbolically.</p>
-
-<p>The <tt>SETRECORDNAME</tt> record (code 3) can also optionally occur in this block. The
-first operand value is a record ID number, and the rest of the elements of the record are
-the bytes for the string name of the record. llvm-bcanalyzer can use
-this to dump out bitcode files symbolically.</p>
-
-<p>
-Note that although the data in <tt>BLOCKINFO</tt> blocks is described as
-"metadata," the abbreviations they contain are essential for parsing records
-from the corresponding blocks. It is not safe to skip them.
-</p>
-
-</div>
-
-</div>
-
-</div>
-
-<!-- *********************************************************************** -->
-<h2><a name="wrapper">Bitcode Wrapper Format</a></h2>
-<!-- *********************************************************************** -->
-
-<div>
-
-<p>
-Bitcode files for LLVM IR may optionally be wrapped in a simple wrapper
-structure. This structure contains a simple header that indicates the offset
-and size of the embedded BC file. This allows additional information to be
-stored alongside the BC file. The structure of this file header is:
-</p>
-
-<div class="doc_code">
-<p>
-<tt>[Magic<sub>32</sub>, Version<sub>32</sub>, Offset<sub>32</sub>,
-Size<sub>32</sub>, CPUType<sub>32</sub>]</tt>
-</p>
-</div>
-
-<p>
-Each of the fields are 32-bit fields stored in little endian form (as with
-the rest of the bitcode file fields). The Magic number is always
-<tt>0x0B17C0DE</tt> and the version is currently always <tt>0</tt>. The Offset
-field is the offset in bytes to the start of the bitcode stream in the file, and
-the Size field is the size in bytes of the stream. CPUType is a target-specific
-value that can be used to encode the CPU of the target.
-</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<h2><a name="llvmir">LLVM IR Encoding</a></h2>
-<!-- *********************************************************************** -->
-
-<div>
-
-<p>
-LLVM IR is encoded into a bitstream by defining blocks and records. It uses
-blocks for things like constant pools, functions, symbol tables, etc. It uses
-records for things like instructions, global variable descriptors, type
-descriptions, etc. This document does not describe the set of abbreviations
-that the writer uses, as these are fully self-described in the file, and the
-reader is not allowed to build in any knowledge of this.
-</p>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="basics">Basics</a>
-</h3>
-
-<div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="ir_magic">LLVM IR Magic Number</a></h4>
-
-<div>
-
-<p>
-The magic number for LLVM IR files is:
-</p>
-
-<div class="doc_code">
-<p>
-<tt>[0x0<sub>4</sub>, 0xC<sub>4</sub>, 0xE<sub>4</sub>, 0xD<sub>4</sub>]</tt>
-</p>
-</div>
-
-<p>
-When combined with the bitcode magic number and viewed as bytes, this is
-<tt>"BC&nbsp;0xC0DE"</tt>.
-</p>
-
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="ir_signed_vbr">Signed VBRs</a></h4>
-
-<div>
-
-<p>
-<a href="#variablewidth">Variable Width Integer</a> encoding is an efficient way to
-encode arbitrary sized unsigned values, but is an extremely inefficient for
-encoding signed values, as signed values are otherwise treated as maximally large
-unsigned values.
-</p>
-
-<p>
-As such, signed VBR values of a specific width are emitted as follows:
-</p>
-
-<ul>
-<li>Positive values are emitted as VBRs of the specified width, but with their
- value shifted left by one.</li>
-<li>Negative values are emitted as VBRs of the specified width, but the negated
- value is shifted left by one, and the low bit is set.</li>
-</ul>
-
-<p>
-With this encoding, small positive and small negative values can both
-be emitted efficiently. Signed VBR encoding is used in
-<tt>CST_CODE_INTEGER</tt> and <tt>CST_CODE_WIDE_INTEGER</tt> records
-within <tt>CONSTANTS_BLOCK</tt> blocks.
-</p>
-
-</div>
-
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="ir_blocks">LLVM IR Blocks</a></h4>
-
-<div>
-
-<p>
-LLVM IR is defined with the following blocks:
-</p>
-
-<ul>
-<li>8 &mdash; <a href="#MODULE_BLOCK"><tt>MODULE_BLOCK</tt></a> &mdash; This is the top-level block that
- contains the entire module, and describes a variety of per-module
- information.</li>
-<li>9 &mdash; <a href="#PARAMATTR_BLOCK"><tt>PARAMATTR_BLOCK</tt></a> &mdash; This enumerates the parameter
- attributes.</li>
-<li>10 &mdash; <a href="#TYPE_BLOCK"><tt>TYPE_BLOCK</tt></a> &mdash; This describes all of the types in
- the module.</li>
-<li>11 &mdash; <a href="#CONSTANTS_BLOCK"><tt>CONSTANTS_BLOCK</tt></a> &mdash; This describes constants for a
- module or function.</li>
-<li>12 &mdash; <a href="#FUNCTION_BLOCK"><tt>FUNCTION_BLOCK</tt></a> &mdash; This describes a function
- body.</li>
-<li>13 &mdash; <a href="#TYPE_SYMTAB_BLOCK"><tt>TYPE_SYMTAB_BLOCK</tt></a> &mdash; This describes the type symbol
- table.</li>
-<li>14 &mdash; <a href="#VALUE_SYMTAB_BLOCK"><tt>VALUE_SYMTAB_BLOCK</tt></a> &mdash; This describes a value symbol
- table.</li>
-<li>15 &mdash; <a href="#METADATA_BLOCK"><tt>METADATA_BLOCK</tt></a> &mdash; This describes metadata items.</li>
-<li>16 &mdash; <a href="#METADATA_ATTACHMENT"><tt>METADATA_ATTACHMENT</tt></a> &mdash; This contains records associating metadata with function instruction values.</li>
-</ul>
-
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="MODULE_BLOCK">MODULE_BLOCK Contents</a>
-</h3>
-
-<div>
-
-<p>The <tt>MODULE_BLOCK</tt> block (id 8) is the top-level block for LLVM
-bitcode files, and each bitcode file must contain exactly one. In
-addition to records (described below) containing information
-about the module, a <tt>MODULE_BLOCK</tt> block may contain the
-following sub-blocks:
-</p>
-
-<ul>
-<li><a href="#BLOCKINFO"><tt>BLOCKINFO</tt></a></li>
-<li><a href="#PARAMATTR_BLOCK"><tt>PARAMATTR_BLOCK</tt></a></li>
-<li><a href="#TYPE_BLOCK"><tt>TYPE_BLOCK</tt></a></li>
-<li><a href="#TYPE_SYMTAB_BLOCK"><tt>TYPE_SYMTAB_BLOCK</tt></a></li>
-<li><a href="#VALUE_SYMTAB_BLOCK"><tt>VALUE_SYMTAB_BLOCK</tt></a></li>
-<li><a href="#CONSTANTS_BLOCK"><tt>CONSTANTS_BLOCK</tt></a></li>
-<li><a href="#FUNCTION_BLOCK"><tt>FUNCTION_BLOCK</tt></a></li>
-<li><a href="#METADATA_BLOCK"><tt>METADATA_BLOCK</tt></a></li>
-</ul>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="MODULE_CODE_VERSION">MODULE_CODE_VERSION Record</a></h4>
-
-<div>
-
-<p><tt>[VERSION, version#]</tt></p>
-
-<p>The <tt>VERSION</tt> record (code 1) contains a single value
-indicating the format version. Only version 0 is supported at this
-time.</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="MODULE_CODE_TRIPLE">MODULE_CODE_TRIPLE Record</a></h4>
-
-<div>
-<p><tt>[TRIPLE, ...string...]</tt></p>
-
-<p>The <tt>TRIPLE</tt> record (code 2) contains a variable number of
-values representing the bytes of the <tt>target triple</tt>
-specification string.</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="MODULE_CODE_DATALAYOUT">MODULE_CODE_DATALAYOUT Record</a></h4>
-
-<div>
-<p><tt>[DATALAYOUT, ...string...]</tt></p>
-
-<p>The <tt>DATALAYOUT</tt> record (code 3) contains a variable number of
-values representing the bytes of the <tt>target datalayout</tt>
-specification string.</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="MODULE_CODE_ASM">MODULE_CODE_ASM Record</a></h4>
-
-<div>
-<p><tt>[ASM, ...string...]</tt></p>
-
-<p>The <tt>ASM</tt> record (code 4) contains a variable number of
-values representing the bytes of <tt>module asm</tt> strings, with
-individual assembly blocks separated by newline (ASCII 10) characters.</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="MODULE_CODE_SECTIONNAME">MODULE_CODE_SECTIONNAME Record</a></h4>
-
-<div>
-<p><tt>[SECTIONNAME, ...string...]</tt></p>
-
-<p>The <tt>SECTIONNAME</tt> record (code 5) contains a variable number
-of values representing the bytes of a single section name
-string. There should be one <tt>SECTIONNAME</tt> record for each
-section name referenced (e.g., in global variable or function
-<tt>section</tt> attributes) within the module. These records can be
-referenced by the 1-based index in the <i>section</i> fields of
-<tt>GLOBALVAR</tt> or <tt>FUNCTION</tt> records.</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="MODULE_CODE_DEPLIB">MODULE_CODE_DEPLIB Record</a></h4>
-
-<div>
-<p><tt>[DEPLIB, ...string...]</tt></p>
-
-<p>The <tt>DEPLIB</tt> record (code 6) contains a variable number of
-values representing the bytes of a single dependent library name
-string, one of the libraries mentioned in a <tt>deplibs</tt>
-declaration. There should be one <tt>DEPLIB</tt> record for each
-library name referenced.</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="MODULE_CODE_GLOBALVAR">MODULE_CODE_GLOBALVAR Record</a></h4>
-
-<div>
-<p><tt>[GLOBALVAR, pointer type, isconst, initid, linkage, alignment, section, visibility, threadlocal, unnamed_addr]</tt></p>
-
-<p>The <tt>GLOBALVAR</tt> record (code 7) marks the declaration or
-definition of a global variable. The operand fields are:</p>
-
-<ul>
-<li><i>pointer type</i>: The type index of the pointer type used to point to
-this global variable</li>
-
-<li><i>isconst</i>: Non-zero if the variable is treated as constant within
-the module, or zero if it is not</li>
-
-<li><i>initid</i>: If non-zero, the value index of the initializer for this
-variable, plus 1.</li>
-
-<li><a name="linkage"><i>linkage</i></a>: An encoding of the linkage
-type for this variable:
- <ul>
- <li><tt>external</tt>: code 0</li>
- <li><tt>weak</tt>: code 1</li>
- <li><tt>appending</tt>: code 2</li>
- <li><tt>internal</tt>: code 3</li>
- <li><tt>linkonce</tt>: code 4</li>
- <li><tt>dllimport</tt>: code 5</li>
- <li><tt>dllexport</tt>: code 6</li>
- <li><tt>extern_weak</tt>: code 7</li>
- <li><tt>common</tt>: code 8</li>
- <li><tt>private</tt>: code 9</li>
- <li><tt>weak_odr</tt>: code 10</li>
- <li><tt>linkonce_odr</tt>: code 11</li>
- <li><tt>available_externally</tt>: code 12</li>
- <li><tt>linker_private</tt>: code 13</li>
- </ul>
-</li>
-
-<li><i>alignment</i>: The logarithm base 2 of the variable's requested
-alignment, plus 1</li>
-
-<li><i>section</i>: If non-zero, the 1-based section index in the
-table of <a href="#MODULE_CODE_SECTIONNAME">MODULE_CODE_SECTIONNAME</a>
-entries.</li>
-
-<li><a name="visibility"><i>visibility</i></a>: If present, an
-encoding of the visibility of this variable:
- <ul>
- <li><tt>default</tt>: code 0</li>
- <li><tt>hidden</tt>: code 1</li>
- <li><tt>protected</tt>: code 2</li>
- </ul>
-</li>
-
-<li><i>threadlocal</i>: If present, an encoding of the thread local storage
-mode of the variable:
- <ul>
- <li><tt>not thread local</tt>: code 0</li>
- <li><tt>thread local; default TLS model</tt>: code 1</li>
- <li><tt>localdynamic</tt>: code 2</li>
- <li><tt>initialexec</tt>: code 3</li>
- <li><tt>localexec</tt>: code 4</li>
- </ul>
-</li>
-
-<li><i>unnamed_addr</i>: If present and non-zero, indicates that the variable
-has <tt>unnamed_addr</tt></li>
-
-</ul>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="MODULE_CODE_FUNCTION">MODULE_CODE_FUNCTION Record</a></h4>
-
-<div>
-
-<p><tt>[FUNCTION, type, callingconv, isproto, linkage, paramattr, alignment, section, visibility, gc]</tt></p>
-
-<p>The <tt>FUNCTION</tt> record (code 8) marks the declaration or
-definition of a function. The operand fields are:</p>
-
-<ul>
-<li><i>type</i>: The type index of the function type describing this function</li>
-
-<li><i>callingconv</i>: The calling convention number:
- <ul>
- <li><tt>ccc</tt>: code 0</li>
- <li><tt>fastcc</tt>: code 8</li>
- <li><tt>coldcc</tt>: code 9</li>
- <li><tt>x86_stdcallcc</tt>: code 64</li>
- <li><tt>x86_fastcallcc</tt>: code 65</li>
- <li><tt>arm_apcscc</tt>: code 66</li>
- <li><tt>arm_aapcscc</tt>: code 67</li>
- <li><tt>arm_aapcs_vfpcc</tt>: code 68</li>
- </ul>
-</li>
-
-<li><i>isproto</i>: Non-zero if this entry represents a declaration
-rather than a definition</li>
-
-<li><i>linkage</i>: An encoding of the <a href="#linkage">linkage type</a>
-for this function</li>
-
-<li><i>paramattr</i>: If nonzero, the 1-based parameter attribute index
-into the table of <a href="#PARAMATTR_CODE_ENTRY">PARAMATTR_CODE_ENTRY</a>
-entries.</li>
-
-<li><i>alignment</i>: The logarithm base 2 of the function's requested
-alignment, plus 1</li>
-
-<li><i>section</i>: If non-zero, the 1-based section index in the
-table of <a href="#MODULE_CODE_SECTIONNAME">MODULE_CODE_SECTIONNAME</a>
-entries.</li>
-
-<li><i>visibility</i>: An encoding of the <a href="#visibility">visibility</a>
- of this function</li>
-
-<li><i>gc</i>: If present and nonzero, the 1-based garbage collector
-index in the table of
-<a href="#MODULE_CODE_GCNAME">MODULE_CODE_GCNAME</a> entries.</li>
-
-<li><i>unnamed_addr</i>: If present and non-zero, indicates that the function
-has <tt>unnamed_addr</tt></li>
-
-</ul>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="MODULE_CODE_ALIAS">MODULE_CODE_ALIAS Record</a></h4>
-
-<div>
-
-<p><tt>[ALIAS, alias type, aliasee val#, linkage, visibility]</tt></p>
-
-<p>The <tt>ALIAS</tt> record (code 9) marks the definition of an
-alias. The operand fields are</p>
-
-<ul>
-<li><i>alias type</i>: The type index of the alias</li>
-
-<li><i>aliasee val#</i>: The value index of the aliased value</li>
-
-<li><i>linkage</i>: An encoding of the <a href="#linkage">linkage type</a>
-for this alias</li>
-
-<li><i>visibility</i>: If present, an encoding of the
-<a href="#visibility">visibility</a> of the alias</li>
-
-</ul>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="MODULE_CODE_PURGEVALS">MODULE_CODE_PURGEVALS Record</a></h4>
-
-<div>
-<p><tt>[PURGEVALS, numvals]</tt></p>
-
-<p>The <tt>PURGEVALS</tt> record (code 10) resets the module-level
-value list to the size given by the single operand value. Module-level
-value list items are added by <tt>GLOBALVAR</tt>, <tt>FUNCTION</tt>,
-and <tt>ALIAS</tt> records. After a <tt>PURGEVALS</tt> record is seen,
-new value indices will start from the given <i>numvals</i> value.</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="MODULE_CODE_GCNAME">MODULE_CODE_GCNAME Record</a></h4>
-
-<div>
-<p><tt>[GCNAME, ...string...]</tt></p>
-
-<p>The <tt>GCNAME</tt> record (code 11) contains a variable number of
-values representing the bytes of a single garbage collector name
-string. There should be one <tt>GCNAME</tt> record for each garbage
-collector name referenced in function <tt>gc</tt> attributes within
-the module. These records can be referenced by 1-based index in the <i>gc</i>
-fields of <tt>FUNCTION</tt> records.</p>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="PARAMATTR_BLOCK">PARAMATTR_BLOCK Contents</a>
-</h3>
-
-<div>
-
-<p>The <tt>PARAMATTR_BLOCK</tt> block (id 9) contains a table of
-entries describing the attributes of function parameters. These
-entries are referenced by 1-based index in the <i>paramattr</i> field
-of module block <a name="MODULE_CODE_FUNCTION"><tt>FUNCTION</tt></a>
-records, or within the <i>attr</i> field of function block <a
-href="#FUNC_CODE_INST_INVOKE"><tt>INST_INVOKE</tt></a> and <a
-href="#FUNC_CODE_INST_CALL"><tt>INST_CALL</tt></a> records.</p>
-
-<p>Entries within <tt>PARAMATTR_BLOCK</tt> are constructed to ensure
-that each is unique (i.e., no two indicies represent equivalent
-attribute lists). </p>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="PARAMATTR_CODE_ENTRY">PARAMATTR_CODE_ENTRY Record</a></h4>
-
-<div>
-
-<p><tt>[ENTRY, paramidx0, attr0, paramidx1, attr1...]</tt></p>
-
-<p>The <tt>ENTRY</tt> record (code 1) contains an even number of
-values describing a unique set of function parameter attributes. Each
-<i>paramidx</i> value indicates which set of attributes is
-represented, with 0 representing the return value attributes,
-0xFFFFFFFF representing function attributes, and other values
-representing 1-based function parameters. Each <i>attr</i> value is a
-bitmap with the following interpretation:
-</p>
-
-<ul>
-<li>bit 0: <tt>zeroext</tt></li>
-<li>bit 1: <tt>signext</tt></li>
-<li>bit 2: <tt>noreturn</tt></li>
-<li>bit 3: <tt>inreg</tt></li>
-<li>bit 4: <tt>sret</tt></li>
-<li>bit 5: <tt>nounwind</tt></li>
-<li>bit 6: <tt>noalias</tt></li>
-<li>bit 7: <tt>byval</tt></li>
-<li>bit 8: <tt>nest</tt></li>
-<li>bit 9: <tt>readnone</tt></li>
-<li>bit 10: <tt>readonly</tt></li>
-<li>bit 11: <tt>noinline</tt></li>
-<li>bit 12: <tt>alwaysinline</tt></li>
-<li>bit 13: <tt>optsize</tt></li>
-<li>bit 14: <tt>ssp</tt></li>
-<li>bit 15: <tt>sspreq</tt></li>
-<li>bits 16&ndash;31: <tt>align <var>n</var></tt></li>
-<li>bit 32: <tt>nocapture</tt></li>
-<li>bit 33: <tt>noredzone</tt></li>
-<li>bit 34: <tt>noimplicitfloat</tt></li>
-<li>bit 35: <tt>naked</tt></li>
-<li>bit 36: <tt>inlinehint</tt></li>
-<li>bits 37&ndash;39: <tt>alignstack <var>n</var></tt>, represented as
-the logarithm base 2 of the requested alignment, plus 1</li>
-</ul>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="TYPE_BLOCK">TYPE_BLOCK Contents</a>
-</h3>
-
-<div>
-
-<p>The <tt>TYPE_BLOCK</tt> block (id 10) contains records which
-constitute a table of type operator entries used to represent types
-referenced within an LLVM module. Each record (with the exception of
-<a href="#TYPE_CODE_NUMENTRY"><tt>NUMENTRY</tt></a>) generates a
-single type table entry, which may be referenced by 0-based index from
-instructions, constants, metadata, type symbol table entries, or other
-type operator records.
-</p>
-
-<p>Entries within <tt>TYPE_BLOCK</tt> are constructed to ensure that
-each entry is unique (i.e., no two indicies represent structurally
-equivalent types). </p>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="TYPE_CODE_NUMENTRY">TYPE_CODE_NUMENTRY Record</a></h4>
-
-<div>
-
-<p><tt>[NUMENTRY, numentries]</tt></p>
-
-<p>The <tt>NUMENTRY</tt> record (code 1) contains a single value which
-indicates the total number of type code entries in the type table of
-the module. If present, <tt>NUMENTRY</tt> should be the first record
-in the block.
-</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="TYPE_CODE_VOID">TYPE_CODE_VOID Record</a></h4>
-
-<div>
-
-<p><tt>[VOID]</tt></p>
-
-<p>The <tt>VOID</tt> record (code 2) adds a <tt>void</tt> type to the
-type table.
-</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="TYPE_CODE_HALF">TYPE_CODE_HALF Record</a></h4>
-
-<div>
-
-<p><tt>[HALF]</tt></p>
-
-<p>The <tt>HALF</tt> record (code 10) adds a <tt>half</tt> (16-bit
-floating point) type to the type table.
-</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="TYPE_CODE_FLOAT">TYPE_CODE_FLOAT Record</a></h4>
-
-<div>
-
-<p><tt>[FLOAT]</tt></p>
-
-<p>The <tt>FLOAT</tt> record (code 3) adds a <tt>float</tt> (32-bit
-floating point) type to the type table.
-</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="TYPE_CODE_DOUBLE">TYPE_CODE_DOUBLE Record</a></h4>
-
-<div>
-
-<p><tt>[DOUBLE]</tt></p>
-
-<p>The <tt>DOUBLE</tt> record (code 4) adds a <tt>double</tt> (64-bit
-floating point) type to the type table.
-</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="TYPE_CODE_LABEL">TYPE_CODE_LABEL Record</a></h4>
-
-<div>
-
-<p><tt>[LABEL]</tt></p>
-
-<p>The <tt>LABEL</tt> record (code 5) adds a <tt>label</tt> type to
-the type table.
-</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="TYPE_CODE_OPAQUE">TYPE_CODE_OPAQUE Record</a></h4>
-
-<div>
-
-<p><tt>[OPAQUE]</tt></p>
-
-<p>The <tt>OPAQUE</tt> record (code 6) adds an <tt>opaque</tt> type to
-the type table. Note that distinct <tt>opaque</tt> types are not
-unified.
-</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="TYPE_CODE_INTEGER">TYPE_CODE_INTEGER Record</a></h4>
-
-<div>
-
-<p><tt>[INTEGER, width]</tt></p>
-
-<p>The <tt>INTEGER</tt> record (code 7) adds an integer type to the
-type table. The single <i>width</i> field indicates the width of the
-integer type.
-</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="TYPE_CODE_POINTER">TYPE_CODE_POINTER Record</a></h4>
-
-<div>
-
-<p><tt>[POINTER, pointee type, address space]</tt></p>
-
-<p>The <tt>POINTER</tt> record (code 8) adds a pointer type to the
-type table. The operand fields are</p>
-
-<ul>
-<li><i>pointee type</i>: The type index of the pointed-to type</li>
-
-<li><i>address space</i>: If supplied, the target-specific numbered
-address space where the pointed-to object resides. Otherwise, the
-default address space is zero.
-</li>
-</ul>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="TYPE_CODE_FUNCTION">TYPE_CODE_FUNCTION Record</a></h4>
-
-<div>
-
-<p><tt>[FUNCTION, vararg, ignored, retty, ...paramty... ]</tt></p>
-
-<p>The <tt>FUNCTION</tt> record (code 9) adds a function type to the
-type table. The operand fields are</p>
-
-<ul>
-<li><i>vararg</i>: Non-zero if the type represents a varargs function</li>
-
-<li><i>ignored</i>: This value field is present for backward
-compatibility only, and is ignored</li>
-
-<li><i>retty</i>: The type index of the function's return type</li>
-
-<li><i>paramty</i>: Zero or more type indices representing the
-parameter types of the function</li>
-</ul>
-
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="TYPE_CODE_STRUCT">TYPE_CODE_STRUCT Record</a></h4>
-
-<div>
-
-<p><tt>[STRUCT, ispacked, ...eltty...]</tt></p>
-
-<p>The <tt>STRUCT </tt> record (code 10) adds a struct type to the
-type table. The operand fields are</p>
-
-<ul>
-<li><i>ispacked</i>: Non-zero if the type represents a packed structure</li>
-
-<li><i>eltty</i>: Zero or more type indices representing the element
-types of the structure</li>
-</ul>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="TYPE_CODE_ARRAY">TYPE_CODE_ARRAY Record</a></h4>
-
-<div>
-
-<p><tt>[ARRAY, numelts, eltty]</tt></p>
-
-<p>The <tt>ARRAY</tt> record (code 11) adds an array type to the type
-table. The operand fields are</p>
-
-<ul>
-<li><i>numelts</i>: The number of elements in arrays of this type</li>
-
-<li><i>eltty</i>: The type index of the array element type</li>
-</ul>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="TYPE_CODE_VECTOR">TYPE_CODE_VECTOR Record</a></h4>
-
-<div>
-
-<p><tt>[VECTOR, numelts, eltty]</tt></p>
-
-<p>The <tt>VECTOR</tt> record (code 12) adds a vector type to the type
-table. The operand fields are</p>
-
-<ul>
-<li><i>numelts</i>: The number of elements in vectors of this type</li>
-
-<li><i>eltty</i>: The type index of the vector element type</li>
-</ul>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="TYPE_CODE_X86_FP80">TYPE_CODE_X86_FP80 Record</a></h4>
-
-<div>
-
-<p><tt>[X86_FP80]</tt></p>
-
-<p>The <tt>X86_FP80</tt> record (code 13) adds an <tt>x86_fp80</tt> (80-bit
-floating point) type to the type table.
-</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="TYPE_CODE_FP128">TYPE_CODE_FP128 Record</a></h4>
-
-<div>
-
-<p><tt>[FP128]</tt></p>
-
-<p>The <tt>FP128</tt> record (code 14) adds an <tt>fp128</tt> (128-bit
-floating point) type to the type table.
-</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="TYPE_CODE_PPC_FP128">TYPE_CODE_PPC_FP128 Record</a></h4>
-
-<div>
-
-<p><tt>[PPC_FP128]</tt></p>
-
-<p>The <tt>PPC_FP128</tt> record (code 15) adds a <tt>ppc_fp128</tt>
-(128-bit floating point) type to the type table.
-</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="TYPE_CODE_METADATA">TYPE_CODE_METADATA Record</a></h4>
-
-<div>
-
-<p><tt>[METADATA]</tt></p>
-
-<p>The <tt>METADATA</tt> record (code 16) adds a <tt>metadata</tt>
-type to the type table.
-</p>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="CONSTANTS_BLOCK">CONSTANTS_BLOCK Contents</a>
-</h3>
-
-<div>
-
-<p>The <tt>CONSTANTS_BLOCK</tt> block (id 11) ...
-</p>
-
-</div>
-
-
-<!-- ======================================================================= -->
-<h3>
- <a name="FUNCTION_BLOCK">FUNCTION_BLOCK Contents</a>
-</h3>
-
-<div>
-
-<p>The <tt>FUNCTION_BLOCK</tt> block (id 12) ...
-</p>
-
-<p>In addition to the record types described below, a
-<tt>FUNCTION_BLOCK</tt> block may contain the following sub-blocks:
-</p>
-
-<ul>
-<li><a href="#CONSTANTS_BLOCK"><tt>CONSTANTS_BLOCK</tt></a></li>
-<li><a href="#VALUE_SYMTAB_BLOCK"><tt>VALUE_SYMTAB_BLOCK</tt></a></li>
-<li><a href="#METADATA_ATTACHMENT"><tt>METADATA_ATTACHMENT</tt></a></li>
-</ul>
-
-</div>
-
-
-<!-- ======================================================================= -->
-<h3>
- <a name="TYPE_SYMTAB_BLOCK">TYPE_SYMTAB_BLOCK Contents</a>
-</h3>
-
-<div>
-
-<p>The <tt>TYPE_SYMTAB_BLOCK</tt> block (id 13) contains entries which
-map between module-level named types and their corresponding type
-indices.
-</p>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="TST_CODE_ENTRY">TST_CODE_ENTRY Record</a></h4>
-
-<div>
-
-<p><tt>[ENTRY, typeid, ...string...]</tt></p>
-
-<p>The <tt>ENTRY</tt> record (code 1) contains a variable number of
-values, with the first giving the type index of the designated type,
-and the remaining values giving the character codes of the type
-name. Each entry corresponds to a single named type.
-</p>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
- <a name="VALUE_SYMTAB_BLOCK">VALUE_SYMTAB_BLOCK Contents</a>
-</h3>
-
-<div>
-
-<p>The <tt>VALUE_SYMTAB_BLOCK</tt> block (id 14) ...
-</p>
-
-</div>
-
-
-<!-- ======================================================================= -->
-<h3>
- <a name="METADATA_BLOCK">METADATA_BLOCK Contents</a>
-</h3>
-
-<div>
-
-<p>The <tt>METADATA_BLOCK</tt> block (id 15) ...
-</p>
-
-</div>
-
-
-<!-- ======================================================================= -->
-<h3>
- <a name="METADATA_ATTACHMENT">METADATA_ATTACHMENT Contents</a>
-</h3>
-
-<div>
-
-<p>The <tt>METADATA_ATTACHMENT</tt> block (id 16) ...
-</p>
-
-</div>
-
-</div>
-
-<!-- *********************************************************************** -->
-<hr>
-<address> <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
- src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a>
-<a href="http://validator.w3.org/check/referer"><img
- src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a>
- <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
-<a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br>
-Last modified: $Date$
-</address>
-</body>
-</html>
diff --git a/docs/BitCodeFormat.rst b/docs/BitCodeFormat.rst
new file mode 100644
index 0000000..d3995e7
--- /dev/null
+++ b/docs/BitCodeFormat.rst
@@ -0,0 +1,1045 @@
+.. _bitcode_format:
+
+.. role:: raw-html(raw)
+ :format: html
+
+========================
+LLVM Bitcode File Format
+========================
+
+.. contents::
+ :local:
+
+Abstract
+========
+
+This document describes the LLVM bitstream file format and the encoding of the
+LLVM IR into it.
+
+Overview
+========
+
+What is commonly known as the LLVM bitcode file format (also, sometimes
+anachronistically known as bytecode) is actually two things: a `bitstream
+container format`_ and an `encoding of LLVM IR`_ into the container format.
+
+The bitstream format is an abstract encoding of structured data, very similar to
+XML in some ways. Like XML, bitstream files contain tags, and nested
+structures, and you can parse the file without having to understand the tags.
+Unlike XML, the bitstream format is a binary encoding, and unlike XML it
+provides a mechanism for the file to self-describe "abbreviations", which are
+effectively size optimizations for the content.
+
+LLVM IR files may be optionally embedded into a `wrapper`_ structure that makes
+it easy to embed extra data along with LLVM IR files.
+
+This document first describes the LLVM bitstream format, describes the wrapper
+format, then describes the record structure used by LLVM IR files.
+
+.. _bitstream container format:
+
+Bitstream Format
+================
+
+The bitstream format is literally a stream of bits, with a very simple
+structure. This structure consists of the following concepts:
+
+* A "`magic number`_" that identifies the contents of the stream.
+
+* Encoding `primitives`_ like variable bit-rate integers.
+
+* `Blocks`_, which define nested content.
+
+* `Data Records`_, which describe entities within the file.
+
+* Abbreviations, which specify compression optimizations for the file.
+
+Note that the `llvm-bcanalyzer <CommandGuide/html/llvm-bcanalyzer.html>`_ tool
+can be used to dump and inspect arbitrary bitstreams, which is very useful for
+understanding the encoding.
+
+.. _magic number:
+
+Magic Numbers
+-------------
+
+The first two bytes of a bitcode file are 'BC' (``0x42``, ``0x43``). The second
+two bytes are an application-specific magic number. Generic bitcode tools can
+look at only the first two bytes to verify the file is bitcode, while
+application-specific programs will want to look at all four.
+
+.. _primitives:
+
+Primitives
+----------
+
+A bitstream literally consists of a stream of bits, which are read in order
+starting with the least significant bit of each byte. The stream is made up of
+a number of primitive values that encode a stream of unsigned integer values.
+These integers are encoded in two ways: either as `Fixed Width Integers`_ or as
+`Variable Width Integers`_.
+
+.. _Fixed Width Integers:
+.. _fixed-width value:
+
+Fixed Width Integers
+^^^^^^^^^^^^^^^^^^^^
+
+Fixed-width integer values have their low bits emitted directly to the file.
+For example, a 3-bit integer value encodes 1 as 001. Fixed width integers are
+used when there are a well-known number of options for a field. For example,
+boolean values are usually encoded with a 1-bit wide integer.
+
+.. _Variable Width Integers:
+.. _Variable Width Integer:
+.. _variable-width value:
+
+Variable Width Integers
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Variable-width integer (VBR) values encode values of arbitrary size, optimizing
+for the case where the values are small. Given a 4-bit VBR field, any 3-bit
+value (0 through 7) is encoded directly, with the high bit set to zero. Values
+larger than N-1 bits emit their bits in a series of N-1 bit chunks, where all
+but the last set the high bit.
+
+For example, the value 27 (0x1B) is encoded as 1011 0011 when emitted as a vbr4
+value. The first set of four bits indicates the value 3 (011) with a
+continuation piece (indicated by a high bit of 1). The next word indicates a
+value of 24 (011 << 3) with no continuation. The sum (3+24) yields the value
+27.
+
+.. _char6-encoded value:
+
+6-bit characters
+^^^^^^^^^^^^^^^^
+
+6-bit characters encode common characters into a fixed 6-bit field. They
+represent the following characters with the following 6-bit values:
+
+::
+
+ 'a' .. 'z' --- 0 .. 25
+ 'A' .. 'Z' --- 26 .. 51
+ '0' .. '9' --- 52 .. 61
+ '.' --- 62
+ '_' --- 63
+
+This encoding is only suitable for encoding characters and strings that consist
+only of the above characters. It is completely incapable of encoding characters
+not in the set.
+
+Word Alignment
+^^^^^^^^^^^^^^
+
+Occasionally, it is useful to emit zero bits until the bitstream is a multiple
+of 32 bits. This ensures that the bit position in the stream can be represented
+as a multiple of 32-bit words.
+
+Abbreviation IDs
+----------------
+
+A bitstream is a sequential series of `Blocks`_ and `Data Records`_. Both of
+these start with an abbreviation ID encoded as a fixed-bitwidth field. The
+width is specified by the current block, as described below. The value of the
+abbreviation ID specifies either a builtin ID (which have special meanings,
+defined below) or one of the abbreviation IDs defined for the current block by
+the stream itself.
+
+The set of builtin abbrev IDs is:
+
+* 0 - `END_BLOCK`_ --- This abbrev ID marks the end of the current block.
+
+* 1 - `ENTER_SUBBLOCK`_ --- This abbrev ID marks the beginning of a new
+ block.
+
+* 2 - `DEFINE_ABBREV`_ --- This defines a new abbreviation.
+
+* 3 - `UNABBREV_RECORD`_ --- This ID specifies the definition of an
+ unabbreviated record.
+
+Abbreviation IDs 4 and above are defined by the stream itself, and specify an
+`abbreviated record encoding`_.
+
+.. _Blocks:
+
+Blocks
+------
+
+Blocks in a bitstream denote nested regions of the stream, and are identified by
+a content-specific id number (for example, LLVM IR uses an ID of 12 to represent
+function bodies). Block IDs 0-7 are reserved for `standard blocks`_ whose
+meaning is defined by Bitcode; block IDs 8 and greater are application
+specific. Nested blocks capture the hierarchical structure of the data encoded
+in it, and various properties are associated with blocks as the file is parsed.
+Block definitions allow the reader to efficiently skip blocks in constant time
+if the reader wants a summary of blocks, or if it wants to efficiently skip data
+it does not understand. The LLVM IR reader uses this mechanism to skip function
+bodies, lazily reading them on demand.
+
+When reading and encoding the stream, several properties are maintained for the
+block. In particular, each block maintains:
+
+#. A current abbrev id width. This value starts at 2 at the beginning of the
+ stream, and is set every time a block record is entered. The block entry
+ specifies the abbrev id width for the body of the block.
+
+#. A set of abbreviations. Abbreviations may be defined within a block, in
+ which case they are only defined in that block (neither subblocks nor
+ enclosing blocks see the abbreviation). Abbreviations can also be defined
+ inside a `BLOCKINFO`_ block, in which case they are defined in all blocks
+ that match the ID that the ``BLOCKINFO`` block is describing.
+
+As sub blocks are entered, these properties are saved and the new sub-block has
+its own set of abbreviations, and its own abbrev id width. When a sub-block is
+popped, the saved values are restored.
+
+.. _ENTER_SUBBLOCK:
+
+ENTER_SUBBLOCK Encoding
+^^^^^^^^^^^^^^^^^^^^^^^
+
+:raw-html:`<tt>`
+[ENTER_SUBBLOCK, blockid\ :sub:`vbr8`, newabbrevlen\ :sub:`vbr4`, <align32bits>, blocklen_32]
+:raw-html:`</tt>`
+
+The ``ENTER_SUBBLOCK`` abbreviation ID specifies the start of a new block
+record. The ``blockid`` value is encoded as an 8-bit VBR identifier, and
+indicates the type of block being entered, which can be a `standard block`_ or
+an application-specific block. The ``newabbrevlen`` value is a 4-bit VBR, which
+specifies the abbrev id width for the sub-block. The ``blocklen`` value is a
+32-bit aligned value that specifies the size of the subblock in 32-bit
+words. This value allows the reader to skip over the entire block in one jump.
+
+.. _END_BLOCK:
+
+END_BLOCK Encoding
+^^^^^^^^^^^^^^^^^^
+
+``[END_BLOCK, <align32bits>]``
+
+The ``END_BLOCK`` abbreviation ID specifies the end of the current block record.
+Its end is aligned to 32-bits to ensure that the size of the block is an even
+multiple of 32-bits.
+
+.. _Data Records:
+
+Data Records
+------------
+
+Data records consist of a record code and a number of (up to) 64-bit integer
+values. The interpretation of the code and values is application specific and
+may vary between different block types. Records can be encoded either using an
+unabbrev record, or with an abbreviation. In the LLVM IR format, for example,
+there is a record which encodes the target triple of a module. The code is
+``MODULE_CODE_TRIPLE``, and the values of the record are the ASCII codes for the
+characters in the string.
+
+.. _UNABBREV_RECORD:
+
+UNABBREV_RECORD Encoding
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+:raw-html:`<tt>`
+[UNABBREV_RECORD, code\ :sub:`vbr6`, numops\ :sub:`vbr6`, op0\ :sub:`vbr6`, op1\ :sub:`vbr6`, ...]
+:raw-html:`</tt>`
+
+An ``UNABBREV_RECORD`` provides a default fallback encoding, which is both
+completely general and extremely inefficient. It can describe an arbitrary
+record by emitting the code and operands as VBRs.
+
+For example, emitting an LLVM IR target triple as an unabbreviated record
+requires emitting the ``UNABBREV_RECORD`` abbrevid, a vbr6 for the
+``MODULE_CODE_TRIPLE`` code, a vbr6 for the length of the string, which is equal
+to the number of operands, and a vbr6 for each character. Because there are no
+letters with values less than 32, each letter would need to be emitted as at
+least a two-part VBR, which means that each letter would require at least 12
+bits. This is not an efficient encoding, but it is fully general.
+
+.. _abbreviated record encoding:
+
+Abbreviated Record Encoding
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[<abbrevid>, fields...]``
+
+An abbreviated record is a abbreviation id followed by a set of fields that are
+encoded according to the `abbreviation definition`_. This allows records to be
+encoded significantly more densely than records encoded with the
+`UNABBREV_RECORD`_ type, and allows the abbreviation types to be specified in
+the stream itself, which allows the files to be completely self describing. The
+actual encoding of abbreviations is defined below.
+
+The record code, which is the first field of an abbreviated record, may be
+encoded in the abbreviation definition (as a literal operand) or supplied in the
+abbreviated record (as a Fixed or VBR operand value).
+
+.. _abbreviation definition:
+
+Abbreviations
+-------------
+
+Abbreviations are an important form of compression for bitstreams. The idea is
+to specify a dense encoding for a class of records once, then use that encoding
+to emit many records. It takes space to emit the encoding into the file, but
+the space is recouped (hopefully plus some) when the records that use it are
+emitted.
+
+Abbreviations can be determined dynamically per client, per file. Because the
+abbreviations are stored in the bitstream itself, different streams of the same
+format can contain different sets of abbreviations according to the needs of the
+specific stream. As a concrete example, LLVM IR files usually emit an
+abbreviation for binary operators. If a specific LLVM module contained no or
+few binary operators, the abbreviation does not need to be emitted.
+
+.. _DEFINE_ABBREV:
+
+DEFINE_ABBREV Encoding
+^^^^^^^^^^^^^^^^^^^^^^
+
+:raw-html:`<tt>`
+[DEFINE_ABBREV, numabbrevops\ :sub:`vbr5`, abbrevop0, abbrevop1, ...]
+:raw-html:`</tt>`
+
+A ``DEFINE_ABBREV`` record adds an abbreviation to the list of currently defined
+abbreviations in the scope of this block. This definition only exists inside
+this immediate block --- it is not visible in subblocks or enclosing blocks.
+Abbreviations are implicitly assigned IDs sequentially starting from 4 (the
+first application-defined abbreviation ID). Any abbreviations defined in a
+``BLOCKINFO`` record for the particular block type receive IDs first, in order,
+followed by any abbreviations defined within the block itself. Abbreviated data
+records reference this ID to indicate what abbreviation they are invoking.
+
+An abbreviation definition consists of the ``DEFINE_ABBREV`` abbrevid followed
+by a VBR that specifies the number of abbrev operands, then the abbrev operands
+themselves. Abbreviation operands come in three forms. They all start with a
+single bit that indicates whether the abbrev operand is a literal operand (when
+the bit is 1) or an encoding operand (when the bit is 0).
+
+#. Literal operands --- :raw-html:`<tt>` [1\ :sub:`1`, litvalue\
+ :sub:`vbr8`] :raw-html:`</tt>` --- Literal operands specify that the value in
+ the result is always a single specific value. This specific value is emitted
+ as a vbr8 after the bit indicating that it is a literal operand.
+
+#. Encoding info without data --- :raw-html:`<tt>` [0\ :sub:`1`, encoding\
+ :sub:`3`] :raw-html:`</tt>` --- Operand encodings that do not have extra data
+ are just emitted as their code.
+
+#. Encoding info with data --- :raw-html:`<tt>` [0\ :sub:`1`, encoding\
+ :sub:`3`, value\ :sub:`vbr5`] :raw-html:`</tt>` --- Operand encodings that do
+ have extra data are emitted as their code, followed by the extra data.
+
+The possible operand encodings are:
+
+* Fixed (code 1): The field should be emitted as a `fixed-width value`_, whose
+ width is specified by the operand's extra data.
+
+* VBR (code 2): The field should be emitted as a `variable-width value`_, whose
+ width is specified by the operand's extra data.
+
+* Array (code 3): This field is an array of values. The array operand has no
+ extra data, but expects another operand to follow it, indicating the element
+ type of the array. When reading an array in an abbreviated record, the first
+ integer is a vbr6 that indicates the array length, followed by the encoded
+ elements of the array. An array may only occur as the last operand of an
+ abbreviation (except for the one final operand that gives the array's
+ type).
+
+* Char6 (code 4): This field should be emitted as a `char6-encoded value`_.
+ This operand type takes no extra data. Char6 encoding is normally used as an
+ array element type.
+
+* Blob (code 5): This field is emitted as a vbr6, followed by padding to a
+ 32-bit boundary (for alignment) and an array of 8-bit objects. The array of
+ bytes is further followed by tail padding to ensure that its total length is a
+ multiple of 4 bytes. This makes it very efficient for the reader to decode
+ the data without having to make a copy of it: it can use a pointer to the data
+ in the mapped in file and poke directly at it. A blob may only occur as the
+ last operand of an abbreviation.
+
+For example, target triples in LLVM modules are encoded as a record of the form
+``[TRIPLE, 'a', 'b', 'c', 'd']``. Consider if the bitstream emitted the
+following abbrev entry:
+
+::
+
+ [0, Fixed, 4]
+ [0, Array]
+ [0, Char6]
+
+When emitting a record with this abbreviation, the above entry would be emitted
+as:
+
+:raw-html:`<tt><blockquote>`
+[4\ :sub:`abbrevwidth`, 2\ :sub:`4`, 4\ :sub:`vbr6`, 0\ :sub:`6`, 1\ :sub:`6`, 2\ :sub:`6`, 3\ :sub:`6`]
+:raw-html:`</blockquote></tt>`
+
+These values are:
+
+#. The first value, 4, is the abbreviation ID for this abbreviation.
+
+#. The second value, 2, is the record code for ``TRIPLE`` records within LLVM IR
+ file ``MODULE_BLOCK`` blocks.
+
+#. The third value, 4, is the length of the array.
+
+#. The rest of the values are the char6 encoded values for ``"abcd"``.
+
+With this abbreviation, the triple is emitted with only 37 bits (assuming a
+abbrev id width of 3). Without the abbreviation, significantly more space would
+be required to emit the target triple. Also, because the ``TRIPLE`` value is
+not emitted as a literal in the abbreviation, the abbreviation can also be used
+for any other string value.
+
+.. _standard blocks:
+.. _standard block:
+
+Standard Blocks
+---------------
+
+In addition to the basic block structure and record encodings, the bitstream
+also defines specific built-in block types. These block types specify how the
+stream is to be decoded or other metadata. In the future, new standard blocks
+may be added. Block IDs 0-7 are reserved for standard blocks.
+
+.. _BLOCKINFO:
+
+#0 - BLOCKINFO Block
+^^^^^^^^^^^^^^^^^^^^
+
+The ``BLOCKINFO`` block allows the description of metadata for other blocks.
+The currently specified records are:
+
+::
+
+ [SETBID (#1), blockid]
+ [DEFINE_ABBREV, ...]
+ [BLOCKNAME, ...name...]
+ [SETRECORDNAME, RecordID, ...name...]
+
+The ``SETBID`` record (code 1) indicates which block ID is being described.
+``SETBID`` records can occur multiple times throughout the block to change which
+block ID is being described. There must be a ``SETBID`` record prior to any
+other records.
+
+Standard ``DEFINE_ABBREV`` records can occur inside ``BLOCKINFO`` blocks, but
+unlike their occurrence in normal blocks, the abbreviation is defined for blocks
+matching the block ID we are describing, *not* the ``BLOCKINFO`` block
+itself. The abbreviations defined in ``BLOCKINFO`` blocks receive abbreviation
+IDs as described in `DEFINE_ABBREV`_.
+
+The ``BLOCKNAME`` record (code 2) can optionally occur in this block. The
+elements of the record are the bytes of the string name of the block.
+llvm-bcanalyzer can use this to dump out bitcode files symbolically.
+
+The ``SETRECORDNAME`` record (code 3) can also optionally occur in this block.
+The first operand value is a record ID number, and the rest of the elements of
+the record are the bytes for the string name of the record. llvm-bcanalyzer can
+use this to dump out bitcode files symbolically.
+
+Note that although the data in ``BLOCKINFO`` blocks is described as "metadata,"
+the abbreviations they contain are essential for parsing records from the
+corresponding blocks. It is not safe to skip them.
+
+.. _wrapper:
+
+Bitcode Wrapper Format
+======================
+
+Bitcode files for LLVM IR may optionally be wrapped in a simple wrapper
+structure. This structure contains a simple header that indicates the offset
+and size of the embedded BC file. This allows additional information to be
+stored alongside the BC file. The structure of this file header is:
+
+:raw-html:`<tt><blockquote>`
+[Magic\ :sub:`32`, Version\ :sub:`32`, Offset\ :sub:`32`, Size\ :sub:`32`, CPUType\ :sub:`32`]
+:raw-html:`</blockquote></tt>`
+
+Each of the fields are 32-bit fields stored in little endian form (as with the
+rest of the bitcode file fields). The Magic number is always ``0x0B17C0DE`` and
+the version is currently always ``0``. The Offset field is the offset in bytes
+to the start of the bitcode stream in the file, and the Size field is the size
+in bytes of the stream. CPUType is a target-specific value that can be used to
+encode the CPU of the target.
+
+.. _encoding of LLVM IR:
+
+LLVM IR Encoding
+================
+
+LLVM IR is encoded into a bitstream by defining blocks and records. It uses
+blocks for things like constant pools, functions, symbol tables, etc. It uses
+records for things like instructions, global variable descriptors, type
+descriptions, etc. This document does not describe the set of abbreviations
+that the writer uses, as these are fully self-described in the file, and the
+reader is not allowed to build in any knowledge of this.
+
+Basics
+------
+
+LLVM IR Magic Number
+^^^^^^^^^^^^^^^^^^^^
+
+The magic number for LLVM IR files is:
+
+:raw-html:`<tt><blockquote>`
+[0x0\ :sub:`4`, 0xC\ :sub:`4`, 0xE\ :sub:`4`, 0xD\ :sub:`4`]
+:raw-html:`</blockquote></tt>`
+
+When combined with the bitcode magic number and viewed as bytes, this is
+``"BC 0xC0DE"``.
+
+Signed VBRs
+^^^^^^^^^^^
+
+`Variable Width Integer`_ encoding is an efficient way to encode arbitrary sized
+unsigned values, but is an extremely inefficient for encoding signed values, as
+signed values are otherwise treated as maximally large unsigned values.
+
+As such, signed VBR values of a specific width are emitted as follows:
+
+* Positive values are emitted as VBRs of the specified width, but with their
+ value shifted left by one.
+
+* Negative values are emitted as VBRs of the specified width, but the negated
+ value is shifted left by one, and the low bit is set.
+
+With this encoding, small positive and small negative values can both be emitted
+efficiently. Signed VBR encoding is used in ``CST_CODE_INTEGER`` and
+``CST_CODE_WIDE_INTEGER`` records within ``CONSTANTS_BLOCK`` blocks.
+
+LLVM IR Blocks
+^^^^^^^^^^^^^^
+
+LLVM IR is defined with the following blocks:
+
+* 8 --- `MODULE_BLOCK`_ --- This is the top-level block that contains the entire
+ module, and describes a variety of per-module information.
+
+* 9 --- `PARAMATTR_BLOCK`_ --- This enumerates the parameter attributes.
+
+* 10 --- `TYPE_BLOCK`_ --- This describes all of the types in the module.
+
+* 11 --- `CONSTANTS_BLOCK`_ --- This describes constants for a module or
+ function.
+
+* 12 --- `FUNCTION_BLOCK`_ --- This describes a function body.
+
+* 13 --- `TYPE_SYMTAB_BLOCK`_ --- This describes the type symbol table.
+
+* 14 --- `VALUE_SYMTAB_BLOCK`_ --- This describes a value symbol table.
+
+* 15 --- `METADATA_BLOCK`_ --- This describes metadata items.
+
+* 16 --- `METADATA_ATTACHMENT`_ --- This contains records associating metadata
+ with function instruction values.
+
+.. _MODULE_BLOCK:
+
+MODULE_BLOCK Contents
+---------------------
+
+The ``MODULE_BLOCK`` block (id 8) is the top-level block for LLVM bitcode files,
+and each bitcode file must contain exactly one. In addition to records
+(described below) containing information about the module, a ``MODULE_BLOCK``
+block may contain the following sub-blocks:
+
+* `BLOCKINFO`_
+* `PARAMATTR_BLOCK`_
+* `TYPE_BLOCK`_
+* `TYPE_SYMTAB_BLOCK`_
+* `VALUE_SYMTAB_BLOCK`_
+* `CONSTANTS_BLOCK`_
+* `FUNCTION_BLOCK`_
+* `METADATA_BLOCK`_
+
+MODULE_CODE_VERSION Record
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[VERSION, version#]``
+
+The ``VERSION`` record (code 1) contains a single value indicating the format
+version. Only version 0 is supported at this time.
+
+MODULE_CODE_TRIPLE Record
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[TRIPLE, ...string...]``
+
+The ``TRIPLE`` record (code 2) contains a variable number of values representing
+the bytes of the ``target triple`` specification string.
+
+MODULE_CODE_DATALAYOUT Record
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[DATALAYOUT, ...string...]``
+
+The ``DATALAYOUT`` record (code 3) contains a variable number of values
+representing the bytes of the ``target datalayout`` specification string.
+
+MODULE_CODE_ASM Record
+^^^^^^^^^^^^^^^^^^^^^^
+
+``[ASM, ...string...]``
+
+The ``ASM`` record (code 4) contains a variable number of values representing
+the bytes of ``module asm`` strings, with individual assembly blocks separated
+by newline (ASCII 10) characters.
+
+.. _MODULE_CODE_SECTIONNAME:
+
+MODULE_CODE_SECTIONNAME Record
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[SECTIONNAME, ...string...]``
+
+The ``SECTIONNAME`` record (code 5) contains a variable number of values
+representing the bytes of a single section name string. There should be one
+``SECTIONNAME`` record for each section name referenced (e.g., in global
+variable or function ``section`` attributes) within the module. These records
+can be referenced by the 1-based index in the *section* fields of ``GLOBALVAR``
+or ``FUNCTION`` records.
+
+MODULE_CODE_DEPLIB Record
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[DEPLIB, ...string...]``
+
+The ``DEPLIB`` record (code 6) contains a variable number of values representing
+the bytes of a single dependent library name string, one of the libraries
+mentioned in a ``deplibs`` declaration. There should be one ``DEPLIB`` record
+for each library name referenced.
+
+MODULE_CODE_GLOBALVAR Record
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[GLOBALVAR, pointer type, isconst, initid, linkage, alignment, section, visibility, threadlocal, unnamed_addr]``
+
+The ``GLOBALVAR`` record (code 7) marks the declaration or definition of a
+global variable. The operand fields are:
+
+* *pointer type*: The type index of the pointer type used to point to this
+ global variable
+
+* *isconst*: Non-zero if the variable is treated as constant within the module,
+ or zero if it is not
+
+* *initid*: If non-zero, the value index of the initializer for this variable,
+ plus 1.
+
+.. _linkage type:
+
+* *linkage*: An encoding of the linkage type for this variable:
+ * ``external``: code 0
+ * ``weak``: code 1
+ * ``appending``: code 2
+ * ``internal``: code 3
+ * ``linkonce``: code 4
+ * ``dllimport``: code 5
+ * ``dllexport``: code 6
+ * ``extern_weak``: code 7
+ * ``common``: code 8
+ * ``private``: code 9
+ * ``weak_odr``: code 10
+ * ``linkonce_odr``: code 11
+ * ``available_externally``: code 12
+ * ``linker_private``: code 13
+
+* alignment*: The logarithm base 2 of the variable's requested alignment, plus 1
+
+* *section*: If non-zero, the 1-based section index in the table of
+ `MODULE_CODE_SECTIONNAME`_ entries.
+
+.. _visibility:
+
+* *visibility*: If present, an encoding of the visibility of this variable:
+ * ``default``: code 0
+ * ``hidden``: code 1
+ * ``protected``: code 2
+
+* *threadlocal*: If present, an encoding of the thread local storage mode of the
+ variable:
+ * ``not thread local``: code 0
+ * ``thread local; default TLS model``: code 1
+ * ``localdynamic``: code 2
+ * ``initialexec``: code 3
+ * ``localexec``: code 4
+
+* *unnamed_addr*: If present and non-zero, indicates that the variable has
+ ``unnamed_addr``
+
+.. _FUNCTION:
+
+MODULE_CODE_FUNCTION Record
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[FUNCTION, type, callingconv, isproto, linkage, paramattr, alignment, section, visibility, gc]``
+
+The ``FUNCTION`` record (code 8) marks the declaration or definition of a
+function. The operand fields are:
+
+* *type*: The type index of the function type describing this function
+
+* *callingconv*: The calling convention number:
+ * ``ccc``: code 0
+ * ``fastcc``: code 8
+ * ``coldcc``: code 9
+ * ``x86_stdcallcc``: code 64
+ * ``x86_fastcallcc``: code 65
+ * ``arm_apcscc``: code 66
+ * ``arm_aapcscc``: code 67
+ * ``arm_aapcs_vfpcc``: code 68
+
+* isproto*: Non-zero if this entry represents a declaration rather than a
+ definition
+
+* *linkage*: An encoding of the `linkage type`_ for this function
+
+* *paramattr*: If nonzero, the 1-based parameter attribute index into the table
+ of `PARAMATTR_CODE_ENTRY`_ entries.
+
+* *alignment*: The logarithm base 2 of the function's requested alignment, plus
+ 1
+
+* *section*: If non-zero, the 1-based section index in the table of
+ `MODULE_CODE_SECTIONNAME`_ entries.
+
+* *visibility*: An encoding of the `visibility`_ of this function
+
+* *gc*: If present and nonzero, the 1-based garbage collector index in the table
+ of `MODULE_CODE_GCNAME`_ entries.
+
+* *unnamed_addr*: If present and non-zero, indicates that the function has
+ ``unnamed_addr``
+
+MODULE_CODE_ALIAS Record
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[ALIAS, alias type, aliasee val#, linkage, visibility]``
+
+The ``ALIAS`` record (code 9) marks the definition of an alias. The operand
+fields are
+
+* *alias type*: The type index of the alias
+
+* *aliasee val#*: The value index of the aliased value
+
+* *linkage*: An encoding of the `linkage type`_ for this alias
+
+* *visibility*: If present, an encoding of the `visibility`_ of the alias
+
+MODULE_CODE_PURGEVALS Record
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[PURGEVALS, numvals]``
+
+The ``PURGEVALS`` record (code 10) resets the module-level value list to the
+size given by the single operand value. Module-level value list items are added
+by ``GLOBALVAR``, ``FUNCTION``, and ``ALIAS`` records. After a ``PURGEVALS``
+record is seen, new value indices will start from the given *numvals* value.
+
+.. _MODULE_CODE_GCNAME:
+
+MODULE_CODE_GCNAME Record
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[GCNAME, ...string...]``
+
+The ``GCNAME`` record (code 11) contains a variable number of values
+representing the bytes of a single garbage collector name string. There should
+be one ``GCNAME`` record for each garbage collector name referenced in function
+``gc`` attributes within the module. These records can be referenced by 1-based
+index in the *gc* fields of ``FUNCTION`` records.
+
+.. _PARAMATTR_BLOCK:
+
+PARAMATTR_BLOCK Contents
+------------------------
+
+The ``PARAMATTR_BLOCK`` block (id 9) contains a table of entries describing the
+attributes of function parameters. These entries are referenced by 1-based index
+in the *paramattr* field of module block `FUNCTION`_ records, or within the
+*attr* field of function block ``INST_INVOKE`` and ``INST_CALL`` records.
+
+Entries within ``PARAMATTR_BLOCK`` are constructed to ensure that each is unique
+(i.e., no two indicies represent equivalent attribute lists).
+
+.. _PARAMATTR_CODE_ENTRY:
+
+PARAMATTR_CODE_ENTRY Record
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[ENTRY, paramidx0, attr0, paramidx1, attr1...]``
+
+The ``ENTRY`` record (code 1) contains an even number of values describing a
+unique set of function parameter attributes. Each *paramidx* value indicates
+which set of attributes is represented, with 0 representing the return value
+attributes, 0xFFFFFFFF representing function attributes, and other values
+representing 1-based function parameters. Each *attr* value is a bitmap with the
+following interpretation:
+
+* bit 0: ``zeroext``
+* bit 1: ``signext``
+* bit 2: ``noreturn``
+* bit 3: ``inreg``
+* bit 4: ``sret``
+* bit 5: ``nounwind``
+* bit 6: ``noalias``
+* bit 7: ``byval``
+* bit 8: ``nest``
+* bit 9: ``readnone``
+* bit 10: ``readonly``
+* bit 11: ``noinline``
+* bit 12: ``alwaysinline``
+* bit 13: ``optsize``
+* bit 14: ``ssp``
+* bit 15: ``sspreq``
+* bits 16-31: ``align n``
+* bit 32: ``nocapture``
+* bit 33: ``noredzone``
+* bit 34: ``noimplicitfloat``
+* bit 35: ``naked``
+* bit 36: ``inlinehint``
+* bits 37-39: ``alignstack n``, represented as the logarithm
+ base 2 of the requested alignment, plus 1
+
+.. _TYPE_BLOCK:
+
+TYPE_BLOCK Contents
+-------------------
+
+The ``TYPE_BLOCK`` block (id 10) contains records which constitute a table of
+type operator entries used to represent types referenced within an LLVM
+module. Each record (with the exception of `NUMENTRY`_) generates a single type
+table entry, which may be referenced by 0-based index from instructions,
+constants, metadata, type symbol table entries, or other type operator records.
+
+Entries within ``TYPE_BLOCK`` are constructed to ensure that each entry is
+unique (i.e., no two indicies represent structurally equivalent types).
+
+.. _TYPE_CODE_NUMENTRY:
+.. _NUMENTRY:
+
+TYPE_CODE_NUMENTRY Record
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[NUMENTRY, numentries]``
+
+The ``NUMENTRY`` record (code 1) contains a single value which indicates the
+total number of type code entries in the type table of the module. If present,
+``NUMENTRY`` should be the first record in the block.
+
+TYPE_CODE_VOID Record
+^^^^^^^^^^^^^^^^^^^^^
+
+``[VOID]``
+
+The ``VOID`` record (code 2) adds a ``void`` type to the type table.
+
+TYPE_CODE_HALF Record
+^^^^^^^^^^^^^^^^^^^^^
+
+``[HALF]``
+
+The ``HALF`` record (code 10) adds a ``half`` (16-bit floating point) type to
+the type table.
+
+TYPE_CODE_FLOAT Record
+^^^^^^^^^^^^^^^^^^^^^^
+
+``[FLOAT]``
+
+The ``FLOAT`` record (code 3) adds a ``float`` (32-bit floating point) type to
+the type table.
+
+TYPE_CODE_DOUBLE Record
+^^^^^^^^^^^^^^^^^^^^^^^
+
+``[DOUBLE]``
+
+The ``DOUBLE`` record (code 4) adds a ``double`` (64-bit floating point) type to
+the type table.
+
+TYPE_CODE_LABEL Record
+^^^^^^^^^^^^^^^^^^^^^^
+
+``[LABEL]``
+
+The ``LABEL`` record (code 5) adds a ``label`` type to the type table.
+
+TYPE_CODE_OPAQUE Record
+^^^^^^^^^^^^^^^^^^^^^^^
+
+``[OPAQUE]``
+
+The ``OPAQUE`` record (code 6) adds an ``opaque`` type to the type table. Note
+that distinct ``opaque`` types are not unified.
+
+TYPE_CODE_INTEGER Record
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[INTEGER, width]``
+
+The ``INTEGER`` record (code 7) adds an integer type to the type table. The
+single *width* field indicates the width of the integer type.
+
+TYPE_CODE_POINTER Record
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[POINTER, pointee type, address space]``
+
+The ``POINTER`` record (code 8) adds a pointer type to the type table. The
+operand fields are
+
+* *pointee type*: The type index of the pointed-to type
+
+* *address space*: If supplied, the target-specific numbered address space where
+ the pointed-to object resides. Otherwise, the default address space is zero.
+
+TYPE_CODE_FUNCTION Record
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[FUNCTION, vararg, ignored, retty, ...paramty... ]``
+
+The ``FUNCTION`` record (code 9) adds a function type to the type table. The
+operand fields are
+
+* *vararg*: Non-zero if the type represents a varargs function
+
+* *ignored*: This value field is present for backward compatibility only, and is
+ ignored
+
+* *retty*: The type index of the function's return type
+
+* *paramty*: Zero or more type indices representing the parameter types of the
+ function
+
+TYPE_CODE_STRUCT Record
+^^^^^^^^^^^^^^^^^^^^^^^
+
+``[STRUCT, ispacked, ...eltty...]``
+
+The ``STRUCT`` record (code 10) adds a struct type to the type table. The
+operand fields are
+
+* *ispacked*: Non-zero if the type represents a packed structure
+
+* *eltty*: Zero or more type indices representing the element types of the
+ structure
+
+TYPE_CODE_ARRAY Record
+^^^^^^^^^^^^^^^^^^^^^^
+
+``[ARRAY, numelts, eltty]``
+
+The ``ARRAY`` record (code 11) adds an array type to the type table. The
+operand fields are
+
+* *numelts*: The number of elements in arrays of this type
+
+* *eltty*: The type index of the array element type
+
+TYPE_CODE_VECTOR Record
+^^^^^^^^^^^^^^^^^^^^^^^
+
+``[VECTOR, numelts, eltty]``
+
+The ``VECTOR`` record (code 12) adds a vector type to the type table. The
+operand fields are
+
+* *numelts*: The number of elements in vectors of this type
+
+* *eltty*: The type index of the vector element type
+
+TYPE_CODE_X86_FP80 Record
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[X86_FP80]``
+
+The ``X86_FP80`` record (code 13) adds an ``x86_fp80`` (80-bit floating point)
+type to the type table.
+
+TYPE_CODE_FP128 Record
+^^^^^^^^^^^^^^^^^^^^^^
+
+``[FP128]``
+
+The ``FP128`` record (code 14) adds an ``fp128`` (128-bit floating point) type
+to the type table.
+
+TYPE_CODE_PPC_FP128 Record
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[PPC_FP128]``
+
+The ``PPC_FP128`` record (code 15) adds a ``ppc_fp128`` (128-bit floating point)
+type to the type table.
+
+TYPE_CODE_METADATA Record
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[METADATA]``
+
+The ``METADATA`` record (code 16) adds a ``metadata`` type to the type table.
+
+.. _CONSTANTS_BLOCK:
+
+CONSTANTS_BLOCK Contents
+------------------------
+
+The ``CONSTANTS_BLOCK`` block (id 11) ...
+
+.. _FUNCTION_BLOCK:
+
+FUNCTION_BLOCK Contents
+-----------------------
+
+The ``FUNCTION_BLOCK`` block (id 12) ...
+
+In addition to the record types described below, a ``FUNCTION_BLOCK`` block may
+contain the following sub-blocks:
+
+* `CONSTANTS_BLOCK`_
+* `VALUE_SYMTAB_BLOCK`_
+* `METADATA_ATTACHMENT`_
+
+.. _TYPE_SYMTAB_BLOCK:
+
+TYPE_SYMTAB_BLOCK Contents
+--------------------------
+
+The ``TYPE_SYMTAB_BLOCK`` block (id 13) contains entries which map between
+module-level named types and their corresponding type indices.
+
+.. _TST_CODE_ENTRY:
+
+TST_CODE_ENTRY Record
+^^^^^^^^^^^^^^^^^^^^^
+
+``[ENTRY, typeid, ...string...]``
+
+The ``ENTRY`` record (code 1) contains a variable number of values, with the
+first giving the type index of the designated type, and the remaining values
+giving the character codes of the type name. Each entry corresponds to a single
+named type.
+
+.. _VALUE_SYMTAB_BLOCK:
+
+VALUE_SYMTAB_BLOCK Contents
+---------------------------
+
+The ``VALUE_SYMTAB_BLOCK`` block (id 14) ...
+
+.. _METADATA_BLOCK:
+
+METADATA_BLOCK Contents
+-----------------------
+
+The ``METADATA_BLOCK`` block (id 15) ...
+
+.. _METADATA_ATTACHMENT:
+
+METADATA_ATTACHMENT Contents
+----------------------------
+
+The ``METADATA_ATTACHMENT`` block (id 16) ...
diff --git a/docs/subsystems.rst b/docs/subsystems.rst
index 27dff6b..c4c3b6d 100644
--- a/docs/subsystems.rst
+++ b/docs/subsystems.rst
@@ -7,6 +7,7 @@ Subsystem Documentation
:hidden:
AliasAnalysis
+ BitCodeFormat
BranchWeightMetadata
Bugpoint
ExceptionHandling
@@ -58,7 +59,7 @@ Subsystem Documentation
Automatic bug finder and test-case reducer description and usage
information.
-* `LLVM Bitcode File Format <BitCodeFormat.html>`_
+* :ref:`bitcode_format`
This describes the file format and encoding used for LLVM "bc" files.