diff options
author | Bill Wendling <isanbard@gmail.com> | 2012-06-28 08:43:12 +0000 |
---|---|---|
committer | Bill Wendling <isanbard@gmail.com> | 2012-06-28 08:43:12 +0000 |
commit | 0ca9927a71fed311eea7459b4c85c98cc7ed0352 (patch) | |
tree | 4e950553c2a69c6db06092a5dc69138c0e2e6378 /docs | |
parent | 87dc7a4c8d21bd638465881f3b6d091f22d9767c (diff) | |
download | external_llvm-0ca9927a71fed311eea7459b4c85c98cc7ed0352.zip external_llvm-0ca9927a71fed311eea7459b4c85c98cc7ed0352.tar.gz external_llvm-0ca9927a71fed311eea7459b4c85c98cc7ed0352.tar.bz2 |
Sphinxify the bitcode format document.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159340 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs')
-rw-r--r-- | docs/BitCodeFormat.html | 1490 | ||||
-rw-r--r-- | docs/BitCodeFormat.rst | 1045 | ||||
-rw-r--r-- | docs/subsystems.rst | 3 |
3 files changed, 1047 insertions, 1491 deletions
diff --git a/docs/BitCodeFormat.html b/docs/BitCodeFormat.html deleted file mode 100644 index 6a670f5..0000000 --- a/docs/BitCodeFormat.html +++ /dev/null @@ -1,1490 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> -<html> -<head> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <title>LLVM Bitcode File Format</title> - <link rel="stylesheet" href="_static/llvm.css" type="text/css"> -</head> -<body> -<h1> LLVM Bitcode File Format</h1> -<ol> - <li><a href="#abstract">Abstract</a></li> - <li><a href="#overview">Overview</a></li> - <li><a href="#bitstream">Bitstream Format</a> - <ol> - <li><a href="#magic">Magic Numbers</a></li> - <li><a href="#primitives">Primitives</a></li> - <li><a href="#abbrevid">Abbreviation IDs</a></li> - <li><a href="#blocks">Blocks</a></li> - <li><a href="#datarecord">Data Records</a></li> - <li><a href="#abbreviations">Abbreviations</a></li> - <li><a href="#stdblocks">Standard Blocks</a></li> - </ol> - </li> - <li><a href="#wrapper">Bitcode Wrapper Format</a> - </li> - <li><a href="#llvmir">LLVM IR Encoding</a> - <ol> - <li><a href="#basics">Basics</a></li> - <li><a href="#MODULE_BLOCK">MODULE_BLOCK Contents</a></li> - <li><a href="#PARAMATTR_BLOCK">PARAMATTR_BLOCK Contents</a></li> - <li><a href="#TYPE_BLOCK">TYPE_BLOCK Contents</a></li> - <li><a href="#CONSTANTS_BLOCK">CONSTANTS_BLOCK Contents</a></li> - <li><a href="#FUNCTION_BLOCK">FUNCTION_BLOCK Contents</a></li> - <li><a href="#TYPE_SYMTAB_BLOCK">TYPE_SYMTAB_BLOCK Contents</a></li> - <li><a href="#VALUE_SYMTAB_BLOCK">VALUE_SYMTAB_BLOCK Contents</a></li> - <li><a href="#METADATA_BLOCK">METADATA_BLOCK Contents</a></li> - <li><a href="#METADATA_ATTACHMENT">METADATA_ATTACHMENT Contents</a></li> - </ol> - </li> -</ol> -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>, - <a href="http://www.reverberate.org">Joshua Haberman</a>, - and <a href="mailto:housel@acm.org">Peter S. Housel</a>. -</p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="abstract">Abstract</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>This document describes the LLVM bitstream file format and the encoding of -the LLVM IR into it.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="overview">Overview</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -What is commonly known as the LLVM bitcode file format (also, sometimes -anachronistically known as bytecode) is actually two things: a <a -href="#bitstream">bitstream container format</a> -and an <a href="#llvmir">encoding of LLVM IR</a> into the container format.</p> - -<p> -The bitstream format is an abstract encoding of structured data, very -similar to XML in some ways. Like XML, bitstream files contain tags, and nested -structures, and you can parse the file without having to understand the tags. -Unlike XML, the bitstream format is a binary encoding, and unlike XML it -provides a mechanism for the file to self-describe "abbreviations", which are -effectively size optimizations for the content.</p> - -<p>LLVM IR files may be optionally embedded into a <a -href="#wrapper">wrapper</a> structure that makes it easy to embed extra data -along with LLVM IR files.</p> - -<p>This document first describes the LLVM bitstream format, describes the -wrapper format, then describes the record structure used by LLVM IR files. -</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="bitstream">Bitstream Format</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -The bitstream format is literally a stream of bits, with a very simple -structure. This structure consists of the following concepts: -</p> - -<ul> -<li>A "<a href="#magic">magic number</a>" that identifies the contents of - the stream.</li> -<li>Encoding <a href="#primitives">primitives</a> like variable bit-rate - integers.</li> -<li><a href="#blocks">Blocks</a>, which define nested content.</li> -<li><a href="#datarecord">Data Records</a>, which describe entities within the - file.</li> -<li>Abbreviations, which specify compression optimizations for the file.</li> -</ul> - -<p>Note that the <a -href="CommandGuide/html/llvm-bcanalyzer.html">llvm-bcanalyzer</a> tool can be -used to dump and inspect arbitrary bitstreams, which is very useful for -understanding the encoding.</p> - -<!-- ======================================================================= --> -<h3> - <a name="magic">Magic Numbers</a> -</h3> - -<div> - -<p>The first two bytes of a bitcode file are 'BC' (0x42, 0x43). -The second two bytes are an application-specific magic number. Generic -bitcode tools can look at only the first two bytes to verify the file is -bitcode, while application-specific programs will want to look at all four.</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="primitives">Primitives</a> -</h3> - -<div> - -<p> -A bitstream literally consists of a stream of bits, which are read in order -starting with the least significant bit of each byte. The stream is made up of a -number of primitive values that encode a stream of unsigned integer values. -These integers are encoded in two ways: either as <a href="#fixedwidth">Fixed -Width Integers</a> or as <a href="#variablewidth">Variable Width -Integers</a>. -</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="fixedwidth">Fixed Width Integers</a> -</h4> - -<div> - -<p>Fixed-width integer values have their low bits emitted directly to the file. - For example, a 3-bit integer value encodes 1 as 001. Fixed width integers - are used when there are a well-known number of options for a field. For - example, boolean values are usually encoded with a 1-bit wide integer. -</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="variablewidth">Variable Width Integers</a> -</h4> - -<div> - -<p>Variable-width integer (VBR) values encode values of arbitrary size, -optimizing for the case where the values are small. Given a 4-bit VBR field, -any 3-bit value (0 through 7) is encoded directly, with the high bit set to -zero. Values larger than N-1 bits emit their bits in a series of N-1 bit -chunks, where all but the last set the high bit.</p> - -<p>For example, the value 27 (0x1B) is encoded as 1011 0011 when emitted as a -vbr4 value. The first set of four bits indicates the value 3 (011) with a -continuation piece (indicated by a high bit of 1). The next word indicates a -value of 24 (011 << 3) with no continuation. The sum (3+24) yields the value -27. -</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="char6">6-bit characters</a></h4> - -<div> - -<p>6-bit characters encode common characters into a fixed 6-bit field. They -represent the following characters with the following 6-bit values:</p> - -<div class="doc_code"> -<pre> -'a' .. 'z' — 0 .. 25 -'A' .. 'Z' — 26 .. 51 -'0' .. '9' — 52 .. 61 - '.' — 62 - '_' — 63 -</pre> -</div> - -<p>This encoding is only suitable for encoding characters and strings that -consist only of the above characters. It is completely incapable of encoding -characters not in the set.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="wordalign">Word Alignment</a></h4> - -<div> - -<p>Occasionally, it is useful to emit zero bits until the bitstream is a -multiple of 32 bits. This ensures that the bit position in the stream can be -represented as a multiple of 32-bit words.</p> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="abbrevid">Abbreviation IDs</a> -</h3> - -<div> - -<p> -A bitstream is a sequential series of <a href="#blocks">Blocks</a> and -<a href="#datarecord">Data Records</a>. Both of these start with an -abbreviation ID encoded as a fixed-bitwidth field. The width is specified by -the current block, as described below. The value of the abbreviation ID -specifies either a builtin ID (which have special meanings, defined below) or -one of the abbreviation IDs defined for the current block by the stream itself. -</p> - -<p> -The set of builtin abbrev IDs is: -</p> - -<ul> -<li><tt>0 - <a href="#END_BLOCK">END_BLOCK</a></tt> — This abbrev ID marks - the end of the current block.</li> -<li><tt>1 - <a href="#ENTER_SUBBLOCK">ENTER_SUBBLOCK</a></tt> — This - abbrev ID marks the beginning of a new block.</li> -<li><tt>2 - <a href="#DEFINE_ABBREV">DEFINE_ABBREV</a></tt> — This defines - a new abbreviation.</li> -<li><tt>3 - <a href="#UNABBREV_RECORD">UNABBREV_RECORD</a></tt> — This ID - specifies the definition of an unabbreviated record.</li> -</ul> - -<p>Abbreviation IDs 4 and above are defined by the stream itself, and specify -an <a href="#abbrev_records">abbreviated record encoding</a>.</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="blocks">Blocks</a> -</h3> - -<div> - -<p> -Blocks in a bitstream denote nested regions of the stream, and are identified by -a content-specific id number (for example, LLVM IR uses an ID of 12 to represent -function bodies). Block IDs 0-7 are reserved for <a href="#stdblocks">standard blocks</a> -whose meaning is defined by Bitcode; block IDs 8 and greater are -application specific. Nested blocks capture the hierarchical structure of the data -encoded in it, and various properties are associated with blocks as the file is -parsed. Block definitions allow the reader to efficiently skip blocks -in constant time if the reader wants a summary of blocks, or if it wants to -efficiently skip data it does not understand. The LLVM IR reader uses this -mechanism to skip function bodies, lazily reading them on demand. -</p> - -<p> -When reading and encoding the stream, several properties are maintained for the -block. In particular, each block maintains: -</p> - -<ol> -<li>A current abbrev id width. This value starts at 2 at the beginning of - the stream, and is set every time a - block record is entered. The block entry specifies the abbrev id width for - the body of the block.</li> - -<li>A set of abbreviations. Abbreviations may be defined within a block, in - which case they are only defined in that block (neither subblocks nor - enclosing blocks see the abbreviation). Abbreviations can also be defined - inside a <tt><a href="#BLOCKINFO">BLOCKINFO</a></tt> block, in which case - they are defined in all blocks that match the ID that the BLOCKINFO block is - describing. -</li> -</ol> - -<p> -As sub blocks are entered, these properties are saved and the new sub-block has -its own set of abbreviations, and its own abbrev id width. When a sub-block is -popped, the saved values are restored. -</p> - -<!-- _______________________________________________________________________ --> -<h4><a name="ENTER_SUBBLOCK">ENTER_SUBBLOCK Encoding</a></h4> - -<div> - -<p><tt>[ENTER_SUBBLOCK, blockid<sub>vbr8</sub>, newabbrevlen<sub>vbr4</sub>, - <align32bits>, blocklen<sub>32</sub>]</tt></p> - -<p> -The <tt>ENTER_SUBBLOCK</tt> abbreviation ID specifies the start of a new block -record. The <tt>blockid</tt> value is encoded as an 8-bit VBR identifier, and -indicates the type of block being entered, which can be -a <a href="#stdblocks">standard block</a> or an application-specific block. -The <tt>newabbrevlen</tt> value is a 4-bit VBR, which specifies the abbrev id -width for the sub-block. The <tt>blocklen</tt> value is a 32-bit aligned value -that specifies the size of the subblock in 32-bit words. This value allows the -reader to skip over the entire block in one jump. -</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="END_BLOCK">END_BLOCK Encoding</a></h4> - -<div> - -<p><tt>[END_BLOCK, <align32bits>]</tt></p> - -<p> -The <tt>END_BLOCK</tt> abbreviation ID specifies the end of the current block -record. Its end is aligned to 32-bits to ensure that the size of the block is -an even multiple of 32-bits. -</p> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="datarecord">Data Records</a> -</h3> - -<div> -<p> -Data records consist of a record code and a number of (up to) 64-bit -integer values. The interpretation of the code and values is -application specific and may vary between different block types. -Records can be encoded either using an unabbrev record, or with an -abbreviation. In the LLVM IR format, for example, there is a record -which encodes the target triple of a module. The code is -<tt>MODULE_CODE_TRIPLE</tt>, and the values of the record are the -ASCII codes for the characters in the string. -</p> - -<!-- _______________________________________________________________________ --> -<h4><a name="UNABBREV_RECORD">UNABBREV_RECORD Encoding</a></h4> - -<div> - -<p><tt>[UNABBREV_RECORD, code<sub>vbr6</sub>, numops<sub>vbr6</sub>, - op0<sub>vbr6</sub>, op1<sub>vbr6</sub>, ...]</tt></p> - -<p> -An <tt>UNABBREV_RECORD</tt> provides a default fallback encoding, which is both -completely general and extremely inefficient. It can describe an arbitrary -record by emitting the code and operands as VBRs. -</p> - -<p> -For example, emitting an LLVM IR target triple as an unabbreviated record -requires emitting the <tt>UNABBREV_RECORD</tt> abbrevid, a vbr6 for the -<tt>MODULE_CODE_TRIPLE</tt> code, a vbr6 for the length of the string, which is -equal to the number of operands, and a vbr6 for each character. Because there -are no letters with values less than 32, each letter would need to be emitted as -at least a two-part VBR, which means that each letter would require at least 12 -bits. This is not an efficient encoding, but it is fully general. -</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="abbrev_records">Abbreviated Record Encoding</a></h4> - -<div> - -<p><tt>[<abbrevid>, fields...]</tt></p> - -<p> -An abbreviated record is a abbreviation id followed by a set of fields that are -encoded according to the <a href="#abbreviations">abbreviation definition</a>. -This allows records to be encoded significantly more densely than records -encoded with the <tt><a href="#UNABBREV_RECORD">UNABBREV_RECORD</a></tt> type, -and allows the abbreviation types to be specified in the stream itself, which -allows the files to be completely self describing. The actual encoding of -abbreviations is defined below. -</p> - -<p>The record code, which is the first field of an abbreviated record, -may be encoded in the abbreviation definition (as a literal -operand) or supplied in the abbreviated record (as a Fixed or VBR -operand value).</p> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="abbreviations">Abbreviations</a> -</h3> - -<div> -<p> -Abbreviations are an important form of compression for bitstreams. The idea is -to specify a dense encoding for a class of records once, then use that encoding -to emit many records. It takes space to emit the encoding into the file, but -the space is recouped (hopefully plus some) when the records that use it are -emitted. -</p> - -<p> -Abbreviations can be determined dynamically per client, per file. Because the -abbreviations are stored in the bitstream itself, different streams of the same -format can contain different sets of abbreviations according to the needs -of the specific stream. -As a concrete example, LLVM IR files usually emit an abbreviation -for binary operators. If a specific LLVM module contained no or few binary -operators, the abbreviation does not need to be emitted. -</p> - -<!-- _______________________________________________________________________ --> -<h4><a name="DEFINE_ABBREV">DEFINE_ABBREV Encoding</a></h4> - -<div> - -<p><tt>[DEFINE_ABBREV, numabbrevops<sub>vbr5</sub>, abbrevop0, abbrevop1, - ...]</tt></p> - -<p> -A <tt>DEFINE_ABBREV</tt> record adds an abbreviation to the list of currently -defined abbreviations in the scope of this block. This definition only exists -inside this immediate block — it is not visible in subblocks or enclosing -blocks. Abbreviations are implicitly assigned IDs sequentially starting from 4 -(the first application-defined abbreviation ID). Any abbreviations defined in a -<tt>BLOCKINFO</tt> record for the particular block type -receive IDs first, in order, followed by any -abbreviations defined within the block itself. Abbreviated data records -reference this ID to indicate what abbreviation they are invoking. -</p> - -<p> -An abbreviation definition consists of the <tt>DEFINE_ABBREV</tt> abbrevid -followed by a VBR that specifies the number of abbrev operands, then the abbrev -operands themselves. Abbreviation operands come in three forms. They all start -with a single bit that indicates whether the abbrev operand is a literal operand -(when the bit is 1) or an encoding operand (when the bit is 0). -</p> - -<ol> -<li>Literal operands — <tt>[1<sub>1</sub>, litvalue<sub>vbr8</sub>]</tt> -— Literal operands specify that the value in the result is always a single -specific value. This specific value is emitted as a vbr8 after the bit -indicating that it is a literal operand.</li> -<li>Encoding info without data — <tt>[0<sub>1</sub>, - encoding<sub>3</sub>]</tt> — Operand encodings that do not have extra - data are just emitted as their code. -</li> -<li>Encoding info with data — <tt>[0<sub>1</sub>, encoding<sub>3</sub>, -value<sub>vbr5</sub>]</tt> — Operand encodings that do have extra data are -emitted as their code, followed by the extra data. -</li> -</ol> - -<p>The possible operand encodings are:</p> - -<ul> -<li>Fixed (code 1): The field should be emitted as - a <a href="#fixedwidth">fixed-width value</a>, whose width is specified by - the operand's extra data.</li> -<li>VBR (code 2): The field should be emitted as - a <a href="#variablewidth">variable-width value</a>, whose width is - specified by the operand's extra data.</li> -<li>Array (code 3): This field is an array of values. The array operand - has no extra data, but expects another operand to follow it, indicating - the element type of the array. When reading an array in an abbreviated - record, the first integer is a vbr6 that indicates the array length, - followed by the encoded elements of the array. An array may only occur as - the last operand of an abbreviation (except for the one final operand that - gives the array's type).</li> -<li>Char6 (code 4): This field should be emitted as - a <a href="#char6">char6-encoded value</a>. This operand type takes no - extra data. Char6 encoding is normally used as an array element type. - </li> -<li>Blob (code 5): This field is emitted as a vbr6, followed by padding to a - 32-bit boundary (for alignment) and an array of 8-bit objects. The array of - bytes is further followed by tail padding to ensure that its total length is - a multiple of 4 bytes. This makes it very efficient for the reader to - decode the data without having to make a copy of it: it can use a pointer to - the data in the mapped in file and poke directly at it. A blob may only - occur as the last operand of an abbreviation.</li> -</ul> - -<p> -For example, target triples in LLVM modules are encoded as a record of the -form <tt>[TRIPLE, 'a', 'b', 'c', 'd']</tt>. Consider if the bitstream emitted -the following abbrev entry: -</p> - -<div class="doc_code"> -<pre> -[0, Fixed, 4] -[0, Array] -[0, Char6] -</pre> -</div> - -<p> -When emitting a record with this abbreviation, the above entry would be emitted -as: -</p> - -<div class="doc_code"> -<p> -<tt>[4<sub>abbrevwidth</sub>, 2<sub>4</sub>, 4<sub>vbr6</sub>, 0<sub>6</sub>, -1<sub>6</sub>, 2<sub>6</sub>, 3<sub>6</sub>]</tt> -</p> -</div> - -<p>These values are:</p> - -<ol> -<li>The first value, 4, is the abbreviation ID for this abbreviation.</li> -<li>The second value, 2, is the record code for <tt>TRIPLE</tt> records within LLVM IR file <tt>MODULE_BLOCK</tt> blocks.</li> -<li>The third value, 4, is the length of the array.</li> -<li>The rest of the values are the char6 encoded values - for <tt>"abcd"</tt>.</li> -</ol> - -<p> -With this abbreviation, the triple is emitted with only 37 bits (assuming a -abbrev id width of 3). Without the abbreviation, significantly more space would -be required to emit the target triple. Also, because the <tt>TRIPLE</tt> value -is not emitted as a literal in the abbreviation, the abbreviation can also be -used for any other string value. -</p> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="stdblocks">Standard Blocks</a> -</h3> - -<div> - -<p> -In addition to the basic block structure and record encodings, the bitstream -also defines specific built-in block types. These block types specify how the -stream is to be decoded or other metadata. In the future, new standard blocks -may be added. Block IDs 0-7 are reserved for standard blocks. -</p> - -<!-- _______________________________________________________________________ --> -<h4><a name="BLOCKINFO">#0 - BLOCKINFO Block</a></h4> - -<div> - -<p> -The <tt>BLOCKINFO</tt> block allows the description of metadata for other -blocks. The currently specified records are: -</p> - -<div class="doc_code"> -<pre> -[SETBID (#1), blockid] -[DEFINE_ABBREV, ...] -[BLOCKNAME, ...name...] -[SETRECORDNAME, RecordID, ...name...] -</pre> -</div> - -<p> -The <tt>SETBID</tt> record (code 1) indicates which block ID is being -described. <tt>SETBID</tt> records can occur multiple times throughout the -block to change which block ID is being described. There must be -a <tt>SETBID</tt> record prior to any other records. -</p> - -<p> -Standard <tt>DEFINE_ABBREV</tt> records can occur inside <tt>BLOCKINFO</tt> -blocks, but unlike their occurrence in normal blocks, the abbreviation is -defined for blocks matching the block ID we are describing, <i>not</i> the -<tt>BLOCKINFO</tt> block itself. The abbreviations defined -in <tt>BLOCKINFO</tt> blocks receive abbreviation IDs as described -in <tt><a href="#DEFINE_ABBREV">DEFINE_ABBREV</a></tt>. -</p> - -<p>The <tt>BLOCKNAME</tt> record (code 2) can optionally occur in this block. The elements of -the record are the bytes of the string name of the block. llvm-bcanalyzer can use -this to dump out bitcode files symbolically.</p> - -<p>The <tt>SETRECORDNAME</tt> record (code 3) can also optionally occur in this block. The -first operand value is a record ID number, and the rest of the elements of the record are -the bytes for the string name of the record. llvm-bcanalyzer can use -this to dump out bitcode files symbolically.</p> - -<p> -Note that although the data in <tt>BLOCKINFO</tt> blocks is described as -"metadata," the abbreviations they contain are essential for parsing records -from the corresponding blocks. It is not safe to skip them. -</p> - -</div> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="wrapper">Bitcode Wrapper Format</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -Bitcode files for LLVM IR may optionally be wrapped in a simple wrapper -structure. This structure contains a simple header that indicates the offset -and size of the embedded BC file. This allows additional information to be -stored alongside the BC file. The structure of this file header is: -</p> - -<div class="doc_code"> -<p> -<tt>[Magic<sub>32</sub>, Version<sub>32</sub>, Offset<sub>32</sub>, -Size<sub>32</sub>, CPUType<sub>32</sub>]</tt> -</p> -</div> - -<p> -Each of the fields are 32-bit fields stored in little endian form (as with -the rest of the bitcode file fields). The Magic number is always -<tt>0x0B17C0DE</tt> and the version is currently always <tt>0</tt>. The Offset -field is the offset in bytes to the start of the bitcode stream in the file, and -the Size field is the size in bytes of the stream. CPUType is a target-specific -value that can be used to encode the CPU of the target. -</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="llvmir">LLVM IR Encoding</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -LLVM IR is encoded into a bitstream by defining blocks and records. It uses -blocks for things like constant pools, functions, symbol tables, etc. It uses -records for things like instructions, global variable descriptors, type -descriptions, etc. This document does not describe the set of abbreviations -that the writer uses, as these are fully self-described in the file, and the -reader is not allowed to build in any knowledge of this. -</p> - -<!-- ======================================================================= --> -<h3> - <a name="basics">Basics</a> -</h3> - -<div> - -<!-- _______________________________________________________________________ --> -<h4><a name="ir_magic">LLVM IR Magic Number</a></h4> - -<div> - -<p> -The magic number for LLVM IR files is: -</p> - -<div class="doc_code"> -<p> -<tt>[0x0<sub>4</sub>, 0xC<sub>4</sub>, 0xE<sub>4</sub>, 0xD<sub>4</sub>]</tt> -</p> -</div> - -<p> -When combined with the bitcode magic number and viewed as bytes, this is -<tt>"BC 0xC0DE"</tt>. -</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="ir_signed_vbr">Signed VBRs</a></h4> - -<div> - -<p> -<a href="#variablewidth">Variable Width Integer</a> encoding is an efficient way to -encode arbitrary sized unsigned values, but is an extremely inefficient for -encoding signed values, as signed values are otherwise treated as maximally large -unsigned values. -</p> - -<p> -As such, signed VBR values of a specific width are emitted as follows: -</p> - -<ul> -<li>Positive values are emitted as VBRs of the specified width, but with their - value shifted left by one.</li> -<li>Negative values are emitted as VBRs of the specified width, but the negated - value is shifted left by one, and the low bit is set.</li> -</ul> - -<p> -With this encoding, small positive and small negative values can both -be emitted efficiently. Signed VBR encoding is used in -<tt>CST_CODE_INTEGER</tt> and <tt>CST_CODE_WIDE_INTEGER</tt> records -within <tt>CONSTANTS_BLOCK</tt> blocks. -</p> - -</div> - - -<!-- _______________________________________________________________________ --> -<h4><a name="ir_blocks">LLVM IR Blocks</a></h4> - -<div> - -<p> -LLVM IR is defined with the following blocks: -</p> - -<ul> -<li>8 — <a href="#MODULE_BLOCK"><tt>MODULE_BLOCK</tt></a> — This is the top-level block that - contains the entire module, and describes a variety of per-module - information.</li> -<li>9 — <a href="#PARAMATTR_BLOCK"><tt>PARAMATTR_BLOCK</tt></a> — This enumerates the parameter - attributes.</li> -<li>10 — <a href="#TYPE_BLOCK"><tt>TYPE_BLOCK</tt></a> — This describes all of the types in - the module.</li> -<li>11 — <a href="#CONSTANTS_BLOCK"><tt>CONSTANTS_BLOCK</tt></a> — This describes constants for a - module or function.</li> -<li>12 — <a href="#FUNCTION_BLOCK"><tt>FUNCTION_BLOCK</tt></a> — This describes a function - body.</li> -<li>13 — <a href="#TYPE_SYMTAB_BLOCK"><tt>TYPE_SYMTAB_BLOCK</tt></a> — This describes the type symbol - table.</li> -<li>14 — <a href="#VALUE_SYMTAB_BLOCK"><tt>VALUE_SYMTAB_BLOCK</tt></a> — This describes a value symbol - table.</li> -<li>15 — <a href="#METADATA_BLOCK"><tt>METADATA_BLOCK</tt></a> — This describes metadata items.</li> -<li>16 — <a href="#METADATA_ATTACHMENT"><tt>METADATA_ATTACHMENT</tt></a> — This contains records associating metadata with function instruction values.</li> -</ul> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="MODULE_BLOCK">MODULE_BLOCK Contents</a> -</h3> - -<div> - -<p>The <tt>MODULE_BLOCK</tt> block (id 8) is the top-level block for LLVM -bitcode files, and each bitcode file must contain exactly one. In -addition to records (described below) containing information -about the module, a <tt>MODULE_BLOCK</tt> block may contain the -following sub-blocks: -</p> - -<ul> -<li><a href="#BLOCKINFO"><tt>BLOCKINFO</tt></a></li> -<li><a href="#PARAMATTR_BLOCK"><tt>PARAMATTR_BLOCK</tt></a></li> -<li><a href="#TYPE_BLOCK"><tt>TYPE_BLOCK</tt></a></li> -<li><a href="#TYPE_SYMTAB_BLOCK"><tt>TYPE_SYMTAB_BLOCK</tt></a></li> -<li><a href="#VALUE_SYMTAB_BLOCK"><tt>VALUE_SYMTAB_BLOCK</tt></a></li> -<li><a href="#CONSTANTS_BLOCK"><tt>CONSTANTS_BLOCK</tt></a></li> -<li><a href="#FUNCTION_BLOCK"><tt>FUNCTION_BLOCK</tt></a></li> -<li><a href="#METADATA_BLOCK"><tt>METADATA_BLOCK</tt></a></li> -</ul> - -<!-- _______________________________________________________________________ --> -<h4><a name="MODULE_CODE_VERSION">MODULE_CODE_VERSION Record</a></h4> - -<div> - -<p><tt>[VERSION, version#]</tt></p> - -<p>The <tt>VERSION</tt> record (code 1) contains a single value -indicating the format version. Only version 0 is supported at this -time.</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="MODULE_CODE_TRIPLE">MODULE_CODE_TRIPLE Record</a></h4> - -<div> -<p><tt>[TRIPLE, ...string...]</tt></p> - -<p>The <tt>TRIPLE</tt> record (code 2) contains a variable number of -values representing the bytes of the <tt>target triple</tt> -specification string.</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="MODULE_CODE_DATALAYOUT">MODULE_CODE_DATALAYOUT Record</a></h4> - -<div> -<p><tt>[DATALAYOUT, ...string...]</tt></p> - -<p>The <tt>DATALAYOUT</tt> record (code 3) contains a variable number of -values representing the bytes of the <tt>target datalayout</tt> -specification string.</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="MODULE_CODE_ASM">MODULE_CODE_ASM Record</a></h4> - -<div> -<p><tt>[ASM, ...string...]</tt></p> - -<p>The <tt>ASM</tt> record (code 4) contains a variable number of -values representing the bytes of <tt>module asm</tt> strings, with -individual assembly blocks separated by newline (ASCII 10) characters.</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="MODULE_CODE_SECTIONNAME">MODULE_CODE_SECTIONNAME Record</a></h4> - -<div> -<p><tt>[SECTIONNAME, ...string...]</tt></p> - -<p>The <tt>SECTIONNAME</tt> record (code 5) contains a variable number -of values representing the bytes of a single section name -string. There should be one <tt>SECTIONNAME</tt> record for each -section name referenced (e.g., in global variable or function -<tt>section</tt> attributes) within the module. These records can be -referenced by the 1-based index in the <i>section</i> fields of -<tt>GLOBALVAR</tt> or <tt>FUNCTION</tt> records.</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="MODULE_CODE_DEPLIB">MODULE_CODE_DEPLIB Record</a></h4> - -<div> -<p><tt>[DEPLIB, ...string...]</tt></p> - -<p>The <tt>DEPLIB</tt> record (code 6) contains a variable number of -values representing the bytes of a single dependent library name -string, one of the libraries mentioned in a <tt>deplibs</tt> -declaration. There should be one <tt>DEPLIB</tt> record for each -library name referenced.</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="MODULE_CODE_GLOBALVAR">MODULE_CODE_GLOBALVAR Record</a></h4> - -<div> -<p><tt>[GLOBALVAR, pointer type, isconst, initid, linkage, alignment, section, visibility, threadlocal, unnamed_addr]</tt></p> - -<p>The <tt>GLOBALVAR</tt> record (code 7) marks the declaration or -definition of a global variable. The operand fields are:</p> - -<ul> -<li><i>pointer type</i>: The type index of the pointer type used to point to -this global variable</li> - -<li><i>isconst</i>: Non-zero if the variable is treated as constant within -the module, or zero if it is not</li> - -<li><i>initid</i>: If non-zero, the value index of the initializer for this -variable, plus 1.</li> - -<li><a name="linkage"><i>linkage</i></a>: An encoding of the linkage -type for this variable: - <ul> - <li><tt>external</tt>: code 0</li> - <li><tt>weak</tt>: code 1</li> - <li><tt>appending</tt>: code 2</li> - <li><tt>internal</tt>: code 3</li> - <li><tt>linkonce</tt>: code 4</li> - <li><tt>dllimport</tt>: code 5</li> - <li><tt>dllexport</tt>: code 6</li> - <li><tt>extern_weak</tt>: code 7</li> - <li><tt>common</tt>: code 8</li> - <li><tt>private</tt>: code 9</li> - <li><tt>weak_odr</tt>: code 10</li> - <li><tt>linkonce_odr</tt>: code 11</li> - <li><tt>available_externally</tt>: code 12</li> - <li><tt>linker_private</tt>: code 13</li> - </ul> -</li> - -<li><i>alignment</i>: The logarithm base 2 of the variable's requested -alignment, plus 1</li> - -<li><i>section</i>: If non-zero, the 1-based section index in the -table of <a href="#MODULE_CODE_SECTIONNAME">MODULE_CODE_SECTIONNAME</a> -entries.</li> - -<li><a name="visibility"><i>visibility</i></a>: If present, an -encoding of the visibility of this variable: - <ul> - <li><tt>default</tt>: code 0</li> - <li><tt>hidden</tt>: code 1</li> - <li><tt>protected</tt>: code 2</li> - </ul> -</li> - -<li><i>threadlocal</i>: If present, an encoding of the thread local storage -mode of the variable: - <ul> - <li><tt>not thread local</tt>: code 0</li> - <li><tt>thread local; default TLS model</tt>: code 1</li> - <li><tt>localdynamic</tt>: code 2</li> - <li><tt>initialexec</tt>: code 3</li> - <li><tt>localexec</tt>: code 4</li> - </ul> -</li> - -<li><i>unnamed_addr</i>: If present and non-zero, indicates that the variable -has <tt>unnamed_addr</tt></li> - -</ul> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="MODULE_CODE_FUNCTION">MODULE_CODE_FUNCTION Record</a></h4> - -<div> - -<p><tt>[FUNCTION, type, callingconv, isproto, linkage, paramattr, alignment, section, visibility, gc]</tt></p> - -<p>The <tt>FUNCTION</tt> record (code 8) marks the declaration or -definition of a function. The operand fields are:</p> - -<ul> -<li><i>type</i>: The type index of the function type describing this function</li> - -<li><i>callingconv</i>: The calling convention number: - <ul> - <li><tt>ccc</tt>: code 0</li> - <li><tt>fastcc</tt>: code 8</li> - <li><tt>coldcc</tt>: code 9</li> - <li><tt>x86_stdcallcc</tt>: code 64</li> - <li><tt>x86_fastcallcc</tt>: code 65</li> - <li><tt>arm_apcscc</tt>: code 66</li> - <li><tt>arm_aapcscc</tt>: code 67</li> - <li><tt>arm_aapcs_vfpcc</tt>: code 68</li> - </ul> -</li> - -<li><i>isproto</i>: Non-zero if this entry represents a declaration -rather than a definition</li> - -<li><i>linkage</i>: An encoding of the <a href="#linkage">linkage type</a> -for this function</li> - -<li><i>paramattr</i>: If nonzero, the 1-based parameter attribute index -into the table of <a href="#PARAMATTR_CODE_ENTRY">PARAMATTR_CODE_ENTRY</a> -entries.</li> - -<li><i>alignment</i>: The logarithm base 2 of the function's requested -alignment, plus 1</li> - -<li><i>section</i>: If non-zero, the 1-based section index in the -table of <a href="#MODULE_CODE_SECTIONNAME">MODULE_CODE_SECTIONNAME</a> -entries.</li> - -<li><i>visibility</i>: An encoding of the <a href="#visibility">visibility</a> - of this function</li> - -<li><i>gc</i>: If present and nonzero, the 1-based garbage collector -index in the table of -<a href="#MODULE_CODE_GCNAME">MODULE_CODE_GCNAME</a> entries.</li> - -<li><i>unnamed_addr</i>: If present and non-zero, indicates that the function -has <tt>unnamed_addr</tt></li> - -</ul> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="MODULE_CODE_ALIAS">MODULE_CODE_ALIAS Record</a></h4> - -<div> - -<p><tt>[ALIAS, alias type, aliasee val#, linkage, visibility]</tt></p> - -<p>The <tt>ALIAS</tt> record (code 9) marks the definition of an -alias. The operand fields are</p> - -<ul> -<li><i>alias type</i>: The type index of the alias</li> - -<li><i>aliasee val#</i>: The value index of the aliased value</li> - -<li><i>linkage</i>: An encoding of the <a href="#linkage">linkage type</a> -for this alias</li> - -<li><i>visibility</i>: If present, an encoding of the -<a href="#visibility">visibility</a> of the alias</li> - -</ul> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="MODULE_CODE_PURGEVALS">MODULE_CODE_PURGEVALS Record</a></h4> - -<div> -<p><tt>[PURGEVALS, numvals]</tt></p> - -<p>The <tt>PURGEVALS</tt> record (code 10) resets the module-level -value list to the size given by the single operand value. Module-level -value list items are added by <tt>GLOBALVAR</tt>, <tt>FUNCTION</tt>, -and <tt>ALIAS</tt> records. After a <tt>PURGEVALS</tt> record is seen, -new value indices will start from the given <i>numvals</i> value.</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="MODULE_CODE_GCNAME">MODULE_CODE_GCNAME Record</a></h4> - -<div> -<p><tt>[GCNAME, ...string...]</tt></p> - -<p>The <tt>GCNAME</tt> record (code 11) contains a variable number of -values representing the bytes of a single garbage collector name -string. There should be one <tt>GCNAME</tt> record for each garbage -collector name referenced in function <tt>gc</tt> attributes within -the module. These records can be referenced by 1-based index in the <i>gc</i> -fields of <tt>FUNCTION</tt> records.</p> -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="PARAMATTR_BLOCK">PARAMATTR_BLOCK Contents</a> -</h3> - -<div> - -<p>The <tt>PARAMATTR_BLOCK</tt> block (id 9) contains a table of -entries describing the attributes of function parameters. These -entries are referenced by 1-based index in the <i>paramattr</i> field -of module block <a name="MODULE_CODE_FUNCTION"><tt>FUNCTION</tt></a> -records, or within the <i>attr</i> field of function block <a -href="#FUNC_CODE_INST_INVOKE"><tt>INST_INVOKE</tt></a> and <a -href="#FUNC_CODE_INST_CALL"><tt>INST_CALL</tt></a> records.</p> - -<p>Entries within <tt>PARAMATTR_BLOCK</tt> are constructed to ensure -that each is unique (i.e., no two indicies represent equivalent -attribute lists). </p> - -<!-- _______________________________________________________________________ --> -<h4><a name="PARAMATTR_CODE_ENTRY">PARAMATTR_CODE_ENTRY Record</a></h4> - -<div> - -<p><tt>[ENTRY, paramidx0, attr0, paramidx1, attr1...]</tt></p> - -<p>The <tt>ENTRY</tt> record (code 1) contains an even number of -values describing a unique set of function parameter attributes. Each -<i>paramidx</i> value indicates which set of attributes is -represented, with 0 representing the return value attributes, -0xFFFFFFFF representing function attributes, and other values -representing 1-based function parameters. Each <i>attr</i> value is a -bitmap with the following interpretation: -</p> - -<ul> -<li>bit 0: <tt>zeroext</tt></li> -<li>bit 1: <tt>signext</tt></li> -<li>bit 2: <tt>noreturn</tt></li> -<li>bit 3: <tt>inreg</tt></li> -<li>bit 4: <tt>sret</tt></li> -<li>bit 5: <tt>nounwind</tt></li> -<li>bit 6: <tt>noalias</tt></li> -<li>bit 7: <tt>byval</tt></li> -<li>bit 8: <tt>nest</tt></li> -<li>bit 9: <tt>readnone</tt></li> -<li>bit 10: <tt>readonly</tt></li> -<li>bit 11: <tt>noinline</tt></li> -<li>bit 12: <tt>alwaysinline</tt></li> -<li>bit 13: <tt>optsize</tt></li> -<li>bit 14: <tt>ssp</tt></li> -<li>bit 15: <tt>sspreq</tt></li> -<li>bits 16–31: <tt>align <var>n</var></tt></li> -<li>bit 32: <tt>nocapture</tt></li> -<li>bit 33: <tt>noredzone</tt></li> -<li>bit 34: <tt>noimplicitfloat</tt></li> -<li>bit 35: <tt>naked</tt></li> -<li>bit 36: <tt>inlinehint</tt></li> -<li>bits 37–39: <tt>alignstack <var>n</var></tt>, represented as -the logarithm base 2 of the requested alignment, plus 1</li> -</ul> -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="TYPE_BLOCK">TYPE_BLOCK Contents</a> -</h3> - -<div> - -<p>The <tt>TYPE_BLOCK</tt> block (id 10) contains records which -constitute a table of type operator entries used to represent types -referenced within an LLVM module. Each record (with the exception of -<a href="#TYPE_CODE_NUMENTRY"><tt>NUMENTRY</tt></a>) generates a -single type table entry, which may be referenced by 0-based index from -instructions, constants, metadata, type symbol table entries, or other -type operator records. -</p> - -<p>Entries within <tt>TYPE_BLOCK</tt> are constructed to ensure that -each entry is unique (i.e., no two indicies represent structurally -equivalent types). </p> - -<!-- _______________________________________________________________________ --> -<h4><a name="TYPE_CODE_NUMENTRY">TYPE_CODE_NUMENTRY Record</a></h4> - -<div> - -<p><tt>[NUMENTRY, numentries]</tt></p> - -<p>The <tt>NUMENTRY</tt> record (code 1) contains a single value which -indicates the total number of type code entries in the type table of -the module. If present, <tt>NUMENTRY</tt> should be the first record -in the block. -</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="TYPE_CODE_VOID">TYPE_CODE_VOID Record</a></h4> - -<div> - -<p><tt>[VOID]</tt></p> - -<p>The <tt>VOID</tt> record (code 2) adds a <tt>void</tt> type to the -type table. -</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="TYPE_CODE_HALF">TYPE_CODE_HALF Record</a></h4> - -<div> - -<p><tt>[HALF]</tt></p> - -<p>The <tt>HALF</tt> record (code 10) adds a <tt>half</tt> (16-bit -floating point) type to the type table. -</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="TYPE_CODE_FLOAT">TYPE_CODE_FLOAT Record</a></h4> - -<div> - -<p><tt>[FLOAT]</tt></p> - -<p>The <tt>FLOAT</tt> record (code 3) adds a <tt>float</tt> (32-bit -floating point) type to the type table. -</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="TYPE_CODE_DOUBLE">TYPE_CODE_DOUBLE Record</a></h4> - -<div> - -<p><tt>[DOUBLE]</tt></p> - -<p>The <tt>DOUBLE</tt> record (code 4) adds a <tt>double</tt> (64-bit -floating point) type to the type table. -</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="TYPE_CODE_LABEL">TYPE_CODE_LABEL Record</a></h4> - -<div> - -<p><tt>[LABEL]</tt></p> - -<p>The <tt>LABEL</tt> record (code 5) adds a <tt>label</tt> type to -the type table. -</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="TYPE_CODE_OPAQUE">TYPE_CODE_OPAQUE Record</a></h4> - -<div> - -<p><tt>[OPAQUE]</tt></p> - -<p>The <tt>OPAQUE</tt> record (code 6) adds an <tt>opaque</tt> type to -the type table. Note that distinct <tt>opaque</tt> types are not -unified. -</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="TYPE_CODE_INTEGER">TYPE_CODE_INTEGER Record</a></h4> - -<div> - -<p><tt>[INTEGER, width]</tt></p> - -<p>The <tt>INTEGER</tt> record (code 7) adds an integer type to the -type table. The single <i>width</i> field indicates the width of the -integer type. -</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="TYPE_CODE_POINTER">TYPE_CODE_POINTER Record</a></h4> - -<div> - -<p><tt>[POINTER, pointee type, address space]</tt></p> - -<p>The <tt>POINTER</tt> record (code 8) adds a pointer type to the -type table. The operand fields are</p> - -<ul> -<li><i>pointee type</i>: The type index of the pointed-to type</li> - -<li><i>address space</i>: If supplied, the target-specific numbered -address space where the pointed-to object resides. Otherwise, the -default address space is zero. -</li> -</ul> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="TYPE_CODE_FUNCTION">TYPE_CODE_FUNCTION Record</a></h4> - -<div> - -<p><tt>[FUNCTION, vararg, ignored, retty, ...paramty... ]</tt></p> - -<p>The <tt>FUNCTION</tt> record (code 9) adds a function type to the -type table. The operand fields are</p> - -<ul> -<li><i>vararg</i>: Non-zero if the type represents a varargs function</li> - -<li><i>ignored</i>: This value field is present for backward -compatibility only, and is ignored</li> - -<li><i>retty</i>: The type index of the function's return type</li> - -<li><i>paramty</i>: Zero or more type indices representing the -parameter types of the function</li> -</ul> - -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="TYPE_CODE_STRUCT">TYPE_CODE_STRUCT Record</a></h4> - -<div> - -<p><tt>[STRUCT, ispacked, ...eltty...]</tt></p> - -<p>The <tt>STRUCT </tt> record (code 10) adds a struct type to the -type table. The operand fields are</p> - -<ul> -<li><i>ispacked</i>: Non-zero if the type represents a packed structure</li> - -<li><i>eltty</i>: Zero or more type indices representing the element -types of the structure</li> -</ul> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="TYPE_CODE_ARRAY">TYPE_CODE_ARRAY Record</a></h4> - -<div> - -<p><tt>[ARRAY, numelts, eltty]</tt></p> - -<p>The <tt>ARRAY</tt> record (code 11) adds an array type to the type -table. The operand fields are</p> - -<ul> -<li><i>numelts</i>: The number of elements in arrays of this type</li> - -<li><i>eltty</i>: The type index of the array element type</li> -</ul> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="TYPE_CODE_VECTOR">TYPE_CODE_VECTOR Record</a></h4> - -<div> - -<p><tt>[VECTOR, numelts, eltty]</tt></p> - -<p>The <tt>VECTOR</tt> record (code 12) adds a vector type to the type -table. The operand fields are</p> - -<ul> -<li><i>numelts</i>: The number of elements in vectors of this type</li> - -<li><i>eltty</i>: The type index of the vector element type</li> -</ul> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="TYPE_CODE_X86_FP80">TYPE_CODE_X86_FP80 Record</a></h4> - -<div> - -<p><tt>[X86_FP80]</tt></p> - -<p>The <tt>X86_FP80</tt> record (code 13) adds an <tt>x86_fp80</tt> (80-bit -floating point) type to the type table. -</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="TYPE_CODE_FP128">TYPE_CODE_FP128 Record</a></h4> - -<div> - -<p><tt>[FP128]</tt></p> - -<p>The <tt>FP128</tt> record (code 14) adds an <tt>fp128</tt> (128-bit -floating point) type to the type table. -</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="TYPE_CODE_PPC_FP128">TYPE_CODE_PPC_FP128 Record</a></h4> - -<div> - -<p><tt>[PPC_FP128]</tt></p> - -<p>The <tt>PPC_FP128</tt> record (code 15) adds a <tt>ppc_fp128</tt> -(128-bit floating point) type to the type table. -</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4><a name="TYPE_CODE_METADATA">TYPE_CODE_METADATA Record</a></h4> - -<div> - -<p><tt>[METADATA]</tt></p> - -<p>The <tt>METADATA</tt> record (code 16) adds a <tt>metadata</tt> -type to the type table. -</p> -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="CONSTANTS_BLOCK">CONSTANTS_BLOCK Contents</a> -</h3> - -<div> - -<p>The <tt>CONSTANTS_BLOCK</tt> block (id 11) ... -</p> - -</div> - - -<!-- ======================================================================= --> -<h3> - <a name="FUNCTION_BLOCK">FUNCTION_BLOCK Contents</a> -</h3> - -<div> - -<p>The <tt>FUNCTION_BLOCK</tt> block (id 12) ... -</p> - -<p>In addition to the record types described below, a -<tt>FUNCTION_BLOCK</tt> block may contain the following sub-blocks: -</p> - -<ul> -<li><a href="#CONSTANTS_BLOCK"><tt>CONSTANTS_BLOCK</tt></a></li> -<li><a href="#VALUE_SYMTAB_BLOCK"><tt>VALUE_SYMTAB_BLOCK</tt></a></li> -<li><a href="#METADATA_ATTACHMENT"><tt>METADATA_ATTACHMENT</tt></a></li> -</ul> - -</div> - - -<!-- ======================================================================= --> -<h3> - <a name="TYPE_SYMTAB_BLOCK">TYPE_SYMTAB_BLOCK Contents</a> -</h3> - -<div> - -<p>The <tt>TYPE_SYMTAB_BLOCK</tt> block (id 13) contains entries which -map between module-level named types and their corresponding type -indices. -</p> - -<!-- _______________________________________________________________________ --> -<h4><a name="TST_CODE_ENTRY">TST_CODE_ENTRY Record</a></h4> - -<div> - -<p><tt>[ENTRY, typeid, ...string...]</tt></p> - -<p>The <tt>ENTRY</tt> record (code 1) contains a variable number of -values, with the first giving the type index of the designated type, -and the remaining values giving the character codes of the type -name. Each entry corresponds to a single named type. -</p> -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="VALUE_SYMTAB_BLOCK">VALUE_SYMTAB_BLOCK Contents</a> -</h3> - -<div> - -<p>The <tt>VALUE_SYMTAB_BLOCK</tt> block (id 14) ... -</p> - -</div> - - -<!-- ======================================================================= --> -<h3> - <a name="METADATA_BLOCK">METADATA_BLOCK Contents</a> -</h3> - -<div> - -<p>The <tt>METADATA_BLOCK</tt> block (id 15) ... -</p> - -</div> - - -<!-- ======================================================================= --> -<h3> - <a name="METADATA_ATTACHMENT">METADATA_ATTACHMENT Contents</a> -</h3> - -<div> - -<p>The <tt>METADATA_ATTACHMENT</tt> block (id 16) ... -</p> - -</div> - -</div> - -<!-- *********************************************************************** --> -<hr> -<address> <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a> -<a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a> - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> -<a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> -Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/BitCodeFormat.rst b/docs/BitCodeFormat.rst new file mode 100644 index 0000000..d3995e7 --- /dev/null +++ b/docs/BitCodeFormat.rst @@ -0,0 +1,1045 @@ +.. _bitcode_format: + +.. role:: raw-html(raw) + :format: html + +======================== +LLVM Bitcode File Format +======================== + +.. contents:: + :local: + +Abstract +======== + +This document describes the LLVM bitstream file format and the encoding of the +LLVM IR into it. + +Overview +======== + +What is commonly known as the LLVM bitcode file format (also, sometimes +anachronistically known as bytecode) is actually two things: a `bitstream +container format`_ and an `encoding of LLVM IR`_ into the container format. + +The bitstream format is an abstract encoding of structured data, very similar to +XML in some ways. Like XML, bitstream files contain tags, and nested +structures, and you can parse the file without having to understand the tags. +Unlike XML, the bitstream format is a binary encoding, and unlike XML it +provides a mechanism for the file to self-describe "abbreviations", which are +effectively size optimizations for the content. + +LLVM IR files may be optionally embedded into a `wrapper`_ structure that makes +it easy to embed extra data along with LLVM IR files. + +This document first describes the LLVM bitstream format, describes the wrapper +format, then describes the record structure used by LLVM IR files. + +.. _bitstream container format: + +Bitstream Format +================ + +The bitstream format is literally a stream of bits, with a very simple +structure. This structure consists of the following concepts: + +* A "`magic number`_" that identifies the contents of the stream. + +* Encoding `primitives`_ like variable bit-rate integers. + +* `Blocks`_, which define nested content. + +* `Data Records`_, which describe entities within the file. + +* Abbreviations, which specify compression optimizations for the file. + +Note that the `llvm-bcanalyzer <CommandGuide/html/llvm-bcanalyzer.html>`_ tool +can be used to dump and inspect arbitrary bitstreams, which is very useful for +understanding the encoding. + +.. _magic number: + +Magic Numbers +------------- + +The first two bytes of a bitcode file are 'BC' (``0x42``, ``0x43``). The second +two bytes are an application-specific magic number. Generic bitcode tools can +look at only the first two bytes to verify the file is bitcode, while +application-specific programs will want to look at all four. + +.. _primitives: + +Primitives +---------- + +A bitstream literally consists of a stream of bits, which are read in order +starting with the least significant bit of each byte. The stream is made up of +a number of primitive values that encode a stream of unsigned integer values. +These integers are encoded in two ways: either as `Fixed Width Integers`_ or as +`Variable Width Integers`_. + +.. _Fixed Width Integers: +.. _fixed-width value: + +Fixed Width Integers +^^^^^^^^^^^^^^^^^^^^ + +Fixed-width integer values have their low bits emitted directly to the file. +For example, a 3-bit integer value encodes 1 as 001. Fixed width integers are +used when there are a well-known number of options for a field. For example, +boolean values are usually encoded with a 1-bit wide integer. + +.. _Variable Width Integers: +.. _Variable Width Integer: +.. _variable-width value: + +Variable Width Integers +^^^^^^^^^^^^^^^^^^^^^^^ + +Variable-width integer (VBR) values encode values of arbitrary size, optimizing +for the case where the values are small. Given a 4-bit VBR field, any 3-bit +value (0 through 7) is encoded directly, with the high bit set to zero. Values +larger than N-1 bits emit their bits in a series of N-1 bit chunks, where all +but the last set the high bit. + +For example, the value 27 (0x1B) is encoded as 1011 0011 when emitted as a vbr4 +value. The first set of four bits indicates the value 3 (011) with a +continuation piece (indicated by a high bit of 1). The next word indicates a +value of 24 (011 << 3) with no continuation. The sum (3+24) yields the value +27. + +.. _char6-encoded value: + +6-bit characters +^^^^^^^^^^^^^^^^ + +6-bit characters encode common characters into a fixed 6-bit field. They +represent the following characters with the following 6-bit values: + +:: + + 'a' .. 'z' --- 0 .. 25 + 'A' .. 'Z' --- 26 .. 51 + '0' .. '9' --- 52 .. 61 + '.' --- 62 + '_' --- 63 + +This encoding is only suitable for encoding characters and strings that consist +only of the above characters. It is completely incapable of encoding characters +not in the set. + +Word Alignment +^^^^^^^^^^^^^^ + +Occasionally, it is useful to emit zero bits until the bitstream is a multiple +of 32 bits. This ensures that the bit position in the stream can be represented +as a multiple of 32-bit words. + +Abbreviation IDs +---------------- + +A bitstream is a sequential series of `Blocks`_ and `Data Records`_. Both of +these start with an abbreviation ID encoded as a fixed-bitwidth field. The +width is specified by the current block, as described below. The value of the +abbreviation ID specifies either a builtin ID (which have special meanings, +defined below) or one of the abbreviation IDs defined for the current block by +the stream itself. + +The set of builtin abbrev IDs is: + +* 0 - `END_BLOCK`_ --- This abbrev ID marks the end of the current block. + +* 1 - `ENTER_SUBBLOCK`_ --- This abbrev ID marks the beginning of a new + block. + +* 2 - `DEFINE_ABBREV`_ --- This defines a new abbreviation. + +* 3 - `UNABBREV_RECORD`_ --- This ID specifies the definition of an + unabbreviated record. + +Abbreviation IDs 4 and above are defined by the stream itself, and specify an +`abbreviated record encoding`_. + +.. _Blocks: + +Blocks +------ + +Blocks in a bitstream denote nested regions of the stream, and are identified by +a content-specific id number (for example, LLVM IR uses an ID of 12 to represent +function bodies). Block IDs 0-7 are reserved for `standard blocks`_ whose +meaning is defined by Bitcode; block IDs 8 and greater are application +specific. Nested blocks capture the hierarchical structure of the data encoded +in it, and various properties are associated with blocks as the file is parsed. +Block definitions allow the reader to efficiently skip blocks in constant time +if the reader wants a summary of blocks, or if it wants to efficiently skip data +it does not understand. The LLVM IR reader uses this mechanism to skip function +bodies, lazily reading them on demand. + +When reading and encoding the stream, several properties are maintained for the +block. In particular, each block maintains: + +#. A current abbrev id width. This value starts at 2 at the beginning of the + stream, and is set every time a block record is entered. The block entry + specifies the abbrev id width for the body of the block. + +#. A set of abbreviations. Abbreviations may be defined within a block, in + which case they are only defined in that block (neither subblocks nor + enclosing blocks see the abbreviation). Abbreviations can also be defined + inside a `BLOCKINFO`_ block, in which case they are defined in all blocks + that match the ID that the ``BLOCKINFO`` block is describing. + +As sub blocks are entered, these properties are saved and the new sub-block has +its own set of abbreviations, and its own abbrev id width. When a sub-block is +popped, the saved values are restored. + +.. _ENTER_SUBBLOCK: + +ENTER_SUBBLOCK Encoding +^^^^^^^^^^^^^^^^^^^^^^^ + +:raw-html:`<tt>` +[ENTER_SUBBLOCK, blockid\ :sub:`vbr8`, newabbrevlen\ :sub:`vbr4`, <align32bits>, blocklen_32] +:raw-html:`</tt>` + +The ``ENTER_SUBBLOCK`` abbreviation ID specifies the start of a new block +record. The ``blockid`` value is encoded as an 8-bit VBR identifier, and +indicates the type of block being entered, which can be a `standard block`_ or +an application-specific block. The ``newabbrevlen`` value is a 4-bit VBR, which +specifies the abbrev id width for the sub-block. The ``blocklen`` value is a +32-bit aligned value that specifies the size of the subblock in 32-bit +words. This value allows the reader to skip over the entire block in one jump. + +.. _END_BLOCK: + +END_BLOCK Encoding +^^^^^^^^^^^^^^^^^^ + +``[END_BLOCK, <align32bits>]`` + +The ``END_BLOCK`` abbreviation ID specifies the end of the current block record. +Its end is aligned to 32-bits to ensure that the size of the block is an even +multiple of 32-bits. + +.. _Data Records: + +Data Records +------------ + +Data records consist of a record code and a number of (up to) 64-bit integer +values. The interpretation of the code and values is application specific and +may vary between different block types. Records can be encoded either using an +unabbrev record, or with an abbreviation. In the LLVM IR format, for example, +there is a record which encodes the target triple of a module. The code is +``MODULE_CODE_TRIPLE``, and the values of the record are the ASCII codes for the +characters in the string. + +.. _UNABBREV_RECORD: + +UNABBREV_RECORD Encoding +^^^^^^^^^^^^^^^^^^^^^^^^ + +:raw-html:`<tt>` +[UNABBREV_RECORD, code\ :sub:`vbr6`, numops\ :sub:`vbr6`, op0\ :sub:`vbr6`, op1\ :sub:`vbr6`, ...] +:raw-html:`</tt>` + +An ``UNABBREV_RECORD`` provides a default fallback encoding, which is both +completely general and extremely inefficient. It can describe an arbitrary +record by emitting the code and operands as VBRs. + +For example, emitting an LLVM IR target triple as an unabbreviated record +requires emitting the ``UNABBREV_RECORD`` abbrevid, a vbr6 for the +``MODULE_CODE_TRIPLE`` code, a vbr6 for the length of the string, which is equal +to the number of operands, and a vbr6 for each character. Because there are no +letters with values less than 32, each letter would need to be emitted as at +least a two-part VBR, which means that each letter would require at least 12 +bits. This is not an efficient encoding, but it is fully general. + +.. _abbreviated record encoding: + +Abbreviated Record Encoding +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``[<abbrevid>, fields...]`` + +An abbreviated record is a abbreviation id followed by a set of fields that are +encoded according to the `abbreviation definition`_. This allows records to be +encoded significantly more densely than records encoded with the +`UNABBREV_RECORD`_ type, and allows the abbreviation types to be specified in +the stream itself, which allows the files to be completely self describing. The +actual encoding of abbreviations is defined below. + +The record code, which is the first field of an abbreviated record, may be +encoded in the abbreviation definition (as a literal operand) or supplied in the +abbreviated record (as a Fixed or VBR operand value). + +.. _abbreviation definition: + +Abbreviations +------------- + +Abbreviations are an important form of compression for bitstreams. The idea is +to specify a dense encoding for a class of records once, then use that encoding +to emit many records. It takes space to emit the encoding into the file, but +the space is recouped (hopefully plus some) when the records that use it are +emitted. + +Abbreviations can be determined dynamically per client, per file. Because the +abbreviations are stored in the bitstream itself, different streams of the same +format can contain different sets of abbreviations according to the needs of the +specific stream. As a concrete example, LLVM IR files usually emit an +abbreviation for binary operators. If a specific LLVM module contained no or +few binary operators, the abbreviation does not need to be emitted. + +.. _DEFINE_ABBREV: + +DEFINE_ABBREV Encoding +^^^^^^^^^^^^^^^^^^^^^^ + +:raw-html:`<tt>` +[DEFINE_ABBREV, numabbrevops\ :sub:`vbr5`, abbrevop0, abbrevop1, ...] +:raw-html:`</tt>` + +A ``DEFINE_ABBREV`` record adds an abbreviation to the list of currently defined +abbreviations in the scope of this block. This definition only exists inside +this immediate block --- it is not visible in subblocks or enclosing blocks. +Abbreviations are implicitly assigned IDs sequentially starting from 4 (the +first application-defined abbreviation ID). Any abbreviations defined in a +``BLOCKINFO`` record for the particular block type receive IDs first, in order, +followed by any abbreviations defined within the block itself. Abbreviated data +records reference this ID to indicate what abbreviation they are invoking. + +An abbreviation definition consists of the ``DEFINE_ABBREV`` abbrevid followed +by a VBR that specifies the number of abbrev operands, then the abbrev operands +themselves. Abbreviation operands come in three forms. They all start with a +single bit that indicates whether the abbrev operand is a literal operand (when +the bit is 1) or an encoding operand (when the bit is 0). + +#. Literal operands --- :raw-html:`<tt>` [1\ :sub:`1`, litvalue\ + :sub:`vbr8`] :raw-html:`</tt>` --- Literal operands specify that the value in + the result is always a single specific value. This specific value is emitted + as a vbr8 after the bit indicating that it is a literal operand. + +#. Encoding info without data --- :raw-html:`<tt>` [0\ :sub:`1`, encoding\ + :sub:`3`] :raw-html:`</tt>` --- Operand encodings that do not have extra data + are just emitted as their code. + +#. Encoding info with data --- :raw-html:`<tt>` [0\ :sub:`1`, encoding\ + :sub:`3`, value\ :sub:`vbr5`] :raw-html:`</tt>` --- Operand encodings that do + have extra data are emitted as their code, followed by the extra data. + +The possible operand encodings are: + +* Fixed (code 1): The field should be emitted as a `fixed-width value`_, whose + width is specified by the operand's extra data. + +* VBR (code 2): The field should be emitted as a `variable-width value`_, whose + width is specified by the operand's extra data. + +* Array (code 3): This field is an array of values. The array operand has no + extra data, but expects another operand to follow it, indicating the element + type of the array. When reading an array in an abbreviated record, the first + integer is a vbr6 that indicates the array length, followed by the encoded + elements of the array. An array may only occur as the last operand of an + abbreviation (except for the one final operand that gives the array's + type). + +* Char6 (code 4): This field should be emitted as a `char6-encoded value`_. + This operand type takes no extra data. Char6 encoding is normally used as an + array element type. + +* Blob (code 5): This field is emitted as a vbr6, followed by padding to a + 32-bit boundary (for alignment) and an array of 8-bit objects. The array of + bytes is further followed by tail padding to ensure that its total length is a + multiple of 4 bytes. This makes it very efficient for the reader to decode + the data without having to make a copy of it: it can use a pointer to the data + in the mapped in file and poke directly at it. A blob may only occur as the + last operand of an abbreviation. + +For example, target triples in LLVM modules are encoded as a record of the form +``[TRIPLE, 'a', 'b', 'c', 'd']``. Consider if the bitstream emitted the +following abbrev entry: + +:: + + [0, Fixed, 4] + [0, Array] + [0, Char6] + +When emitting a record with this abbreviation, the above entry would be emitted +as: + +:raw-html:`<tt><blockquote>` +[4\ :sub:`abbrevwidth`, 2\ :sub:`4`, 4\ :sub:`vbr6`, 0\ :sub:`6`, 1\ :sub:`6`, 2\ :sub:`6`, 3\ :sub:`6`] +:raw-html:`</blockquote></tt>` + +These values are: + +#. The first value, 4, is the abbreviation ID for this abbreviation. + +#. The second value, 2, is the record code for ``TRIPLE`` records within LLVM IR + file ``MODULE_BLOCK`` blocks. + +#. The third value, 4, is the length of the array. + +#. The rest of the values are the char6 encoded values for ``"abcd"``. + +With this abbreviation, the triple is emitted with only 37 bits (assuming a +abbrev id width of 3). Without the abbreviation, significantly more space would +be required to emit the target triple. Also, because the ``TRIPLE`` value is +not emitted as a literal in the abbreviation, the abbreviation can also be used +for any other string value. + +.. _standard blocks: +.. _standard block: + +Standard Blocks +--------------- + +In addition to the basic block structure and record encodings, the bitstream +also defines specific built-in block types. These block types specify how the +stream is to be decoded or other metadata. In the future, new standard blocks +may be added. Block IDs 0-7 are reserved for standard blocks. + +.. _BLOCKINFO: + +#0 - BLOCKINFO Block +^^^^^^^^^^^^^^^^^^^^ + +The ``BLOCKINFO`` block allows the description of metadata for other blocks. +The currently specified records are: + +:: + + [SETBID (#1), blockid] + [DEFINE_ABBREV, ...] + [BLOCKNAME, ...name...] + [SETRECORDNAME, RecordID, ...name...] + +The ``SETBID`` record (code 1) indicates which block ID is being described. +``SETBID`` records can occur multiple times throughout the block to change which +block ID is being described. There must be a ``SETBID`` record prior to any +other records. + +Standard ``DEFINE_ABBREV`` records can occur inside ``BLOCKINFO`` blocks, but +unlike their occurrence in normal blocks, the abbreviation is defined for blocks +matching the block ID we are describing, *not* the ``BLOCKINFO`` block +itself. The abbreviations defined in ``BLOCKINFO`` blocks receive abbreviation +IDs as described in `DEFINE_ABBREV`_. + +The ``BLOCKNAME`` record (code 2) can optionally occur in this block. The +elements of the record are the bytes of the string name of the block. +llvm-bcanalyzer can use this to dump out bitcode files symbolically. + +The ``SETRECORDNAME`` record (code 3) can also optionally occur in this block. +The first operand value is a record ID number, and the rest of the elements of +the record are the bytes for the string name of the record. llvm-bcanalyzer can +use this to dump out bitcode files symbolically. + +Note that although the data in ``BLOCKINFO`` blocks is described as "metadata," +the abbreviations they contain are essential for parsing records from the +corresponding blocks. It is not safe to skip them. + +.. _wrapper: + +Bitcode Wrapper Format +====================== + +Bitcode files for LLVM IR may optionally be wrapped in a simple wrapper +structure. This structure contains a simple header that indicates the offset +and size of the embedded BC file. This allows additional information to be +stored alongside the BC file. The structure of this file header is: + +:raw-html:`<tt><blockquote>` +[Magic\ :sub:`32`, Version\ :sub:`32`, Offset\ :sub:`32`, Size\ :sub:`32`, CPUType\ :sub:`32`] +:raw-html:`</blockquote></tt>` + +Each of the fields are 32-bit fields stored in little endian form (as with the +rest of the bitcode file fields). The Magic number is always ``0x0B17C0DE`` and +the version is currently always ``0``. The Offset field is the offset in bytes +to the start of the bitcode stream in the file, and the Size field is the size +in bytes of the stream. CPUType is a target-specific value that can be used to +encode the CPU of the target. + +.. _encoding of LLVM IR: + +LLVM IR Encoding +================ + +LLVM IR is encoded into a bitstream by defining blocks and records. It uses +blocks for things like constant pools, functions, symbol tables, etc. It uses +records for things like instructions, global variable descriptors, type +descriptions, etc. This document does not describe the set of abbreviations +that the writer uses, as these are fully self-described in the file, and the +reader is not allowed to build in any knowledge of this. + +Basics +------ + +LLVM IR Magic Number +^^^^^^^^^^^^^^^^^^^^ + +The magic number for LLVM IR files is: + +:raw-html:`<tt><blockquote>` +[0x0\ :sub:`4`, 0xC\ :sub:`4`, 0xE\ :sub:`4`, 0xD\ :sub:`4`] +:raw-html:`</blockquote></tt>` + +When combined with the bitcode magic number and viewed as bytes, this is +``"BC 0xC0DE"``. + +Signed VBRs +^^^^^^^^^^^ + +`Variable Width Integer`_ encoding is an efficient way to encode arbitrary sized +unsigned values, but is an extremely inefficient for encoding signed values, as +signed values are otherwise treated as maximally large unsigned values. + +As such, signed VBR values of a specific width are emitted as follows: + +* Positive values are emitted as VBRs of the specified width, but with their + value shifted left by one. + +* Negative values are emitted as VBRs of the specified width, but the negated + value is shifted left by one, and the low bit is set. + +With this encoding, small positive and small negative values can both be emitted +efficiently. Signed VBR encoding is used in ``CST_CODE_INTEGER`` and +``CST_CODE_WIDE_INTEGER`` records within ``CONSTANTS_BLOCK`` blocks. + +LLVM IR Blocks +^^^^^^^^^^^^^^ + +LLVM IR is defined with the following blocks: + +* 8 --- `MODULE_BLOCK`_ --- This is the top-level block that contains the entire + module, and describes a variety of per-module information. + +* 9 --- `PARAMATTR_BLOCK`_ --- This enumerates the parameter attributes. + +* 10 --- `TYPE_BLOCK`_ --- This describes all of the types in the module. + +* 11 --- `CONSTANTS_BLOCK`_ --- This describes constants for a module or + function. + +* 12 --- `FUNCTION_BLOCK`_ --- This describes a function body. + +* 13 --- `TYPE_SYMTAB_BLOCK`_ --- This describes the type symbol table. + +* 14 --- `VALUE_SYMTAB_BLOCK`_ --- This describes a value symbol table. + +* 15 --- `METADATA_BLOCK`_ --- This describes metadata items. + +* 16 --- `METADATA_ATTACHMENT`_ --- This contains records associating metadata + with function instruction values. + +.. _MODULE_BLOCK: + +MODULE_BLOCK Contents +--------------------- + +The ``MODULE_BLOCK`` block (id 8) is the top-level block for LLVM bitcode files, +and each bitcode file must contain exactly one. In addition to records +(described below) containing information about the module, a ``MODULE_BLOCK`` +block may contain the following sub-blocks: + +* `BLOCKINFO`_ +* `PARAMATTR_BLOCK`_ +* `TYPE_BLOCK`_ +* `TYPE_SYMTAB_BLOCK`_ +* `VALUE_SYMTAB_BLOCK`_ +* `CONSTANTS_BLOCK`_ +* `FUNCTION_BLOCK`_ +* `METADATA_BLOCK`_ + +MODULE_CODE_VERSION Record +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``[VERSION, version#]`` + +The ``VERSION`` record (code 1) contains a single value indicating the format +version. Only version 0 is supported at this time. + +MODULE_CODE_TRIPLE Record +^^^^^^^^^^^^^^^^^^^^^^^^^ + +``[TRIPLE, ...string...]`` + +The ``TRIPLE`` record (code 2) contains a variable number of values representing +the bytes of the ``target triple`` specification string. + +MODULE_CODE_DATALAYOUT Record +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``[DATALAYOUT, ...string...]`` + +The ``DATALAYOUT`` record (code 3) contains a variable number of values +representing the bytes of the ``target datalayout`` specification string. + +MODULE_CODE_ASM Record +^^^^^^^^^^^^^^^^^^^^^^ + +``[ASM, ...string...]`` + +The ``ASM`` record (code 4) contains a variable number of values representing +the bytes of ``module asm`` strings, with individual assembly blocks separated +by newline (ASCII 10) characters. + +.. _MODULE_CODE_SECTIONNAME: + +MODULE_CODE_SECTIONNAME Record +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``[SECTIONNAME, ...string...]`` + +The ``SECTIONNAME`` record (code 5) contains a variable number of values +representing the bytes of a single section name string. There should be one +``SECTIONNAME`` record for each section name referenced (e.g., in global +variable or function ``section`` attributes) within the module. These records +can be referenced by the 1-based index in the *section* fields of ``GLOBALVAR`` +or ``FUNCTION`` records. + +MODULE_CODE_DEPLIB Record +^^^^^^^^^^^^^^^^^^^^^^^^^ + +``[DEPLIB, ...string...]`` + +The ``DEPLIB`` record (code 6) contains a variable number of values representing +the bytes of a single dependent library name string, one of the libraries +mentioned in a ``deplibs`` declaration. There should be one ``DEPLIB`` record +for each library name referenced. + +MODULE_CODE_GLOBALVAR Record +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``[GLOBALVAR, pointer type, isconst, initid, linkage, alignment, section, visibility, threadlocal, unnamed_addr]`` + +The ``GLOBALVAR`` record (code 7) marks the declaration or definition of a +global variable. The operand fields are: + +* *pointer type*: The type index of the pointer type used to point to this + global variable + +* *isconst*: Non-zero if the variable is treated as constant within the module, + or zero if it is not + +* *initid*: If non-zero, the value index of the initializer for this variable, + plus 1. + +.. _linkage type: + +* *linkage*: An encoding of the linkage type for this variable: + * ``external``: code 0 + * ``weak``: code 1 + * ``appending``: code 2 + * ``internal``: code 3 + * ``linkonce``: code 4 + * ``dllimport``: code 5 + * ``dllexport``: code 6 + * ``extern_weak``: code 7 + * ``common``: code 8 + * ``private``: code 9 + * ``weak_odr``: code 10 + * ``linkonce_odr``: code 11 + * ``available_externally``: code 12 + * ``linker_private``: code 13 + +* alignment*: The logarithm base 2 of the variable's requested alignment, plus 1 + +* *section*: If non-zero, the 1-based section index in the table of + `MODULE_CODE_SECTIONNAME`_ entries. + +.. _visibility: + +* *visibility*: If present, an encoding of the visibility of this variable: + * ``default``: code 0 + * ``hidden``: code 1 + * ``protected``: code 2 + +* *threadlocal*: If present, an encoding of the thread local storage mode of the + variable: + * ``not thread local``: code 0 + * ``thread local; default TLS model``: code 1 + * ``localdynamic``: code 2 + * ``initialexec``: code 3 + * ``localexec``: code 4 + +* *unnamed_addr*: If present and non-zero, indicates that the variable has + ``unnamed_addr`` + +.. _FUNCTION: + +MODULE_CODE_FUNCTION Record +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``[FUNCTION, type, callingconv, isproto, linkage, paramattr, alignment, section, visibility, gc]`` + +The ``FUNCTION`` record (code 8) marks the declaration or definition of a +function. The operand fields are: + +* *type*: The type index of the function type describing this function + +* *callingconv*: The calling convention number: + * ``ccc``: code 0 + * ``fastcc``: code 8 + * ``coldcc``: code 9 + * ``x86_stdcallcc``: code 64 + * ``x86_fastcallcc``: code 65 + * ``arm_apcscc``: code 66 + * ``arm_aapcscc``: code 67 + * ``arm_aapcs_vfpcc``: code 68 + +* isproto*: Non-zero if this entry represents a declaration rather than a + definition + +* *linkage*: An encoding of the `linkage type`_ for this function + +* *paramattr*: If nonzero, the 1-based parameter attribute index into the table + of `PARAMATTR_CODE_ENTRY`_ entries. + +* *alignment*: The logarithm base 2 of the function's requested alignment, plus + 1 + +* *section*: If non-zero, the 1-based section index in the table of + `MODULE_CODE_SECTIONNAME`_ entries. + +* *visibility*: An encoding of the `visibility`_ of this function + +* *gc*: If present and nonzero, the 1-based garbage collector index in the table + of `MODULE_CODE_GCNAME`_ entries. + +* *unnamed_addr*: If present and non-zero, indicates that the function has + ``unnamed_addr`` + +MODULE_CODE_ALIAS Record +^^^^^^^^^^^^^^^^^^^^^^^^ + +``[ALIAS, alias type, aliasee val#, linkage, visibility]`` + +The ``ALIAS`` record (code 9) marks the definition of an alias. The operand +fields are + +* *alias type*: The type index of the alias + +* *aliasee val#*: The value index of the aliased value + +* *linkage*: An encoding of the `linkage type`_ for this alias + +* *visibility*: If present, an encoding of the `visibility`_ of the alias + +MODULE_CODE_PURGEVALS Record +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``[PURGEVALS, numvals]`` + +The ``PURGEVALS`` record (code 10) resets the module-level value list to the +size given by the single operand value. Module-level value list items are added +by ``GLOBALVAR``, ``FUNCTION``, and ``ALIAS`` records. After a ``PURGEVALS`` +record is seen, new value indices will start from the given *numvals* value. + +.. _MODULE_CODE_GCNAME: + +MODULE_CODE_GCNAME Record +^^^^^^^^^^^^^^^^^^^^^^^^^ + +``[GCNAME, ...string...]`` + +The ``GCNAME`` record (code 11) contains a variable number of values +representing the bytes of a single garbage collector name string. There should +be one ``GCNAME`` record for each garbage collector name referenced in function +``gc`` attributes within the module. These records can be referenced by 1-based +index in the *gc* fields of ``FUNCTION`` records. + +.. _PARAMATTR_BLOCK: + +PARAMATTR_BLOCK Contents +------------------------ + +The ``PARAMATTR_BLOCK`` block (id 9) contains a table of entries describing the +attributes of function parameters. These entries are referenced by 1-based index +in the *paramattr* field of module block `FUNCTION`_ records, or within the +*attr* field of function block ``INST_INVOKE`` and ``INST_CALL`` records. + +Entries within ``PARAMATTR_BLOCK`` are constructed to ensure that each is unique +(i.e., no two indicies represent equivalent attribute lists). + +.. _PARAMATTR_CODE_ENTRY: + +PARAMATTR_CODE_ENTRY Record +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``[ENTRY, paramidx0, attr0, paramidx1, attr1...]`` + +The ``ENTRY`` record (code 1) contains an even number of values describing a +unique set of function parameter attributes. Each *paramidx* value indicates +which set of attributes is represented, with 0 representing the return value +attributes, 0xFFFFFFFF representing function attributes, and other values +representing 1-based function parameters. Each *attr* value is a bitmap with the +following interpretation: + +* bit 0: ``zeroext`` +* bit 1: ``signext`` +* bit 2: ``noreturn`` +* bit 3: ``inreg`` +* bit 4: ``sret`` +* bit 5: ``nounwind`` +* bit 6: ``noalias`` +* bit 7: ``byval`` +* bit 8: ``nest`` +* bit 9: ``readnone`` +* bit 10: ``readonly`` +* bit 11: ``noinline`` +* bit 12: ``alwaysinline`` +* bit 13: ``optsize`` +* bit 14: ``ssp`` +* bit 15: ``sspreq`` +* bits 16-31: ``align n`` +* bit 32: ``nocapture`` +* bit 33: ``noredzone`` +* bit 34: ``noimplicitfloat`` +* bit 35: ``naked`` +* bit 36: ``inlinehint`` +* bits 37-39: ``alignstack n``, represented as the logarithm + base 2 of the requested alignment, plus 1 + +.. _TYPE_BLOCK: + +TYPE_BLOCK Contents +------------------- + +The ``TYPE_BLOCK`` block (id 10) contains records which constitute a table of +type operator entries used to represent types referenced within an LLVM +module. Each record (with the exception of `NUMENTRY`_) generates a single type +table entry, which may be referenced by 0-based index from instructions, +constants, metadata, type symbol table entries, or other type operator records. + +Entries within ``TYPE_BLOCK`` are constructed to ensure that each entry is +unique (i.e., no two indicies represent structurally equivalent types). + +.. _TYPE_CODE_NUMENTRY: +.. _NUMENTRY: + +TYPE_CODE_NUMENTRY Record +^^^^^^^^^^^^^^^^^^^^^^^^^ + +``[NUMENTRY, numentries]`` + +The ``NUMENTRY`` record (code 1) contains a single value which indicates the +total number of type code entries in the type table of the module. If present, +``NUMENTRY`` should be the first record in the block. + +TYPE_CODE_VOID Record +^^^^^^^^^^^^^^^^^^^^^ + +``[VOID]`` + +The ``VOID`` record (code 2) adds a ``void`` type to the type table. + +TYPE_CODE_HALF Record +^^^^^^^^^^^^^^^^^^^^^ + +``[HALF]`` + +The ``HALF`` record (code 10) adds a ``half`` (16-bit floating point) type to +the type table. + +TYPE_CODE_FLOAT Record +^^^^^^^^^^^^^^^^^^^^^^ + +``[FLOAT]`` + +The ``FLOAT`` record (code 3) adds a ``float`` (32-bit floating point) type to +the type table. + +TYPE_CODE_DOUBLE Record +^^^^^^^^^^^^^^^^^^^^^^^ + +``[DOUBLE]`` + +The ``DOUBLE`` record (code 4) adds a ``double`` (64-bit floating point) type to +the type table. + +TYPE_CODE_LABEL Record +^^^^^^^^^^^^^^^^^^^^^^ + +``[LABEL]`` + +The ``LABEL`` record (code 5) adds a ``label`` type to the type table. + +TYPE_CODE_OPAQUE Record +^^^^^^^^^^^^^^^^^^^^^^^ + +``[OPAQUE]`` + +The ``OPAQUE`` record (code 6) adds an ``opaque`` type to the type table. Note +that distinct ``opaque`` types are not unified. + +TYPE_CODE_INTEGER Record +^^^^^^^^^^^^^^^^^^^^^^^^ + +``[INTEGER, width]`` + +The ``INTEGER`` record (code 7) adds an integer type to the type table. The +single *width* field indicates the width of the integer type. + +TYPE_CODE_POINTER Record +^^^^^^^^^^^^^^^^^^^^^^^^ + +``[POINTER, pointee type, address space]`` + +The ``POINTER`` record (code 8) adds a pointer type to the type table. The +operand fields are + +* *pointee type*: The type index of the pointed-to type + +* *address space*: If supplied, the target-specific numbered address space where + the pointed-to object resides. Otherwise, the default address space is zero. + +TYPE_CODE_FUNCTION Record +^^^^^^^^^^^^^^^^^^^^^^^^^ + +``[FUNCTION, vararg, ignored, retty, ...paramty... ]`` + +The ``FUNCTION`` record (code 9) adds a function type to the type table. The +operand fields are + +* *vararg*: Non-zero if the type represents a varargs function + +* *ignored*: This value field is present for backward compatibility only, and is + ignored + +* *retty*: The type index of the function's return type + +* *paramty*: Zero or more type indices representing the parameter types of the + function + +TYPE_CODE_STRUCT Record +^^^^^^^^^^^^^^^^^^^^^^^ + +``[STRUCT, ispacked, ...eltty...]`` + +The ``STRUCT`` record (code 10) adds a struct type to the type table. The +operand fields are + +* *ispacked*: Non-zero if the type represents a packed structure + +* *eltty*: Zero or more type indices representing the element types of the + structure + +TYPE_CODE_ARRAY Record +^^^^^^^^^^^^^^^^^^^^^^ + +``[ARRAY, numelts, eltty]`` + +The ``ARRAY`` record (code 11) adds an array type to the type table. The +operand fields are + +* *numelts*: The number of elements in arrays of this type + +* *eltty*: The type index of the array element type + +TYPE_CODE_VECTOR Record +^^^^^^^^^^^^^^^^^^^^^^^ + +``[VECTOR, numelts, eltty]`` + +The ``VECTOR`` record (code 12) adds a vector type to the type table. The +operand fields are + +* *numelts*: The number of elements in vectors of this type + +* *eltty*: The type index of the vector element type + +TYPE_CODE_X86_FP80 Record +^^^^^^^^^^^^^^^^^^^^^^^^^ + +``[X86_FP80]`` + +The ``X86_FP80`` record (code 13) adds an ``x86_fp80`` (80-bit floating point) +type to the type table. + +TYPE_CODE_FP128 Record +^^^^^^^^^^^^^^^^^^^^^^ + +``[FP128]`` + +The ``FP128`` record (code 14) adds an ``fp128`` (128-bit floating point) type +to the type table. + +TYPE_CODE_PPC_FP128 Record +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``[PPC_FP128]`` + +The ``PPC_FP128`` record (code 15) adds a ``ppc_fp128`` (128-bit floating point) +type to the type table. + +TYPE_CODE_METADATA Record +^^^^^^^^^^^^^^^^^^^^^^^^^ + +``[METADATA]`` + +The ``METADATA`` record (code 16) adds a ``metadata`` type to the type table. + +.. _CONSTANTS_BLOCK: + +CONSTANTS_BLOCK Contents +------------------------ + +The ``CONSTANTS_BLOCK`` block (id 11) ... + +.. _FUNCTION_BLOCK: + +FUNCTION_BLOCK Contents +----------------------- + +The ``FUNCTION_BLOCK`` block (id 12) ... + +In addition to the record types described below, a ``FUNCTION_BLOCK`` block may +contain the following sub-blocks: + +* `CONSTANTS_BLOCK`_ +* `VALUE_SYMTAB_BLOCK`_ +* `METADATA_ATTACHMENT`_ + +.. _TYPE_SYMTAB_BLOCK: + +TYPE_SYMTAB_BLOCK Contents +-------------------------- + +The ``TYPE_SYMTAB_BLOCK`` block (id 13) contains entries which map between +module-level named types and their corresponding type indices. + +.. _TST_CODE_ENTRY: + +TST_CODE_ENTRY Record +^^^^^^^^^^^^^^^^^^^^^ + +``[ENTRY, typeid, ...string...]`` + +The ``ENTRY`` record (code 1) contains a variable number of values, with the +first giving the type index of the designated type, and the remaining values +giving the character codes of the type name. Each entry corresponds to a single +named type. + +.. _VALUE_SYMTAB_BLOCK: + +VALUE_SYMTAB_BLOCK Contents +--------------------------- + +The ``VALUE_SYMTAB_BLOCK`` block (id 14) ... + +.. _METADATA_BLOCK: + +METADATA_BLOCK Contents +----------------------- + +The ``METADATA_BLOCK`` block (id 15) ... + +.. _METADATA_ATTACHMENT: + +METADATA_ATTACHMENT Contents +---------------------------- + +The ``METADATA_ATTACHMENT`` block (id 16) ... diff --git a/docs/subsystems.rst b/docs/subsystems.rst index 27dff6b..c4c3b6d 100644 --- a/docs/subsystems.rst +++ b/docs/subsystems.rst @@ -7,6 +7,7 @@ Subsystem Documentation :hidden: AliasAnalysis + BitCodeFormat BranchWeightMetadata Bugpoint ExceptionHandling @@ -58,7 +59,7 @@ Subsystem Documentation Automatic bug finder and test-case reducer description and usage information. -* `LLVM Bitcode File Format <BitCodeFormat.html>`_ +* :ref:`bitcode_format` This describes the file format and encoding used for LLVM "bc" files. |