diff options
author | Chris Lattner <sabre@nondot.org> | 2007-05-12 07:49:15 +0000 |
---|---|---|
committer | Chris Lattner <sabre@nondot.org> | 2007-05-12 07:49:15 +0000 |
commit | daeb63c22064a4f25f6df2b04c34a5d3aa6af873 (patch) | |
tree | 11bc39fb4d86c7f6d26f620f2c0d2f54dba29e11 /docs/BitCodeFormat.html | |
parent | 3a1716db5818feb96054dcce325e8840063d10b7 (diff) | |
download | external_llvm-daeb63c22064a4f25f6df2b04c34a5d3aa6af873.zip external_llvm-daeb63c22064a4f25f6df2b04c34a5d3aa6af873.tar.gz external_llvm-daeb63c22064a4f25f6df2b04c34a5d3aa6af873.tar.bz2 |
continued description
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@37003 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs/BitCodeFormat.html')
-rw-r--r-- | docs/BitCodeFormat.html | 113 |
1 files changed, 108 insertions, 5 deletions
diff --git a/docs/BitCodeFormat.html b/docs/BitCodeFormat.html index b84cd0e..16171d3 100644 --- a/docs/BitCodeFormat.html +++ b/docs/BitCodeFormat.html @@ -18,6 +18,7 @@ <li><a href="#abbrevid">Abbreviation IDs</a></li> <li><a href="#blocks">Blocks</a></li> <li><a href="#datarecord">Data Records</a></li> + <li><a href="#abbreviations">Abbreviations</a></li> </ol> </li> <li><a href="#llvmir">LLVM IR Encoding</a></li> @@ -213,12 +214,14 @@ The set of builtin abbrev IDs is: current block.</li> <li>1 - <a href="#ENTER_SUBBLOCK">ENTER_SUBBLOCK</a> - This abbrev ID marks the beginning of a new block.</li> -<li>2 - DEFINE_ABBREV - This defines a new abbreviation.</li> -<li>3 - UNABBREV_RECORD - This ID specifies the definition of an unabbreviated - record.</li> +<li>2 - <a href="#DEFINE_ABBREV">DEFINE_ABBREV</a> - This defines a new + abbreviation.</li> +<li>3 - <a href="#UNABBREV_RECORD">UNABBREV_RECORD</a> - This ID specifies the + definition of an unabbreviated record.</li> </ul> -<p>Abbreviation IDs 4 and above are defined by the stream itself.</p> +<p>Abbreviation IDs 4 and above are defined by the stream itself, and specify +an <a href="#abbrev_records">abbreviated record encoding</a>.</p> </div> @@ -303,11 +306,111 @@ multiple of 32-bits.</p> </div> <div class="doc_text"> +<p> +Data records consist of a record code and a number of (up to) 64-bit integer +values. The interpretation of the code and values is application specific and +there are multiple different ways to encode a record (with an unabbrev record +or with an abbreviation). In the LLVM IR format, for example, there is a record +which encodes the target triple of a module. The code is MODULE_CODE_TRIPLE, +and the values of the record are the ascii codes for the characters in the +string.</p> + +</div> + +<!-- _______________________________________________________________________ --> +<div class="doc_subsubsection"> <a name="UNABBREV_RECORD">UNABBREV_RECORD +Encoding</a></div> + +<div class="doc_text"> + +<p><tt>[UNABBREV_RECORD, code<sub>vbr6</sub>, numops<sub>vbr6</sub>, + op0<sub>vbr6</sub>, op1<sub>vbr6</sub>, ...]</tt></p> + +<p>An UNABBREV_RECORD provides a default fallback encoding, which is both +completely general and also extremely inefficient. It can describe an arbitrary +record, by emitting the code and operands as vbrs.</p> + +<p>For example, emitting an LLVM IR target triple as an unabbreviated record +requires emitting the UNABBREV_RECORD abbrevid, a vbr6 for the +MODULE_CODE_TRIPLE code, a vbr6 for the length of the string (which is equal to +the number of operands), and a vbr6 for each character. Since there are no +letters with value less than 32, each letter would need to be emitted as at +least a two-part VBR, which means that each letter would require at least 12 +bits. This is not an efficient encoding, but it is fully general.</p> +</div> + +<!-- _______________________________________________________________________ --> +<div class="doc_subsubsection"> <a name="abbrev_records">Abbreviated Record +Encoding</a></div> + +<div class="doc_text"> + +<p><tt>[<abbrevid>, fields...]</tt></p> + +<p>An abbreviated record is a abbreviation id followed by a set of fields that +are encoded according to the <a href="#abbreviations">abbreviation +definition</a>. This allows records to be encoded significantly more densely +than records encoded with the <a href="#UNABBREV_RECORD">UNABBREV_RECORD</a> +type, and allows the abbreviation types to be specified in the stream itself, +which allows the files to be completely self describing. The actual encoding +of abbreviations is defined below. +</p> + +</div> + +<!-- ======================================================================= --> +<div class="doc_subsection"><a name="abbreviations">Abbreviations</a> +</div> + +<div class="doc_text"> <p> -blah +Abbreviations are an important form of compression for bitstreams. The idea is +to specify a dense encoding for a class of records once, then use that encoding +to emit many records. It takes space to emit the encoding into the file, but +the space is recouped (hopefully plus some) when the records that use it are +emitted. </p> +<p> +Abbreviations can be determined dynamically per client, per file. Since the +abbreviations are stored in the bitstream itself, different streams of the same +format can contain different sets of abbreviations if the specific stream does +not need it. As a concrete example, LLVM IR files usually emit an abbreviation +for binary operators. If a specific LLVM module contained no or few binary +operators, the abbreviation does not need to be emitted. +</p> +</div> + +<!-- _______________________________________________________________________ --> +<div class="doc_subsubsection"><a name="DEFINE_ABBREV">DEFINE_ABBREV + Encoding</a></div> + +<div class="doc_text"> + +<p><tt>[DEFINE_ABBREV, numabbrevops<sub>vbr5</sub>, abbrevop0, abbrevop1, + ...]</tt></p> + +<p>An abbreviation definition consists of the DEFINE_ABBREV abbrevid followed +by a VBR that specifies the number of abbrev operands, then the abbrev +operands themselves. Abbreviation operands come in three forms. They all start +with a single bit that indicates whether the abbrev operand is a literal operand +(when the bit is 1) or an encoding operand (when the bit is 0).</p> + +<ol> +<li>Literal operands - <tt>[1<sub>1</sub>, litvalue<sub>vbr8</sub>]</tt> - +Literal operands specify that the value in the result +is always a single specific value. This specific value is emitted as a vbr8 +after the bit indicating that it is a literal operand.</li> +<li>Encoding info without data - <tt>[0<sub>1</sub>, encoding<sub>3</sub>]</tt> + - blah +</li> +<li>Encoding info with data - <tt>[0<sub>1</sub>, encoding<sub>3</sub>, +value<sub>vbr5</sub>]</tt> - + +</li> +</ol> + </div> |