aboutsummaryrefslogtreecommitdiffstats
path: root/docs/BitCodeFormat.html
diff options
context:
space:
mode:
authorChris Lattner <sabre@nondot.org>2007-05-12 07:49:15 +0000
committerChris Lattner <sabre@nondot.org>2007-05-12 07:49:15 +0000
commitdaeb63c22064a4f25f6df2b04c34a5d3aa6af873 (patch)
tree11bc39fb4d86c7f6d26f620f2c0d2f54dba29e11 /docs/BitCodeFormat.html
parent3a1716db5818feb96054dcce325e8840063d10b7 (diff)
downloadexternal_llvm-daeb63c22064a4f25f6df2b04c34a5d3aa6af873.zip
external_llvm-daeb63c22064a4f25f6df2b04c34a5d3aa6af873.tar.gz
external_llvm-daeb63c22064a4f25f6df2b04c34a5d3aa6af873.tar.bz2
continued description
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@37003 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs/BitCodeFormat.html')
-rw-r--r--docs/BitCodeFormat.html113
1 files changed, 108 insertions, 5 deletions
diff --git a/docs/BitCodeFormat.html b/docs/BitCodeFormat.html
index b84cd0e..16171d3 100644
--- a/docs/BitCodeFormat.html
+++ b/docs/BitCodeFormat.html
@@ -18,6 +18,7 @@
<li><a href="#abbrevid">Abbreviation IDs</a></li>
<li><a href="#blocks">Blocks</a></li>
<li><a href="#datarecord">Data Records</a></li>
+ <li><a href="#abbreviations">Abbreviations</a></li>
</ol>
</li>
<li><a href="#llvmir">LLVM IR Encoding</a></li>
@@ -213,12 +214,14 @@ The set of builtin abbrev IDs is:
current block.</li>
<li>1 - <a href="#ENTER_SUBBLOCK">ENTER_SUBBLOCK</a> - This abbrev ID marks the
beginning of a new block.</li>
-<li>2 - DEFINE_ABBREV - This defines a new abbreviation.</li>
-<li>3 - UNABBREV_RECORD - This ID specifies the definition of an unabbreviated
- record.</li>
+<li>2 - <a href="#DEFINE_ABBREV">DEFINE_ABBREV</a> - This defines a new
+ abbreviation.</li>
+<li>3 - <a href="#UNABBREV_RECORD">UNABBREV_RECORD</a> - This ID specifies the
+ definition of an unabbreviated record.</li>
</ul>
-<p>Abbreviation IDs 4 and above are defined by the stream itself.</p>
+<p>Abbreviation IDs 4 and above are defined by the stream itself, and specify
+an <a href="#abbrev_records">abbreviated record encoding</a>.</p>
</div>
@@ -303,11 +306,111 @@ multiple of 32-bits.</p>
</div>
<div class="doc_text">
+<p>
+Data records consist of a record code and a number of (up to) 64-bit integer
+values. The interpretation of the code and values is application specific and
+there are multiple different ways to encode a record (with an unabbrev record
+or with an abbreviation). In the LLVM IR format, for example, there is a record
+which encodes the target triple of a module. The code is MODULE_CODE_TRIPLE,
+and the values of the record are the ascii codes for the characters in the
+string.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection"> <a name="UNABBREV_RECORD">UNABBREV_RECORD
+Encoding</a></div>
+
+<div class="doc_text">
+
+<p><tt>[UNABBREV_RECORD, code<sub>vbr6</sub>, numops<sub>vbr6</sub>,
+ op0<sub>vbr6</sub>, op1<sub>vbr6</sub>, ...]</tt></p>
+
+<p>An UNABBREV_RECORD provides a default fallback encoding, which is both
+completely general and also extremely inefficient. It can describe an arbitrary
+record, by emitting the code and operands as vbrs.</p>
+
+<p>For example, emitting an LLVM IR target triple as an unabbreviated record
+requires emitting the UNABBREV_RECORD abbrevid, a vbr6 for the
+MODULE_CODE_TRIPLE code, a vbr6 for the length of the string (which is equal to
+the number of operands), and a vbr6 for each character. Since there are no
+letters with value less than 32, each letter would need to be emitted as at
+least a two-part VBR, which means that each letter would require at least 12
+bits. This is not an efficient encoding, but it is fully general.</p>
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection"> <a name="abbrev_records">Abbreviated Record
+Encoding</a></div>
+
+<div class="doc_text">
+
+<p><tt>[&lt;abbrevid&gt;, fields...]</tt></p>
+
+<p>An abbreviated record is a abbreviation id followed by a set of fields that
+are encoded according to the <a href="#abbreviations">abbreviation
+definition</a>. This allows records to be encoded significantly more densely
+than records encoded with the <a href="#UNABBREV_RECORD">UNABBREV_RECORD</a>
+type, and allows the abbreviation types to be specified in the stream itself,
+which allows the files to be completely self describing. The actual encoding
+of abbreviations is defined below.
+</p>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection"><a name="abbreviations">Abbreviations</a>
+</div>
+
+<div class="doc_text">
<p>
-blah
+Abbreviations are an important form of compression for bitstreams. The idea is
+to specify a dense encoding for a class of records once, then use that encoding
+to emit many records. It takes space to emit the encoding into the file, but
+the space is recouped (hopefully plus some) when the records that use it are
+emitted.
</p>
+<p>
+Abbreviations can be determined dynamically per client, per file. Since the
+abbreviations are stored in the bitstream itself, different streams of the same
+format can contain different sets of abbreviations if the specific stream does
+not need it. As a concrete example, LLVM IR files usually emit an abbreviation
+for binary operators. If a specific LLVM module contained no or few binary
+operators, the abbreviation does not need to be emitted.
+</p>
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection"><a name="DEFINE_ABBREV">DEFINE_ABBREV
+ Encoding</a></div>
+
+<div class="doc_text">
+
+<p><tt>[DEFINE_ABBREV, numabbrevops<sub>vbr5</sub>, abbrevop0, abbrevop1,
+ ...]</tt></p>
+
+<p>An abbreviation definition consists of the DEFINE_ABBREV abbrevid followed
+by a VBR that specifies the number of abbrev operands, then the abbrev
+operands themselves. Abbreviation operands come in three forms. They all start
+with a single bit that indicates whether the abbrev operand is a literal operand
+(when the bit is 1) or an encoding operand (when the bit is 0).</p>
+
+<ol>
+<li>Literal operands - <tt>[1<sub>1</sub>, litvalue<sub>vbr8</sub>]</tt> -
+Literal operands specify that the value in the result
+is always a single specific value. This specific value is emitted as a vbr8
+after the bit indicating that it is a literal operand.</li>
+<li>Encoding info without data - <tt>[0<sub>1</sub>, encoding<sub>3</sub>]</tt>
+ - blah
+</li>
+<li>Encoding info with data - <tt>[0<sub>1</sub>, encoding<sub>3</sub>,
+value<sub>vbr5</sub>]</tt> -
+
+</li>
+</ol>
+
</div>