diff options
author | Stephen Hines <srhines@google.com> | 2014-04-23 16:57:46 -0700 |
---|---|---|
committer | Stephen Hines <srhines@google.com> | 2014-04-24 15:53:16 -0700 |
commit | 36b56886974eae4f9c5ebc96befd3e7bfe5de338 (patch) | |
tree | e6cfb69fbbd937f450eeb83bfb83b9da3b01275a /docs/TableGen | |
parent | 69a8640022b04415ae9fac62f8ab090601d8f889 (diff) | |
download | external_llvm-36b56886974eae4f9c5ebc96befd3e7bfe5de338.zip external_llvm-36b56886974eae4f9c5ebc96befd3e7bfe5de338.tar.gz external_llvm-36b56886974eae4f9c5ebc96befd3e7bfe5de338.tar.bz2 |
Update to LLVM 3.5a.
Change-Id: Ifadecab779f128e62e430c2b4f6ddd84953ed617
Diffstat (limited to 'docs/TableGen')
-rw-r--r-- | docs/TableGen/BackEnds.rst | 428 | ||||
-rw-r--r-- | docs/TableGen/Deficiencies.rst | 31 | ||||
-rw-r--r-- | docs/TableGen/LangIntro.rst | 601 | ||||
-rw-r--r-- | docs/TableGen/LangRef.rst | 3 | ||||
-rw-r--r-- | docs/TableGen/index.rst | 309 |
5 files changed, 1372 insertions, 0 deletions
diff --git a/docs/TableGen/BackEnds.rst b/docs/TableGen/BackEnds.rst new file mode 100644 index 0000000..42de41d --- /dev/null +++ b/docs/TableGen/BackEnds.rst @@ -0,0 +1,428 @@ +================= +TableGen BackEnds +================= + +.. contents:: + :local: + +Introduction +============ + +TableGen backends are at the core of TableGen's functionality. The source files +provide the semantics to a generated (in memory) structure, but it's up to the +backend to print this out in a way that is meaningful to the user (normally a +C program including a file or a textual list of warnings, options and error +messages). + +TableGen is used by both LLVM and Clang with very different goals. LLVM uses it +as a way to automate the generation of massive amounts of information regarding +instructions, schedules, cores and architecture features. Some backends generate +output that is consumed by more than one source file, so they need to be created +in a way that is easy to use pre-processor tricks. Some backends can also print +C code structures, so that they can be directly included as-is. + +Clang, on the other hand, uses it mainly for diagnostic messages (errors, +warnings, tips) and attributes, so more on the textual end of the scale. + +LLVM BackEnds +============= + +.. warning:: + This document is raw. Each section below needs three sub-sections: description + of its purpose with a list of users, output generated from generic input, and + finally why it needed a new backend (in case there's something similar). + +Overall, each backend will take the same TableGen file type and transform into +similar output for different targets/uses. There is an implicit contract between +the TableGen files, the back-ends and their users. + +For instance, a global contract is that each back-end produces macro-guarded +sections. Based on whether the file is included by a header or a source file, +or even in which context of each file the include is being used, you have +todefine a macro just before including it, to get the right output: + +.. code-block:: c++ + + #define GET_REGINFO_TARGET_DESC + #include "ARMGenRegisterInfo.inc" + +And just part of the generated file would be included. This is useful if +you need the same information in multiple formats (instantiation, initialization, +getter/setter functions, etc) from the same source TableGen file without having +to re-compile the TableGen file multiple times. + +Sometimes, multiple macros might be defined before the same include file to +output multiple blocks: + +.. code-block:: c++ + + #define GET_REGISTER_MATCHER + #define GET_SUBTARGET_FEATURE_NAME + #define GET_MATCHER_IMPLEMENTATION + #include "ARMGenAsmMatcher.inc" + +The macros will be undef'd automatically as they're used, in the include file. + +On all LLVM back-ends, the ``llvm-tblgen`` binary will be executed on the root +TableGen file ``<Target>.td``, which should include all others. This guarantees +that all information needed is accessible, and that no duplication is needed +in the TbleGen files. + +CodeEmitter +----------- + +**Purpose**: CodeEmitterGen uses the descriptions of instructions and their fields to +construct an automated code emitter: a function that, given a MachineInstr, +returns the (currently, 32-bit unsigned) value of the instruction. + +**Output**: C++ code, implementing the target's CodeEmitter +class by overriding the virtual functions as ``<Target>CodeEmitter::function()``. + +**Usage**: Used to include directly at the end of ``<Target>CodeEmitter.cpp``, and +with option `-mc-emitter` to be included in ``<Target>MCCodeEmitter.cpp``. + +RegisterInfo +------------ + +**Purpose**: This tablegen backend is responsible for emitting a description of a target +register file for a code generator. It uses instances of the Register, +RegisterAliases, and RegisterClass classes to gather this information. + +**Output**: C++ code with enums and structures representing the register mappings, +properties, masks, etc. + +**Usage**: Both on ``<Target>BaseRegisterInfo`` and ``<Target>MCTargetDesc`` (headers +and source files) with macros defining in which they are for declaration vs. +initialization issues. + +InstrInfo +--------- + +**Purpose**: This tablegen backend is responsible for emitting a description of the target +instruction set for the code generator. (what are the differences from CodeEmitter?) + +**Output**: C++ code with enums and structures representing the register mappings, +properties, masks, etc. + +**Usage**: Both on ``<Target>BaseInstrInfo`` and ``<Target>MCTargetDesc`` (headers +and source files) with macros defining in which they are for declaration vs. + +AsmWriter +--------- + +**Purpose**: Emits an assembly printer for the current target. + +**Output**: Implementation of ``<Target>InstPrinter::printInstruction()``, among +other things. + +**Usage**: Included directly into ``InstPrinter/<Target>InstPrinter.cpp``. + +AsmMatcher +---------- + +**Purpose**: Emits a target specifier matcher for +converting parsed assembly operands in the MCInst structures. It also +emits a matcher for custom operand parsing. Extensive documentation is +written on the ``AsmMatcherEmitter.cpp`` file. + +**Output**: Assembler parsers' matcher functions, declarations, etc. + +**Usage**: Used in back-ends' ``AsmParser/<Target>AsmParser.cpp`` for +building the AsmParser class. + +Disassembler +------------ + +**Purpose**: Contains disassembler table emitters for various +architectures. Extensive documentation is written on the +``DisassemblerEmitter.cpp`` file. + +**Output**: Decoding tables, static decoding functions, etc. + +**Usage**: Directly included in ``Disassembler/<Target>Disassembler.cpp`` +to cater for all default decodings, after all hand-made ones. + +PseudoLowering +-------------- + +**Purpose**: Generate pseudo instruction lowering. + +**Output**: Implements ``ARMAsmPrinter::emitPseudoExpansionLowering()``. + +**Usage**: Included directly into ``<Target>AsmPrinter.cpp``. + +CallingConv +----------- + +**Purpose**: Responsible for emitting descriptions of the calling +conventions supported by this target. + +**Output**: Implement static functions to deal with calling conventions +chained by matching styles, returning false on no match. + +**Usage**: Used in ISelLowering and FastIsel as function pointers to +implementation returned by a CC sellection function. + +DAGISel +------- + +**Purpose**: Generate a DAG instruction selector. + +**Output**: Creates huge functions for automating DAG selection. + +**Usage**: Included in ``<Target>ISelDAGToDAG.cpp`` inside the target's +implementation of ``SelectionDAGISel``. + +DFAPacketizer +------------- + +**Purpose**: This class parses the Schedule.td file and produces an API that +can be used to reason about whether an instruction can be added to a packet +on a VLIW architecture. The class internally generates a deterministic finite +automaton (DFA) that models all possible mappings of machine instructions +to functional units as instructions are added to a packet. + +**Output**: Scheduling tables for GPU back-ends (Hexagon, AMD). + +**Usage**: Included directly on ``<Target>InstrInfo.cpp``. + +FastISel +-------- + +**Purpose**: This tablegen backend emits code for use by the "fast" +instruction selection algorithm. See the comments at the top of +lib/CodeGen/SelectionDAG/FastISel.cpp for background. This file +scans through the target's tablegen instruction-info files +and extracts instructions with obvious-looking patterns, and it emits +code to look up these instructions by type and operator. + +**Output**: Generates ``Predicate`` and ``FastEmit`` methods. + +**Usage**: Implements private methods of the targets' implementation +of ``FastISel`` class. + +Subtarget +--------- + +**Purpose**: Generate subtarget enumerations. + +**Output**: Enums, globals, local tables for sub-target information. + +**Usage**: Populates ``<Target>Subtarget`` and +``MCTargetDesc/<Target>MCTargetDesc`` files (both headers and source). + +Intrinsic +--------- + +**Purpose**: Generate (target) intrinsic information. + +OptParserDefs +------------- + +**Purpose**: Print enum values for a class. + +CTags +----- + +**Purpose**: This tablegen backend emits an index of definitions in ctags(1) +format. A helper script, utils/TableGen/tdtags, provides an easier-to-use +interface; run 'tdtags -H' for documentation. + +Clang BackEnds +============== + +ClangAttrClasses +---------------- + +**Purpose**: Creates Attrs.inc, which contains semantic attribute class +declarations for any attribute in ``Attr.td`` that has not set ``ASTNode = 0``. +This file is included as part of ``Attr.h``. + +ClangAttrParserStringSwitches +----------------------------- + +**Purpose**: Creates AttrParserStringSwitches.inc, which contains +StringSwitch::Case statements for parser-related string switches. Each switch +is given its own macro (such as ``CLANG_ATTR_ARG_CONTEXT_LIST``, or +``CLANG_ATTR_IDENTIFIER_ARG_LIST``), which is expected to be defined before +including AttrParserStringSwitches.inc, and undefined after. + +ClangAttrImpl +------------- + +**Purpose**: Creates AttrImpl.inc, which contains semantic attribute class +definitions for any attribute in ``Attr.td`` that has not set ``ASTNode = 0``. +This file is included as part of ``AttrImpl.cpp``. + +ClangAttrList +------------- + +**Purpose**: Creates AttrList.inc, which is used when a list of semantic +attribute identifiers is required. For instance, ``AttrKinds.h`` includes this +file to generate the list of ``attr::Kind`` enumeration values. This list is +separated out into multiple categories: attributes, inheritable attributes, and +inheritable parameter attributes. This categorization happens automatically +based on information in ``Attr.td`` and is used to implement the ``classof`` +functionality required for ``dyn_cast`` and similar APIs. + +ClangAttrPCHRead +---------------- + +**Purpose**: Creates AttrPCHRead.inc, which is used to deserialize attributes +in the ``ASTReader::ReadAttributes`` function. + +ClangAttrPCHWrite +----------------- + +**Purpose**: Creates AttrPCHWrite.inc, which is used to serialize attributes in +the ``ASTWriter::WriteAttributes`` function. + +ClangAttrSpellings +--------------------- + +**Purpose**: Creates AttrSpellings.inc, which is used to implement the +``__has_attribute`` feature test macro. + +ClangAttrSpellingListIndex +-------------------------- + +**Purpose**: Creates AttrSpellingListIndex.inc, which is used to map parsed +attribute spellings (including which syntax or scope was used) to an attribute +spelling list index. These spelling list index values are internal +implementation details exposed via +``AttributeList::getAttributeSpellingListIndex``. + +ClangAttrVisitor +------------------- + +**Purpose**: Creates AttrVisitor.inc, which is used when implementing +recursive AST visitors. + +ClangAttrTemplateInstantiate +---------------------------- + +**Purpose**: Creates AttrTemplateInstantiate.inc, which implements the +``instantiateTemplateAttribute`` function, used when instantiating a template +that requires an attribute to be cloned. + +ClangAttrParsedAttrList +----------------------- + +**Purpose**: Creates AttrParsedAttrList.inc, which is used to generate the +``AttributeList::Kind`` parsed attribute enumeration. + +ClangAttrParsedAttrImpl +----------------------- + +**Purpose**: Creates AttrParsedAttrImpl.inc, which is used by +``AttributeList.cpp`` to implement several functions on the ``AttributeList`` +class. This functionality is implemented via the ``AttrInfoMap ParsedAttrInfo`` +array, which contains one element per parsed attribute object. + +ClangAttrParsedAttrKinds +------------------------ + +**Purpose**: Creates AttrParsedAttrKinds.inc, which is used to implement the +``AttributeList::getKind`` function, mapping a string (and syntax) to a parsed +attribute ``AttributeList::Kind`` enumeration. + +ClangAttrDump +------------- + +**Purpose**: Creates AttrDump.inc, which dumps information about an attribute. +It is used to implement ``ASTDumper::dumpAttr``. + +ClangDiagsDefs +-------------- + +Generate Clang diagnostics definitions. + +ClangDiagGroups +--------------- + +Generate Clang diagnostic groups. + +ClangDiagsIndexName +------------------- + +Generate Clang diagnostic name index. + +ClangCommentNodes +----------------- + +Generate Clang AST comment nodes. + +ClangDeclNodes +-------------- + +Generate Clang AST declaration nodes. + +ClangStmtNodes +-------------- + +Generate Clang AST statement nodes. + +ClangSACheckers +--------------- + +Generate Clang Static Analyzer checkers. + +ClangCommentHTMLTags +-------------------- + +Generate efficient matchers for HTML tag names that are used in documentation comments. + +ClangCommentHTMLTagsProperties +------------------------------ + +Generate efficient matchers for HTML tag properties. + +ClangCommentHTMLNamedCharacterReferences +---------------------------------------- + +Generate function to translate named character references to UTF-8 sequences. + +ClangCommentCommandInfo +----------------------- + +Generate command properties for commands that are used in documentation comments. + +ClangCommentCommandList +----------------------- + +Generate list of commands that are used in documentation comments. + +ArmNeon +------- + +Generate arm_neon.h for clang. + +ArmNeonSema +----------- + +Generate ARM NEON sema support for clang. + +ArmNeonTest +----------- + +Generate ARM NEON tests for clang. + +AttrDocs +-------- + +**Purpose**: Creates ``AttributeReference.rst`` from ``AttrDocs.td``, and is +used for documenting user-facing attributes. + +How to write a back-end +======================= + +TODO. + +Until we get a step-by-step HowTo for writing TableGen backends, you can at +least grab the boilerplate (build system, new files, etc.) from Clang's +r173931. + +TODO: How they work, how to write one. This section should not contain details +about any particular backend, except maybe ``-print-enums`` as an example. This +should highlight the APIs in ``TableGen/Record.h``. + diff --git a/docs/TableGen/Deficiencies.rst b/docs/TableGen/Deficiencies.rst new file mode 100644 index 0000000..a00aecd --- /dev/null +++ b/docs/TableGen/Deficiencies.rst @@ -0,0 +1,31 @@ +===================== +TableGen Deficiencies +===================== + +.. contents:: + :local: + +Introduction +============ + +Despite being very generic, TableGen has some deficiencies that have been +pointed out numerous times. The common theme is that, while TableGen allows +you to build Domain-Specific-Languages, the final languages that you create +lack the power of other DSLs, which in turn increase considerably the size +and complexity of TableGen files. + +At the same time, TableGen allows you to create virtually any meaning of +the basic concepts via custom-made back-ends, which can pervert the original +design and make it very hard for newcomers to understand it. + +There are some in favour of extending the semantics even more, but making sure +back-ends adhere to strict rules. Others suggesting we should move to more +powerful DSLs designed with specific purposes, or even re-using existing +DSLs. + +Known Problems +============== + +TODO: Add here frequently asked questions about why TableGen doesn't do +what you want, how it might, and how we could extend/restrict it to +be more use friendly. diff --git a/docs/TableGen/LangIntro.rst b/docs/TableGen/LangIntro.rst new file mode 100644 index 0000000..f139f35 --- /dev/null +++ b/docs/TableGen/LangIntro.rst @@ -0,0 +1,601 @@ +============================== +TableGen Language Introduction +============================== + +.. contents:: + :local: + +.. warning:: + This document is extremely rough. If you find something lacking, please + fix it, file a documentation bug, or ask about it on llvmdev. + +Introduction +============ + +This document is not meant to be a normative spec about the TableGen language +in and of itself (i.e. how to understand a given construct in terms of how +it affects the final set of records represented by the TableGen file). For +the formal language specification, see :doc:`LangRef`. + +TableGen syntax +=============== + +TableGen doesn't care about the meaning of data (that is up to the backend to +define), but it does care about syntax, and it enforces a simple type system. +This section describes the syntax and the constructs allowed in a TableGen file. + +TableGen primitives +------------------- + +TableGen comments +^^^^^^^^^^^^^^^^^ + +TableGen supports C++ style "``//``" comments, which run to the end of the +line, and it also supports **nestable** "``/* */``" comments. + +.. _TableGen type: + +The TableGen type system +^^^^^^^^^^^^^^^^^^^^^^^^ + +TableGen files are strongly typed, in a simple (but complete) type-system. +These types are used to perform automatic conversions, check for errors, and to +help interface designers constrain the input that they allow. Every `value +definition`_ is required to have an associated type. + +TableGen supports a mixture of very low-level types (such as ``bit``) and very +high-level types (such as ``dag``). This flexibility is what allows it to +describe a wide range of information conveniently and compactly. The TableGen +types are: + +``bit`` + A 'bit' is a boolean value that can hold either 0 or 1. + +``int`` + The 'int' type represents a simple 32-bit integer value, such as 5. + +``string`` + The 'string' type represents an ordered sequence of characters of arbitrary + length. + +``bits<n>`` + A 'bits' type is an arbitrary, but fixed, size integer that is broken up + into individual bits. This type is useful because it can handle some bits + being defined while others are undefined. + +``list<ty>`` + This type represents a list whose elements are some other type. The + contained type is arbitrary: it can even be another list type. + +Class type + Specifying a class name in a type context means that the defined value must + be a subclass of the specified class. This is useful in conjunction with + the ``list`` type, for example, to constrain the elements of the list to a + common base class (e.g., a ``list<Register>`` can only contain definitions + derived from the "``Register``" class). + +``dag`` + This type represents a nestable directed graph of elements. + +To date, these types have been sufficient for describing things that TableGen +has been used for, but it is straight-forward to extend this list if needed. + +.. _TableGen expressions: + +TableGen values and expressions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +TableGen allows for a pretty reasonable number of different expression forms +when building up values. These forms allow the TableGen file to be written in a +natural syntax and flavor for the application. The current expression forms +supported include: + +``?`` + uninitialized field + +``0b1001011`` + binary integer value + +``07654321`` + octal integer value (indicated by a leading 0) + +``7`` + decimal integer value + +``0x7F`` + hexadecimal integer value + +``"foo"`` + string value + +``[{ ... }]`` + usually called a "code fragment", but is just a multiline string literal + +``[ X, Y, Z ]<type>`` + list value. <type> is the type of the list element and is usually optional. + In rare cases, TableGen is unable to deduce the element type in which case + the user must specify it explicitly. + +``{ a, b, c }`` + initializer for a "bits<3>" value + +``value`` + value reference + +``value{17}`` + access to one bit of a value + +``value{15-17}`` + access to multiple bits of a value + +``DEF`` + reference to a record definition + +``CLASS<val list>`` + reference to a new anonymous definition of CLASS with the specified template + arguments. + +``X.Y`` + reference to the subfield of a value + +``list[4-7,17,2-3]`` + A slice of the 'list' list, including elements 4,5,6,7,17,2, and 3 from it. + Elements may be included multiple times. + +``foreach <var> = [ <list> ] in { <body> }`` + +``foreach <var> = [ <list> ] in <def>`` + Replicate <body> or <def>, replacing instances of <var> with each value + in <list>. <var> is scoped at the level of the ``foreach`` loop and must + not conflict with any other object introduced in <body> or <def>. Currently + only ``def``\s are expanded within <body>. + +``foreach <var> = 0-15 in ...`` + +``foreach <var> = {0-15,32-47} in ...`` + Loop over ranges of integers. The braces are required for multiple ranges. + +``(DEF a, b)`` + a dag value. The first element is required to be a record definition, the + remaining elements in the list may be arbitrary other values, including + nested ```dag``' values. + +``!strconcat(a, b)`` + A string value that is the result of concatenating the 'a' and 'b' strings. + +``str1#str2`` + "#" (paste) is a shorthand for !strconcat. It may concatenate things that + are not quoted strings, in which case an implicit !cast<string> is done on + the operand of the paste. + +``!cast<type>(a)`` + A symbol of type *type* obtained by looking up the string 'a' in the symbol + table. If the type of 'a' does not match *type*, TableGen aborts with an + error. !cast<string> is a special case in that the argument must be an + object defined by a 'def' construct. + +``!subst(a, b, c)`` + If 'a' and 'b' are of string type or are symbol references, substitute 'b' + for 'a' in 'c.' This operation is analogous to $(subst) in GNU make. + +``!foreach(a, b, c)`` + For each member 'b' of dag or list 'a' apply operator 'c.' 'b' is a dummy + variable that should be declared as a member variable of an instantiated + class. This operation is analogous to $(foreach) in GNU make. + +``!head(a)`` + The first element of list 'a.' + +``!tail(a)`` + The 2nd-N elements of list 'a.' + +``!empty(a)`` + An integer {0,1} indicating whether list 'a' is empty. + +``!if(a,b,c)`` + 'b' if the result of 'int' or 'bit' operator 'a' is nonzero, 'c' otherwise. + +``!eq(a,b)`` + 'bit 1' if string a is equal to string b, 0 otherwise. This only operates + on string, int and bit objects. Use !cast<string> to compare other types of + objects. + +Note that all of the values have rules specifying how they convert to values +for different types. These rules allow you to assign a value like "``7``" +to a "``bits<4>``" value, for example. + +Classes and definitions +----------------------- + +As mentioned in the :doc:`introduction <index>`, classes and definitions (collectively known as +'records') in TableGen are the main high-level unit of information that TableGen +collects. Records are defined with a ``def`` or ``class`` keyword, the record +name, and an optional list of "`template arguments`_". If the record has +superclasses, they are specified as a comma separated list that starts with a +colon character ("``:``"). If `value definitions`_ or `let expressions`_ are +needed for the class, they are enclosed in curly braces ("``{}``"); otherwise, +the record ends with a semicolon. + +Here is a simple TableGen file: + +.. code-block:: llvm + + class C { bit V = 1; } + def X : C; + def Y : C { + string Greeting = "hello"; + } + +This example defines two definitions, ``X`` and ``Y``, both of which derive from +the ``C`` class. Because of this, they both get the ``V`` bit value. The ``Y`` +definition also gets the Greeting member as well. + +In general, classes are useful for collecting together the commonality between a +group of records and isolating it in a single place. Also, classes permit the +specification of default values for their subclasses, allowing the subclasses to +override them as they wish. + +.. _value definition: +.. _value definitions: + +Value definitions +^^^^^^^^^^^^^^^^^ + +Value definitions define named entries in records. A value must be defined +before it can be referred to as the operand for another value definition or +before the value is reset with a `let expression`_. A value is defined by +specifying a `TableGen type`_ and a name. If an initial value is available, it +may be specified after the type with an equal sign. Value definitions require +terminating semicolons. + +.. _let expression: +.. _let expressions: +.. _"let" expressions within a record: + +'let' expressions +^^^^^^^^^^^^^^^^^ + +A record-level let expression is used to change the value of a value definition +in a record. This is primarily useful when a superclass defines a value that a +derived class or definition wants to override. Let expressions consist of the +'``let``' keyword followed by a value name, an equal sign ("``=``"), and a new +value. For example, a new class could be added to the example above, redefining +the ``V`` field for all of its subclasses: + +.. code-block:: llvm + + class D : C { let V = 0; } + def Z : D; + +In this case, the ``Z`` definition will have a zero value for its ``V`` value, +despite the fact that it derives (indirectly) from the ``C`` class, because the +``D`` class overrode its value. + +.. _template arguments: + +Class template arguments +^^^^^^^^^^^^^^^^^^^^^^^^ + +TableGen permits the definition of parameterized classes as well as normal +concrete classes. Parameterized TableGen classes specify a list of variable +bindings (which may optionally have defaults) that are bound when used. Here is +a simple example: + +.. code-block:: llvm + + class FPFormat<bits<3> val> { + bits<3> Value = val; + } + def NotFP : FPFormat<0>; + def ZeroArgFP : FPFormat<1>; + def OneArgFP : FPFormat<2>; + def OneArgFPRW : FPFormat<3>; + def TwoArgFP : FPFormat<4>; + def CompareFP : FPFormat<5>; + def CondMovFP : FPFormat<6>; + def SpecialFP : FPFormat<7>; + +In this case, template arguments are used as a space efficient way to specify a +list of "enumeration values", each with a "``Value``" field set to the specified +integer. + +The more esoteric forms of `TableGen expressions`_ are useful in conjunction +with template arguments. As an example: + +.. code-block:: llvm + + class ModRefVal<bits<2> val> { + bits<2> Value = val; + } + + def None : ModRefVal<0>; + def Mod : ModRefVal<1>; + def Ref : ModRefVal<2>; + def ModRef : ModRefVal<3>; + + class Value<ModRefVal MR> { + // Decode some information into a more convenient format, while providing + // a nice interface to the user of the "Value" class. + bit isMod = MR.Value{0}; + bit isRef = MR.Value{1}; + + // other stuff... + } + + // Example uses + def bork : Value<Mod>; + def zork : Value<Ref>; + def hork : Value<ModRef>; + +This is obviously a contrived example, but it shows how template arguments can +be used to decouple the interface provided to the user of the class from the +actual internal data representation expected by the class. In this case, +running ``llvm-tblgen`` on the example prints the following definitions: + +.. code-block:: llvm + + def bork { // Value + bit isMod = 1; + bit isRef = 0; + } + def hork { // Value + bit isMod = 1; + bit isRef = 1; + } + def zork { // Value + bit isMod = 0; + bit isRef = 1; + } + +This shows that TableGen was able to dig into the argument and extract a piece +of information that was requested by the designer of the "Value" class. For +more realistic examples, please see existing users of TableGen, such as the X86 +backend. + +Multiclass definitions and instances +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +While classes with template arguments are a good way to factor commonality +between two instances of a definition, multiclasses allow a convenient notation +for defining multiple definitions at once (instances of implicitly constructed +classes). For example, consider an 3-address instruction set whose instructions +come in two forms: "``reg = reg op reg``" and "``reg = reg op imm``" +(e.g. SPARC). In this case, you'd like to specify in one place that this +commonality exists, then in a separate place indicate what all the ops are. + +Here is an example TableGen fragment that shows this idea: + +.. code-block:: llvm + + def ops; + def GPR; + def Imm; + class inst<int opc, string asmstr, dag operandlist>; + + multiclass ri_inst<int opc, string asmstr> { + def _rr : inst<opc, !strconcat(asmstr, " $dst, $src1, $src2"), + (ops GPR:$dst, GPR:$src1, GPR:$src2)>; + def _ri : inst<opc, !strconcat(asmstr, " $dst, $src1, $src2"), + (ops GPR:$dst, GPR:$src1, Imm:$src2)>; + } + + // Instantiations of the ri_inst multiclass. + defm ADD : ri_inst<0b111, "add">; + defm SUB : ri_inst<0b101, "sub">; + defm MUL : ri_inst<0b100, "mul">; + ... + +The name of the resultant definitions has the multidef fragment names appended +to them, so this defines ``ADD_rr``, ``ADD_ri``, ``SUB_rr``, etc. A defm may +inherit from multiple multiclasses, instantiating definitions from each +multiclass. Using a multiclass this way is exactly equivalent to instantiating +the classes multiple times yourself, e.g. by writing: + +.. code-block:: llvm + + def ops; + def GPR; + def Imm; + class inst<int opc, string asmstr, dag operandlist>; + + class rrinst<int opc, string asmstr> + : inst<opc, !strconcat(asmstr, " $dst, $src1, $src2"), + (ops GPR:$dst, GPR:$src1, GPR:$src2)>; + + class riinst<int opc, string asmstr> + : inst<opc, !strconcat(asmstr, " $dst, $src1, $src2"), + (ops GPR:$dst, GPR:$src1, Imm:$src2)>; + + // Instantiations of the ri_inst multiclass. + def ADD_rr : rrinst<0b111, "add">; + def ADD_ri : riinst<0b111, "add">; + def SUB_rr : rrinst<0b101, "sub">; + def SUB_ri : riinst<0b101, "sub">; + def MUL_rr : rrinst<0b100, "mul">; + def MUL_ri : riinst<0b100, "mul">; + ... + +A ``defm`` can also be used inside a multiclass providing several levels of +multiclass instantiations. + +.. code-block:: llvm + + class Instruction<bits<4> opc, string Name> { + bits<4> opcode = opc; + string name = Name; + } + + multiclass basic_r<bits<4> opc> { + def rr : Instruction<opc, "rr">; + def rm : Instruction<opc, "rm">; + } + + multiclass basic_s<bits<4> opc> { + defm SS : basic_r<opc>; + defm SD : basic_r<opc>; + def X : Instruction<opc, "x">; + } + + multiclass basic_p<bits<4> opc> { + defm PS : basic_r<opc>; + defm PD : basic_r<opc>; + def Y : Instruction<opc, "y">; + } + + defm ADD : basic_s<0xf>, basic_p<0xf>; + ... + + // Results + def ADDPDrm { ... + def ADDPDrr { ... + def ADDPSrm { ... + def ADDPSrr { ... + def ADDSDrm { ... + def ADDSDrr { ... + def ADDY { ... + def ADDX { ... + +``defm`` declarations can inherit from classes too, the rule to follow is that +the class list must start after the last multiclass, and there must be at least +one multiclass before them. + +.. code-block:: llvm + + class XD { bits<4> Prefix = 11; } + class XS { bits<4> Prefix = 12; } + + class I<bits<4> op> { + bits<4> opcode = op; + } + + multiclass R { + def rr : I<4>; + def rm : I<2>; + } + + multiclass Y { + defm SS : R, XD; + defm SD : R, XS; + } + + defm Instr : Y; + + // Results + def InstrSDrm { + bits<4> opcode = { 0, 0, 1, 0 }; + bits<4> Prefix = { 1, 1, 0, 0 }; + } + ... + def InstrSSrr { + bits<4> opcode = { 0, 1, 0, 0 }; + bits<4> Prefix = { 1, 0, 1, 1 }; + } + +File scope entities +------------------- + +File inclusion +^^^^^^^^^^^^^^ + +TableGen supports the '``include``' token, which textually substitutes the +specified file in place of the include directive. The filename should be +specified as a double quoted string immediately after the '``include``' keyword. +Example: + +.. code-block:: llvm + + include "foo.td" + +'let' expressions +^^^^^^^^^^^^^^^^^ + +"Let" expressions at file scope are similar to `"let" expressions within a +record`_, except they can specify a value binding for multiple records at a +time, and may be useful in certain other cases. File-scope let expressions are +really just another way that TableGen allows the end-user to factor out +commonality from the records. + +File-scope "let" expressions take a comma-separated list of bindings to apply, +and one or more records to bind the values in. Here are some examples: + +.. code-block:: llvm + + let isTerminator = 1, isReturn = 1, isBarrier = 1, hasCtrlDep = 1 in + def RET : I<0xC3, RawFrm, (outs), (ins), "ret", [(X86retflag 0)]>; + + let isCall = 1 in + // All calls clobber the non-callee saved registers... + let Defs = [EAX, ECX, EDX, FP0, FP1, FP2, FP3, FP4, FP5, FP6, ST0, + MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, + XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, EFLAGS] in { + def CALLpcrel32 : Ii32<0xE8, RawFrm, (outs), (ins i32imm:$dst,variable_ops), + "call\t${dst:call}", []>; + def CALL32r : I<0xFF, MRM2r, (outs), (ins GR32:$dst, variable_ops), + "call\t{*}$dst", [(X86call GR32:$dst)]>; + def CALL32m : I<0xFF, MRM2m, (outs), (ins i32mem:$dst, variable_ops), + "call\t{*}$dst", []>; + } + +File-scope "let" expressions are often useful when a couple of definitions need +to be added to several records, and the records do not otherwise need to be +opened, as in the case with the ``CALL*`` instructions above. + +It's also possible to use "let" expressions inside multiclasses, providing more +ways to factor out commonality from the records, specially if using several +levels of multiclass instantiations. This also avoids the need of using "let" +expressions within subsequent records inside a multiclass. + +.. code-block:: llvm + + multiclass basic_r<bits<4> opc> { + let Predicates = [HasSSE2] in { + def rr : Instruction<opc, "rr">; + def rm : Instruction<opc, "rm">; + } + let Predicates = [HasSSE3] in + def rx : Instruction<opc, "rx">; + } + + multiclass basic_ss<bits<4> opc> { + let IsDouble = 0 in + defm SS : basic_r<opc>; + + let IsDouble = 1 in + defm SD : basic_r<opc>; + } + + defm ADD : basic_ss<0xf>; + +Looping +^^^^^^^ + +TableGen supports the '``foreach``' block, which textually replicates the loop +body, substituting iterator values for iterator references in the body. +Example: + +.. code-block:: llvm + + foreach i = [0, 1, 2, 3] in { + def R#i : Register<...>; + def F#i : Register<...>; + } + +This will create objects ``R0``, ``R1``, ``R2`` and ``R3``. ``foreach`` blocks +may be nested. If there is only one item in the body the braces may be +elided: + +.. code-block:: llvm + + foreach i = [0, 1, 2, 3] in + def R#i : Register<...>; + +Code Generator backend info +=========================== + +Expressions used by code generator to describe instructions and isel patterns: + +``(implicit a)`` + an implicitly defined physical register. This tells the dag instruction + selection emitter the input pattern's extra definitions matches implicit + physical register definitions. + diff --git a/docs/TableGen/LangRef.rst b/docs/TableGen/LangRef.rst index bd28a90..e3db3aa 100644 --- a/docs/TableGen/LangRef.rst +++ b/docs/TableGen/LangRef.rst @@ -74,6 +74,9 @@ TableGen also has two string-like literals: TokString: '"' <non-'"' characters and C-like escapes> '"' TokCodeFragment: "[{" <shortest text not containing "}]"> "}]" +:token:`TokCodeFragment` is essentially a multiline string literal +delimited by ``[{`` and ``}]``. + .. note:: The current implementation accepts the following C-like escapes:: diff --git a/docs/TableGen/index.rst b/docs/TableGen/index.rst new file mode 100644 index 0000000..0860afa --- /dev/null +++ b/docs/TableGen/index.rst @@ -0,0 +1,309 @@ +======== +TableGen +======== + +.. contents:: + :local: + +.. toctree:: + :hidden: + + BackEnds + LangRef + LangIntro + Deficiencies + +Introduction +============ + +TableGen's purpose is to help a human develop and maintain records of +domain-specific information. Because there may be a large number of these +records, it is specifically designed to allow writing flexible descriptions and +for common features of these records to be factored out. This reduces the +amount of duplication in the description, reduces the chance of error, and makes +it easier to structure domain specific information. + +The core part of TableGen parses a file, instantiates the declarations, and +hands the result off to a domain-specific `backend`_ for processing. + +The current major users of TableGen are :doc:`../CodeGenerator` +and the +`Clang diagnostics and attributes <http://clang.llvm.org/docs/UsersManual.html#controlling-errors-and-warnings>`_. + +Note that if you work on TableGen much, and use emacs or vim, that you can find +an emacs "TableGen mode" and a vim language file in the ``llvm/utils/emacs`` and +``llvm/utils/vim`` directories of your LLVM distribution, respectively. + +.. _intro: + + +The TableGen program +==================== + +TableGen files are interpreted by the TableGen program: `llvm-tblgen` available +on your build directory under `bin`. It is not installed in the system (or where +your sysroot is set to), since it has no use beyond LLVM's build process. + +Running TableGen +---------------- + +TableGen runs just like any other LLVM tool. The first (optional) argument +specifies the file to read. If a filename is not specified, ``llvm-tblgen`` +reads from standard input. + +To be useful, one of the `backends`_ must be used. These backends are +selectable on the command line (type '``llvm-tblgen -help``' for a list). For +example, to get a list of all of the definitions that subclass a particular type +(which can be useful for building up an enum list of these records), use the +``-print-enums`` option: + +.. code-block:: bash + + $ llvm-tblgen X86.td -print-enums -class=Register + AH, AL, AX, BH, BL, BP, BPL, BX, CH, CL, CX, DH, DI, DIL, DL, DX, EAX, EBP, EBX, + ECX, EDI, EDX, EFLAGS, EIP, ESI, ESP, FP0, FP1, FP2, FP3, FP4, FP5, FP6, IP, + MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, R10, R10B, R10D, R10W, R11, R11B, R11D, + R11W, R12, R12B, R12D, R12W, R13, R13B, R13D, R13W, R14, R14B, R14D, R14W, R15, + R15B, R15D, R15W, R8, R8B, R8D, R8W, R9, R9B, R9D, R9W, RAX, RBP, RBX, RCX, RDI, + RDX, RIP, RSI, RSP, SI, SIL, SP, SPL, ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7, + XMM0, XMM1, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM2, XMM3, XMM4, XMM5, + XMM6, XMM7, XMM8, XMM9, + + $ llvm-tblgen X86.td -print-enums -class=Instruction + ABS_F, ABS_Fp32, ABS_Fp64, ABS_Fp80, ADC32mi, ADC32mi8, ADC32mr, ADC32ri, + ADC32ri8, ADC32rm, ADC32rr, ADC64mi32, ADC64mi8, ADC64mr, ADC64ri32, ADC64ri8, + ADC64rm, ADC64rr, ADD16mi, ADD16mi8, ADD16mr, ADD16ri, ADD16ri8, ADD16rm, + ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr, + ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ... + +The default backend prints out all of the records. + +If you plan to use TableGen, you will most likely have to write a `backend`_ +that extracts the information specific to what you need and formats it in the +appropriate way. + +Example +------- + +With no other arguments, `llvm-tblgen` parses the specified file and prints out all +of the classes, then all of the definitions. This is a good way to see what the +various definitions expand to fully. Running this on the ``X86.td`` file prints +this (at the time of this writing): + +.. code-block:: llvm + + ... + def ADD32rr { // Instruction X86Inst I + string Namespace = "X86"; + dag OutOperandList = (outs GR32:$dst); + dag InOperandList = (ins GR32:$src1, GR32:$src2); + string AsmString = "add{l}\t{$src2, $dst|$dst, $src2}"; + list<dag> Pattern = [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]; + list<Register> Uses = []; + list<Register> Defs = [EFLAGS]; + list<Predicate> Predicates = []; + int CodeSize = 3; + int AddedComplexity = 0; + bit isReturn = 0; + bit isBranch = 0; + bit isIndirectBranch = 0; + bit isBarrier = 0; + bit isCall = 0; + bit canFoldAsLoad = 0; + bit mayLoad = 0; + bit mayStore = 0; + bit isImplicitDef = 0; + bit isConvertibleToThreeAddress = 1; + bit isCommutable = 1; + bit isTerminator = 0; + bit isReMaterializable = 0; + bit isPredicable = 0; + bit hasDelaySlot = 0; + bit usesCustomInserter = 0; + bit hasCtrlDep = 0; + bit isNotDuplicable = 0; + bit hasSideEffects = 0; + bit neverHasSideEffects = 0; + InstrItinClass Itinerary = NoItinerary; + string Constraints = ""; + string DisableEncoding = ""; + bits<8> Opcode = { 0, 0, 0, 0, 0, 0, 0, 1 }; + Format Form = MRMDestReg; + bits<6> FormBits = { 0, 0, 0, 0, 1, 1 }; + ImmType ImmT = NoImm; + bits<3> ImmTypeBits = { 0, 0, 0 }; + bit hasOpSizePrefix = 0; + bit hasAdSizePrefix = 0; + bits<4> Prefix = { 0, 0, 0, 0 }; + bit hasREX_WPrefix = 0; + FPFormat FPForm = ?; + bits<3> FPFormBits = { 0, 0, 0 }; + } + ... + +This definition corresponds to the 32-bit register-register ``add`` instruction +of the x86 architecture. ``def ADD32rr`` defines a record named +``ADD32rr``, and the comment at the end of the line indicates the superclasses +of the definition. The body of the record contains all of the data that +TableGen assembled for the record, indicating that the instruction is part of +the "X86" namespace, the pattern indicating how the instruction is selected by +the code generator, that it is a two-address instruction, has a particular +encoding, etc. The contents and semantics of the information in the record are +specific to the needs of the X86 backend, and are only shown as an example. + +As you can see, a lot of information is needed for every instruction supported +by the code generator, and specifying it all manually would be unmaintainable, +prone to bugs, and tiring to do in the first place. Because we are using +TableGen, all of the information was derived from the following definition: + +.. code-block:: llvm + + let Defs = [EFLAGS], + isCommutable = 1, // X = ADD Y,Z --> X = ADD Z,Y + isConvertibleToThreeAddress = 1 in // Can transform into LEA. + def ADD32rr : I<0x01, MRMDestReg, (outs GR32:$dst), + (ins GR32:$src1, GR32:$src2), + "add{l}\t{$src2, $dst|$dst, $src2}", + [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>; + +This definition makes use of the custom class ``I`` (extended from the custom +class ``X86Inst``), which is defined in the X86-specific TableGen file, to +factor out the common features that instructions of its class share. A key +feature of TableGen is that it allows the end-user to define the abstractions +they prefer to use when describing their information. + +Each ``def`` record has a special entry called "NAME". This is the name of the +record ("``ADD32rr``" above). In the general case ``def`` names can be formed +from various kinds of string processing expressions and ``NAME`` resolves to the +final value obtained after resolving all of those expressions. The user may +refer to ``NAME`` anywhere she desires to use the ultimate name of the ``def``. +``NAME`` should not be defined anywhere else in user code to avoid conflicts. + +Syntax +====== + +TableGen has a syntax that is loosely based on C++ templates, with built-in +types and specification. In addition, TableGen's syntax introduces some +automation concepts like multiclass, foreach, let, etc. + +Basic concepts +-------------- + +TableGen files consist of two key parts: 'classes' and 'definitions', both of +which are considered 'records'. + +**TableGen records** have a unique name, a list of values, and a list of +superclasses. The list of values is the main data that TableGen builds for each +record; it is this that holds the domain specific information for the +application. The interpretation of this data is left to a specific `backend`_, +but the structure and format rules are taken care of and are fixed by +TableGen. + +**TableGen definitions** are the concrete form of 'records'. These generally do +not have any undefined values, and are marked with the '``def``' keyword. + +.. code-block:: llvm + + def FeatureFPARMv8 : SubtargetFeature<"fp-armv8", "HasFPARMv8", "true", + "Enable ARMv8 FP">; + +In this example, FeatureFPARMv8 is ``SubtargetFeature`` record initialised +with some values. The names of the classes are defined via the +keyword `class` either on the same file or some other included. Most target +TableGen files include the generic ones in ``include/llvm/Target``. + +**TableGen classes** are abstract records that are used to build and describe +other records. These classes allow the end-user to build abstractions for +either the domain they are targeting (such as "Register", "RegisterClass", and +"Instruction" in the LLVM code generator) or for the implementor to help factor +out common properties of records (such as "FPInst", which is used to represent +floating point instructions in the X86 backend). TableGen keeps track of all of +the classes that are used to build up a definition, so the backend can find all +definitions of a particular class, such as "Instruction". + +.. code-block:: llvm + + class ProcNoItin<string Name, list<SubtargetFeature> Features> + : Processor<Name, NoItineraries, Features>; + +Here, the class ProcNoItin, receiving parameters `Name` of type `string` and +a list of target features is specializing the class Processor by passing the +arguments down as well as hard-coding NoItineraries. + +**TableGen multiclasses** are groups of abstract records that are instantiated +all at once. Each instantiation can result in multiple TableGen definitions. +If a multiclass inherits from another multiclass, the definitions in the +sub-multiclass become part of the current multiclass, as if they were declared +in the current multiclass. + +.. code-block:: llvm + + multiclass ro_signed_pats<string T, string Rm, dag Base, dag Offset, dag Extend, + dag address, ValueType sty> { + def : Pat<(i32 (!cast<SDNode>("sextload" # sty) address)), + (!cast<Instruction>("LDRS" # T # "w_" # Rm # "_RegOffset") + Base, Offset, Extend)>; + + def : Pat<(i64 (!cast<SDNode>("sextload" # sty) address)), + (!cast<Instruction>("LDRS" # T # "x_" # Rm # "_RegOffset") + Base, Offset, Extend)>; + } + + defm : ro_signed_pats<"B", Rm, Base, Offset, Extend, + !foreach(decls.pattern, address, + !subst(SHIFT, imm_eq0, decls.pattern)), + i8>; + + + +See the :doc:`TableGen Language Introduction <LangIntro>` for more generic +information on the usage of the language, and the +:doc:`TableGen Language Reference <LangRef>` for more in-depth description +of the formal language specification. + +.. _backend: +.. _backends: + +TableGen backends +================= + +TableGen files have no real meaning without a back-end. The default operation +of running ``llvm-tblgen`` is to print the information in a textual format, but +that's only useful for debugging of the TableGen files themselves. The power +in TableGen is, however, to interpret the source files into an internal +representation that can be generated into anything you want. + +Current usage of TableGen is to create include huge files with tables that you +can either include directly (if the output is in the language you're coding), +or be used in pre-processing via macros surrounding the include of the file. + +Direct output can be used if the back-end already prints a table in C format +or if the output is just a list of strings (for error and warning messages). +Pre-processed output should be used if the same information needs to be used +in different contexts (like Instruction names), so your back-end should print +a meta-information list that can be shaped into different compile-time formats. + +See the `TableGen BackEnds <BackEnds.html>`_ for more information. + +TableGen Deficiencies +===================== + +Despite being very generic, TableGen has some deficiencies that have been +pointed out numerous times. The common theme is that, while TableGen allows +you to build Domain-Specific-Languages, the final languages that you create +lack the power of other DSLs, which in turn increase considerably the size +and complecity of TableGen files. + +At the same time, TableGen allows you to create virtually any meaning of +the basic concepts via custom-made back-ends, which can pervert the original +design and make it very hard for newcomers to understand the evil TableGen +file. + +There are some in favour of extending the semantics even more, but making sure +back-ends adhere to strict rules. Others are suggesting we should move to less, +more powerful DSLs designed with specific purposes, or even re-using existing +DSLs. + +Either way, this is a discussion that will likely span across several years, +if not decades. You can read more in the `TableGen Deficiencies <Deficiencies.html>`_ +document. |