diff options
Diffstat (limited to 'docs')
28 files changed, 795 insertions, 228 deletions
diff --git a/docs/CMake.rst b/docs/CMake.rst index 6eab04b..8459081 100644 --- a/docs/CMake.rst +++ b/docs/CMake.rst @@ -168,8 +168,8 @@ LLVM-specific variables **LLVM_TARGETS_TO_BUILD**:STRING Semicolon-separated list of targets to build, or *all* for building all - targets. Case-sensitive. For Visual C++ defaults to *X86*. On the other cases - defaults to *all*. Example: ``-DLLVM_TARGETS_TO_BUILD="X86;PowerPC"``. + targets. Case-sensitive. Defaults to *all*. Example: + ``-DLLVM_TARGETS_TO_BUILD="X86;PowerPC"``. **LLVM_BUILD_TOOLS**:BOOL Build LLVM tools. Defaults to ON. Targets for building each tool are generated @@ -204,7 +204,7 @@ LLVM-specific variables tests. **LLVM_APPEND_VC_REV**:BOOL - Append version control revision info (svn revision number or git revision id) + Append version control revision info (svn revision number or Git revision id) to LLVM version string (stored in the PACKAGE_VERSION macro). For this to work cmake must be invoked before the build. Defaults to OFF. @@ -271,6 +271,10 @@ LLVM-specific variables **LLVM_USE_INTEL_JITEVENTS**:BOOL Enable building support for Intel JIT Events API. Defaults to OFF +**LLVM_ENABLE_ZLIB**:BOOL + Build with zlib to support compression/uncompression in LLVM tools. + Defaults to ON. + Executing the test suite ======================== diff --git a/docs/CodeGenerator.rst b/docs/CodeGenerator.rst index b5d4180..75415ab 100644 --- a/docs/CodeGenerator.rst +++ b/docs/CodeGenerator.rst @@ -1038,6 +1038,24 @@ for your target. It has the following strengths: are used to manipulate the input immediate (in this case, take the high or low 16-bits of the immediate). +* When using the 'Pat' class to map a pattern to an instruction that has one + or more complex operands (like e.g. `X86 addressing mode`_), the pattern may + either specify the operand as a whole using a ``ComplexPattern``, or else it + may specify the components of the complex operand separately. The latter is + done e.g. for pre-increment instructions by the PowerPC back end: + + :: + + def STWU : DForm_1<37, (outs ptr_rc:$ea_res), (ins GPRC:$rS, memri:$dst), + "stwu $rS, $dst", LdStStoreUpd, []>, + RegConstraint<"$dst.reg = $ea_res">, NoEncode<"$ea_res">; + + def : Pat<(pre_store GPRC:$rS, ptr_rc:$ptrreg, iaddroff:$ptroff), + (STWU GPRC:$rS, iaddroff:$ptroff, ptr_rc:$ptrreg)>; + + Here, the pair of ``ptroff`` and ``ptrreg`` operands is matched onto the + complex operand ``dst`` of class ``memri`` in the ``STWU`` instruction. + * While the system does automate a lot, it still allows you to write custom C++ code to match special cases if there is something that is hard to express. diff --git a/docs/CommandGuide/index.rst b/docs/CommandGuide/index.rst index ac8a944..b3b4bc3 100644 --- a/docs/CommandGuide/index.rst +++ b/docs/CommandGuide/index.rst @@ -50,3 +50,4 @@ Developer Tools tblgen lit llvm-build + llvm-readobj diff --git a/docs/CommandGuide/llc.rst b/docs/CommandGuide/llc.rst index 70354b0..e6a5976 100644 --- a/docs/CommandGuide/llc.rst +++ b/docs/CommandGuide/llc.rst @@ -69,6 +69,14 @@ End-user Options llvm-as < /dev/null | llc -march=xyz -mcpu=help +.. option:: -filetype=<output file type> + + Specify what kind of output ``llc`` should generated. Options are: ``asm`` + for textual assembly ( ``'.s'``), ``obj`` for native object files (``'.o'``) + and ``null`` for not emitting anything (for performance testing). + + Note that not all targets support all options. + .. option:: -mattr=a1,+a2,-a3,... Override or control specific attributes of the target, such as whether SIMD diff --git a/docs/CommandGuide/lli.rst b/docs/CommandGuide/lli.rst index 7cc1284..a9aaf31 100644 --- a/docs/CommandGuide/lli.rst +++ b/docs/CommandGuide/lli.rst @@ -50,7 +50,7 @@ GENERAL OPTIONS -**-load**\ =\ *puginfilename* +**-load**\ =\ *pluginfilename* Causes **lli** to load the plugin (shared object) named *pluginfilename* and use it for optimization. diff --git a/docs/CommandGuide/llvm-link.rst b/docs/CommandGuide/llvm-link.rst index e4f2228..3bcfa68 100644 --- a/docs/CommandGuide/llvm-link.rst +++ b/docs/CommandGuide/llvm-link.rst @@ -1,5 +1,5 @@ -llvm-link - LLVM linker -======================= +llvm-link - LLVM bitcode linker +=============================== SYNOPSIS -------- @@ -13,23 +13,9 @@ DESCRIPTION into a single LLVM bitcode file. It writes the output file to standard output, unless the :option:`-o` option is used to specify a filename. -:program:`llvm-link` attempts to load the input files from the current -directory. If that fails, it looks for each file in each of the directories -specified by the :option:`-L` options on the command line. The library search -paths are global; each one is searched for every input file if necessary. The -directories are searched in the order they were specified on the command line. - OPTIONS ------- -.. option:: -L directory - - Add the specified ``directory`` to the library search path. When looking for - libraries, :program:`llvm-link` will look in path name for libraries. This - option can be specified multiple times; :program:`llvm-link` will search - inside these directories in the order in which they were specified on the - command line. - .. option:: -f Enable binary output on terminals. Normally, :program:`llvm-link` will refuse @@ -48,8 +34,8 @@ OPTIONS .. option:: -d - If specified, :program:`llvm-link` prints a human-readable version of the output - bitcode file to standard error. + If specified, :program:`llvm-link` prints a human-readable version of the + output bitcode file to standard error. .. option:: -help @@ -67,8 +53,4 @@ EXIT STATUS If :program:`llvm-link` succeeds, it will exit with 0. Otherwise, if an error occurs, it will exit with a non-zero value. -SEE ALSO --------- - -gccld diff --git a/docs/CommandGuide/llvm-readobj.rst b/docs/CommandGuide/llvm-readobj.rst new file mode 100644 index 0000000..b1918b5 --- /dev/null +++ b/docs/CommandGuide/llvm-readobj.rst @@ -0,0 +1,86 @@ +llvm-readobj - LLVM Object Reader +================================= + +SYNOPSIS +-------- + +:program:`llvm-readobj` [*options*] [*input...*] + +DESCRIPTION +----------- + +The :program:`llvm-readobj` tool displays low-level format-specific information +about one or more object files. The tool and its output is primarily designed +for use in FileCheck-based tests. + +OPTIONS +------- + +If ``input`` is "``-``" or omitted, :program:`llvm-readobj` reads from standard +input. Otherwise, it will read from the specified ``filenames``. + +.. option:: -help + + Print a summary of command line options. + +.. option:: -version + + Display the version of this program + +.. option:: -file-headers, -h + + Display file headers. + +.. option:: -sections, -s + + Display all sections. + +.. option:: -section-data, -sd + + When used with ``-sections``, display section data for each section shown. + +.. option:: -section-relocations, -sr + + When used with ``-sections``, display relocations for each section shown. + +.. option:: -section-symbols, -st + + When used with ``-sections``, display symbols for each section shown. + +.. option:: -relocations, -r + + Display the relocation entries in the file. + +.. option:: -symbols, -t + + Display the symbol table. + +.. option:: -dyn-symbols + + Display the dynamic symbol table (only for ELF object files). + +.. option:: -unwind, -u + + Display unwind information. + +.. option:: -expand-relocs + + When used with ``-relocations``, display each relocation in an expanded + multi-line format. + +.. option:: -dynamic-table + + Display the ELF .dynamic section table (only for ELF object files). + +.. option:: -needed-libs + + Display the needed libraries (only for ELF object files). + +.. option:: -program-headers + + Display the ELF program headers (only for ELF object files). + +EXIT STATUS +----------- + +:program:`llvm-readobj` returns 0. diff --git a/docs/CommandGuide/tblgen.rst b/docs/CommandGuide/tblgen.rst index 1858ee4..1c46828 100644 --- a/docs/CommandGuide/tblgen.rst +++ b/docs/CommandGuide/tblgen.rst @@ -23,6 +23,8 @@ file to read as input. OPTIONS ------- +.. program:: tblgen + .. option:: -help Print a summary of command line options. @@ -56,7 +58,7 @@ OPTIONS .. option:: -print-enums - Print enumeration values for a class + Print enumeration values for a class. .. option:: -print-sets diff --git a/docs/CommandLine.rst b/docs/CommandLine.rst index 073958b..ea0c369 100644 --- a/docs/CommandLine.rst +++ b/docs/CommandLine.rst @@ -2,6 +2,9 @@ CommandLine 2.0 Library Manual ============================== +.. contents:: + :local: + Introduction ============ diff --git a/docs/CompilerWriterInfo.rst b/docs/CompilerWriterInfo.rst index 87add67..e9a7bc8 100644 --- a/docs/CompilerWriterInfo.rst +++ b/docs/CompilerWriterInfo.rst @@ -20,11 +20,15 @@ ARM * `ABI <http://www.arm.com/products/DevTools/ABI.html>`_ +* `ARM C Language Extensions <http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053a/IHI0053A_acle.pdf>`_ + AArch64 ------- * `ARMv8 Instruction Set Overview <http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.genc010197a/index.html>`_ +* `ARM C Language Extensions <http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053a/IHI0053A_acle.pdf>`_ + Itanium (ia64) -------------- @@ -107,6 +111,12 @@ OS X * `Mach-O Runtime Architecture <http://developer.apple.com/documentation/Darwin/RuntimeArchitecture-date.html>`_ * `Notes on Mach-O ABI <http://www.unsanity.org/archives/000044.php>`_ +NVPTX +===== + +* `CUDA Documentation <http://docs.nvidia.com/cuda/index.html>`_ includes the PTX + ISA and Driver API documentation + Miscellaneous Resources ======================= diff --git a/docs/DeveloperPolicy.rst b/docs/DeveloperPolicy.rst index 43bdc85..0655559 100644 --- a/docs/DeveloperPolicy.rst +++ b/docs/DeveloperPolicy.rst @@ -260,7 +260,7 @@ quality patches. If you would like commit access, please send an email to from, e.g. "J. Random Hacker <hacker@yoyodyne.com>". #. A "password hash" of the password you want to use, e.g. "``2ACR96qjUqsyM``". - Note that you don't ever tell us what your password is, you just give it to + Note that you don't ever tell us what your password is; you just give it to us in an encrypted form. To get this, run "``htpasswd``" (a utility that comes with apache) in crypt mode (often enabled with "``-d``"), or find a web page that will do it for you. @@ -269,17 +269,17 @@ Once you've been granted commit access, you should be able to check out an LLVM tree with an SVN URL of "https://username@llvm.org/..." instead of the normal anonymous URL of "http://llvm.org/...". The first time you commit you'll have to type in your password. Note that you may get a warning from SVN about an -untrusted key, you can ignore this. To verify that your commit access works, +untrusted key; you can ignore this. To verify that your commit access works, please do a test commit (e.g. change a comment or add a blank line). Your first commit to a repository may require the autogenerated email to be approved by a -mailing list. This is normal, and will be done when the mailing list owner has +mailing list. This is normal and will be done when the mailing list owner has time. If you have recently been granted commit access, these policies apply: #. You are granted *commit-after-approval* to all parts of LLVM. To get approval, submit a `patch`_ to `llvm-commits - <http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>`_. When approved + <http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>`_. When approved, you may commit it yourself. #. You are allowed to commit patches without approval which you think are @@ -291,7 +291,7 @@ If you have recently been granted commit access, these policies apply: #. You are allowed to commit patches without approval to those portions of LLVM that you have contributed or maintain (i.e., have been assigned responsibility for), with the proviso that such commits must not break the - build. This is a "trust but verify" policy and commits of this nature are + build. This is a "trust but verify" policy, and commits of this nature are reviewed after they are committed. #. Multiple violations of these policies or a single egregious violation may @@ -300,7 +300,7 @@ If you have recently been granted commit access, these policies apply: In any case, your changes are still subject to `code review`_ (either before or after they are committed, depending on the nature of the change). You are encouraged to review other peoples' patches as well, but you aren't required -to. +to do so. .. _discuss the change/gather consensus: diff --git a/docs/ExtendingLLVM.rst b/docs/ExtendingLLVM.rst index 3d8e9ee..3ae676a 100644 --- a/docs/ExtendingLLVM.rst +++ b/docs/ExtendingLLVM.rst @@ -45,7 +45,7 @@ function and then be turned into an instruction if warranted. what the restrictions are. Talk to other people about it so that you are sure it's a good idea. -#. ``llvm/include/llvm/Intrinsics*.td``: +#. ``llvm/include/llvm/IR/Intrinsics*.td``: Add an entry for your intrinsic. Describe its memory access characteristics for optimization (this controls whether it will be DCE'd, CSE'd, etc). Note diff --git a/docs/Extensions.rst b/docs/Extensions.rst new file mode 100644 index 0000000..062804a --- /dev/null +++ b/docs/Extensions.rst @@ -0,0 +1,39 @@ +=============== +LLVM Extensions +=============== + +.. contents:: + :local: + :depth: 1 + +.. toctree:: + :hidden: + +Introduction +============ + +This document describes extensions to tools and formats LLVM seeks compatibility +with. + +Machine-specific Assembly Syntax +================================ + +X86/COFF-Dependent +------------------ + +The following additional relocation type is supported: + +**@IMGREL** (AT&T syntax only) generates an image-relative relocation that +corresponds to the COFF relocation types ``IMAGE_REL_I386_DIR32NB`` (32-bit) or +``IMAGE_REL_AMD64_ADDR32NB`` (64-bit). + +.. code-block:: gas + + .text + fun: + mov foo@IMGREL(%ebx, %ecx, 4), %eax + + .section .pdata + .long fun@IMGREL + .long (fun@imgrel + 0x3F) + .long $unwind$fun@imgrel diff --git a/docs/GettingStarted.rst b/docs/GettingStarted.rst index 539c75e..5a8a297 100644 --- a/docs/GettingStarted.rst +++ b/docs/GettingStarted.rst @@ -229,6 +229,8 @@ uses the package and provides other details. +--------------------------------------------------------------+-----------------+---------------------------------------------+ | `libtool <http://savannah.gnu.org/projects/libtool>`_ | 1.5.22 | Shared library manager\ :sup:`4` | +--------------------------------------------------------------+-----------------+---------------------------------------------+ +| `zlib <http://zlib.net>`_ | >=1.2.3.4 | Compression library\ :sup:`5` | ++--------------------------------------------------------------+-----------------+---------------------------------------------+ .. note:: @@ -243,6 +245,8 @@ uses the package and provides other details. #. If you want to make changes to the configure scripts, you will need GNU autoconf (2.60), and consequently, GNU M4 (version 1.4 or higher). You will also need automake (1.9.6). We only use aclocal from that package. + #. Optional, adds compression/uncompression capabilities to selected LLVM + tools. Additionally, your compilation host is expected to have the usual plethora of Unix utilities. Specifically: @@ -521,13 +525,13 @@ By placing it in the ``llvm/projects``, it will be automatically configured by the LLVM configure script as well as automatically updated when you run ``svn update``. -GIT mirror +Git Mirror ---------- -GIT mirrors are available for a number of LLVM subprojects. These mirrors sync +Git mirrors are available for a number of LLVM subprojects. These mirrors sync automatically with each Subversion commit and contain all necessary git-svn marks (so, you can recreate git-svn metadata locally). Note that right now -mirrors reflect only ``trunk`` for each project. You can do the read-only GIT +mirrors reflect only ``trunk`` for each project. You can do the read-only Git clone of LLVM via: .. code-block:: console @@ -538,10 +542,23 @@ If you want to check out clang too, run: .. code-block:: console - % git clone http://llvm.org/git/llvm.git % cd llvm/tools % git clone http://llvm.org/git/clang.git +If you want to check out compiler-rt too, run: + +.. code-block:: console + + % cd llvm/projects + % git clone http://llvm.org/git/compiler-rt.git + +If you want to check out the Test Suite Source Code (optional), run: + +.. code-block:: console + + % cd llvm/projects + % git clone http://llvm.org/git/test-suite.git + Since the upstream repository is in Subversion, you should use ``git pull --rebase`` instead of ``git pull`` to avoid generating a non-linear history in your clone. To configure ``git pull`` to pass ``--rebase`` by default on the @@ -626,8 +643,10 @@ To set up clone from which you can submit code using ``git-svn``, run: % git config svn-remote.svn.fetch :refs/remotes/origin/master % git svn rebase -l +Likewise for compiler-rt and test-suite. + To update this clone without generating git-svn tags that conflict with the -upstream git repo, run: +upstream Git repo, run: .. code-block:: console @@ -638,39 +657,26 @@ upstream git repo, run: git checkout master && git svn rebase -l) +Likewise for compiler-rt and test-suite. + This leaves your working directories on their master branches, so you'll need to ``checkout`` each working branch individually and ``rebase`` it on top of its parent branch. -For those who wish to be able to update an llvm repo in a simpler fashion, -consider placing the following git script in your path under the name -``git-svnup``: - -.. code-block:: bash - - #!/bin/bash - - STATUS=$(git status -s | grep -v "??") - - if [ ! -z "$STATUS" ]; then - STASH="yes" - git stash >/dev/null - fi - - git fetch - OLD_BRANCH=$(git rev-parse --abbrev-ref HEAD) - git checkout master 2> /dev/null - git svn rebase -l - git checkout $OLD_BRANCH 2> /dev/null +For those who wish to be able to update an llvm repo/revert patches easily using +git-svn, please look in the directory for the scripts ``git-svnup`` and +``git-svnrevert``. - if [ ! -z $STASH ]; then - git stash pop >/dev/null - fi +To perform the aforementioned update steps go into your source directory and +just type ``git-svnup`` or ``git svnup`` and everything will just work. -Then to perform the aforementioned update steps go into your source directory -and just type ``git-svnup`` or ``git svnup`` and everything will just work. +If one wishes to revert a commit with git-svn, but do not want the git hash to +escape into the commit message, one can use the script ``git-svnrevert`` or +``git svnrevert`` which will take in the git hash for the commit you want to +revert, look up the appropriate svn revision, and output a message where all +references to the git hash have been replaced with the svn revision. -To commit back changes via git-svn, use ``dcommit``: +To commit back changes via git-svn, use ``git svn dcommit``: .. code-block:: console @@ -991,7 +997,7 @@ Optional Configuration Items ---------------------------- If you're running on a Linux system that supports the `binfmt_misc -<http://www.tat.physik.uni-tuebingen.de/~rguenth/linux/binfmt_misc.html>`_ +<http://en.wikipedia.org/wiki/binfmt_misc>`_ module, and you have root access on the system, you can set your system up to execute LLVM bitcode files directly. To do this, use commands like this (the first command may not be required if you are already using the module): diff --git a/docs/GettingStartedVS.rst b/docs/GettingStartedVS.rst index 4c80f2c..a80a9e2 100644 --- a/docs/GettingStartedVS.rst +++ b/docs/GettingStartedVS.rst @@ -137,15 +137,18 @@ Here's the short story for getting up and running quickly with LLVM: .. code-block:: bat - C:\..\llvm> llvm-lit test + C:\..\llvm> python ..\build\bin\llvm-lit --param build_config=Win32 --param build_mode=Debug --param llvm_site_config=../build/test/lit.site.cfg test - Note that quite a few of these test will fail. + This example assumes that Python is in your PATH variable, you + have built a Win32 Debug version of llvm with a standard out of + line build. You should not see any unexpected failures, but will + see many unsupported tests and expected failures. A specific test or test directory can be run with: .. code-block:: bat - C:\..\llvm> llvm-lit test/path/to/test + C:\..\llvm> python ..\build\bin\llvm-lit --param build_config=Win32 --param build_mode=Debug --param llvm_site_config=../build/test/lit.site.cfg test/path/to/test An Example Using the LLVM Tool Chain diff --git a/docs/LLVMBuild.rst b/docs/LLVMBuild.rst index d9215dd..040b044 100644 --- a/docs/LLVMBuild.rst +++ b/docs/LLVMBuild.rst @@ -123,8 +123,8 @@ the file format is below: boolean_property_name = 1 (or 0) LLVMBuild files are expected to define a strict set of sections and -properties. An typical component description file for a library -component would look typically look like the following example: +properties. A typical component description file for a library +component would look like the following example: .. code-block:: ini diff --git a/docs/LangRef.rst b/docs/LangRef.rst index 03004f6..382314e 100644 --- a/docs/LangRef.rst +++ b/docs/LangRef.rst @@ -719,12 +719,17 @@ Currently, only the following parameter attributes are defined: ``nest`` This indicates that the pointer parameter can be excised using the :ref:`trampoline intrinsics <int_trampoline>`. This is not a valid - attribute for return values. -``nobuiltin`` - This indicates that the callee function at a call site is not - recognized as a built-in function. LLVM will retain the original call - and not replace it with equivalent code based on the semantics of the - built-in function. + attribute for return values and can only be applied to one parameter. + +``returned`` + This indicates that the value of the function always returns the value + of the parameter as its return value. This is an optimization hint to + the code generator when generating the caller, allowing tail call + optimization and omission of register saves and restores in some cases; + it is not checked or enforced when generating the callee. The parameter + and the function return type must be valid operands for the + :ref:`bitcast instruction <i_bitcast>`. This is not a valid attribute for + return values and can only be applied to one parameter. .. _gc: @@ -764,10 +769,10 @@ inlined, has a stack alignment of 4, and which shouldn't use SSE instructions: .. code-block:: llvm ; Target-independent attributes: - #0 = attributes { alwaysinline alignstack=4 } + attributes #0 = { alwaysinline alignstack=4 } ; Target-dependent attributes: - #1 = attributes { "no-sse" } + attributes #1 = { "no-sse" } ; Function @f has attributes: alwaysinline, alignstack=4, and "no-sse". define void @f() #0 #1 { ... } @@ -814,6 +819,12 @@ example: ``naked`` This attribute disables prologue / epilogue emission for the function. This can have very system-specific consequences. +``nobuiltin`` + This indicates that the callee function at a call site is not + recognized as a built-in function. LLVM will retain the original call + and not replace it with equivalent code based on the semantics of the + built-in function. This is only valid at call sites, not on function + declarations or definitions. ``noduplicate`` This attribute indicates that calls to the function cannot be duplicated. A call to a ``noduplicate`` function may be moved @@ -2857,9 +2868,9 @@ All globals of this sort should have a section specified as The '``llvm.used``' Global Variable ----------------------------------- -The ``@llvm.used`` global is an array with i8\* element type which has -:ref:`appending linkage <linkage_appending>`. This array contains a list of -pointers to global variables and functions which may optionally have a +The ``@llvm.used`` global is an array which has + :ref:`appending linkage <linkage_appending>`. This array contains a list of +pointers to global variables, functions and aliases which may optionally have a pointer cast formed of bitcast or getelementptr. For example, a legal use of it is: @@ -2873,13 +2884,13 @@ use of it is: i8* bitcast (i32* @Y to i8*) ], section "llvm.metadata" -If a global variable appears in the ``@llvm.used`` list, then the -compiler, assembler, and linker are required to treat the symbol as if -there is a reference to the global that it cannot see. For example, if a -variable has internal linkage and no references other than that from the -``@llvm.used`` list, it cannot be deleted. This is commonly used to -represent references from inline asms and other things the compiler -cannot "see", and corresponds to "``attribute((used))``" in GNU C. +If a symbol appears in the ``@llvm.used`` list, then the compiler, assembler, +and linker are required to treat the symbol as if there is a reference to the +symbol that it cannot see. For example, if a variable has internal linkage and +no references other than that from the ``@llvm.used`` list, it cannot be +deleted. This is commonly used to represent references from inline asms and +other things the compiler cannot "see", and corresponds to +"``attribute((used))``" in GNU C. On some targets, the code generator must emit a directive to the assembler or object file to prevent the assembler and linker from @@ -4534,7 +4545,7 @@ The '``load``' instruction is used to read from memory. Arguments: """""""""" -The argument to the '``load``' instruction specifies the memory address +The argument to the ``load`` instruction specifies the memory address from which to load. The pointer must point to a :ref:`first class <t_firstclass>` type. If the ``load`` is marked as ``volatile``, then the optimizer is not allowed to modify the number or order of @@ -4555,14 +4566,14 @@ any defined semantics for atomic loads. The optional constant ``align`` argument specifies the alignment of the operation (that is, the alignment of the memory address). A value of 0 -or an omitted ``align`` argument means that the operation has the abi +or an omitted ``align`` argument means that the operation has the ABI alignment for the target. It is the responsibility of the code emitter to ensure that the alignment information is correct. Overestimating the alignment results in undefined behavior. Underestimating the alignment may produce less efficient code. An alignment of 1 is always safe. The optional ``!nontemporal`` metadata must reference a single -metatadata name <index> corresponding to a metadata node with one +metatadata name ``<index>`` corresponding to a metadata node with one ``i32`` entry of value 1. The existence of the ``!nontemporal`` metatadata on the instruction tells the optimizer and code generator that this load is not expected to be reused in the cache. The code @@ -4570,7 +4581,7 @@ generator may select special instructions to save cache bandwidth, such as the ``MOVNT`` instruction on x86. The optional ``!invariant.load`` metadata must reference a single -metatadata name <index> corresponding to a metadata node with no +metatadata name ``<index>`` corresponding to a metadata node with no entries. The existence of the ``!invariant.load`` metatadata on the instruction tells the optimizer and code generator that this load address points to memory which does not change value during program @@ -4618,10 +4629,10 @@ The '``store``' instruction is used to write to memory. Arguments: """""""""" -There are two arguments to the '``store``' instruction: a value to store -and an address at which to store it. The type of the '``<pointer>``' +There are two arguments to the ``store`` instruction: a value to store +and an address at which to store it. The type of the ``<pointer>`` operand must be a pointer to the :ref:`first class <t_firstclass>` type of -the '``<value>``' operand. If the ``store`` is marked as ``volatile``, +the ``<value>`` operand. If the ``store`` is marked as ``volatile``, then the optimizer is not allowed to modify the number or order of execution of this ``store`` with other :ref:`volatile operations <volatile>`. @@ -4638,18 +4649,18 @@ has undefined behavior if the alignment is not set to a value which is at least the size in bytes of the pointee. ``!nontemporal`` does not have any defined semantics for atomic stores. -The optional constant "align" argument specifies the alignment of the +The optional constant ``align`` argument specifies the alignment of the operation (that is, the alignment of the memory address). A value of 0 -or an omitted "align" argument means that the operation has the abi +or an omitted ``align`` argument means that the operation has the ABI alignment for the target. It is the responsibility of the code emitter to ensure that the alignment information is correct. Overestimating the -alignment results in an undefined behavior. Underestimating the +alignment results in undefined behavior. Underestimating the alignment may produce less efficient code. An alignment of 1 is always safe. -The optional !nontemporal metadata must reference a single metatadata -name <index> corresponding to a metadata node with one i32 entry of -value 1. The existence of the !nontemporal metatadata on the instruction +The optional ``!nontemporal`` metadata must reference a single metatadata +name ``<index>`` corresponding to a metadata node with one ``i32`` entry of +value 1. The existence of the ``!nontemporal`` metatadata on the instruction tells the optimizer and code generator that this load is not expected to be reused in the cache. The code generator may select special instructions to save cache bandwidth, such as the MOVNT instruction on @@ -4658,8 +4669,8 @@ x86. Semantics: """""""""" -The contents of memory are updated to contain '``<value>``' at the -location specified by the '``<pointer>``' operand. If '``<value>``' is +The contents of memory are updated to contain ``<value>`` at the +location specified by the ``<pointer>`` operand. If ``<value>`` is of scalar type then the number of bytes written does not exceed the minimum number of bytes needed to hold all bits of the type. For example, storing an ``i24`` writes at most three bytes. When writing a @@ -8342,6 +8353,46 @@ strings. This can be useful for special purpose optimizations that want to look for these annotations. These have no other defined use; they are ignored by code generation and optimization. +'``llvm.ptr.annotation.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use '``llvm.ptr.annotation``' on a +pointer to an integer of any width. *NOTE* you must specify an address space for +the pointer. The identifier for the default address space is the integer +'``0``'. + +:: + + declare i8* @llvm.ptr.annotation.p<address space>i8(i8* <val>, i8* <str>, i8* <str>, i32 <int>) + declare i16* @llvm.ptr.annotation.p<address space>i16(i16* <val>, i8* <str>, i8* <str>, i32 <int>) + declare i32* @llvm.ptr.annotation.p<address space>i32(i32* <val>, i8* <str>, i8* <str>, i32 <int>) + declare i64* @llvm.ptr.annotation.p<address space>i64(i64* <val>, i8* <str>, i8* <str>, i32 <int>) + declare i256* @llvm.ptr.annotation.p<address space>i256(i256* <val>, i8* <str>, i8* <str>, i32 <int>) + +Overview: +""""""""" + +The '``llvm.ptr.annotation``' intrinsic. + +Arguments: +"""""""""" + +The first argument is a pointer to an integer value of arbitrary bitwidth +(result of some expression), the second is a pointer to a global string, the +third is a pointer to a global string which is the source file name, and the +last argument is the line number. It returns the value of the first argument. + +Semantics: +"""""""""" + +This intrinsic allows annotation of a pointer to an integer with arbitrary +strings. This can be useful for special purpose optimizations that want to look +for these annotations. These have no other defined use; they are ignored by code +generation and optimization. + '``llvm.annotation.*``' Intrinsic ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ diff --git a/docs/NVPTXUsage.rst b/docs/NVPTXUsage.rst new file mode 100644 index 0000000..5451619 --- /dev/null +++ b/docs/NVPTXUsage.rst @@ -0,0 +1,276 @@ +============================= +User Guide for NVPTX Back-end +============================= + +.. contents:: + :local: + :depth: 3 + + +Introduction +============ + +To support GPU programming, the NVPTX back-end supports a subset of LLVM IR +along with a defined set of conventions used to represent GPU programming +concepts. This document provides an overview of the general usage of the back- +end, including a description of the conventions used and the set of accepted +LLVM IR. + +.. note:: + + This document assumes a basic familiarity with CUDA and the PTX + assembly language. Information about the CUDA Driver API and the PTX assembly + language can be found in the `CUDA documentation + <http://docs.nvidia.com/cuda/index.html>`_. + + + +Conventions +=========== + +Marking Functions as Kernels +---------------------------- + +In PTX, there are two types of functions: *device functions*, which are only +callable by device code, and *kernel functions*, which are callable by host +code. By default, the back-end will emit device functions. Metadata is used to +declare a function as a kernel function. This metadata is attached to the +``nvvm.annotations`` named metadata object, and has the following format: + +.. code-block:: llvm + + !0 = metadata !{<function-ref>, metadata !"kernel", i32 1} + +The first parameter is a reference to the kernel function. The following +example shows a kernel function calling a device function in LLVM IR. The +function ``@my_kernel`` is callable from host code, but ``@my_fmad`` is not. + +.. code-block:: llvm + + define float @my_fmad(float %x, float %y, float %z) { + %mul = fmul float %x, %y + %add = fadd float %mul, %z + ret float %add + } + + define void @my_kernel(float* %ptr) { + %val = load float* %ptr + %ret = call float @my_fmad(float %val, float %val, float %val) + store float %ret, float* %ptr + ret void + } + + !nvvm.annotations = !{!1} + !1 = metadata !{void (float*)* @my_kernel, metadata !"kernel", i32 1} + +When compiled, the PTX kernel functions are callable by host-side code. + + +Address Spaces +-------------- + +The NVPTX back-end uses the following address space mapping: + + ============= ====================== + Address Space Memory Space + ============= ====================== + 0 Generic + 1 Global + 2 Internal Use + 3 Shared + 4 Constant + 5 Local + ============= ====================== + +Every global variable and pointer type is assigned to one of these address +spaces, with 0 being the default address space. Intrinsics are provided which +can be used to convert pointers between the generic and non-generic address +spaces. + +As an example, the following IR will define an array ``@g`` that resides in +global device memory. + +.. code-block:: llvm + + @g = internal addrspace(1) global [4 x i32] [ i32 0, i32 1, i32 2, i32 3 ] + +LLVM IR functions can read and write to this array, and host-side code can +copy data to it by name with the CUDA Driver API. + +Note that since address space 0 is the generic space, it is illegal to have +global variables in address space 0. Address space 0 is the default address +space in LLVM, so the ``addrspace(N)`` annotation is *required* for global +variables. + + +NVPTX Intrinsics +================ + +Address Space Conversion +------------------------ + +'``llvm.nvvm.ptr.*.to.gen``' Intrinsics +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +These are overloaded intrinsics. You can use these on any pointer types. + +.. code-block:: llvm + + declare i8* @llvm.nvvm.ptr.global.to.gen.p0i8.p1i8(i8 addrspace(1)*) + declare i8* @llvm.nvvm.ptr.shared.to.gen.p0i8.p3i8(i8 addrspace(3)*) + declare i8* @llvm.nvvm.ptr.constant.to.gen.p0i8.p4i8(i8 addrspace(4)*) + declare i8* @llvm.nvvm.ptr.local.to.gen.p0i8.p5i8(i8 addrspace(5)*) + +Overview: +""""""""" + +The '``llvm.nvvm.ptr.*.to.gen``' intrinsics convert a pointer in a non-generic +address space to a generic address space pointer. + +Semantics: +"""""""""" + +These intrinsics modify the pointer value to be a valid generic address space +pointer. + + +'``llvm.nvvm.ptr.gen.to.*``' Intrinsics +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +These are overloaded intrinsics. You can use these on any pointer types. + +.. code-block:: llvm + + declare i8* @llvm.nvvm.ptr.gen.to.global.p1i8.p0i8(i8 addrspace(1)*) + declare i8* @llvm.nvvm.ptr.gen.to.shared.p3i8.p0i8(i8 addrspace(3)*) + declare i8* @llvm.nvvm.ptr.gen.to.constant.p4i8.p0i8(i8 addrspace(4)*) + declare i8* @llvm.nvvm.ptr.gen.to.local.p5i8.p0i8(i8 addrspace(5)*) + +Overview: +""""""""" + +The '``llvm.nvvm.ptr.gen.to.*``' intrinsics convert a pointer in the generic +address space to a pointer in the target address space. Note that these +intrinsics are only useful if the address space of the target address space of +the pointer is known. It is not legal to use address space conversion +intrinsics to convert a pointer from one non-generic address space to another +non-generic address space. + +Semantics: +"""""""""" + +These intrinsics modify the pointer value to be a valid pointer in the target +non-generic address space. + + +Reading PTX Special Registers +----------------------------- + +'``llvm.nvvm.read.ptx.sreg.*``' +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +.. code-block:: llvm + + declare i32 @llvm.nvvm.read.ptx.sreg.tid.x() + declare i32 @llvm.nvvm.read.ptx.sreg.tid.y() + declare i32 @llvm.nvvm.read.ptx.sreg.tid.z() + declare i32 @llvm.nvvm.read.ptx.sreg.ntid.x() + declare i32 @llvm.nvvm.read.ptx.sreg.ntid.y() + declare i32 @llvm.nvvm.read.ptx.sreg.ntid.z() + declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() + declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.y() + declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.z() + declare i32 @llvm.nvvm.read.ptx.sreg.nctaid.x() + declare i32 @llvm.nvvm.read.ptx.sreg.nctaid.y() + declare i32 @llvm.nvvm.read.ptx.sreg.nctaid.z() + declare i32 @llvm.nvvm.read.ptx.sreg.warpsize() + +Overview: +""""""""" + +The '``@llvm.nvvm.read.ptx.sreg.*``' intrinsics provide access to the PTX +special registers, in particular the kernel launch bounds. These registers +map in the following way to CUDA builtins: + + ============ ===================================== + CUDA Builtin PTX Special Register Intrinsic + ============ ===================================== + ``threadId`` ``@llvm.nvvm.read.ptx.sreg.tid.*`` + ``blockIdx`` ``@llvm.nvvm.read.ptx.sreg.ctaid.*`` + ``blockDim`` ``@llvm.nvvm.read.ptx.sreg.ntid.*`` + ``gridDim`` ``@llvm.nvvm.read.ptx.sreg.nctaid.*`` + ============ ===================================== + + +Barriers +-------- + +'``llvm.nvvm.barrier0``' +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +.. code-block:: llvm + + declare void @llvm.nvvm.barrier0() + +Overview: +""""""""" + +The '``@llvm.nvvm.barrier0()``' intrinsic emits a PTX ``bar.sync 0`` +instruction, equivalent to the ``__syncthreads()`` call in CUDA. + + +Other Intrinsics +---------------- + +For the full set of NVPTX intrinsics, please see the +``include/llvm/IR/IntrinsicsNVVM.td`` file in the LLVM source tree. + + +Executing PTX +============= + +The most common way to execute PTX assembly on a GPU device is to use the CUDA +Driver API. This API is a low-level interface to the GPU driver and allows for +JIT compilation of PTX code to native GPU machine code. + +Initializing the Driver API: + +.. code-block:: c++ + + CUdevice device; + CUcontext context; + + // Initialize the driver API + cuInit(0); + // Get a handle to the first compute device + cuDeviceGet(&device, 0); + // Create a compute device context + cuCtxCreate(&context, 0, device); + +JIT compiling a PTX string to a device binary: + +.. code-block:: c++ + + CUmodule module; + CUfunction funcion; + + // JIT compile a null-terminated PTX string + cuModuleLoadData(&module, (void*)PTXString); + + // Get a handle to the "myfunction" kernel function + cuModuleGetFunction(&function, module, "myfunction"); + +For full examples of executing PTX assembly, please see the `CUDA Samples +<https://developer.nvidia.com/cuda-downloads>`_ distribution. diff --git a/docs/Passes.rst b/docs/Passes.rst index 9cb8ba0..d279eca 100644 --- a/docs/Passes.rst +++ b/docs/Passes.rst @@ -1018,8 +1018,8 @@ possible, it transforms the individual ``alloca`` instructions into nice clean scalar SSA form. This combines a simple scalar replacement of aggregates algorithm with the -:ref:`mem2reg <passes-mem2reg>` algorithm because often interact, especially -for C++ programs. As such, iterating between ``scalarrepl``, then +:ref:`mem2reg <passes-mem2reg>` algorithm because they often interact, +especially for C++ programs. As such, iterating between ``scalarrepl``, then :ref:`mem2reg <passes-mem2reg>` until we run out of things to promote works well. diff --git a/docs/ProgrammersManual.rst b/docs/ProgrammersManual.rst index 4fc4597..7864165 100644 --- a/docs/ProgrammersManual.rst +++ b/docs/ProgrammersManual.rst @@ -626,6 +626,33 @@ SmallVectors are most useful when on the stack. SmallVector also provides a nice portable and efficient replacement for ``alloca``. +.. note:: + + Prefer to use ``SmallVectorImpl<T>`` as a parameter type. + + In APIs that don't care about the "small size" (most?), prefer to use + the ``SmallVectorImpl<T>`` class, which is basically just the "vector + header" (and methods) without the elements allocated after it. Note that + ``SmallVector<T, N>`` inherits from ``SmallVectorImpl<T>`` so the + conversion is implicit and costs nothing. E.g. + + .. code-block:: c++ + + // BAD: Clients cannot pass e.g. SmallVector<Foo, 4>. + hardcodedSmallSize(SmallVector<Foo, 2> &Out); + // GOOD: Clients can pass any SmallVector<Foo, N>. + allowsAnySmallSize(SmallVectorImpl<Foo> &Out); + + void someFunc() { + SmallVector<Foo, 8> Vec; + hardcodedSmallSize(Vec); // Error. + allowsAnySmallSize(Vec); // Works. + } + + Even though it has "``Impl``" in the name, this is so widely used that + it really isn't "private to the implementation" anymore. A name like + ``SmallVectorHeader`` would be more appropriate. + .. _dss_vector: <vector> @@ -989,7 +1016,9 @@ coupled with a good choice of :ref:`sequential container <ds_sequential>`. This combination provides the several nice properties: the result data is contiguous in memory (good for cache locality), has few allocations, is easy to address (iterators in the final vector are just indices or pointers), and can be -efficiently queried with a standard binary or radix search. +efficiently queried with a standard binary search (e.g. +``std::lower_bound``; if you want the whole range of elements comparing +equal, use ``std::equal_range``). .. _dss_smallset: diff --git a/docs/ReleaseNotes.rst b/docs/ReleaseNotes.rst index 822b55f..080a81a 100644 --- a/docs/ReleaseNotes.rst +++ b/docs/ReleaseNotes.rst @@ -64,6 +64,12 @@ Non-comprehensive list of changes in this release attributes, which are useful for passing information to code generation. See :doc:`HowToUseAttributes` for more details. +* TableGen's syntax for instruction selection patterns has been simplified. + Instead of specifying types indirectly with register classes, you should now + specify types directly in the input patterns. See ``SparcInstrInfo.td`` for + examples of the new syntax. The old syntax using register classes still + works, but it will be removed in a future LLVM release. + * ... next change ... .. NOTE @@ -103,15 +109,25 @@ Loop Vectorizer We've continued the work on the loop vectorizer. The loop vectorizer now has the following features: -- Loops with unknown trip count. -- Runtime checks of pointers -- Reductions, Inductions -- If Conversion -- Pointer induction variables -- Reverse iterators -- Vectorization of mixed types -- Vectorization of function calls -- Partial unrolling during vectorization +- Loops with unknown trip counts. +- Runtime checks of pointers. +- Reductions, Inductions. +- Min/Max reductions of integers. +- If Conversion. +- Pointer induction variables. +- Reverse iterators. +- Vectorization of mixed types. +- Vectorization of function calls. +- Partial unrolling during vectorization. + +The loop vectorizer is now enabled by default for -O3. + +SLP Vectorizer +-------------- + +LLVM now has a new SLP vectorizer. The new SLP vectorizer is not enabled by +default but can be enabled using the clang flag -fslp-vectorize. The BB-vectorizer +can also be enabled using the command line flag -fslp-vectorize-aggressive. R600 Backend ------------ diff --git a/docs/SourceLevelDebugging.rst b/docs/SourceLevelDebugging.rst index 16fa7f0..8574795 100644 --- a/docs/SourceLevelDebugging.rst +++ b/docs/SourceLevelDebugging.rst @@ -2080,23 +2080,23 @@ array to be: HeaderData.atoms[0].form = DW_FORM_data4; This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is - encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have - multiple matching DIEs in a single file, which could come up with an inlined - function for instance. Future tables could include more information about the - DIE such as flags indicating if the DIE is a function, method, block, - or inlined. +encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have +multiple matching DIEs in a single file, which could come up with an inlined +function for instance. Future tables could include more information about the +DIE such as flags indicating if the DIE is a function, method, block, +or inlined. The KeyType for the DWARF table is a 32 bit string table offset into the - ".debug_str" table. The ".debug_str" is the string table for the DWARF which - may already contain copies of all of the strings. This helps make sure, with - help from the compiler, that we reuse the strings between all of the DWARF - sections and keeps the hash table size down. Another benefit to having the - compiler generate all strings as DW_FORM_strp in the debug info, is that - DWARF parsing can be made much faster. +".debug_str" table. The ".debug_str" is the string table for the DWARF which +may already contain copies of all of the strings. This helps make sure, with +help from the compiler, that we reuse the strings between all of the DWARF +sections and keeps the hash table size down. Another benefit to having the +compiler generate all strings as DW_FORM_strp in the debug info, is that +DWARF parsing can be made much faster. After a lookup is made, we get an offset into the hash data. The hash data - needs to be able to deal with 32 bit hash collisions, so the chunk of data - at the offset in the hash data consists of a triple: +needs to be able to deal with 32 bit hash collisions, so the chunk of data +at the offset in the hash data consists of a triple: .. code-block:: c @@ -2105,7 +2105,7 @@ After a lookup is made, we get an offset into the hash data. The hash data HashData[hash_data_count] If "str_offset" is zero, then the bucket contents are done. 99.9% of the - hash data chunks contain a single item (no 32 bit hash collision): +hash data chunks contain a single item (no 32 bit hash collision): .. code-block:: none diff --git a/docs/TableGen/LangRef.rst b/docs/TableGen/LangRef.rst index c9e1efb..bd28a90 100644 --- a/docs/TableGen/LangRef.rst +++ b/docs/TableGen/LangRef.rst @@ -286,7 +286,7 @@ given values. .. productionlist:: SimpleValue: "(" `DagArg` `DagArgList` ")" DagArgList: `DagArg` ("," `DagArg`)* - DagArg: `Value` [":" `TokVarName`] + DagArg: `Value` [":" `TokVarName`] | `TokVarName` The initial :token:`DagArg` is called the "operator" of the dag. diff --git a/docs/TestingGuide.rst b/docs/TestingGuide.rst index 4d8c8ce..79cedee 100644 --- a/docs/TestingGuide.rst +++ b/docs/TestingGuide.rst @@ -224,16 +224,7 @@ Below is an example of legal RUN lines in a ``.ll`` file: ; RUN: diff %t1 %t2 As with a Unix shell, the RUN lines permit pipelines and I/O -redirection to be used. However, the usage is slightly different than -for Bash. In general, it's useful to read the code of other tests to figure out -what you can use in yours. The major differences are: - -- You can't do ``2>&1``. That will cause :program:`lit` to write to a file - named ``&1``. Usually this is done to get stderr to go through a pipe. You - can do that with ``|&`` so replace this idiom: - ``... 2>&1 | FileCheck`` with ``... |& FileCheck`` -- You can only redirect to a file, not to another descriptor and not - from a here document. +redirection to be used. There are some quoting rules that you must pay attention to when writing your RUN lines. In general nothing needs to be quoted. :program:`lit` won't @@ -243,7 +234,7 @@ everything enclosed as one value. In general, you should strive to keep your RUN lines as simple as possible, using them only to run tools that generate textual output you can then examine. -The recommended way to examine output to figure out if the test passes it using +The recommended way to examine output to figure out if the test passes is using the :doc:`FileCheck tool <CommandGuide/FileCheck>`. *[The usage of grep in RUN lines is deprecated - please do not send or commit patches that use it.]* diff --git a/docs/Vectorizers.rst b/docs/Vectorizers.rst index e2d3667..d565c21 100644 --- a/docs/Vectorizers.rst +++ b/docs/Vectorizers.rst @@ -6,10 +6,10 @@ Auto-Vectorization in LLVM :local: LLVM has two vectorizers: The :ref:`Loop Vectorizer <loop-vectorizer>`, -which operates on Loops, and the :ref:`Basic Block Vectorizer -<bb-vectorizer>`, which optimizes straight-line code. These vectorizers +which operates on Loops, and the :ref:`SLP Vectorizer +<slp-vectorizer>`, which optimizes straight-line code. These vectorizers focus on different optimization opportunities and use different techniques. -The BB vectorizer merges multiple scalars that are found in the code into +The SLP vectorizer merges multiple scalars that are found in the code into vectors while the Loop Vectorizer widens instructions in the original loop to operate on multiple consecutive loop iterations. @@ -21,19 +21,13 @@ The Loop Vectorizer Usage ----- -LLVM's Loop Vectorizer is now available and will be useful for many people. -It is not enabled by default, but can be enabled through clang using the -command line flag: +LLVM's Loop Vectorizer is now enabled by default for -O3. +We plan to enable parts of the Loop Vectorizer on -O2 and -Os in future releases. +The vectorizer can be disabled using the command line: .. code-block:: console - $ clang -fvectorize -O3 file.c - -If the ``-fvectorize`` flag is used then the loop vectorizer will be enabled -when running with ``-O3``, ``-O2``. When ``-Os`` is used, the loop vectorizer -will only vectorize loops that do not require a major increase in code size. - -We plan to enable the Loop Vectorizer by default as part of the LLVM 3.3 release. + $ clang ... -fno-vectorize file.c Command line flags ^^^^^^^^^^^^^^^^^^ @@ -299,25 +293,15 @@ And Linpack-pc with the same configuration. Result is Mflops, higher is better. .. image:: linpack-pc.png -.. _bb-vectorizer: +.. _slp-vectorizer: -The Basic Block Vectorizer -========================== - -Usage ------- - -The Basic Block Vectorizer is not enabled by default, but it can be enabled -through clang using the command line flag: - -.. code-block:: console - - $ clang -fslp-vectorize file.c +The SLP Vectorizer +================== Details ------- -The goal of basic-block vectorization (a.k.a. superword-level parallelism) is +The goal of SLP vectorization (a.k.a. superword-level parallelism) is to combine similar independent instructions within simple control-flow regions into vector instructions. Memory accesses, arithemetic operations, comparison operations and some math functions can all be vectorized using this technique @@ -329,10 +313,50 @@ into vector operations. .. code-block:: c++ - int foo(int a1, int a2, int b1, int b2) { - int r1 = a1*(a1 + b1)/b1 + 50*b1/a1; - int r2 = a2*(a2 + b2)/b2 + 50*b2/a2; - return r1 + r2; + void foo(int a1, int a2, int b1, int b2, int *A) { + A[0] = a1*(a1 + b1)/b1 + 50*b1/a1; + A[1] = a2*(a2 + b2)/b2 + 50*b2/a2; } +The SLP-vectorizer has two phases, bottom-up, and top-down. The top-down vectorization +phase is more aggressive, but takes more time to run. + +Usage +------ + +The SLP Vectorizer is not enabled by default, but it can be enabled +through clang using the command line flag: + +.. code-block:: console + + $ clang -fslp-vectorize file.c + +LLVM has a second basic block vectorization phase +which is more compile-time intensive (The BB vectorizer). This optimization +can be enabled through clang using the command line flag: + +.. code-block:: console + + $ clang -fslp-vectorize-aggressive file.c + + +The SLP vectorizer is in early development stages but can already vectorize +and accelerate many programs in the LLVM test suite. + +======================= ============ +Benchmark Name Gain +======================= ============ +Misc/flops-7 -32.70% +Misc/matmul_f64_4x4 -23.23% +Olden/power -21.45% +Misc/flops-4 -14.90% +ASC_Sequoia/AMGmk -13.85% +TSVC/LoopRerolling-flt -11.76% +Misc/flops-6 -9.70% +Misc/flops-5 -8.54% +Misc/flops -8.12% +TSVC/NodeSplitting-dbl -6.96% +Misc-C++/sphereflake -6.74% +Ptrdist/yacr2 -6.31% +======================= ============ diff --git a/docs/WritingAnLLVMBackend.rst b/docs/WritingAnLLVMBackend.rst index 6d6c2a1..a03a5e4 100644 --- a/docs/WritingAnLLVMBackend.rst +++ b/docs/WritingAnLLVMBackend.rst @@ -760,7 +760,7 @@ target description file (``IntRegs``). def LDrr : F3_1 <3, 0b000000, (outs IntRegs:$dst), (ins MEMrr:$addr), "ld [$addr], $dst", - [(set IntRegs:$dst, (load ADDRrr:$addr))]>; + [(set i32:$dst, (load ADDRrr:$addr))]>; The fourth parameter is the input source, which uses the address operand ``MEMrr`` that is defined earlier in ``SparcInstrInfo.td``: @@ -788,7 +788,7 @@ class is defined: def LDri : F3_2 <3, 0b000000, (outs IntRegs:$dst), (ins MEMri:$addr), "ld [$addr], $dst", - [(set IntRegs:$dst, (load ADDRri:$addr))]>; + [(set i32:$dst, (load ADDRri:$addr))]>; Writing these definitions for so many similar instructions can involve a lot of cut and paste. In ``.td`` files, the ``multiclass`` directive enables the @@ -803,11 +803,11 @@ pattern ``F3_12`` is defined to create 2 instruction classes each time def rr : F3_1 <2, Op3Val, (outs IntRegs:$dst), (ins IntRegs:$b, IntRegs:$c), !strconcat(OpcStr, " $b, $c, $dst"), - [(set IntRegs:$dst, (OpNode IntRegs:$b, IntRegs:$c))]>; + [(set i32:$dst, (OpNode i32:$b, i32:$c))]>; def ri : F3_2 <2, Op3Val, (outs IntRegs:$dst), (ins IntRegs:$b, i32imm:$c), !strconcat(OpcStr, " $b, $c, $dst"), - [(set IntRegs:$dst, (OpNode IntRegs:$b, simm13:$c))]>; + [(set i32:$dst, (OpNode i32:$b, simm13:$c))]>; } So when the ``defm`` directive is used for the ``XOR`` and ``ADD`` @@ -856,7 +856,7 @@ format instruction having three operands. def XNORrr : F3_1<2, 0b000111, (outs IntRegs:$dst), (ins IntRegs:$b, IntRegs:$c), "xnor $b, $c, $dst", - [(set IntRegs:$dst, (not (xor IntRegs:$b, IntRegs:$c)))]>; + [(set i32:$dst, (not (xor i32:$b, i32:$c)))]>; The instruction templates in ``SparcInstrFormats.td`` show the base class for ``F3_1`` is ``InstSP``. @@ -1124,7 +1124,7 @@ a pattern with the store DAG operator. .. code-block:: llvm def STrr : F3_1< 3, 0b000100, (outs), (ins MEMrr:$addr, IntRegs:$src), - "st $src, [$addr]", [(store IntRegs:$src, ADDRrr:$addr)]>; + "st $src, [$addr]", [(store i32:$src, ADDRrr:$addr)]>; ``ADDRrr`` is a memory mode that is also defined in ``SparcInstrInfo.td``: @@ -1185,7 +1185,7 @@ instruction. SDValue CPTmp0; SDValue CPTmp1; - // Pattern: (st:void IntRegs:i32:$src, + // Pattern: (st:void i32:i32:$src, // ADDRrr:i32:$addr)<<P:Predicate_store>> // Emits: (STrr:void ADDRrr:i32:$addr, IntRegs:i32:$src) // Pattern complexity = 13 cost = 1 size = 0 diff --git a/docs/index.rst b/docs/index.rst index 8f22ef2..6b182da 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -22,7 +22,6 @@ Several introductory papers and presentations. :hidden: LangRef - GetElementPtr :doc:`LangRef` Defines the LLVM intermediate representation. @@ -48,10 +47,6 @@ Several introductory papers and presentations. .. __: http://llvm.org/pubs/2002-12-LattnerMSThesis.html -:doc:`GetElementPtr` - Answers to some very frequent questions about LLVM's most frequently - misunderstood instruction. - `Publications mentioning LLVM <http://llvm.org/pubs>`_ .. @@ -72,7 +67,6 @@ representation. CMake HowToBuildOnARM CommandGuide/index - DeveloperPolicy GettingStarted GettingStartedVS FAQ @@ -87,6 +81,7 @@ representation. ReleaseNotes Passes YamlIO + GetElementPtr :doc:`GettingStarted` Discusses how to get up and running quickly with the LLVM infrastructure. @@ -108,9 +103,6 @@ representation. Tutorials about using LLVM. Includes a tutorial about making a custom language with LLVM. -:doc:`DeveloperPolicy` - The LLVM project's policy towards developers and their contributions. - :doc:`LLVM Command Guide <CommandGuide/index>` A reference manual for the LLVM command line utilities ("man" pages for LLVM tools). @@ -149,25 +141,9 @@ representation. :doc:`YamlIO` A reference guide for using LLVM's YAML I/O library. -IRC -=== - -Users and developers of the LLVM project (including subprojects such as Clang) -can be found in #llvm on `irc.oftc.net <irc://irc.oftc.net/llvm>`_. - -This channel has several bots. - -* Buildbot reporters - - * llvmbb - Bot for the main LLVM buildbot master. - http://lab.llvm.org:8011/console - * bb-chapuni - An individually run buildbot master. http://bb.pgr.jp/console - * smooshlab - Apple's internal buildbot master. - -* robot - Bugzilla linker. %bug <number> - -* clang-bot - A `geordi <http://www.eelis.net/geordi/>`_ instance running - near-trunk clang instead of gcc. +:doc:`GetElementPtr` + Answers to some very frequent questions about LLVM's most frequently + misunderstood instruction. Programming Documentation ========================= @@ -184,6 +160,7 @@ For developers of applications which use LLVM as a library. ExtendingLLVM HowToSetUpLLVMStyleRTTI ProgrammersManual + Extensions :doc:`LLVM Language Reference Manual <LangRef>` Defines the LLVM intermediate representation and the assembly form of the @@ -196,6 +173,9 @@ For developers of applications which use LLVM as a library. Introduction to the general layout of the LLVM sourcebase, important classes and APIs, and some tips & tricks. +:doc:`Extensions` + LLVM-specific extensions to tools and formats LLVM seeks compatibility with. + :doc:`CommandLine` Provides information on using the command line parsing library. @@ -248,6 +228,7 @@ For API clients and LLVM developers. WritingAnLLVMPass TableGen/LangRef HowToUseAttributes + NVPTXUsage :doc:`WritingAnLLVMPass` Information on how to write LLVM transformations and analyses. @@ -316,6 +297,10 @@ For API clients and LLVM developers. :doc:`HowToUseAttributes` Answers some questions about the new Attributes infrastructure. +:doc:`NVPTXUsage` + This document describes using the NVPTX back-end to compile GPU kernels. + + Development Process Documentation ================================= @@ -324,12 +309,16 @@ Information about LLVM's development process. .. toctree:: :hidden: + DeveloperPolicy MakefileGuide Projects LLVMBuild HowToReleaseLLVM Packaging +:doc:`DeveloperPolicy` + The LLVM project's policy towards developers and their contributions. + :doc:`Projects` How-to guide and templates for new projects that *use* the LLVM infrastructure. The templates (directory organization, Makefiles, and test @@ -349,46 +338,75 @@ Information about LLVM's development process. :doc:`Packaging` Advice on packaging LLVM into a distribution. +Community +========= + +LLVM has a thriving community of friendly and helpful developers. +The two primary communication mechanisms in the LLVM community are mailing +lists and IRC. + Mailing Lists -============= +------------- If you can't find what you need in these docs, try consulting the mailing lists. -`LLVM Announcements List`__ - This is a low volume list that provides important announcements regarding - LLVM. It gets email about once a month. - - .. __: http://lists.cs.uiuc.edu/mailman/listinfo/llvm-announce - -`Developer's List`__ +`Developer's List (llvmdev)`__ This list is for people who want to be included in technical discussions of LLVM. People post to this list when they have questions about writing code for or using the LLVM tools. It is relatively low volume. .. __: http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -`Bugs & Patches Archive`__ - This list gets emailed every time a bug is opened and closed, and when people - submit patches to be included in LLVM. It is higher volume than the LLVMdev - list. - - .. __: http://lists.cs.uiuc.edu/pipermail/llvmbugs/ - -`Commits Archive`__ +`Commits Archive (llvm-commits)`__ This list contains all commit messages that are made when LLVM developers - commit code changes to the repository. It is useful for those who want to - stay on the bleeding edge of LLVM development. This list is very high volume. + commit code changes to the repository. It also serves as a forum for + patch review (i.e. send patches here). It is useful for those who want to + stay on the bleeding edge of LLVM development. This list is very high + volume. .. __: http://lists.cs.uiuc.edu/pipermail/llvm-commits/ -`Test Results Archive`__ +`Bugs & Patches Archive (llvmbugs)`__ + This list gets emailed every time a bug is opened and closed. It is + higher volume than the LLVMdev list. + + .. __: http://lists.cs.uiuc.edu/pipermail/llvmbugs/ + +`Test Results Archive (llvm-testresults)`__ A message is automatically sent to this list by every active nightly tester when it completes. As such, this list gets email several times each day, making it a high volume list. .. __: http://lists.cs.uiuc.edu/pipermail/llvm-testresults/ +`LLVM Announcements List (llvm-announce)`__ + This is a low volume list that provides important announcements regarding + LLVM. It gets email about once a month. + + .. __: http://lists.cs.uiuc.edu/mailman/listinfo/llvm-announce + +IRC +--- + +Users and developers of the LLVM project (including subprojects such as Clang) +can be found in #llvm on `irc.oftc.net <irc://irc.oftc.net/llvm>`_. + +This channel has several bots. + +* Buildbot reporters + + * llvmbb - Bot for the main LLVM buildbot master. + http://lab.llvm.org:8011/console + * bb-chapuni - An individually run buildbot master. http://bb.pgr.jp/console + * smooshlab - Apple's internal buildbot master. + +* robot - Bugzilla linker. %bug <number> + +* clang-bot - A `geordi <http://www.eelis.net/geordi/>`_ instance running + near-trunk clang instead of gcc. + + Indices and tables ================== diff --git a/docs/tutorial/LangImpl1.rst b/docs/tutorial/LangImpl1.rst index aa619cf..a2c5eee 100644 --- a/docs/tutorial/LangImpl1.rst +++ b/docs/tutorial/LangImpl1.rst @@ -55,7 +55,7 @@ in the various pieces. The structure of the tutorial is: - Because a lot of people are interested in using LLVM as a JIT, we'll dive right into it and show you the 3 lines it takes to add JIT support. LLVM is also useful in many other ways, but this is one - simple and "sexy" way to shows off its power. :) + simple and "sexy" way to show off its power. :) - `Chapter #5 <LangImpl5.html>`_: Extending the Language: Control Flow - With the language up and running, we show how to extend it with control flow operations (if/then/else and a 'for' loop). This |
