aboutsummaryrefslogtreecommitdiffstats
path: root/docs/Frontend
diff options
context:
space:
mode:
Diffstat (limited to 'docs/Frontend')
-rw-r--r--docs/Frontend/PerformanceTips.rst161
1 files changed, 161 insertions, 0 deletions
diff --git a/docs/Frontend/PerformanceTips.rst b/docs/Frontend/PerformanceTips.rst
new file mode 100644
index 0000000..d8c0465
--- /dev/null
+++ b/docs/Frontend/PerformanceTips.rst
@@ -0,0 +1,161 @@
+=====================================
+Performance Tips for Frontend Authors
+=====================================
+
+.. contents::
+ :local:
+ :depth: 2
+
+Abstract
+========
+
+The intended audience of this document is developers of language frontends
+targeting LLVM IR. This document is home to a collection of tips on how to
+generate IR that optimizes well. As with any optimizer, LLVM has its strengths
+and weaknesses. In some cases, surprisingly small changes in the source IR
+can have a large effect on the generated code.
+
+Avoid loads and stores of large aggregate type
+================================================
+
+LLVM currently does not optimize well loads and stores of large :ref:`aggregate
+types <t_aggregate>` (i.e. structs and arrays). As an alternative, consider
+loading individual fields from memory.
+
+Aggregates that are smaller than the largest (performant) load or store
+instruction supported by the targeted hardware are well supported. These can
+be an effective way to represent collections of small packed fields.
+
+Prefer zext over sext when legal
+==================================
+
+On some architectures (X86_64 is one), sign extension can involve an extra
+instruction whereas zero extension can be folded into a load. LLVM will try to
+replace a sext with a zext when it can be proven safe, but if you have
+information in your source language about the range of a integer value, it can
+be profitable to use a zext rather than a sext.
+
+Alternatively, you can :ref:`specify the range of the value using metadata
+<range-metadata>` and LLVM can do the sext to zext conversion for you.
+
+Zext GEP indices to machine register width
+============================================
+
+Internally, LLVM often promotes the width of GEP indices to machine register
+width. When it does so, it will default to using sign extension (sext)
+operations for safety. If your source language provides information about
+the range of the index, you may wish to manually extend indices to machine
+register width using a zext instruction.
+
+Other things to consider
+=========================
+
+#. Make sure that a DataLayout is provided (this will likely become required in
+ the near future, but is certainly important for optimization).
+
+#. Add nsw/nuw flags as appropriate. Reasoning about overflow is
+ generally hard for an optimizer so providing these facts from the frontend
+ can be very impactful. For languages which need overflow semantics,
+ consider using the :ref:`overflow intrinsics <int_overflow>`.
+
+#. Use fast-math flags on floating point operations if legal. If you don't
+ need strict IEEE floating point semantics, there are a number of additional
+ optimizations that can be performed. This can be highly impactful for
+ floating point intensive computations.
+
+#. Use inbounds on geps. This can help to disambiguate some aliasing queries.
+
+#. Add noalias/align/dereferenceable/nonnull to function arguments and return
+ values as appropriate
+
+#. Mark functions as readnone/readonly or noreturn/nounwind when known. The
+ optimizer will try to infer these flags, but may not always be able to.
+ Manual annotations are particularly important for external functions that
+ the optimizer can not analyze.
+
+#. Use ptrtoint/inttoptr sparingly (they interfere with pointer aliasing
+ analysis), prefer GEPs
+
+#. Use the lifetime.start/lifetime.end and invariant.start/invariant.end
+ intrinsics where possible. Common profitable uses are for stack like data
+ structures (thus allowing dead store elimination) and for describing
+ life times of allocas (thus allowing smaller stack sizes).
+
+#. Use pointer aliasing metadata, especially tbaa metadata, to communicate
+ otherwise-non-deducible pointer aliasing facts
+
+#. Use the "most-private" possible linkage types for the functions being defined
+ (private, internal or linkonce_odr preferably)
+
+#. Mark invariant locations using !invariant.load and TBAA's constant flags
+
+#. Prefer globals over inttoptr of a constant address - this gives you
+ dereferencability information. In MCJIT, use getSymbolAddress to provide
+ actual address.
+
+#. Be wary of ordered and atomic memory operations. They are hard to optimize
+ and may not be well optimized by the current optimizer. Depending on your
+ source language, you may consider using fences instead.
+
+#. If calling a function which is known to throw an exception (unwind), use
+ an invoke with a normal destination which contains an unreachable
+ instruction. This form conveys to the optimizer that the call returns
+ abnormally. For an invoke which neither returns normally or requires unwind
+ code in the current function, you can use a noreturn call instruction if
+ desired. This is generally not required because the optimizer will convert
+ an invoke with an unreachable unwind destination to a call instruction.
+
+#. If you language uses range checks, consider using the IRCE pass. It is not
+ currently part of the standard pass order.
+
+#. For languages with numerous rarely executed guard conditions (e.g. null
+ checks, type checks, range checks) consider adding an extra execution or
+ two of LoopUnswith and LICM to your pass order. The standard pass order,
+ which is tuned for C and C++ applications, may not be sufficient to remove
+ all dischargeable checks from loops.
+
+#. Use profile metadata to indicate statically known cold paths, even if
+ dynamic profiling information is not available. This can make a large
+ difference in code placement and thus the performance of tight loops.
+
+#. When generating code for loops, try to avoid terminating the header block of
+ the loop earlier than necessary. If the terminator of the loop header
+ block is a loop exiting conditional branch, the effectiveness of LICM will
+ be limited for loads not in the header. (This is due to the fact that LLVM
+ may not know such a load is safe to speculatively execute and thus can't
+ lift an otherwise loop invariant load unless it can prove the exiting
+ condition is not taken.) It can be profitable, in some cases, to emit such
+ instructions into the header even if they are not used along a rarely
+ executed path that exits the loop. This guidance specifically does not
+ apply if the condition which terminates the loop header is itself invariant,
+ or can be easily discharged by inspecting the loop index variables.
+
+#. In hot loops, consider duplicating instructions from small basic blocks
+ which end in highly predictable terminators into their successor blocks.
+ If a hot successor block contains instructions which can be vectorized
+ with the duplicated ones, this can provide a noticeable throughput
+ improvement. Note that this is not always profitable and does involve a
+ potentially large increase in code size.
+
+#. Avoid high in-degree basic blocks (e.g. basic blocks with dozens or hundreds
+ of predecessors). Among other issues, the register allocator is known to
+ perform badly with confronted with such structures. The only exception to
+ this guidance is that a unified return block with high in-degree is fine.
+
+p.s. If you want to help improve this document, patches expanding any of the
+above items into standalone sections of their own with a more complete
+discussion would be very welcome.
+
+
+Adding to this document
+=======================
+
+If you run across a case that you feel deserves to be covered here, please send
+a patch to `llvm-commits
+<http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>`_ for review.
+
+If you have questions on these items, please direct them to `llvmdev
+<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_. The more relevant
+context you are able to give to your question, the more likely it is to be
+answered.
+