Update aosp/master LLVM for rebase to r222494.

Change-Id: Ic787f5e0124df789bd26f3f24680f45e678eef2d
author: Stephen Hines <srhines@google.com> 2014-12-01 14:51:49 -0800
committer: Stephen Hines <srhines@google.com> 2014-12-02 16:08:10 -0800
commit: 37ed9c199ca639565f6ce88105f9e39e898d82d0 (patch)
tree: 8fb36d3910e3ee4c4e1b7422f4f017108efc52f5 /docs/Atomics.rst
parent: d2327b22152ced7bc46dc629fc908959e8a52d03 (diff)
download: external_llvm-37ed9c199ca639565f6ce88105f9e39e898d82d0.zip
external_llvm-37ed9c199ca639565f6ce88105f9e39e898d82d0.tar.gz
external_llvm-37ed9c199ca639565f6ce88105f9e39e898d82d0.tar.bz2
1 files changed, 44 insertions, 22 deletions
diff --git a/docs/Atomics.rst b/docs/Atomics.rst
index 5f17c61..6c8303b 100644
--- a/docs/Atomics.rst
+++ b/docs/Atomics.rst
@@ -18,16 +18,16 @@ clarified in the IR.
 The atomic instructions are designed specifically to provide readable IR and
 optimized code generation for the following:
 
-* The new C++0x ``<atomic>`` header.  (`C++0x draft available here
-  <http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C1x draft available here
+* The new C++11 ``<atomic>`` header.  (`C++11 draft available here
+  <http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C11 draft available here
   <http://www.open-std.org/jtc1/sc22/wg14/>`_.)
 
 * Proper semantics for Java-style memory, for both ``volatile`` and regular
   shared variables. (`Java Specification
-  <http://java.sun.com/docs/books/jls/third_edition/html/memory.html>`_)
+  <http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html>`_)
 
 * gcc-compatible ``__sync_*`` builtins. (`Description
-  <http://gcc.gnu.org/onlinedocs/gcc/Atomic-Builtins.html>`_)
+  <https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html>`_)
 
 * Other scenarios with atomic semantics, including ``static`` variables with
   non-trivial constructors in C++.
@@ -115,7 +115,10 @@ memory operation can happen on any thread between the load and store.
 A ``fence`` provides Acquire and/or Release ordering which is not part of
 another operation; it is normally used along with Monotonic memory operations.
 A Monotonic load followed by an Acquire fence is roughly equivalent to an
-Acquire load.
+Acquire load, and a Monotonic store following a Release fence is roughly
+equivalent to a Release store. SequentiallyConsistent fences behave as both
+an Acquire and a Release fence, and offer some additional complicated
+guarantees, see the C++11 standard for details.
 
 Frontends generating atomic instructions generally need to be aware of the
 target to some degree; atomic instructions are guaranteed to be lock-free, and
@@ -177,10 +180,10 @@ Unordered
 
 Unordered is the lowest level of atomicity. It essentially guarantees that races
 produce somewhat sane results instead of having undefined behavior.  It also
-guarantees the operation to be lock-free, so it do not depend on the data being
-part of a special atomic structure or depend on a separate per-process global
-lock.  Note that code generation will fail for unsupported atomic operations; if
-you need such an operation, use explicit locking.
+guarantees the operation to be lock-free, so it does not depend on the data
+being part of a special atomic structure or depend on a separate per-process
+global lock.  Note that code generation will fail for unsupported atomic
+operations; if you need such an operation, use explicit locking.
 
 Relevant standard
   This is intended to match the Java memory model for shared variables.
@@ -221,7 +224,7 @@ essentially guarantees that if you take all the operations affecting a specific
 address, a consistent ordering exists.
 
 Relevant standard
-  This corresponds to the C++0x/C1x ``memory_order_relaxed``; see those
+  This corresponds to the C++11/C11 ``memory_order_relaxed``; see those
   standards for the exact definition.
 
 Notes for frontends
@@ -251,8 +254,8 @@ Acquire provides a barrier of the sort necessary to acquire a lock to access
 other memory with normal loads and stores.
 
 Relevant standard
-  This corresponds to the C++0x/C1x ``memory_order_acquire``. It should also be
-  used for C++0x/C1x ``memory_order_consume``.
+  This corresponds to the C++11/C11 ``memory_order_acquire``. It should also be
+  used for C++11/C11 ``memory_order_consume``.
 
 Notes for frontends
   If you are writing a frontend which uses this directly, use with caution.
@@ -281,7 +284,7 @@ Release is similar to Acquire, but with a barrier of the sort necessary to
 release a lock.
 
 Relevant standard
-  This corresponds to the C++0x/C1x ``memory_order_release``.
+  This corresponds to the C++11/C11 ``memory_order_release``.
 
 Notes for frontends
   If you are writing a frontend which uses this directly, use with caution.
@@ -307,7 +310,7 @@ AcquireRelease (``acq_rel`` in IR) provides both an Acquire and a Release
 barrier (for fences and operations which both read and write memory).
 
 Relevant standard
-  This corresponds to the C++0x/C1x ``memory_order_acq_rel``.
+  This corresponds to the C++11/C11 ``memory_order_acq_rel``.
 
 Notes for frontends
   If you are writing a frontend which uses this directly, use with caution.
@@ -330,7 +333,7 @@ and Release semantics for stores. Additionally, it guarantees that a total
 ordering exists between all SequentiallyConsistent operations.
 
 Relevant standard
-  This corresponds to the C++0x/C1x ``memory_order_seq_cst``, Java volatile, and
+  This corresponds to the C++11/C11 ``memory_order_seq_cst``, Java volatile, and
   the gcc-compatible ``__sync_*`` builtins which do not specify otherwise.
 
 Notes for frontends
@@ -368,6 +371,11 @@ Predicates for optimizer writers to query:
   that they return true for any operation which is volatile or at least
   Monotonic.
 
+* ``isAtLeastAcquire()``/``isAtLeastRelease()``: These are predicates on
+  orderings. They can be useful for passes that are aware of atomics, for
+  example to do DSE across a single atomic access, but not across a
+  release-acquire pair (see MemoryDependencyAnalysis for an example of this)
+
 * Alias analysis: Note that AA will return ModRef for anything Acquire or
   Release, and for the address accessed by any Monotonic operation.
 
@@ -389,7 +397,9 @@ operations:
 
 * DSE: Unordered stores can be DSE'ed like normal stores.  Monotonic stores can
   be DSE'ed in some cases, but it's tricky to reason about, and not especially
-  important.
+  important. It is possible in some case for DSE to operate across a stronger
+  atomic operation, but it is fairly tricky. DSE delegates this reasoning to
+  MemoryDependencyAnalysis (which is also used by other passes like GVN).
 
 * Folding a load: Any atomic load from a constant global can be constant-folded,
   because it cannot be observed.  Similar reasoning allows scalarrepl with
@@ -400,7 +410,8 @@ Atomics and Codegen
 
 Atomic operations are represented in the SelectionDAG with ``ATOMIC_*`` opcodes.
 On architectures which use barrier instructions for all atomic ordering (like
-ARM), appropriate fences are split out as the DAG is built.
+ARM), appropriate fences can be emitted by the AtomicExpand Codegen pass if
+``setInsertFencesForAtomic()`` was used.
 
 The MachineMemOperand for all atomic operations is currently marked as volatile;
 this is not correct in the IR sense of volatile, but CodeGen handles anything
@@ -415,11 +426,6 @@ error when given an operation which cannot be implemented.  (The LLVM code
 generator is not very helpful here at the moment, but hopefully that will
 change.)
 
-The implementation of atomics on LL/SC architectures (like ARM) is currently a
-bit of a mess; there is a lot of copy-pasted code across targets, and the
-representation is relatively unsuited to optimization (it would be nice to be
-able to optimize loops involving cmpxchg etc.).
-
 On x86, all atomic loads generate a ``MOV``. SequentiallyConsistent stores
 generate an ``XCHG``, other stores generate a ``MOV``. SequentiallyConsistent
 fences generate an ``MFENCE``, other fences do not cause any code to be
@@ -435,3 +441,19 @@ operation. Loads and stores generate normal instructions.  ``cmpxchg`` and
 ``atomicrmw`` can be represented using a loop with LL/SC-style instructions
 which take some sort of exclusive lock on a cache line (``LDREX`` and ``STREX``
 on ARM, etc.).
+
+It is often easiest for backends to use AtomicExpandPass to lower some of the
+atomic constructs. Here are some lowerings it can do:
+
+* cmpxchg -> loop with load-linked/store-conditional
+  by overriding ``hasLoadLinkedStoreConditional()``, ``emitLoadLinked()``,
+  ``emitStoreConditional()``
+* large loads/stores -> ll-sc/cmpxchg
+  by overriding ``shouldExpandAtomicStoreInIR()``/``shouldExpandAtomicLoadInIR()``
+* strong atomic accesses -> monotonic accesses + fences
+  by using ``setInsertFencesForAtomic()`` and overriding ``emitLeadingFence()``
+  and ``emitTrailingFence()``
+* atomic rmw -> loop with cmpxchg or load-linked/store-conditional
+  by overriding ``expandAtomicRMWInIR()``
+
+For an example of all of these, look at the ARM backend.
author	Stephen Hines <srhines@google.com>	2014-12-01 14:51:49 -0800
committer	Stephen Hines <srhines@google.com>	2014-12-02 16:08:10 -0800
commit	37ed9c199ca639565f6ce88105f9e39e898d82d0 (patch)
tree	8fb36d3910e3ee4c4e1b7422f4f017108efc52f5 /docs/Atomics.rst
parent	d2327b22152ced7bc46dc629fc908959e8a52d03 (diff)
download	external_llvm-37ed9c199ca639565f6ce88105f9e39e898d82d0.zip external_llvm-37ed9c199ca639565f6ce88105f9e39e898d82d0.tar.gz external_llvm-37ed9c199ca639565f6ce88105f9e39e898d82d0.tar.bz2