From 5289142cc84c0e2df25d455c1d741bdd0e8b9b1e Mon Sep 17 00:00:00 2001
From: Eli Friedman
These intrinsic functions expand the "universal IR" of LLVM to represent - hardware constructs for atomic operations and memory synchronization. This - provides an interface to the hardware, not an interface to the programmer. It - is aimed at a low enough level to allow any programming models or APIs - (Application Programming Interfaces) which need atomic behaviors to map - cleanly onto it. It is also modeled primarily on hardware behavior. Just as - hardware provides a "universal IR" for source languages, it also provides a - starting point for developing a "universal" atomic operation and - synchronization IR.
- -These do not form an API such as high-level threading libraries, - software transaction memory systems, atomic primitives, and intrinsic - functions as found in BSD, GNU libc, atomic_ops, APR, and other system and - application libraries. The hardware interface provided by LLVM should allow - a clean implementation of all of these APIs and parallel programming models. - No one model or paradigm should be selected above others unless the hardware - itself ubiquitously does so.
- - -- declare void @llvm.memory.barrier(i1 <ll>, i1 <ls>, i1 <sl>, i1 <ss>, i1 <device>) -- -
The llvm.memory.barrier intrinsic guarantees ordering between - specific pairs of memory access types.
- -The llvm.memory.barrier intrinsic requires five boolean arguments. - The first four arguments enables a specific barrier as listed below. The - fifth argument specifies that the barrier applies to io or device or uncached - memory.
- -This intrinsic causes the system to enforce some ordering constraints upon - the loads and stores of the program. This barrier does not - indicate when any events will occur, it only enforces - an order in which they occur. For any of the specified pairs of load - and store operations (f.ex. load-load, or store-load), all of the first - operations preceding the barrier will complete before any of the second - operations succeeding the barrier begin. Specifically the semantics for each - pairing is as follows:
- -These semantics are applied with a logical "and" behavior when more than one - is enabled in a single memory barrier intrinsic.
- -Backends may implement stronger barriers than those requested when they do - not support as fine grained a barrier as requested. Some architectures do - not need all types of barriers and on such architectures, these become - noops.
- --%mallocP = tail call i8* @malloc(i32 ptrtoint (i32* getelementptr (i32* null, i32 1) to i32)) -%ptr = bitcast i8* %mallocP to i32* - store i32 4, %ptr - -%result1 = load i32* %ptr ; yields {i32}:result1 = 4 - call void @llvm.memory.barrier(i1 false, i1 true, i1 false, i1 false, i1 true) - ; guarantee the above finishes - store i32 8, %ptr ; before this begins -- -
This is an overloaded intrinsic. You can use llvm.atomic.cmp.swap on - any integer bit width and for different address spaces. Not all targets - support all bit widths however.
- -- declare i8 @llvm.atomic.cmp.swap.i8.p0i8(i8* <ptr>, i8 <cmp>, i8 <val>) - declare i16 @llvm.atomic.cmp.swap.i16.p0i16(i16* <ptr>, i16 <cmp>, i16 <val>) - declare i32 @llvm.atomic.cmp.swap.i32.p0i32(i32* <ptr>, i32 <cmp>, i32 <val>) - declare i64 @llvm.atomic.cmp.swap.i64.p0i64(i64* <ptr>, i64 <cmp>, i64 <val>) -- -
This loads a value in memory and compares it to a given value. If they are - equal, it stores a new value into the memory.
- -The llvm.atomic.cmp.swap intrinsic takes three arguments. The result - as well as both cmp and val must be integer values with the - same bit width. The ptr argument must be a pointer to a value of - this integer type. While any bit width integer may be used, targets may only - lower representations they support in hardware.
- -This entire intrinsic must be executed atomically. It first loads the value - in memory pointed to by ptr and compares it with the - value cmp. If they are equal, val is stored into the - memory. The loaded value is yielded in all cases. This provides the - equivalent of an atomic compare-and-swap operation within the SSA - framework.
- --%mallocP = tail call i8* @malloc(i32 ptrtoint (i32* getelementptr (i32* null, i32 1) to i32)) -%ptr = bitcast i8* %mallocP to i32* - store i32 4, %ptr - -%val1 = add i32 4, 4 -%result1 = call i32 @llvm.atomic.cmp.swap.i32.p0i32(i32* %ptr, i32 4, %val1) - ; yields {i32}:result1 = 4 -%stored1 = icmp eq i32 %result1, 4 ; yields {i1}:stored1 = true -%memval1 = load i32* %ptr ; yields {i32}:memval1 = 8 - -%val2 = add i32 1, 1 -%result2 = call i32 @llvm.atomic.cmp.swap.i32.p0i32(i32* %ptr, i32 5, %val2) - ; yields {i32}:result2 = 8 -%stored2 = icmp eq i32 %result2, 5 ; yields {i1}:stored2 = false - -%memval2 = load i32* %ptr ; yields {i32}:memval2 = 8 -- -
This is an overloaded intrinsic. You can use llvm.atomic.swap on any - integer bit width. Not all targets support all bit widths however.
- -- declare i8 @llvm.atomic.swap.i8.p0i8(i8* <ptr>, i8 <val>) - declare i16 @llvm.atomic.swap.i16.p0i16(i16* <ptr>, i16 <val>) - declare i32 @llvm.atomic.swap.i32.p0i32(i32* <ptr>, i32 <val>) - declare i64 @llvm.atomic.swap.i64.p0i64(i64* <ptr>, i64 <val>) -- -
This intrinsic loads the value stored in memory at ptr and yields - the value from memory. It then stores the value in val in the memory - at ptr.
- -The llvm.atomic.swap intrinsic takes two arguments. Both - the val argument and the result must be integers of the same bit - width. The first argument, ptr, must be a pointer to a value of this - integer type. The targets may only lower integer representations they - support.
- -This intrinsic loads the value pointed to by ptr, yields it, and - stores val back into ptr atomically. This provides the - equivalent of an atomic swap operation within the SSA framework.
- --%mallocP = tail call i8* @malloc(i32 ptrtoint (i32* getelementptr (i32* null, i32 1) to i32)) -%ptr = bitcast i8* %mallocP to i32* - store i32 4, %ptr - -%val1 = add i32 4, 4 -%result1 = call i32 @llvm.atomic.swap.i32.p0i32(i32* %ptr, i32 %val1) - ; yields {i32}:result1 = 4 -%stored1 = icmp eq i32 %result1, 4 ; yields {i1}:stored1 = true -%memval1 = load i32* %ptr ; yields {i32}:memval1 = 8 - -%val2 = add i32 1, 1 -%result2 = call i32 @llvm.atomic.swap.i32.p0i32(i32* %ptr, i32 %val2) - ; yields {i32}:result2 = 8 - -%stored2 = icmp eq i32 %result2, 8 ; yields {i1}:stored2 = true -%memval2 = load i32* %ptr ; yields {i32}:memval2 = 2 -- -
This is an overloaded intrinsic. You can use llvm.atomic.load.add on - any integer bit width. Not all targets support all bit widths however.
- -- declare i8 @llvm.atomic.load.add.i8.p0i8(i8* <ptr>, i8 <delta>) - declare i16 @llvm.atomic.load.add.i16.p0i16(i16* <ptr>, i16 <delta>) - declare i32 @llvm.atomic.load.add.i32.p0i32(i32* <ptr>, i32 <delta>) - declare i64 @llvm.atomic.load.add.i64.p0i64(i64* <ptr>, i64 <delta>) -- -
This intrinsic adds delta to the value stored in memory - at ptr. It yields the original value at ptr.
- -The intrinsic takes two arguments, the first a pointer to an integer value - and the second an integer value. The result is also an integer value. These - integer types can have any bit width, but they must all have the same bit - width. The targets may only lower integer representations they support.
- -This intrinsic does a series of operations atomically. It first loads the - value stored at ptr. It then adds delta, stores the result - to ptr. It yields the original value stored at ptr.
- --%mallocP = tail call i8* @malloc(i32 ptrtoint (i32* getelementptr (i32* null, i32 1) to i32)) -%ptr = bitcast i8* %mallocP to i32* - store i32 4, %ptr -%result1 = call i32 @llvm.atomic.load.add.i32.p0i32(i32* %ptr, i32 4) - ; yields {i32}:result1 = 4 -%result2 = call i32 @llvm.atomic.load.add.i32.p0i32(i32* %ptr, i32 2) - ; yields {i32}:result2 = 8 -%result3 = call i32 @llvm.atomic.load.add.i32.p0i32(i32* %ptr, i32 5) - ; yields {i32}:result3 = 10 -%memval1 = load i32* %ptr ; yields {i32}:memval1 = 15 -- -
This is an overloaded intrinsic. You can use llvm.atomic.load.sub on - any integer bit width and for different address spaces. Not all targets - support all bit widths however.
- -- declare i8 @llvm.atomic.load.sub.i8.p0i32(i8* <ptr>, i8 <delta>) - declare i16 @llvm.atomic.load.sub.i16.p0i32(i16* <ptr>, i16 <delta>) - declare i32 @llvm.atomic.load.sub.i32.p0i32(i32* <ptr>, i32 <delta>) - declare i64 @llvm.atomic.load.sub.i64.p0i32(i64* <ptr>, i64 <delta>) -- -
This intrinsic subtracts delta to the value stored in memory at - ptr. It yields the original value at ptr.
- -The intrinsic takes two arguments, the first a pointer to an integer value - and the second an integer value. The result is also an integer value. These - integer types can have any bit width, but they must all have the same bit - width. The targets may only lower integer representations they support.
- -This intrinsic does a series of operations atomically. It first loads the - value stored at ptr. It then subtracts delta, stores the - result to ptr. It yields the original value stored - at ptr.
- --%mallocP = tail call i8* @malloc(i32 ptrtoint (i32* getelementptr (i32* null, i32 1) to i32)) -%ptr = bitcast i8* %mallocP to i32* - store i32 8, %ptr -%result1 = call i32 @llvm.atomic.load.sub.i32.p0i32(i32* %ptr, i32 4) - ; yields {i32}:result1 = 8 -%result2 = call i32 @llvm.atomic.load.sub.i32.p0i32(i32* %ptr, i32 2) - ; yields {i32}:result2 = 4 -%result3 = call i32 @llvm.atomic.load.sub.i32.p0i32(i32* %ptr, i32 5) - ; yields {i32}:result3 = 2 -%memval1 = load i32* %ptr ; yields {i32}:memval1 = -3 -- -
These are overloaded intrinsics. You can - use llvm.atomic.load_and, llvm.atomic.load_nand, - llvm.atomic.load_or, and llvm.atomic.load_xor on any integer - bit width and for different address spaces. Not all targets support all bit - widths however.
- -- declare i8 @llvm.atomic.load.and.i8.p0i8(i8* <ptr>, i8 <delta>) - declare i16 @llvm.atomic.load.and.i16.p0i16(i16* <ptr>, i16 <delta>) - declare i32 @llvm.atomic.load.and.i32.p0i32(i32* <ptr>, i32 <delta>) - declare i64 @llvm.atomic.load.and.i64.p0i64(i64* <ptr>, i64 <delta>) -- -
- declare i8 @llvm.atomic.load.or.i8.p0i8(i8* <ptr>, i8 <delta>) - declare i16 @llvm.atomic.load.or.i16.p0i16(i16* <ptr>, i16 <delta>) - declare i32 @llvm.atomic.load.or.i32.p0i32(i32* <ptr>, i32 <delta>) - declare i64 @llvm.atomic.load.or.i64.p0i64(i64* <ptr>, i64 <delta>) -- -
- declare i8 @llvm.atomic.load.nand.i8.p0i32(i8* <ptr>, i8 <delta>) - declare i16 @llvm.atomic.load.nand.i16.p0i32(i16* <ptr>, i16 <delta>) - declare i32 @llvm.atomic.load.nand.i32.p0i32(i32* <ptr>, i32 <delta>) - declare i64 @llvm.atomic.load.nand.i64.p0i32(i64* <ptr>, i64 <delta>) -- -
- declare i8 @llvm.atomic.load.xor.i8.p0i32(i8* <ptr>, i8 <delta>) - declare i16 @llvm.atomic.load.xor.i16.p0i32(i16* <ptr>, i16 <delta>) - declare i32 @llvm.atomic.load.xor.i32.p0i32(i32* <ptr>, i32 <delta>) - declare i64 @llvm.atomic.load.xor.i64.p0i32(i64* <ptr>, i64 <delta>) -- -
These intrinsics bitwise the operation (and, nand, or, xor) delta to - the value stored in memory at ptr. It yields the original value - at ptr.
- -These intrinsics take two arguments, the first a pointer to an integer value - and the second an integer value. The result is also an integer value. These - integer types can have any bit width, but they must all have the same bit - width. The targets may only lower integer representations they support.
- -These intrinsics does a series of operations atomically. They first load the - value stored at ptr. They then do the bitwise - operation delta, store the result to ptr. They yield the - original value stored at ptr.
- --%mallocP = tail call i8* @malloc(i32 ptrtoint (i32* getelementptr (i32* null, i32 1) to i32)) -%ptr = bitcast i8* %mallocP to i32* - store i32 0x0F0F, %ptr -%result0 = call i32 @llvm.atomic.load.nand.i32.p0i32(i32* %ptr, i32 0xFF) - ; yields {i32}:result0 = 0x0F0F -%result1 = call i32 @llvm.atomic.load.and.i32.p0i32(i32* %ptr, i32 0xFF) - ; yields {i32}:result1 = 0xFFFFFFF0 -%result2 = call i32 @llvm.atomic.load.or.i32.p0i32(i32* %ptr, i32 0F) - ; yields {i32}:result2 = 0xF0 -%result3 = call i32 @llvm.atomic.load.xor.i32.p0i32(i32* %ptr, i32 0F) - ; yields {i32}:result3 = FF -%memval1 = load i32* %ptr ; yields {i32}:memval1 = F0 -- -
These are overloaded intrinsics. You can use llvm.atomic.load_max, - llvm.atomic.load_min, llvm.atomic.load_umax, and - llvm.atomic.load_umin on any integer bit width and for different - address spaces. Not all targets support all bit widths however.
- -- declare i8 @llvm.atomic.load.max.i8.p0i8(i8* <ptr>, i8 <delta>) - declare i16 @llvm.atomic.load.max.i16.p0i16(i16* <ptr>, i16 <delta>) - declare i32 @llvm.atomic.load.max.i32.p0i32(i32* <ptr>, i32 <delta>) - declare i64 @llvm.atomic.load.max.i64.p0i64(i64* <ptr>, i64 <delta>) -- -
- declare i8 @llvm.atomic.load.min.i8.p0i8(i8* <ptr>, i8 <delta>) - declare i16 @llvm.atomic.load.min.i16.p0i16(i16* <ptr>, i16 <delta>) - declare i32 @llvm.atomic.load.min.i32.p0i32(i32* <ptr>, i32 <delta>) - declare i64 @llvm.atomic.load.min.i64.p0i64(i64* <ptr>, i64 <delta>) -- -
- declare i8 @llvm.atomic.load.umax.i8.p0i8(i8* <ptr>, i8 <delta>) - declare i16 @llvm.atomic.load.umax.i16.p0i16(i16* <ptr>, i16 <delta>) - declare i32 @llvm.atomic.load.umax.i32.p0i32(i32* <ptr>, i32 <delta>) - declare i64 @llvm.atomic.load.umax.i64.p0i64(i64* <ptr>, i64 <delta>) -- -
- declare i8 @llvm.atomic.load.umin.i8.p0i8(i8* <ptr>, i8 <delta>) - declare i16 @llvm.atomic.load.umin.i16.p0i16(i16* <ptr>, i16 <delta>) - declare i32 @llvm.atomic.load.umin.i32.p0i32(i32* <ptr>, i32 <delta>) - declare i64 @llvm.atomic.load.umin.i64.p0i64(i64* <ptr>, i64 <delta>) -- -
These intrinsics takes the signed or unsigned minimum or maximum of - delta and the value stored in memory at ptr. It yields the - original value at ptr.
- -These intrinsics take two arguments, the first a pointer to an integer value - and the second an integer value. The result is also an integer value. These - integer types can have any bit width, but they must all have the same bit - width. The targets may only lower integer representations they support.
- -These intrinsics does a series of operations atomically. They first load the - value stored at ptr. They then do the signed or unsigned min or - max delta and the value, store the result to ptr. They - yield the original value stored at ptr.
- --%mallocP = tail call i8* @malloc(i32 ptrtoint (i32* getelementptr (i32* null, i32 1) to i32)) -%ptr = bitcast i8* %mallocP to i32* - store i32 7, %ptr -%result0 = call i32 @llvm.atomic.load.min.i32.p0i32(i32* %ptr, i32 -2) - ; yields {i32}:result0 = 7 -%result1 = call i32 @llvm.atomic.load.max.i32.p0i32(i32* %ptr, i32 8) - ; yields {i32}:result1 = -2 -%result2 = call i32 @llvm.atomic.load.umin.i32.p0i32(i32* %ptr, i32 10) - ; yields {i32}:result2 = 8 -%result3 = call i32 @llvm.atomic.load.umax.i32.p0i32(i32* %ptr, i32 30) - ; yields {i32}:result3 = 8 -%memval1 = load i32* %ptr ; yields {i32}:memval1 = 30 -- -