draw: improve vertex fetch (v2)

The per-element fetch has quite some calculations which are constant, these can be moved outside both the per-element as well as the main shader loop (llvm can figure out it's constant mostly on its own, however this can have a significant compile time cost). Similarly, it looks easier swapping the fetch loops (outer loop per attrib, inner loop filling up the per vertex elements - this way the aos->soa conversion also can be done per attrib and not just at the end though again this doesn't really make much of a difference in the generated code). (This would also make it possible to vectorize the calculations leading to the fetches.) There's also some minimal change simplifying the overflow math slightly. All in all, the generated code seems to look slightly simpler (depending on the actual vs), but more importantly I've seen a significant reduction in compile times for some vs (albeit with old (3.3) llvm version, and the time reduction is only really for the optimizations run on the IR). v2: adapt to other draw change. No changes with piglit. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
author: Roland Scheidegger <sroland@vmware.com> 2016-10-15 03:53:48 +0200
committer: Roland Scheidegger <sroland@vmware.com> 2016-10-19 01:44:59 +0200
commit: aeceec54a86d26aad165a1ade67a8aba61ae080f (patch)
tree: 9d193becaaf3f48ae023ca881ae504baf5617d03 /src/gallium/auxiliary/gallivm/lp_bld_arit_overflow.c
parent: 0942fe548e935ccc849f44bd920649ef2b93a6a5 (diff)
download: external_mesa3d-aeceec54a86d26aad165a1ade67a8aba61ae080f.zip
external_mesa3d-aeceec54a86d26aad165a1ade67a8aba61ae080f.tar.gz
external_mesa3d-aeceec54a86d26aad165a1ade67a8aba61ae080f.tar.bz2
1 files changed, 24 insertions, 0 deletions
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit_overflow.c b/src/gallium/auxiliary/gallivm/lp_bld_arit_overflow.c
index 91247fd..152ad57 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_arit_overflow.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_arit_overflow.c
@@ -127,6 +127,30 @@ lp_build_uadd_overflow(struct gallivm_state *gallivm,
 }
 
 /**
+ * Performs unsigned subtraction of two integers and reports 
+ * overflow if detected.
+ *
+ * The values @a and @b must be of the same integer type. If
+ * an overflow is detected the IN/OUT @ofbit parameter is used:
+ * - if it's pointing to a null value, the overflow bit is simply
+ *   stored inside the variable it's pointing to,
+ * - if it's pointing to a valid value, then that variable,
+ *   which must be of i1 type, is ORed with the newly detected
+ *   overflow bit. This is done to allow chaining of a number of
+ *   overflow functions together without having to test the 
+ *   overflow bit after every single one.
+ */
+LLVMValueRef
+lp_build_usub_overflow(struct gallivm_state *gallivm,
+                       LLVMValueRef a,
+                       LLVMValueRef b,
+                       LLVMValueRef *ofbit)
+{
+   return build_binary_int_overflow(gallivm, "llvm.usub.with.overflow",
+                                    a, b, ofbit);
+}
+
+/**
  * Performs unsigned multiplication of  two integers and 
  * reports overflow if detected.
  *
author	Roland Scheidegger <sroland@vmware.com>	2016-10-15 03:53:48 +0200
committer	Roland Scheidegger <sroland@vmware.com>	2016-10-19 01:44:59 +0200
commit	aeceec54a86d26aad165a1ade67a8aba61ae080f (patch)
tree	9d193becaaf3f48ae023ca881ae504baf5617d03 /src/gallium/auxiliary/gallivm/lp_bld_arit_overflow.c
parent	0942fe548e935ccc849f44bd920649ef2b93a6a5 (diff)
download	external_mesa3d-aeceec54a86d26aad165a1ade67a8aba61ae080f.zip external_mesa3d-aeceec54a86d26aad165a1ade67a8aba61ae080f.tar.gz external_mesa3d-aeceec54a86d26aad165a1ade67a8aba61ae080f.tar.bz2