summaryrefslogtreecommitdiffstats
path: root/src/gallium/auxiliary/gallivm
Commit message (Collapse)AuthorAgeFilesLines
* draw: improve vertex fetch (v2)Roland Scheidegger2016-10-192-0/+30
| | | | | | | | | | | | | | | | | | | | | | | The per-element fetch has quite some calculations which are constant, these can be moved outside both the per-element as well as the main shader loop (llvm can figure out it's constant mostly on its own, however this can have a significant compile time cost). Similarly, it looks easier swapping the fetch loops (outer loop per attrib, inner loop filling up the per vertex elements - this way the aos->soa conversion also can be done per attrib and not just at the end though again this doesn't really make much of a difference in the generated code). (This would also make it possible to vectorize the calculations leading to the fetches.) There's also some minimal change simplifying the overflow math slightly. All in all, the generated code seems to look slightly simpler (depending on the actual vs), but more importantly I've seen a significant reduction in compile times for some vs (albeit with old (3.3) llvm version, and the time reduction is only really for the optimizations run on the IR). v2: adapt to other draw change. No changes with piglit. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* gallivm: print out time for jitting functions with GALLIVM_DEBUG=perfRoland Scheidegger2016-10-191-0/+11
| | | | | | | | Compilation to actual machine code can easily take as much time as the optimization passes on the IR if not more, so print this out too. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* gallivm: Use native packs and unpacks for the lerpsRoland Scheidegger2016-10-193-13/+156
| | | | | | | | | | | | | | | | | | | | | | | | For the texturing packs, things looked pretty terrible. For every lerp, we were repacking the values, and while those look sort of cheap with 128bit, with 256bit we end up with 2 of them instead of just 1 but worse, plus 2 extracts too (the unpack, however, works fine with a single instruction, albeit only with llvm 3.8 - the vpmovzxbw). Ideally we'd use more clever pack for llvmpipe backend conversion too since we actually use the "wrong" shuffle (which is more work) when doing the fs twiddle just so we end up with the wrong order for being able to do native pack when converting from 2x8f -> 1x16b. But this requires some refactoring, since the untwiddle is separate from conversion. This is only used for avx2 256bit pack/unpack for now. Improves openarena scores by 8% or so, though overall it's still pretty disappointing how much faster 256bit vectors are even with avx2 (or rather, aren't...). And, of course, eliminating the needless packs/unpacks in the first place would eliminate most of that advantage (not quite all) from this patch. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* gallivm: Use AVX2 gather instrinsics.Jose Fonseca2016-10-041-0/+95
| | | | | | v2: Use AVX2 gather for non aligned loads too. Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* gallivm: Use 8 wide AoS sampling on AVX2.Roland Scheidegger2016-10-041-5/+6
| | | | | | | | | | v2: Make sure that with num_lods > 1 and min_filter != mag_filter we still enter the splitting path. So this case would still use 4-wide aos path (as a side note, the 4-wide aos sampling path could actually be improved quite a bit if we have avx2, by just doing the filtering with 256bit vectors). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* gallivm: Basic AVX2 support.José Fonseca2016-10-044-28/+98
| | | | | | v2: pblendb -> pblendvb Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* gallivm: support negation on 64-bit integersNicolai Hähnle2016-09-211-0/+4
| | | | | | | This should be analogous to 32-bit integers. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallivm/llvmpipe: prepare support for ARB_gpu_shader_int64.Dave Airlie2016-09-214-4/+498
| | | | | | | | | | | | | | | | This enables 64-bit integer support in gallivm and llvmpipe. v2: add conversion opcodes. v3: - PIPE_CAP_INT64 is not there yet - restrict DIV/MOD defaults to the CPU, as for 32 bits - TGSI_OPCODE_I2U64 becomes TGSI_OPCODE_U2I64 Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallivm: add lp_build_alloca_undefNicolai Hähnle2016-08-172-0/+24
| | | | | Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* gallivm: add create_builder_at_entry helper functionNicolai Hähnle2016-08-171-23/+22
| | | | | | | Reduces code duplication. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* util: Move format_rgb9e5.h to src/utilJason Ekstrand2016-08-051-1/+1
| | | | | | | It's used from both mesa main and gallium. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* gallivm: add helper lp_add_attr_dereferenceableMarek Olšák2016-07-132-0/+14
| | | | | | | | | Not sure if this is the right way to do it, but it seems to work. v2: make it a no-op on LLVM <= 3.5 Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallivm: set LLVMNoUnwindAttribute on all intrinsicsMarek Olšák2016-07-111-2/+4
| | | | | | | | | RadeonSI stats: Mostly 0% difference, but Valley shows a small improvement: Application Files SGPRs VGPRs SpillSGPR SpillVGPR Code Size LDS Max Waves Waits unigine_valley 278 0.00 % -0.29 % 0.00 % 0.00 % 0.01 % 0.00 % 0.17 % 0.00 % Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* gallivm: don't use integer min/max sse intrinsics with llvm >= 3.9Roland Scheidegger2016-06-201-2/+4
| | | | | | | | | | | | | | | | | | | Apparently, these are deprecated. There's some AutoUpgrade feature which is supposed to promote these to cmp/select, which apparently doesn't work with jit code. It is possible it's not actually even meant to work (see the bug filed against llvm which couldn't provide an answer neither) but in any case this is meant to be only temporary unless the intrinsics are really illegal. So, just use the fallback code (which should be cmp/select, we're actually doing cmp/sext/trunc/select, but in any case llvm 3.9 manages to optimize this back to pmin/pmax in the end). This addresses https://llvm.org/bugs/show_bug.cgi?id=28176 CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Tested-by: Vinson Lee <vlee@freedesktop.org> Tested-by: Aaron Watry <awatry@gmail.com>
* gallivm: Fix trivial sign warningsJan Vesely2016-06-138-21/+22
| | | | | | | v2: include whitespace fixes Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* gallivm: more 64-bit integer prep work.Dave Airlie2016-06-111-8/+8
| | | | | | | This converts one other place to using the new helper. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
* gallivm: make non-float return code bitcast consistent.Dave Airlie2016-06-111-12/+6
| | | | | | | | This just uses the same form across the fetches. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
* gallium/gallivm: use 64-bit test instead of doubles.Dave Airlie2016-06-111-37/+36
| | | | | | | | | This just makes some generic code that currently emits double suitable for emitting 64-bit values. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
* gallivm: Never emit llvm.fmuladd on LLVM 3.3.Jose Fonseca2016-06-102-1/+7
| | | | | | | | Besides the old JIT bug, it seems the X86 backend on LLVM 3.3 doesn't handle llvm.fmuladd and instead it fall backs to a C function. Which in turn causes a segfault on Windows. Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* gallivm: Use llvm.fmuladd.*.Jose Fonseca2016-06-105-47/+87
| | | | Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* util,gallivm: Explicitly enable/disable fma attribute.Jose Fonseca2016-06-102-0/+11
| | | | | | | | | | As suggested by Roland Scheidegger. Use the same logic as f16c, since fma requires VEX encoding. But disable FMA on LLVM 3.3 without MCJIT. Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* gallivm: initialize init_native_targets_once_flag correctlyFrederic Devernay2016-05-301-1/+1
| | | | Signed-off-by: Marek Olšák <marek.olsak@amd.com>
* gallivm: eliminate a unnecessary AND with unorm lerpsRoland Scheidegger2016-05-271-10/+35
| | | | | | | | | Instead of doing a add and then mask out the upper bits, we can simply do a add with a half wide type (this, of course, assumes the hw can actually do it...), so we'll get the required zero in the upper bits automatically. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* gallivm: improve dumping of bitcodeRoland Scheidegger2016-05-112-4/+9
| | | | | | | | | Use GALLIVM_DEBUG=dumpbc for dumping of modules as bitcode. Instead of a fixed llvmpipe.bc name, use ir_<modulename>.bc so multiple modules can be dumped (albeit it might still overwrite previous modules, particularly the modules from draw tend to always have the same name). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* gallivm: print declarations of intrinsics with GALLIVM_DEBUG=irRoland Scheidegger2016-05-101-0/+5
| | | | | | | Those aren't really interesting, however outputting them is helpful when trying to feed the IR to llvm llc (or opt) for debugging. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* gallivm: use InternalLinkage instead of PrivateLinkage for texture functionsRoland Scheidegger2016-05-101-1/+1
| | | | | | | At least with MCJIT the disassembler will crash otherwise when trying to disassemble such functions. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* gallivm: disable avx512 featuresRoland Scheidegger2016-05-101-0/+12
| | | | | | | | | | | | | | | We don't target this yet, and some llvm versions incorrectly enable it based on cpu string, causing crashes. (Albeit this is a losing battle, it is pretty much guaranteed when the next new feature comes along llvm will mistakenly enable it on some future cpu, thus we would have to proactively disable all new features as llvm adds them.) This should fix https://bugs.freedesktop.org/show_bug.cgi?id=94291 (untested) Tested-by: Timo Aaltonen <tjaalton@ubuntu.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com CC: <mesa-stable@lists.freedesktop.org>
* gallium: enable intel jitevents profilingTim Rowley2016-05-091-0/+9
| | | | | | | | LLVM when configured with "intel jitevents" enabled can inform VTune about dynamic code, so individual shaders are attributed profiling data and the resulting assembly can be examined. Acked-by: Roland Scheidegger <sroland@vmware.com>
* gallivm: s/Elements/ARRAY_SIZE/Brian Paul2016-04-279-29/+29
| | | | Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* gallivm: make sampling more robust against bogus coordinatesRoland Scheidegger2016-04-263-13/+43
| | | | | | | | | | | | | | | | | | | | | | Some cases (especially these using fract for coord wrapping) did not handle NaNs (or Infs) correctly - the following code assumed the fract result could not be outside [0,1], but if the input is a NaN (or +-Inf) the fract result was NaN - which then could produce out-of-bound offsets. (Note that the explicit NaN behavior changes for min/max on x86 sse don't result in actual changes in the generated jit code, but may on other architectures. Found by looking through all the wrap functions.) This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94955 No piglit changes. (v2: fix min/max typo in coord_mirror, add comment) Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org> Tested-by: Bruce Cherniak <bruce.cherniak@intel.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* gallium: use unreachable instead of assertsGrazvydas Ignotas2016-04-251-1/+1
| | | | | | | Avoids warnings in release builds. Signed-off-by: Grazvydas Ignotas <notasas@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
* gallium: use PIPE_SHADER_* everywhere, remove TGSI_PROCESSOR_*Marek Olšák2016-04-221-3/+3
| | | | Acked-by: Jose Fonseca <jfonseca@vmware.com>
* gallium: merge PIPE_SWIZZLE_* and UTIL_FORMAT_SWIZZLE_*Marek Olšák2016-04-226-45/+45
| | | | | | | | Use PIPE_SWIZZLE_* everywhere. Use X/Y/Z/W/0/1 instead of RED, GREEN, BLUE, ALPHA, ZERO, ONE. The new enum is called pipe_swizzle. Acked-by: Jose Fonseca <jfonseca@vmware.com>
* gallivm: fix bogus argument order to lp_build_sample_mipmap functionRoland Scheidegger2016-04-211-2/+2
| | | | | | | | | | | | | | | Screwed up since 0753b135f6e83b171d8a1b08aea967374f3542bc. (Only an issue with different min/mag filters, and then only in some cases, which is probably why it went unnoticed for quite a while. The effect should have simply been nearest mip filter instead of linear, iff min was nearest, mag was linear, and all pixels hit the mignifying path.) Fixes a bunch of dEQP failures. Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
* gallivm: Avoid llvm::sys::getProcessTriple().Jose Fonseca2016-04-191-3/+3
| | | | | | | Just use LLVM_HOST_TRIPLE, which is available at least from LLVM 3.3 onwards, and is pretty much what llvm::sys::getProcessTriple() does anyway, Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* gallivm: Remove lp_get_module_id.Jose Fonseca2016-04-194-12/+15
| | | | | | Just keep a copy of the module_name in gallivm. Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* gallivm: Fix MCJIT with LLVM 3.3.Jose Fonseca2016-04-191-3/+3
| | | | | | | | | | | | One needs to call setJITMemoryManager for LLVM 3.3, instead of setMCJITMemoryManager. This regressed in commits 065256df/75ad4fe7 when trying to make the code to build with LLVM 3.6. Tested MCJIT with LLVM 3.3 to 3.6. Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* gallivm: Make MCJIT a runtime option.Jose Fonseca2016-04-191-75/+72
| | | | | | | | | | | | | On the LLVM versions that support it, so we can easily switch between MCJIT/old-jit for testing. The new option is GALLIVM_MCJIT. Unfortunately setting GALLIVM_MCJIT=1 for LLVM 3.3 or 3.4 causes segfault, both on Linux and Windows. I'm almost certain this used to work, so there probably is a regression somewhere. Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* gallivm: Use LLVMSetTarget.Jose Fonseca2016-04-191-3/+9
| | | | | | Instead of LLVM C++ interfaces. Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* gallivm: Use LLVMPrintValueToString where available.Jose Fonseca2016-04-191-35/+10
| | | | | | | | | | | | | And llvm::raw_string_ostream where not (LLVM 3.3). Thereby eliminating yet another dependency on unstable LLVM interfaces. As a bonus this also gets LLVM IR on OutputDebugMessageA on MSVC (which was disabled, probably due to C++ issues.) Tested `lp_test_arit -v -v` on LLVM 3.3, 3.4 and 3.8. Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* gallivm: convert size query to using a set of parameters.Dave Airlie2016-04-194-43/+39
| | | | | | | | | | This isn't currently that easy to expand, so fix it up before expanding it later to include dynamic samplers. [airlied: use some local variables (Roland)] Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
* gallivm: don't use vector selects with llvm 3.7Roland Scheidegger2016-04-181-3/+5
| | | | | | | | | | llvm 3.7 sometimes simply miscompiles vector selects. See https://bugs.freedesktop.org/show_bug.cgi?id=94972 This was fixed in llvm r249669 (https://llvm.org/bugs/show_bug.cgi?id=24532). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* gallivm: Workaround LLVM PR 27332.Jose Fonseca2016-04-131-3/+14
| | | | | | | | | | | | | The credit for finding and isolating this bug goes to Vinson and Roland. The buggy LLVM versions were found by doing opt -instcombine llvm-pr27332.ll > /dev/null where llvm-pr27332.ll is the IR from https://llvm.org/bugs/show_bug.cgi?id=27332#c3 Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* gallivm: use llvm.nearbyint instead of llvm.round.Roland Scheidegger2016-04-131-98/+1
| | | | | | | | | | | | | | | | We used to use sse roundps intrinsic directly, but switched to use the llvm intrinsics for rounding with e4f01da15d8c6ce3e8c77ff3ff3d2ce2574a3f7b. However, llvm semantics follows standard math lib round function which is specced to do roundNearestAwayFromZero but we really want roundNearestEven (moreoever, using round generates atrocious code since the cpu can't do it directly and it results in scalar calls to libm __roundf). So, use llvm.nearbyint instead, which does exactly the right thing, and even has the advantage of being available with llvm 3.3 too. (I've verified it actually generates a roundps instruction with llvm 3.3.) This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94909 Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* gallivm: Introduce lp_format_intrinsic.Jose Fonseca2016-04-043-14/+54
| | | | | | | | | | For adding .v4f32 like suffixes to intrinsics, taking special care for scalar case, which was being often neglected. This fixes invalid IR when doing mipmap filtering on SSE2 (the only case where we'd use intrinsics with scalars.) Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* gallivm: Use llvm.fabs.Jose Fonseca2016-04-031-8/+3
| | | | | | Exactly the same code. Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* gallivm: Prefer backend agnostic intrinsic for rounding.Jose Fonseca2016-04-031-7/+39
| | | | | | | | | We could unconditionally use these instrinsics, but performance with SSE2 would suck, as LLVM falls back to calling libm. lp_test_arit. Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* gallivm: Add debug option to force SSE2.Jose Fonseca2016-04-031-11/+14
| | | | | | For simulating less capable machines. Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* gallivm: Fix performance regressions due to vector selects.Jose Fonseca2016-04-031-22/+18
| | | | | | | | | LLVM often can't determine the mask elements are all ones/zeros, and there doesn't seem to be a good way to hint that. Thanks to Roland Scheidegger for spotting and analyzing the issue. Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* gallivm: Remove lp_build_load_volatile.Jose Fonseca2016-04-032-12/+0
| | | | | | | No longer needed. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>