summaryrefslogtreecommitdiffstats
path: root/src/compiler/nir/nir_opt_algebraic.py
Commit message (Collapse)AuthorAgeFilesLines
* nir: Rely on the fact that bcsel takes a well formed boolean.Kenneth Graunke2016-08-191-3/+3
| | | | | | | | | | | | | | | According to Connor, it's safe to assume that the first operand of bcsel, as well as the operand of b2f and b2i, must be well formed booleans. https://lists.freedesktop.org/archives/mesa-dev/2016-August/125658.html With the previous improvements to a@bool handling, this now has no change in shader-db instruction counts on Broadwell. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* nir/algebraic: Optimize common array indexing sequenceIan Romanick2016-08-171-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some shaders include code that looks like: uniform int i; uniform vec4 bones[...]; foo(bones[i * 3], bones[i * 3 + 1], bones[i * 3 + 2]); CSE would do some work on this: x = i * 3 foo(bones[x], bones[x + 1], bones[x + 2]); The compiler may then add '<< 4 + base' to the index calculations. This results in expressions like x = i * 3 foo(bones[x << 4], bones[(x + 1) << 4], bones[(x + 2) << 4]); Just rearranging the math to produce (i * 48) + 16 saves an instruction, and it allows CSE to do more work. x = i * 48; foo(bones[x], bones[x + 16], bones[x + 32]); So, ~6 instructions becomes ~3. Some individual shader-db results look pretty bad. However, I have a really, really hard time believing the change in estimated cycles in, for example, 3dmmes-taiji/51.shader_test after looking that change in the generated code. G45 total instructions in shared programs: 4020840 -> 4010070 (-0.27%) instructions in affected programs: 177460 -> 166690 (-6.07%) helped: 894 HURT: 0 total cycles in shared programs: 98829000 -> 98784990 (-0.04%) cycles in affected programs: 3936648 -> 3892638 (-1.12%) helped: 894 HURT: 0 Ironlake total instructions in shared programs: 6418887 -> 6408117 (-0.17%) instructions in affected programs: 177460 -> 166690 (-6.07%) helped: 894 HURT: 0 total cycles in shared programs: 143504542 -> 143460532 (-0.03%) cycles in affected programs: 3936648 -> 3892638 (-1.12%) helped: 894 HURT: 0 Sandy Bridge total instructions in shared programs: 8357887 -> 8339251 (-0.22%) instructions in affected programs: 432715 -> 414079 (-4.31%) helped: 2795 HURT: 0 total cycles in shared programs: 118284184 -> 118207412 (-0.06%) cycles in affected programs: 6114626 -> 6037854 (-1.26%) helped: 2478 HURT: 317 Ivy Bridge total instructions in shared programs: 7669390 -> 7653822 (-0.20%) instructions in affected programs: 388234 -> 372666 (-4.01%) helped: 2795 HURT: 0 total cycles in shared programs: 68381982 -> 68263684 (-0.17%) cycles in affected programs: 1972658 -> 1854360 (-6.00%) helped: 2458 HURT: 307 Haswell total instructions in shared programs: 7082636 -> 7067068 (-0.22%) instructions in affected programs: 388234 -> 372666 (-4.01%) helped: 2795 HURT: 0 total cycles in shared programs: 68282020 -> 68164158 (-0.17%) cycles in affected programs: 1891820 -> 1773958 (-6.23%) helped: 2459 HURT: 261 Broadwell total instructions in shared programs: 9002466 -> 8985875 (-0.18%) instructions in affected programs: 658784 -> 642193 (-2.52%) helped: 2795 HURT: 5 total cycles in shared programs: 78503092 -> 78450404 (-0.07%) cycles in affected programs: 2873304 -> 2820616 (-1.83%) helped: 2275 HURT: 415 Skylake total instructions in shared programs: 9156978 -> 9140387 (-0.18%) instructions in affected programs: 682625 -> 666034 (-2.43%) helped: 2795 HURT: 5 total cycles in shared programs: 75591392 -> 75550574 (-0.05%) cycles in affected programs: 3192120 -> 3151302 (-1.28%) helped: 2271 HURT: 425 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* nir/algebraic: Optimize fabs(u2f(x))Ian Romanick2016-07-191-0/+1
| | | | | | | | | | I noticed this when I tried to do frexp(float(some_unsigned)) in the ir_unop_find_lsb lowering pass. The code generated for frexp() uses fabs, and this resulted in an extra instruction. Ultimately I ended up not using frexp. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* nir: Add optimization for (a || True == True)Eric Anholt2016-07-121-0/+1
| | | | | | | | | | | | This was appearing in vc4 VS/CS in mupen64, due to vertex attrib lowering producing some constants that were getting compared. total instructions in shared programs: 112276 -> 112198 (-0.07%) instructions in affected programs: 2239 -> 2161 (-3.48%) total estimated cycles in shared programs: 283102 -> 283038 (-0.02%) estimated cycles in affected programs: 2365 -> 2301 (-2.71%) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* nir/algebraic: Remove imprecise flog2 optimizationsJason Ekstrand2016-06-201-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | While mathematically correct, these two optimizations result in an expression with substantially lower precision than the original. For any positive finite floating-point value, log2(x) is well-defined and finite. More precisely, it is in the range [-150, 150] so any sum of logarithms log2(a) + log2(b) is also well-defined and finite as long as a and b are both positive and finite. However, if a and b are either very small or very large, their product may get flushed to infinity or zero causing log2(a * b) to be nowhere close to log2(a) + log2(b). This imprecision was causing incorrect rendering in Talos Principal because part of its HDR rendering process involves doing 8 texture operations, clamping the result to [0, 65000], taking a dot-product with a constant, and then taking the log2. This is done 6 or 8 times and summed to produce the final result which is written to a red texture. In cases where you have a region of the screen that is very dark, it can end up getting a result value of -inf which is not what is intended. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96425 Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
* nir/algebraic: support for power-of-two optimizationsRob Clark2016-06-031-3/+12
| | | | | | | | | | | | Some optimizations, like converting integer multiply/divide into left/ right shifts, have additional constraints on the search expression. Like requiring that a variable is a constant power of two. Support these cases by allowing a fxn name to be appended to the search var expression (ie. "a#32(is_power_of_two)"). Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* nir/algebraic: Separate ffma lowering from fusingJason Ekstrand2016-05-111-1/+1
| | | | | | | | The i965 driver has its own pass for fusing mul+add combinations that's much smarter than what nir_opt_algebraic can do so we don't want to get the nir_opt_algebraic one just because we didn't set lower_ffma. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* nir: Separate 32 and 64-bit fmod loweringSamuel Iglesias Gonsálvez2016-05-041-2/+3
| | | | | | | | Split 32-bit and 64-bit fmod lowering as the drivers might need to lower them separately inside NIR depending on the HW support. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* nir/algebraic: Support lowering for both 64 and 32-bit ldexpJason Ekstrand2016-04-281-9/+22
| | | | Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
* nir/opcodes: Make ldexp take an explicitly 32-bit intJason Ekstrand2016-04-281-1/+1
| | | | | | | | | There is no sense in having the double version of ldexp take a 64-bit integer. Instead, let's just take a 32-bit int all the time. This also matches what GLSL does where both variants of ldexp take a regular integer for the exponent argument. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
* nir: Add lrp lowering for doubles in opt_algebraicSamuel Iglesias Gonsálvez2016-04-281-3/+6
| | | | | | | | | | | | | | | Some hardware (i965 on Broadwell generation, for example) does not support natively the execution of lrp instruction with double arguments. Add 'lower_flrp64' flag to lower this instruction in that case. v2: - Rename lower_flrp_double to lower_flrp64 (Jason) - Fix typo (Jason) - Adapt the code to define bit_size information in the opcodes. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* nir: rename lower_flrp to lower_flrp32Samuel Iglesias Gonsálvez2016-04-281-6/+6
| | | | | | | A later patch will add lower_flrp64 option to NIR. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* nir/opt_algebraic: Fix some expressions with ambiguous bit sizesJason Ekstrand2016-04-271-3/+3
| | | | | Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* nir/search: Respect the bit_size parameter on nir_search_valueJason Ekstrand2016-04-271-1/+4
| | | | | Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* nir/algebraic: Add a mechanism for specifying the bit size of a valueJason Ekstrand2016-04-271-0/+4
| | | | Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* nir/algebraic: Add lowering for ldexpJason Ekstrand2016-04-131-0/+31
| | | | | | | | | The algorithm used is different from both the naive suggestion from the GLSL spec and the one used in GLSL IR today. Unfortunately, the GLSL IR implementation that we have today doesn't handle denormals (for those that care) or the case where the float source is +-inf. Reviewed-by: Matt Turner <mattst88@gmail.com>
* nir: Add more modulus opcodesJason Ekstrand2016-04-131-0/+1
| | | | | | | These are all needed for SPIR-V Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* nir: Merge redudant integer clamping.Markus Wick2016-04-111-1/+4
| | | | | | | | | Dolphin uses them a lot. Range tracking would be better in the long term, but this two lines works fine for now. Signed-off-by: Markus Wick <markus@selfnet.de> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* nir: Do basic constant reassociation.Kenneth Graunke2016-04-111-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many shaders contain expression trees of the form: const_1 * (value * const_2) Reorganizing these to (const_1 * const_2) * value will allow constant folding to combine the constants. Sometimes, these constants are 2 and 0.5, so we can remove a multiply altogether. Other times, it can create more immediate constants, which can actually hurt. Finding a good balance here is tricky. While much more could be done, this simple patch seems to have a lot of positive benefit while having a low downside. shader-db results on Broadwell: total instructions in shared programs: 8963768 -> 8961369 (-0.03%) instructions in affected programs: 438318 -> 435919 (-0.55%) helped: 1502 HURT: 245 total cycles in shared programs: 71527354 -> 71421516 (-0.15%) cycles in affected programs: 11541788 -> 11435950 (-0.92%) helped: 3445 HURT: 1224 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* nir: Simplify a bcsel to logical-orIan Romanick2016-03-311-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Oddly, this did not affect the shader where I first noticed the pattern. That particular shader doesn't get its if-statement converted to a bcsel because there are two assignments in the else-statement. This led to me submitting https://bugs.freedesktop.org/show_bug.cgi?id=94747. shader-db results: Sandy Bridge total instructions in shared programs: 8467384 -> 8467069 (-0.00%) instructions in affected programs: 36594 -> 36279 (-0.86%) helped: 46 HURT: 0 total cycles in shared programs: 117573448 -> 117568518 (-0.00%) cycles in affected programs: 339114 -> 334184 (-1.45%) helped: 46 HURT: 0 Ivy Bridge / Haswell / Broadwell / Skylake: total instructions in shared programs: 7774258 -> 7773999 (-0.00%) instructions in affected programs: 30874 -> 30615 (-0.84%) helped: 46 HURT: 0 total cycles in shared programs: 65739190 -> 65734530 (-0.01%) cycles in affected programs: 180380 -> 175720 (-2.58%) helped: 45 HURT: 1 No change on G45 or Ironlake. I also tried these expressions, but none of them affected any shaders in shader-db: (('bcsel', a, 'a@bool', 'b@bool'), ('ior', a, b)), (('bcsel', a, 'b@bool', False), ('iand', a, b)), (('bcsel', a, 'b@bool', 'a@bool'), ('iand', a, b)), Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* nir: Fix typo from commit 6702f1acde9.Matt Turner2016-03-301-1/+1
|
* nir: Propagate negates up multiplication chains.Matt Turner2016-03-301-0/+4
| | | | | | | | | | total instructions in shared programs: 7112159 -> 7088092 (-0.34%) instructions in affected programs: 1374915 -> 1350848 (-1.75%) helped: 7392 HURT: 621 GAINED: 2 LOST: 2
* nir/algebraic: Flag inexact optimizationsJason Ekstrand2016-03-231-59/+62
| | | | | | | | | | Many of our optimizations, while great for cutting shaders down to size, aren't really precision-safe. This commit tries to flag all of the inexact floating-point optimizations so they don't get run on values that are flagged "exact". It's a bit conservative and maybe flags some safe optimizations as unsafe but that's better than missing one. Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* nir/algebraic: Fix fmin detection to match the specJason Ekstrand2016-03-231-1/+1
| | | | | | | The previous transformation got the arguments to fmin backwards. When NaNs are involved, the GLSL min/max aren't commutative so it matters. Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* nir/algebraic: Get rid of an invlid fxor optimizationJason Ekstrand2016-03-231-1/+0
| | | | | | | The fxor opcode is required to return 1.0f or 0.0f but the input variable may not be 1.0f or 0.0f. Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* nir/algebraic: Allow for flagging operations as being inexactJason Ekstrand2016-03-231-1/+8
| | | | Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* nir: Don't abs slt and friendsIan Romanick2016-03-221-0/+4
| | | | | | | No shader-db changes, but this is symmetric with the previous commit. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* nir: Don't abs the result of b2f or b2iIan Romanick2016-03-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the results below, 2 SIMD16 shaders in Trine are lost. G4X total instructions in shared programs: 4012279 -> 4011108 (-0.03%) instructions in affected programs: 116776 -> 115605 (-1.00%) helped: 339 HURT: 0 total cycles in shared programs: 84315862 -> 84313584 (-0.00%) cycles in affected programs: 1767232 -> 1764954 (-0.13%) helped: 274 HURT: 81 Ironlake total instructions in shared programs: 6399073 -> 6396998 (-0.03%) instructions in affected programs: 218050 -> 215975 (-0.95%) helped: 600 HURT: 0 total cycles in shared programs: 128892088 -> 128888810 (-0.00%) cycles in affected programs: 2867452 -> 2864174 (-0.11%) helped: 422 HURT: 137 Sandy Bridge total instructions in shared programs: 8462174 -> 8460759 (-0.02%) instructions in affected programs: 178529 -> 177114 (-0.79%) helped: 596 HURT: 0 total cycles in shared programs: 117542276 -> 117534098 (-0.01%) cycles in affected programs: 1239166 -> 1230988 (-0.66%) helped: 369 HURT: 150 Ivy Bridge total instructions in shared programs: 7775131 -> 7773410 (-0.02%) instructions in affected programs: 162903 -> 161182 (-1.06%) helped: 590 HURT: 0 total cycles in shared programs: 65759882 -> 65747268 (-0.02%) cycles in affected programs: 1004354 -> 991740 (-1.26%) helped: 467 HURT: 141 Haswell total instructions in shared programs: 7107786 -> 7106327 (-0.02%) instructions in affected programs: 140954 -> 139495 (-1.04%) helped: 590 HURT: 0 total cycles in shared programs: 64668028 -> 64655322 (-0.02%) cycles in affected programs: 967080 -> 954374 (-1.31%) helped: 452 HURT: 149 LOST: 2 GAINED: 0 Broadwell total instructions in shared programs: 8980029 -> 8978287 (-0.02%) instructions in affected programs: 197232 -> 195490 (-0.88%) helped: 715 HURT: 0 total cycles in shared programs: 70070448 -> 70055970 (-0.02%) cycles in affected programs: 975724 -> 961246 (-1.48%) helped: 471 HURT: 111 LOST: 2 GAINED: 0 Skylake total instructions in shared programs: 9115178 -> 9113436 (-0.02%) instructions in affected programs: 203012 -> 201270 (-0.86%) helped: 715 HURT: 0 total cycles in shared programs: 68848660 -> 68834004 (-0.02%) cycles in affected programs: 993888 -> 979232 (-1.47%) helped: 473 HURT: 116 LOST: 2 GAINED: 0 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* nir: Simplify 0 < fabs(a)Ian Romanick2016-03-221-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sandy Bridge / Ivy Bridge / Haswell total instructions in shared programs: 8462180 -> 8462174 (-0.00%) instructions in affected programs: 564 -> 558 (-1.06%) helped: 6 HURT: 0 total cycles in shared programs: 117542462 -> 117542276 (-0.00%) cycles in affected programs: 9768 -> 9582 (-1.90%) helped: 12 HURT: 0 Broadwell / Skylake total instructions in shared programs: 8980833 -> 8980826 (-0.00%) instructions in affected programs: 626 -> 619 (-1.12%) helped: 7 HURT: 0 total cycles in shared programs: 70077900 -> 70077714 (-0.00%) cycles in affected programs: 9378 -> 9192 (-1.98%) helped: 12 HURT: 0 G45 and Ironlake showed no change. v2: Modify the comments to look more like a proof. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* nir: Simplify 0 >= b2f(a)Ian Romanick2016-03-221-0/+7
| | | | | | | | | | | | | | | | | | | | | | | This also prevented some regressions with other patches in my local tree. Broadwell / Skylake total instructions in shared programs: 8980835 -> 8980833 (-0.00%) instructions in affected programs: 45 -> 43 (-4.44%) helped: 1 HURT: 0 total cycles in shared programs: 70077904 -> 70077900 (-0.00%) cycles in affected programs: 122 -> 118 (-3.28%) helped: 1 HURT: 0 No changes on earlier platforms. v2: Modify the comments to look more like a proof. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* nir: Simplify i2b with negated or abs operandIan Romanick2016-03-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This enables removing ssa_201 and ssa_202 in sequences like: vec1 ssa_200 = flt ssa_199, ssa_194 vec1 ssa_201 = b2i ssa_200 vec1 ssa_202 = i2b -ssa_201 shader-db results: Sandy Bridge total instructions in shared programs: 8462257 -> 8462180 (-0.00%) instructions in affected programs: 3846 -> 3769 (-2.00%) helped: 35 HURT: 0 total cycles in shared programs: 117542934 -> 117542462 (-0.00%) cycles in affected programs: 20072 -> 19600 (-2.35%) helped: 20 HURT: 1 Ivy Bridge total instructions in shared programs: 7775252 -> 7775137 (-0.00%) instructions in affected programs: 3645 -> 3530 (-3.16%) helped: 35 HURT: 0 total cycles in shared programs: 65760522 -> 65760068 (-0.00%) cycles in affected programs: 21082 -> 20628 (-2.15%) helped: 25 HURT: 2 Haswell total instructions in shared programs: 7108666 -> 7108589 (-0.00%) instructions in affected programs: 3253 -> 3176 (-2.37%) helped: 35 HURT: 0 total cycles in shared programs: 64675726 -> 64675272 (-0.00%) cycles in affected programs: 21034 -> 20580 (-2.16%) helped: 26 HURT: 1 Broadwell / Skylake total instructions in shared programs: 8980912 -> 8980835 (-0.00%) instructions in affected programs: 3223 -> 3146 (-2.39%) helped: 35 HURT: 0 total cycles in shared programs: 70077926 -> 70077904 (-0.00%) cycles in affected programs: 21886 -> 21864 (-0.10%) helped: 21 HURT: 6 G45 and Ironlake showed no change. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* nir: Lower flrp with Boolean interpolator to bcselIan Romanick2016-03-221-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | On Intel platforms that don't set lower_flrp, using bcsel instead of flrp seems to be a small amount worse. On those platforms, the use of flrp, bcsel, and multiply of b2f is still an active area of research. In review, Matt suggested this is because bcsel turns into CMP+SEL, and because of the flag register we can't schedule instructions well. shader-db results: G4X / Ironlake total instructions in shared programs: 4016538 -> 4012279 (-0.11%) instructions in affected programs: 161556 -> 157297 (-2.64%) helped: 1077 HURT: 1 total cycles in shared programs: 84328296 -> 84315862 (-0.01%) cycles in affected programs: 4174570 -> 4162136 (-0.30%) helped: 926 HURT: 53 Unsurprisingly, no changes on later platforms. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* nir: Recognize open-coded extract_u16.Matt Turner2016-03-041-0/+5
| | | | | | | | No shader-db changes, but does recognize some extract_u16 which enables the next patch to optimize some code. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* nir: Recognize open-coded extract_u8.Matt Turner2016-03-041-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Two shaders that appear in Unigine benchmarks (Heaven and Valley) unpack three bytes from an integer and convert each into a float: float((val >> 16u) & 0xffu) float((val >> 8u) & 0xffu) float((val >> 0u) & 0xffu) Instead of shifting, masking, and type converting like this: shr(8) g15<1>UD g25<8,8,1>UD 0x00000010UD and(8) g16<1>UD g15<8,8,1>UD 0x000000ffUD mov(8) g17<1>F g16<8,8,1>UD shr(8) g18<1>UD g25<8,8,1>UD 0x00000008UD and(8) g19<1>UD g18<8,8,1>UD 0x000000ffUD mov(8) g20<1>F g19<8,8,1>UD and(8) g21<1>UD g25<8,8,1>UD 0x000000ffUD mov(8) g22<1>F g21<8,8,1>UD i965 can simply extract a byte and convert to float in a single instruction: mov(8) g17<1>F g25.2<32,8,4>UB mov(8) g20<1>F g25.1<32,8,4>UB mov(8) g22<1>F g25.0<32,8,4>UB This patch implements the first step: recognizing byte extraction. A later patch will optimize out the conversion to float. instructions in affected programs: 28568 -> 27450 (-3.91%) helped: 7 cycles in affected programs: 210076 -> 203144 (-3.30%) helped: 7 This patch decreases the number of instructions in the two Unigine programs by: #1721: 4520 -> 4374 instructions (-3.23%) #1706: 3752 -> 3582 instructions (-4.53%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* nir: Recognize open-coded bitfield_reverse.Matt Turner2016-02-081-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | Helps 11 shaders in UnrealEngine4 demos. I seriously hope they would have given us bitfieldReverse() if we exposed GL 4.0 (but we do expose ARB_gpu_shader5, so why not use that anyway?). instructions in affected programs: 4875 -> 4633 (-4.96%) cycles in affected programs: 270516 -> 244516 (-9.61%) I suspect there's a *lot* of room to improve nir_search/opt_algebraic's handling of this. We'd actually like to match, e.g., step2 by matching step1 once and then doing a pointer comparison for the second instance of step1, but unfortunately we generate an enormous tuple for instead. The .text size increases by 6.5% and the .data by 17.5%. text data bss dec hex filename 22957 45224 0 68181 10a55 nir_libnir_la-nir_opt_algebraic.o 24461 53160 0 77621 12f35 nir_libnir_la-nir_opt_algebraic.o I'd be happy to remove this if Unreal4 uses bitfieldReverse() if it is in a GL 4.0 context once we expose GL 4.0. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* nir: Recognize product of open-coded pow()s.Matt Turner2016-02-081-0/+2
| | | | | | Prevents regressions in the next commit. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* nir: Add opt_algebraic rules for xor with zero.Matt Turner2016-02-081-0/+2
| | | | | | | | instructions in affected programs: 668 -> 664 (-0.60%) helped: 4 Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* nir: Add lowering support for unpacking opcodes.Matt Turner2016-02-011-0/+28
| | | | Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* nir: Add lowering support for packing opcodes.Matt Turner2016-02-011-0/+20
| | | | Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* nir: Add opcodes to extract bytes or words.Matt Turner2016-02-011-0/+16
| | | | | | The uint versions zero extend while the int versions sign extend. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* nir: move to compiler/Emil Velikov2016-01-261-0/+285
Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Matt Turner <mattst88@gmail.com> Acked-by: Jose Fonseca <jfonseca@vmware.com>