summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: Drop fs_inst::overwrites_reg() in favor of regions_overlap().Francisco Jerez2016-09-141-1/+2
| | | | | | | | | | | fs_inst::overwrites_reg is rather easy to misuse because it cannot tell how large the register region starting at 'reg' is, so in cases where the destination region starts after 'reg' it may give a misleading result. regions_overlap() is somewhat more verbose to use but handles arbitrary overlap correctly so it should generally be used instead. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Compare full register offsets in cmod propagation pass.Francisco Jerez2016-09-141-2/+1
| | | | | | | This could potentially have misoptimized a program in cases where inst->src[0] had a non-zero sub-GRF offset. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Replace fs_reg::reg_offset with fs_reg::offset expressed in bytes.Francisco Jerez2016-09-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fs_reg::offset field in byte units introduced in this patch is a more straightforward alternative to the current register offset representation split between fs_reg::reg_offset and ::subreg_offset. The split representation makes it too easy to forget about one of the offsets while dealing with the other, which has led to multiple back-end bugs in the past. To make the matter worse the unit reg_offset was expressed in was rather inconsistent, for uniforms it would be expressed in either 4B or 16B units depending on the back-end, and for most other things it would be expressed in 32B units. This encodes reg_offset as a new offset field expressed consistently in byte units. Each rvalue reference of reg_offset in existing code like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and each lvalue reference like 'r.reg_offset = x' is rewritten to 'r.offset = r.offset % reg_unit + x * reg_unit'. Because the change affects a lot of places and is rather non-trivial to verify due to the inconsistent value of reg_unit, I've tried to avoid making any additional changes other than applying the rewrite rule above in order to keep the patch as simple as possible, sometimes at the cost of introducing obvious stupidity (e.g. algebraic expressions that could be simplified given some knowledge of the context) -- I'll clean those up later on in a second pass. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* intel: s/brw_device_info/gen_device_info/Jason Ekstrand2016-09-031-1/+1
| | | | | | | | | | | | | Generated by: sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965/fs: Fix cmod propagation not to propagate non-identity cmod into CMP(N).Francisco Jerez2016-05-271-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | The conditional mod of these instructions determines the semantics of the comparison itself (rather than being evaluated based on the result of the instruction as is usually the case for most other instructions that allow conditional mods), so it's in general not legal to propagate a conditional mod into a CMP instruction. This prevents cmod propagation from (mis)optimizing: cmp.z.f0 tmp, ... mov.z.f0 null, tmp into: cmp.z.f0 tmp, ... which gives the negation of the flag result of the original sequence. I could reproduce this easily with SIMD32 but I don't see any reason why the problem would be SIMD32-specific, it was most likely working by luck. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Clean up remaining uses of fs_inst::reads_flag and ::writes_flag.Francisco Jerez2016-05-271-5/+5
| | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965: Clean up #includes in the compiler.Matt Turner2015-11-241-1/+1
| | | | Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: don't propagate cmod when the exec sizes differConnor Abbott2015-11-231-1/+2
| | | | | | | | | | | | This can happen when the source of the compare was split by the SIMD lowering pass. Potentially, we could allow the case where the exec size of scan_inst is larger, and scan_inst has the right quarter selected, but doing that seems a little more risky. v2: Merge the bail condition into the the previous if/break block (Matt) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Rename GRF to VGRF.Matt Turner2015-11-131-1/+1
| | | | | | | | | | The 2-bit hardware register file field is ARF, GRF, MRF, IMM. Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to mean an assigned general purpose register. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Remove block arg from foreach_inst_in_block_*_starting_fromNeil Roberts2015-10-211-2/+1
| | | | | | | | | Since 49374fab5d793 these macros no longer actually use the block argument. I think this is worth doing to make the macros easier to use because they already have really long names and a confusing set of arguments. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Relax type check in cmod propagation.Matt Turner2015-04-011-1/+3
| | | | | | | | | | | The thing we want to avoid is int/float comparisons, but int/unsigned comparisons with 0 are equivalent. total instructions in shared programs: 6194829 -> 6193996 (-0.01%) instructions in affected programs: 117192 -> 116359 (-0.71%) helped: 471 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs: Ignore type in cmod prop if scan_inst is CMP.Matt Turner2015-03-181-12/+13
| | | | | | | | | | total instructions in shared programs: 6263270 -> 6203091 (-0.96%) instructions in affected programs: 2606529 -> 2546350 (-2.31%) helped: 14301 GAINED: 5 LOST: 3 Revewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs: Handle CMP.nz ... 0 and AND.nz ... 1 similarly in cmod propagationIan Romanick2015-03-171-1/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Espically on platforms that do not natively generate 0u and ~0u for Boolean results, we generate a lot of sequences where a CMP is followed by an AND with 1. emit_bool_to_cond_code does this, for example. On ILK, this results in a sequence like: add(8) g3<1>F g8<8,8,1>F -g4<0,1,0>F cmp.l.f0(8) g3<1>D g3<8,8,1>F 0F and.nz.f0(8) null g3<8,8,1>D 1D (+f0) iff(8) Jump: 6 The AND.nz is obviously redundant. By propagating the cmod, we can instead generate add.l.f0(8) null g8<8,8,1>F -g4<0,1,0>F (+f0) iff(8) Jump: 6 Existing code already handles the propagation from the CMP to the ADD. Shader-db results: GM45 (0x2A42): total instructions in shared programs: 3550829 -> 3550788 (-0.00%) instructions in affected programs: 10028 -> 9987 (-0.41%) helped: 24 Iron Lake (0x0046): total instructions in shared programs: 4993146 -> 4993105 (-0.00%) instructions in affected programs: 9675 -> 9634 (-0.42%) helped: 24 Ivy Bridge (0x0166): total instructions in shared programs: 6291870 -> 6291794 (-0.00%) instructions in affected programs: 17914 -> 17838 (-0.42%) helped: 48 Haswell (0x0426): total instructions in shared programs: 5779256 -> 5779180 (-0.00%) instructions in affected programs: 16694 -> 16618 (-0.46%) helped: 48 Broadwell (0x162E): total instructions in shared programs: 6823088 -> 6823014 (-0.00%) instructions in affected programs: 15824 -> 15750 (-0.47%) helped: 46 No chage on Sandy Bridge or on any platform when NIR is used. v2: Add unit tests suggested by Matt. Remove spurious writes_flag() check on scan_inst when scan_inst is known to be BRW_OPCODE_CMP (also suggested by Matt). v3: Fix some comments and remove some explicit int() casts in fs_reg constructors in the unit tests. Both suggested by Matt. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Silence unused parameter warningIan Romanick2015-03-091-2/+2
| | | | | | | | | | | | I don't this opt_cmod_propagation_local ever used the fs_visitor. brw_fs_cmod_propagation.cpp:52:40: warning: unused parameter 'v' [-Wunused-parameter] opt_cmod_propagation_local(fs_visitor *v, bblock_t *block) ^ Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965/fs: Don't propagate cmod to inst with different type.Matt Turner2015-03-041-0/+4
| | | | | | Cc: 10.5 <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89317 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Handle CMP.nz ... 0 and MOV.nz similarly in cmod propagation.Kenneth Graunke2015-01-261-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | "MOV.nz null src" and "CMP.nz null src 0" are equivalent instructions. Previously, we deleted MOV.nz instructions when the instruction generating the MOV's source also wrote the flag register (as the flag register already contains the desired value). However, we wouldn't delete CMP.nz instructions that served the same purpose. We also didn't attempt true cmod propagation on MOV.nz instructions, while we would for the equivalent CMP.nz form. This patch fixes both limitations, treating both forms equally. CMP.nz instructions will now be deleted (helping the NIR backend), and MOV.nz instructions will have their .nz propagated. No changes in shader-db without NIR. With NIR, total instructions in shared programs: 6006153 -> 5969364 (-0.61%) instructions in affected programs: 2087139 -> 2050350 (-1.76%) helped: 10704 HURT: 0 GAINED: 2 LOST: 2 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Add support for removing MOV.NZ instructions.Matt Turner2015-01-231-3/+20
| | | | | | | | | | | | | | | | | | | | | | For some reason, we occasionally write the flag register with a MOV.NZ instruction: add(8) g25<1>F -g6<0,1,0>F g15<8,8,1>F cmp.l.f0(8) g26<1>D g25<8,8,1>F 0F mov.nz.f0(8) null g26<8,8,1>D A MOV.NZ instruction on the result of a CMP is like comparing for equality with true in C. It's useless. Removing it allows us to generate: add.l.f0(8) null -g6<0,1,0>F g15<8,8,1>F total instructions in shared programs: 5955701 -> 5951657 (-0.07%) instructions in affected programs: 302910 -> 298866 (-1.34%) GAINED: 1 LOST: 0 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Allow flipping cond mod for negated arguments.Matt Turner2015-01-231-3/+6
| | | | | | | | | | | | | | | | | | | | | This allows us to apply the optimization in cases where the CMP's argument is negated, by flipping the conditional mod. For example, it allows us to optimize this: add(8) temp a b cmp.l.f0(8) null -temp 0.0 into add.g.f0(8) temp a b total instructions in shared programs: 5958360 -> 5955701 (-0.04%) instructions in affected programs: 466880 -> 464221 (-0.57%) GAINED: 0 LOST: 1 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Propagate cmod across flag read if it contains the same value.Matt Turner2015-01-231-2/+14
| | | | | | | | total instructions in shared programs: 5959463 -> 5958900 (-0.01%) instructions in affected programs: 70031 -> 69468 (-0.80%) Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Add pass to propagate conditional modifiers.Matt Turner2015-01-231-0/+98
total instructions in shared programs: 5974160 -> 5959463 (-0.25%) instructions in affected programs: 1743737 -> 1729040 (-0.84%) GAINED: 0 LOST: 12 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>