summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: Drop fs_inst::overwrites_reg() in favor of regions_overlap().Francisco Jerez2016-09-141-1/+2
| | | | | | | | | | | fs_inst::overwrites_reg is rather easy to misuse because it cannot tell how large the register region starting at 'reg' is, so in cases where the destination region starts after 'reg' it may give a misleading result. regions_overlap() is somewhat more verbose to use but handles arbitrary overlap correctly so it should generally be used instead. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Replace fs_reg::reg_offset with fs_reg::offset expressed in bytes.Francisco Jerez2016-09-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fs_reg::offset field in byte units introduced in this patch is a more straightforward alternative to the current register offset representation split between fs_reg::reg_offset and ::subreg_offset. The split representation makes it too easy to forget about one of the offsets while dealing with the other, which has led to multiple back-end bugs in the past. To make the matter worse the unit reg_offset was expressed in was rather inconsistent, for uniforms it would be expressed in either 4B or 16B units depending on the back-end, and for most other things it would be expressed in 32B units. This encodes reg_offset as a new offset field expressed consistently in byte units. Each rvalue reference of reg_offset in existing code like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and each lvalue reference like 'r.reg_offset = x' is rewritten to 'r.offset = r.offset % reg_unit + x * reg_unit'. Because the change affects a lot of places and is rather non-trivial to verify due to the inconsistent value of reg_unit, I've tried to avoid making any additional changes other than applying the rewrite rule above in order to keep the patch as simple as possible, sometimes at the cost of introducing obvious stupidity (e.g. algebraic expressions that could be simplified given some knowledge of the context) -- I'll clean those up later on in a second pass. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Restrict inequality that can only hold equal in saturate propagation.Francisco Jerez2016-03-141-1/+1
| | | | | | | | | | Should have no functional change. The IP value of an instruction that reads src_var cannot possibly be after the end of the live interval of the variable it's reading from, by the definition of live interval. Might save future readers a momentary WTF while trying to understand this code. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Allow saturate propagation to propagate negations into MADs.Matt Turner2016-02-251-0/+4
| | | | | | | | | | | | | | | | Allows us to transform mad res src0 src1 src2 mov.sat dst -res into mad.sat dst -src0 -src1 src2 instructions in affected programs: 3712 -> 3688 (-0.65%) helped: 24 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: Allow saturate propagation to propagate negations into ADDs.Matt Turner2016-02-251-0/+11
| | | | | | | | | | | | | | | Allows us to transform add res src0 src1 mov.sat dst -res into add.sat dst -src0 -src1 No shader-db changes. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: Allow saturate propagation to propagate negations into MULs.Matt Turner2016-02-251-3/+14
| | | | | | | | | | | | | | | | Allows us to transform mul res src0 src1 mov.sat dst -res into mul.sat dst src0 -src1 instructions in affected programs: 45246 -> 45054 (-0.42%) helped: 162 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Rename GRF to VGRF.Matt Turner2015-11-131-3/+3
| | | | | | | | | | The 2-bit hardware register file field is ARF, GRF, MRF, IMM. Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to mean an assigned general purpose register. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use brw_reg's nr field to store register number.Matt Turner2015-11-131-1/+1
| | | | | | | | | | | | In addition to combining another field, we get replace silliness like "reg.reg" with something that actually makes sense, "reg.nr"; and no one will ever wonder again why dst.reg isn't a dst_reg. Moving the now 16-bit nr field to a 16-bit boundary decreases code size by about 3k. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Remove block arg from foreach_inst_in_block_*_starting_fromNeil Roberts2015-10-211-1/+1
| | | | | | | | | Since 49374fab5d793 these macros no longer actually use the block argument. I think this is worth doing to make the macros easier to use because they already have really long names and a confusing set of arguments. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Consider type mismatches in saturate propagation.Matt Turner2015-10-191-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | NIR considers bcsel to produce and consume unsigned types, leading to SEL instructions operating on unsigned types when the data is really floating-point. Previous to this patch, saturate propagation would happily transform (+f0) sel g20:UD, g30:UD, g40:UD mov.sat g50:F, g20:F into (+f0) sel.sat g20:UD, g30:UD, g40:UD mov g50:F, g20:F But since the meaning of .sat is dependent on the type of the destination register, this is not valid. Instead, allow saturate propagation to change the types of dest/source on instructions that are simply copying data in order to propagate the saturate modifier. Fixes bad code gen in 158 programs. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs: Consider MOV.SAT to interfere if it has a source modifier.Matt Turner2015-02-191-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | The saturate propagation pass recognizes that the second instruction below does not interfere with an attempt to propagate the saturate modifier from instruction 3 to 1. 1: add(8) dst0 src0 src1 2: mov.sat(8) dst1 dst0 3: mov.sat(8) dst2 dst0 Unfortunately, we did not consider the case of instruction 2 having a source modifier on dst0. Take for instance: 1: add(8) dst0 src0 src1 2: mov.sat(8) dst1 -dst0 3: mov.sat(8) dst2 dst0 Consider such an instruction to interfere. Increase instruction counts in Anomaly 2, which could be a bug fix depending on the values the first instruction produces. instructions in affected programs: 53228 -> 53934 (1.33%) HURT: 360 Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: Use fs_inst::overwrites_reg() in saturate propagation.Matt Turner2015-02-191-4/+4
| | | | | | | This is safer and matches the conditional_mod propagation pass. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: Add a comment explaining what saturate propagation does.Matt Turner2014-12-161-0/+14
|
* i965/fs: Use const fs_reg & rather than a copy or pointer.Matt Turner2014-12-011-1/+1
| | | | | | Also while we're touching var_from_reg, just make it an inline function. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Don't invalidate live intervals in saturate propagation.Matt Turner2014-09-271-2/+1
| | | | Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs: Ignore mov.sat instructions in interference check in sat prop.Matt Turner2014-09-271-1/+2
| | | | | | | | | | | | When an instruction's result was consumed by multiple mov.sat instructions, we would decide that we couldn't move the saturate modifier because something else was using the result, even though it was just another mov.sat! total instructions in shared programs: 4275598 -> 4274842 (-0.02%) instructions in affected programs: 75634 -> 74878 (-1.00%) Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs: Walk instructions in reverse in saturate propagation.Matt Turner2014-09-271-3/+3
| | | | | | | | | | When we find a mov.sat, we search backwards. We might as well search everything else backwards as well and potentially look at fewer instructions. This change enables the next patch. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs: Don't iterate between blocks with inst->next/prev.Matt Turner2014-09-241-6/+1
| | | | | | When instruction lists are per-basic block, this won't work. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Remove cfg-invalidating parameter from invalidate_live_intervals.Matt Turner2014-09-241-1/+1
| | | | | | Everything has been converted to preserve the CFG. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Use basic-block aware insertion/removal functions.Matt Turner2014-08-221-1/+1
| | | | | | | | | To avoid invalidating and recreating the control flow graph. Also stop invalidating the CFG in places we didn't add or remove an instruction. cfg calculations: 202951 -> 80307 (-60.43%) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Add and use foreach_block macro.Matt Turner2014-08-181-3/+2
| | | | | Use this as an opportunity to rename 'block_num' to 'num'. block->num is clear, and block->block_num has always been redundant.
* i965: Add cfg to backend_visitor.Matt Turner2014-07-211-5/+3
| | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965/fs: Pass cfg to calculate_live_intervals().Matt Turner2014-07-011-2/+2
| | | | | | | We've often created the CFG immediately before, so use it when available. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Add and use foreach_inst_in_block macros.Matt Turner2014-07-011-3/+1
| | | | Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: Let sat-prop ignore live ranges if producer already has sat.Matt Turner2014-06-301-4/+7
| | | | | | | | | | | | | | | | | | | This sequence (where both x and w are used afterwards) wasn't handled. mul.sat x, y, z ... mov.sat w, x We assumed that if x was used after the mov.sat, that we couldn't propagate the saturate modifier, but in fact x was already saturated. So ignore the live range check if the producing instruction already saturates its result. Cuts one instruction from hundreds of TF2 shaders. total instructions in shared programs: 1995631 -> 1994951 (-0.03%) instructions in affected programs: 155248 -> 154568 (-0.44%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Loop from 0 to inst->sources, not 0 to 3.Matt Turner2014-06-011-1/+1
| | | | | | Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Don't propagate saturation modifiers if there are source modifiers.Matt Turner2014-04-051-0/+2
| | | | | | | | | | | | | | | | | Which would lead to translating mad vgrf9:F, vgrf3:F, u0:F, vgrf6:F mov.sat vgrf7:F, -vgrf9:F into mad.sat vgrf9:F, vgrf3:F, u0:F, vgrf6:F mov vgrf7:F, -vgrf9:F Fixes some lighting effects in Dota2. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76749 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: Don't propagate saturate modifiers into partial writes.Matt Turner2014-04-051-1/+2
| | | | Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: Fix off-by-one in saturate propagation.Matt Turner2014-04-051-1/+1
| | | | | | | | | | | | ip needs to be initialized to start_ip - 1, since the first thing in the main loop is ip++. Otherwise we would incorrectly propagate the saturate from the mov to the mad: mad a, b, c, d mov.sat x, a add y, z, a Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: Fix register comparisons in saturate propagation.Kenneth Graunke2014-03-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | opt_saturate_propagation_local compares scan_inst->dst.reg/reg_offset with inst->src[0].reg/reg_offset, and ensures that scan_inst->dst.file is GRF. But nothing ensured that inst->src[0].file was GRF. In the following program, this resulted in u1:F matching vgrf1:UW, and a saturate being incorrectly propagated from instruction 8 to instruction 1. { 1} 0: add vgrf0:UW, hw_reg1+8:UW, hw_reg0:V { 1} 1: add vgrf1:UW, hw_reg1+10:UW, hw_reg0:V { 1} 2: linterp vgrf6:F, hw_reg2:F, hw_reg3:F, hw_reg0:F { 2} 3: linterp vgrf27:F, hw_reg2:F, hw_reg3:F, hw_reg0+16:F { 4} 4: mov vgrf10+0.0:F, vgrf6:F { 3} 5: mov vgrf10+1.0:F, vgrf27:F { 6} 6: tex vgrf8+0.0:F, vgrf10+0.0:F { 5} 7: mov vgrf32:F, u1:F { 5} 8: mov.sat vgrf12:F, u1:F From shader-db: total instructions in shared programs: 1841932 -> 1841957 (0.00%) instructions in affected programs: 5823 -> 5848 (0.43%) I inspected two of the 25 hurt shaders, and concluded that they were both hitting this bug, and not legitimately optimized. This fixes bugs in Left 4 Dead 2 and Team Fortress 2, possibly among others. The optimization pass didn't exist in 10.0, so this is only a candidate for 10.1. Cc: "10.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Add a saturation propagation optimization pass.Matt Turner2014-01-281-0/+104
Transforms, for example, mul vgrf3, vgrf2, vgrf1 mov.sat vgrf4, vgrf3 into mul.sat vgrf3, vgrf2, vgrf1 mov vgrf4, vgrf3 which gives register_coalescing an opportunity to remove the MOV instruction. total instructions in shared programs: 1515039 -> 1504634 (-0.69%) instructions in affected programs: 798586 -> 788181 (-1.30%) GAINED: 0 LOST: 4 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>