| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
fs_inst::overwrites_reg is rather easy to misuse because it cannot
tell how large the register region starting at 'reg' is, so in cases
where the destination region starts after 'reg' it may give a
misleading result. regions_overlap() is somewhat more verbose to use
but handles arbitrary overlap correctly so it should generally be used
instead.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fs_reg::offset field in byte units introduced in this patch is a
more straightforward alternative to the current register offset
representation split between fs_reg::reg_offset and ::subreg_offset.
The split representation makes it too easy to forget about one of the
offsets while dealing with the other, which has led to multiple
back-end bugs in the past. To make the matter worse the unit
reg_offset was expressed in was rather inconsistent, for uniforms it
would be expressed in either 4B or 16B units depending on the
back-end, and for most other things it would be expressed in 32B
units.
This encodes reg_offset as a new offset field expressed consistently
in byte units. Each rvalue reference of reg_offset in existing code
like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and
each lvalue reference like 'r.reg_offset = x' is rewritten to
'r.offset = r.offset % reg_unit + x * reg_unit'.
Because the change affects a lot of places and is rather non-trivial
to verify due to the inconsistent value of reg_unit, I've tried to
avoid making any additional changes other than applying the rewrite
rule above in order to keep the patch as simple as possible, sometimes
at the cost of introducing obvious stupidity (e.g. algebraic
expressions that could be simplified given some knowledge of the
context) -- I'll clean those up later on in a second pass.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
|
| |
Should have no functional change. The IP value of an instruction that
reads src_var cannot possibly be after the end of the live interval of
the variable it's reading from, by the definition of live interval.
Might save future readers a momentary WTF while trying to understand
this code.
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allows us to transform
mad res src0 src1 src2
mov.sat dst -res
into
mad.sat dst -src0 -src1 src2
instructions in affected programs: 3712 -> 3688 (-0.65%)
helped: 24
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allows us to transform
add res src0 src1
mov.sat dst -res
into
add.sat dst -src0 -src1
No shader-db changes.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allows us to transform
mul res src0 src1
mov.sat dst -res
into
mul.sat dst src0 -src1
instructions in affected programs: 45246 -> 45054 (-0.42%)
helped: 162
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
The 2-bit hardware register file field is ARF, GRF, MRF, IMM.
Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to
mean an assigned general purpose register.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In addition to combining another field, we get replace silliness like
"reg.reg" with something that actually makes sense, "reg.nr"; and no one
will ever wonder again why dst.reg isn't a dst_reg.
Moving the now 16-bit nr field to a 16-bit boundary decreases code size
by about 3k.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
| |
Since 49374fab5d793 these macros no longer actually use the block
argument. I think this is worth doing to make the macros easier to use
because they already have really long names and a confusing set of
arguments.
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NIR considers bcsel to produce and consume unsigned types, leading to
SEL instructions operating on unsigned types when the data is really
floating-point. Previous to this patch, saturate propagation would
happily transform
(+f0) sel g20:UD, g30:UD, g40:UD
mov.sat g50:F, g20:F
into
(+f0) sel.sat g20:UD, g30:UD, g40:UD
mov g50:F, g20:F
But since the meaning of .sat is dependent on the type of the
destination register, this is not valid.
Instead, allow saturate propagation to change the types of dest/source
on instructions that are simply copying data in order to propagate the
saturate modifier.
Fixes bad code gen in 158 programs.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The saturate propagation pass recognizes that the second instruction
below does not interfere with an attempt to propagate the saturate
modifier from instruction 3 to 1.
1: add(8) dst0 src0 src1
2: mov.sat(8) dst1 dst0
3: mov.sat(8) dst2 dst0
Unfortunately, we did not consider the case of instruction 2 having a
source modifier on dst0. Take for instance:
1: add(8) dst0 src0 src1
2: mov.sat(8) dst1 -dst0
3: mov.sat(8) dst2 dst0
Consider such an instruction to interfere. Increase instruction counts
in Anomaly 2, which could be a bug fix depending on the values the first
instruction produces.
instructions in affected programs: 53228 -> 53934 (1.33%)
HURT: 360
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
|
|
|
|
|
| |
This is safer and matches the conditional_mod propagation pass.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
|
| |
|
|
|
|
|
|
| |
Also while we're touching var_from_reg, just make it an inline function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When an instruction's result was consumed by multiple mov.sat
instructions, we would decide that we couldn't move the saturate
modifier because something else was using the result, even though it was
just another mov.sat!
total instructions in shared programs: 4275598 -> 4274842 (-0.02%)
instructions in affected programs: 75634 -> 74878 (-1.00%)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
When we find a mov.sat, we search backwards. We might as well search
everything else backwards as well and potentially look at fewer
instructions.
This change enables the next patch.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
|
|
|
|
|
|
| |
When instruction lists are per-basic block, this won't work.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
|
|
|
|
|
|
| |
Everything has been converted to preserve the CFG.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
|
|
|
|
|
|
|
|
|
| |
To avoid invalidating and recreating the control flow graph. Also stop
invalidating the CFG in places we didn't add or remove an instruction.
cfg calculations: 202951 -> 80307 (-60.43%)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
|
|
|
|
|
| |
Use this as an opportunity to rename 'block_num' to 'num'. block->num is
clear, and block->block_num has always been redundant.
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
|
|
|
|
|
|
|
| |
We've often created the CFG immediately before, so use it when
available.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
|
|
| |
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This sequence (where both x and w are used afterwards) wasn't handled.
mul.sat x, y, z
...
mov.sat w, x
We assumed that if x was used after the mov.sat, that we couldn't
propagate the saturate modifier, but in fact x was already saturated.
So ignore the live range check if the producing instruction already
saturates its result. Cuts one instruction from hundreds of TF2 shaders.
total instructions in shared programs: 1995631 -> 1994951 (-0.03%)
instructions in affected programs: 155248 -> 154568 (-0.44%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
| |
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Which would lead to translating
mad vgrf9:F, vgrf3:F, u0:F, vgrf6:F
mov.sat vgrf7:F, -vgrf9:F
into
mad.sat vgrf9:F, vgrf3:F, u0:F, vgrf6:F
mov vgrf7:F, -vgrf9:F
Fixes some lighting effects in Dota2.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76749
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
|
|
| |
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
ip needs to be initialized to start_ip - 1, since the first thing in the
main loop is ip++. Otherwise we would incorrectly propagate the saturate
from the mov to the mad:
mad a, b, c, d
mov.sat x, a
add y, z, a
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
opt_saturate_propagation_local compares scan_inst->dst.reg/reg_offset
with inst->src[0].reg/reg_offset, and ensures that scan_inst->dst.file
is GRF. But nothing ensured that inst->src[0].file was GRF.
In the following program, this resulted in u1:F matching vgrf1:UW,
and a saturate being incorrectly propagated from instruction 8 to
instruction 1.
{ 1} 0: add vgrf0:UW, hw_reg1+8:UW, hw_reg0:V
{ 1} 1: add vgrf1:UW, hw_reg1+10:UW, hw_reg0:V
{ 1} 2: linterp vgrf6:F, hw_reg2:F, hw_reg3:F, hw_reg0:F
{ 2} 3: linterp vgrf27:F, hw_reg2:F, hw_reg3:F, hw_reg0+16:F
{ 4} 4: mov vgrf10+0.0:F, vgrf6:F
{ 3} 5: mov vgrf10+1.0:F, vgrf27:F
{ 6} 6: tex vgrf8+0.0:F, vgrf10+0.0:F
{ 5} 7: mov vgrf32:F, u1:F
{ 5} 8: mov.sat vgrf12:F, u1:F
From shader-db:
total instructions in shared programs: 1841932 -> 1841957 (0.00%)
instructions in affected programs: 5823 -> 5848 (0.43%)
I inspected two of the 25 hurt shaders, and concluded that they were
both hitting this bug, and not legitimately optimized.
This fixes bugs in Left 4 Dead 2 and Team Fortress 2, possibly among
others. The optimization pass didn't exist in 10.0, so this is only
a candidate for 10.1.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
Transforms, for example,
mul vgrf3, vgrf2, vgrf1
mov.sat vgrf4, vgrf3
into
mul.sat vgrf3, vgrf2, vgrf1
mov vgrf4, vgrf3
which gives register_coalescing an opportunity to remove the MOV
instruction.
total instructions in shared programs: 1515039 -> 1504634 (-0.69%)
instructions in affected programs: 798586 -> 788181 (-1.30%)
GAINED: 0
LOST: 4
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|