| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
fs_inst::overwrites_reg is rather easy to misuse because it cannot
tell how large the register region starting at 'reg' is, so in cases
where the destination region starts after 'reg' it may give a
misleading result. regions_overlap() is somewhat more verbose to use
but handles arbitrary overlap correctly so it should generally be used
instead.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
|
|
| |
is_nop_mov() was broken for LOAD_PAYLOAD instructions in two ways: On
the one hand the original destination register offset wasn't being
taken into account which would give incorrect results if it was
already non-zero, and on the other hand all source registers were
being treated as if they had a size of 32B, which is almost never the
case in SIMD16 programs for non-header sources.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous regs_written field can be recovered by rewriting each
rvalue reference of regs_written like 'x = i.regs_written' to 'x =
DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference
like 'i.regs_written = x' to 'i.size_written = x * reg_unit'.
For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is in preparation for dropping fs_inst::regs_read and
::regs_written in favor of more accurate alternatives expressed in
byte units. The main reason these wrappers are useful is that a
number of optimization passes implement dataflow analysis with
register granularity, so these helpers will come in handy once we've
switched register offsets and sizes to the byte representation. The
wrapper functions will also make sure that GRF misalignment (currently
neglected by most of the back-end) is taken into account correctly in
the calculation of regs_read and regs_written.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fs_reg::offset field in byte units introduced in this patch is a
more straightforward alternative to the current register offset
representation split between fs_reg::reg_offset and ::subreg_offset.
The split representation makes it too easy to forget about one of the
offsets while dealing with the other, which has led to multiple
back-end bugs in the past. To make the matter worse the unit
reg_offset was expressed in was rather inconsistent, for uniforms it
would be expressed in either 4B or 16B units depending on the
back-end, and for most other things it would be expressed in 32B
units.
This encodes reg_offset as a new offset field expressed consistently
in byte units. Each rvalue reference of reg_offset in existing code
like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and
each lvalue reference like 'r.reg_offset = x' is rewritten to
'r.offset = r.offset % reg_unit + x * reg_unit'.
Because the change affects a lot of places and is rather non-trivial
to verify due to the inconsistent value of reg_unit, I've tried to
avoid making any additional changes other than applying the rewrite
rule above in order to keep the patch as simple as possible, sometimes
at the cost of introducing obvious stupidity (e.g. algebraic
expressions that could be simplified given some knowledge of the
context) -- I'll clean those up later on in a second pass.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
|
| |
The 2-bit hardware register file field is ARF, GRF, MRF, IMM.
Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to
mean an assigned general purpose register.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In addition to combining another field, we get replace silliness like
"reg.reg" with something that actually makes sense, "reg.nr"; and no one
will ever wonder again why dst.reg isn't a dst_reg.
Moving the now 16-bit nr field to a 16-bit boundary decreases code size
by about 3k.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
start_to -> dst_start
end_to -> dst_end
start_from -> src_start
end_from -> src_end
var_to -> dst_var
var_from -> src_var
reg_to -> dst_reg
reg_to_offset -> dst_reg_offset
reg_from -> src_reg
Not sure how these made sense to me before.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
| |
No need to walk through instructions in blocks we know don't contain our
registers' live ranges.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I always thought that the is_control_flow() -> return false check was a
bad hack, and some previous attempts to remove it have failed and have
been reverted.
The previous two patches fix some problems that caused register
coalescing to not notice some interference between registers, which the
is_control_flow() check apparently works around.
With that fixed, we can calculate interference more accurately.
total instructions in shared programs: 6261319 -> 6257917 (-0.05%)
instructions in affected programs: 346282 -> 342880 (-0.98%)
helped: 1552
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
| |
equals() returns false for registers with different types, using it
isn't appropriate to determine whether an is overwriting a register.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
For some reason the loop that rewrites all occurrences of the
coalesced register was iterating over all possible offsets until it
would find one that compares equal to the offset of a source or
destination of any instruction in the program. Since the mapping
between old and new offsets is already available in the regs_to_offset
array and we know that the whole register has been coalesced we can
just look it up.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The register coalesce pass wasn't rewriting the destination and
sources of instructions that accessed the second half of a coalesced
register previously copied with a 16-wide MOV instruction. E.g.:
| ADD (16) vgrf0:f, vgrf0:f, 1.0:f
| MOV (16) vgrf1:f, vgrf0:f
| MOV (8) vgrf2:f, vgrf0+1:f { sechalf }
would get incorrectly register-coalesced into:
| ADD (16) vgrf1:f, vgrf1:f, 1.0:f
| MOV (8) vgrf2:f, vgrf0+1:f { sechalf }
The reason is that the mov[i] pointer was being left equal to NULL for
every other register. The fact that we've made it to the rewrite loop
implies that the whole register will be coalesced, so it doesn't seem
right not to update something that uses it depending on whether mov[i]
is NULL or not. Fixes an amount of texturing and image_load_store
piglit tests on my SIMD-lowering branch.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
register_coalesce() was considering the exec_size of the MOV
instruction alone to decide whether the register at offset+1 of the
source VGRF was being copied to inst->dst.reg_offset+1 of the
destination VGRF, which is only a valid assumption if the move has a
32-bit execution type. Use regs_read() instead to find out the number
of registers copied by the instruction.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As of now, the width field is no longer used for anything. The width field
"seemed like a good idea at the time" but is actually entirely redundant
with the instruction's execution size. Initially, it gave us the ability
to easily set the instructions execution size based entirely on register
widths. With the builder, we can easiliy set the sizes explicitly and the
width field doesn't have as much purpose. At this point, it's just
redundant information that can get out of sync so it really needs to go.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
|
|
|
|
|
|
|
|
| |
There are a variety of places where we use dst.width / 8 to compute the
size of a single logical channel. Instead, we should be using exec_size.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
|
|
|
|
|
|
|
|
|
|
| |
This commit adds a new is_copy_payload helper to fs_inst that takes the
place of the similarly named functions in cse and register coalesce. The
two is_copy_payload functions in CSE and register coalesce were subtly
different and potentially subtly broken. The new version unifies the two
and should be more correct.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Right now virtual GRF book-keeping and allocation is performed in each
visitor class separately (among other hundred different things),
leading to duplicated logic in each visitor and preventing layering as
it forces any code that manipulates i965 IR and needs to allocate
virtual registers to depend on the specific visitor that happens to be
used to translate from GLSL IR.
v2: Use realloc()/free() to allocate VGRF book-keeping arrays (Connor).
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we had a MAX_SAMPLER_MESSAGE_SIZE which we used instead.
However, some FB write messages can validly be longer than this so we need
something different. Since MAX_SAMPLER_MESSAGE_SIZE is validly useful on
its own, we leave it alone and add a new MAX_GRF_SIZE that's big enough for
FB writes.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84539
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
This commit fixes a bug in register coalesce that happens when one register
is moved to another the proper number of times but the channels are
re-arranged. When this happens, the previous code would happily coalesce
the registers regardless of the fact that the channel mappins were wrong.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is actually the squash of a bunch of different changes. Individual
commit titles follow:
i965/fs: Always 2-align registers SIMD16 for gen <= 5
i965/fs: Use the register width when applying offsets
This reworks both byte_offset() and offset() to be more intelligent.
The byte_offset() function now supports offsets bigger than 32. The
offset() function uses the byte_offset() function together with the
register width and the type size to offset the register by the correct
amount.
i965/fs: Change regs_read to be in hardware registers
i965/fs: Change regs_written to be actual hardware registers
i965/fs: Properly handle register widths in LOAD_PAYLOAD
The LOAD_PAYLOAD instruction is a bit special because it collects a
bunch of registers (with possibly different widths) into a single
payload block. Once the payload is constructed, it's treated as a
single block of data and most of the information such as register widths
doesn't matter anymore. In particular, the offset of any particular
source register is the accumulation of the sizes of the previous source
registers.
i965/fs: Properly set writemasks in LOAD_PAYLOAD
i965/fs: Handle register widths in demote_pull_constants
i965/fs: Get rid of implicit register doubling in the allocator
i965/fs: Reserve enough registers for PLN instructions
i965/fs: Make sources and destinations interfere in 16-wide
i965/fs: Properly handle register widths in CSE
i965/fs: Properly handle register widths in register_coalesce
i965/fs: Properly handle widths in copy propagation
i965/fs: Properly handle register widths in VARYING_PULL_CONSTANT_LOAD
i965/fs: Properly handle register widths and odd register sizes in spilling
i965/fs: Don't waste a register on texture lookups for gen >= 7
Previously, we were waisting a register in SIMD16 mode because we could
only allocate registers in pairs. Now that we can allocate and address
odd-sized registers, let's get rid of this special-case.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
Just pass the visitor into is_copy_payload() and is_coalesce_candidate()
instead of a register size and the virtual_grf_sizes array. Among other
things, this makes the code more obvious because you don't have to figure
out where src_size came from.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
| |
The only trick is changing a break into a return true in register
coalescing, since the macro is actually a double loop, and break will do
something different than you expect. (Wish I'd realized that earlier!)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
|
|
|
|
|
|
| |
Everything has been converted to preserve the CFG.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
|
|
|
|
|
|
|
|
|
| |
To avoid invalidating and recreating the control flow graph. Also stop
invalidating the CFG in places we didn't add or remove an instruction.
cfg calculations: 202951 -> 80307 (-60.43%)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A similar attempt was made in commit 5ff1e446 and was reverted in commit
a39428cf after causing a regression in an ES 3 conformance test. The
test still passes after this commit.
total instructions in shared programs: 1994827 -> 1992858 (-0.10%)
instructions in affected programs: 128247 -> 126278 (-1.54%)
GAINED: 0
LOST: 1
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
| |
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
|
|
| |
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
|
| |
|
|
|
|
|
|
| |
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
| |
This reverts commit 5ff1e446d44bb9d50f84883c7058635cb070e069.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77704
|
|
|
|
|
|
| |
This reverts commit 55de1c035cbca2b7087b3aa21a8c3dfc900a4ad9.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
|
|
|
|
|
|
|
| |
This reverts commit f770123f58b46459e8dbd27525162ee8ba89f30b.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78692
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We previously only allowed coalescing registers that interfere (i.e.,
whose live ranges overlap) if the destination register's live range was
entirely inside the source's live range. This is unnecessary -- we only
need to check for interfering writes in the intersection of their live
ranges.
total instructions in shared programs: 1639470 -> 1638453 (-0.06%)
instructions in affected programs: 84751 -> 83734 (-1.20%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than any old control flow. Muchnick's algorithm just checks for
interfering writes between the MOV and the end of the program. Handling
this when you have backward branches is hard, so don't, but there's no
reason to bail if you see forward branches.
instructions in affected programs: 4270 -> 4248 (-0.52%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were starting at the beginning of the instruction list, rather than
with the MOV instruction itself. This allows us to coalesce after
control flow.
Excluding the shaders from an unreleased title, the shader-db results:
total instructions in shared programs: 1603791 -> 1594215 (-0.60%)
instructions in affected programs: 678772 -> 669196 (-1.41%)
GAINED: 5
LOST: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
| |
And avoid rewriting other instructions unnecessarily. Removes a few
self-moves we weren't able to handle because they were components of a
large VGRF.
instructions in affected programs: 830 -> 826 (-0.48%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
| |
Otherwise there's nothing to do.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
| |
This reverts commit f092e8951ce5212ba3cbb382ce3a6666eb6c9bed.
Didn't mean to push this...
|
|
|
|
| |
Otherwise there's nothing to do.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Not setting this would prevented coalescing after a failed attempt if
the sources for both MOVs were the same.
total instructions in shared programs: 1654531 -> 1650224 (-0.26%)
instructions in affected programs: 423167 -> 418860 (-1.02%)
GAINED: 2
LOST: 0
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
|
|
|
|
|
|
| |
I think this was used for coalescing out partly dead large virtual
registers, but the patch that enabled that caused regressions and didn't
make it upstream.
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's more likely that we won't find writes to all channels than one will
interfere, and calculating interference is more expensive. This change
will also help prepare for coalescing load_payload instructions'
operands.
Also update the live intervals for all channels, and not just the last
that we saw.
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
|
|
| |
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
|
|
|
|
| |
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
|
|
The function has gotten large, and brw_fs.cpp is the largest source file
in the driver.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
|