summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: Remove the width field from fs_regJason Ekstrand2015-06-301-1/+0
| | | | | | | | | | | | | As of now, the width field is no longer used for anything. The width field "seemed like a good idea at the time" but is actually entirely redundant with the instruction's execution size. Initially, it gave us the ability to easily set the instructions execution size based entirely on register widths. With the builder, we can easiliy set the sizes explicitly and the width field doesn't have as much purpose. At this point, it's just redundant information that can get out of sync so it really needs to go. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Use exec_size instead of dst.width for computing component sizeJason Ekstrand2015-06-301-1/+1
| | | | | | | | There are a variety of places where we use dst.width / 8 to compute the size of a single logical channel. Instead, we should be using exec_size. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs_inst: Add an is_copy_payload helperJason Ekstrand2015-05-061-16/+1
| | | | | | | | | | This commit adds a new is_copy_payload helper to fs_inst that takes the place of the similarly named functions in cse and register coalesce. The two is_copy_payload functions in CSE and register coalesce were subtly different and potentially subtly broken. The new version unifies the two and should be more correct. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Factor out virtual GRF allocation to a separate object.Francisco Jerez2015-02-101-4/+4
| | | | | | | | | | | | | Right now virtual GRF book-keeping and allocation is performed in each visitor class separately (among other hundred different things), leading to duplicated logic in each visitor and preventing layering as it forces any code that manipulates i965 IR and needs to allocate virtual registers to depend on the specific visitor that happens to be used to translate from GLSL IR. v2: Use realloc()/free() to allocate VGRF book-keeping arrays (Connor). Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Add a MAX_GRF_SIZE define and use it various placesJason Ekstrand2014-10-021-5/+5
| | | | | | | | | | | | Previously, we had a MAX_SAMPLER_MESSAGE_SIZE which we used instead. However, some FB write messages can validly be longer than this so we need something different. Since MAX_SAMPLER_MESSAGE_SIZE is validly useful on its own, we leave it alone and add a new MAX_GRF_SIZE that's big enough for FB writes. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84539 Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Fix a bug in register coalesceJason Ekstrand2014-09-301-0/+17
| | | | | | | | | | This commit fixes a bug in register coalesce that happens when one register is moved to another the proper number of times but the channels are re-arranged. When this happens, the previous code would happily coalesce the registers regardless of the fact that the channel mappins were wrong. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs_reg: Allocate double the number of vgrfs in SIMD16 modeJason Ekstrand2014-09-301-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is actually the squash of a bunch of different changes. Individual commit titles follow: i965/fs: Always 2-align registers SIMD16 for gen <= 5 i965/fs: Use the register width when applying offsets This reworks both byte_offset() and offset() to be more intelligent. The byte_offset() function now supports offsets bigger than 32. The offset() function uses the byte_offset() function together with the register width and the type size to offset the register by the correct amount. i965/fs: Change regs_read to be in hardware registers i965/fs: Change regs_written to be actual hardware registers i965/fs: Properly handle register widths in LOAD_PAYLOAD The LOAD_PAYLOAD instruction is a bit special because it collects a bunch of registers (with possibly different widths) into a single payload block. Once the payload is constructed, it's treated as a single block of data and most of the information such as register widths doesn't matter anymore. In particular, the offset of any particular source register is the accumulation of the sizes of the previous source registers. i965/fs: Properly set writemasks in LOAD_PAYLOAD i965/fs: Handle register widths in demote_pull_constants i965/fs: Get rid of implicit register doubling in the allocator i965/fs: Reserve enough registers for PLN instructions i965/fs: Make sources and destinations interfere in 16-wide i965/fs: Properly handle register widths in CSE i965/fs: Properly handle register widths in register_coalesce i965/fs: Properly handle widths in copy propagation i965/fs: Properly handle register widths in VARYING_PULL_CONSTANT_LOAD i965/fs: Properly handle register widths and odd register sizes in spilling i965/fs: Don't waste a register on texture lookups for gen >= 7 Previously, we were waisting a register in SIMD16 mode because we could only allocate registers in pairs. Now that we can allocate and address odd-sized registers, let's get rid of this special-case. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: A little harmless refactoring of register_coalesceJason Ekstrand2014-09-301-7/+7
| | | | | | | | | | Just pass the visitor into is_copy_payload() and is_coalesce_candidate() instead of a register size and the virtual_grf_sizes array. Among other things, this makes the code more obvious because you don't have to figure out where src_size came from. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Don't use instruction list after calculating the cfg.Matt Turner2014-09-241-6/+6
| | | | | | | | The only trick is changing a break into a return true in register coalescing, since the macro is actually a double loop, and break will do something different than you expect. (Wish I'd realized that earlier!) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Remove cfg-invalidating parameter from invalidate_live_intervals.Matt Turner2014-09-241-1/+1
| | | | | | Everything has been converted to preserve the CFG. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Use basic-block aware insertion/removal functions.Matt Turner2014-08-221-3/+4
| | | | | | | | | To avoid invalidating and recreating the control flow graph. Also stop invalidating the CFG in places we didn't add or remove an instruction. cfg calculations: 202951 -> 80307 (-60.43%) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965/fs: Relax interference check in register coalescing.Matt Turner2014-07-151-11/+12
| | | | | | | | | | | | | A similar attempt was made in commit 5ff1e446 and was reverted in commit a39428cf after causing a regression in an ES 3 conformance test. The test still passes after this commit. total instructions in shared programs: 1994827 -> 1992858 (-0.10%) instructions in affected programs: 128247 -> 126278 (-1.54%) GAINED: 0 LOST: 1 Acked-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use typed foreach_in_list_safe instead of foreach_list_safe.Matt Turner2014-07-011-3/+1
| | | | Acked-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Use typed foreach_in_list instead of foreach_list.Matt Turner2014-07-011-8/+3
| | | | Acked-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: Support register coalescing on LOAD_PAYLOAD operands.Matt Turner2014-06-171-10/+54
|
* i965/fs: Loop from 0 to inst->sources, not 0 to 3.Matt Turner2014-06-011-1/+1
| | | | | | Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* Revert "i965/fs: Simplify interference scan in register coalescing."Matt Turner2014-05-261-9/+13
| | | | | | | This reverts commit 5ff1e446d44bb9d50f84883c7058635cb070e069. Cc: "10.2" <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77704
* Revert "i965/fs: Give up in interference check if we see a WHILE."Matt Turner2014-05-261-1/+1
| | | | | | This reverts commit 55de1c035cbca2b7087b3aa21a8c3dfc900a4ad9. Cc: "10.2" <mesa-stable@lists.freedesktop.org>
* Revert "i965/fs: Reduce restrictions on interference in register coalescing."Matt Turner2014-05-261-0/+13
| | | | | | | This reverts commit f770123f58b46459e8dbd27525162ee8ba89f30b. Cc: "10.2" <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78692
* i965/fs: Reduce restrictions on interference in register coalescing.Matt Turner2014-04-181-13/+0
| | | | | | | | | | | | | We previously only allowed coalescing registers that interfere (i.e., whose live ranges overlap) if the destination register's live range was entirely inside the source's live range. This is unnecessary -- we only need to check for interfering writes in the intersection of their live ranges. total instructions in shared programs: 1639470 -> 1638453 (-0.06%) instructions in affected programs: 84751 -> 83734 (-1.20%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Give up in interference check if we see a WHILE.Matt Turner2014-04-181-1/+1
| | | | | | | | | | | Rather than any old control flow. Muchnick's algorithm just checks for interfering writes between the MOV and the end of the program. Handling this when you have backward branches is hard, so don't, but there's no reason to bail if you see forward branches. instructions in affected programs: 4270 -> 4248 (-0.52%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Simplify interference scan in register coalescing.Matt Turner2014-04-181-13/+9
| | | | | | | | | | | | | | | We were starting at the beginning of the instruction list, rather than with the MOV instruction itself. This allows us to coalesce after control flow. Excluding the shaders from an unreleased title, the shader-db results: total instructions in shared programs: 1603791 -> 1594215 (-0.60%) instructions in affected programs: 678772 -> 669196 (-1.41%) GAINED: 5 LOST: 0 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Unindent can_coalesce_vars().Matt Turner2014-04-181-27/+28
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Recognize nop-MOV instructions early.Matt Turner2014-04-181-3/+17
| | | | | | | | | | And avoid rewriting other instructions unnecessarily. Removes a few self-moves we weren't able to handle because they were components of a large VGRF. instructions in affected programs: 830 -> 826 (-0.48%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Only sweep NOPs if register coalescing made progress.Matt Turner2014-04-181-7/+9
| | | | | | Otherwise there's nothing to do. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* Revert "i965/fs: Only sweep NOPs if register coalescing made progress."Matt Turner2014-04-151-8/+7
| | | | | | This reverts commit f092e8951ce5212ba3cbb382ce3a6666eb6c9bed. Didn't mean to push this...
* i965/fs: Only sweep NOPs if register coalescing made progress.Matt Turner2014-04-151-7/+8
| | | | Otherwise there's nothing to do.
* i965/fs: Reset reg_from when we can't coalesce.Matt Turner2014-04-111-0/+1
| | | | | | | | | | | | Not setting this would prevented coalescing after a failed attempt if the sources for both MOVs were the same. total instructions in shared programs: 1654531 -> 1650224 (-0.26%) instructions in affected programs: 423167 -> 418860 (-1.02%) GAINED: 2 LOST: 0 Reviewed-by: Eric Anholt <eric@anholt.net>
* i965/fs: Remove left-over 'removed' variable.Matt Turner2014-04-071-13/+8
| | | | | | | | I think this was used for coalescing out partly dead large virtual registers, but the patch that enabled that caused regressions and didn't make it upstream. Reviewed-by: Eric Anholt <eric@anholt.net>
* i965/fs: Check for interference after finding all channels.Matt Turner2014-04-071-11/+26
| | | | | | | | | | | | It's more likely that we won't find writes to all channels than one will interfere, and calculating interference is more expensive. This change will also help prepare for coalescing load_payload instructions' operands. Also update the live intervals for all channels, and not just the last that we saw. Reviewed-by: Eric Anholt <eric@anholt.net>
* i965/fs: Split out can_coalesce_vars() function.Matt Turner2014-04-051-44/+47
| | | | Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965/fs: Split out is_coalesce_candidate() function.Matt Turner2014-04-051-14/+23
| | | | Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965/fs: Split fs_visitor::register_coalesce() into its own file.Matt Turner2014-04-051-0/+208
The function has gotten large, and brw_fs.cpp is the largest source file in the driver. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>