summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: Use exec_size instead of dst.width for computing component sizeJason Ekstrand2015-06-301-2/+2
| | | | | | | | There are a variety of places where we use dst.width / 8 to compute the size of a single logical channel. Instead, we should be using exec_size. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965: Rename backend_visitor to backend_shaderJason Ekstrand2015-05-281-9/+9
| | | | | | | | The backend_shader class really is a representation of a shader. The fact that it inherits from ir_visitor is somewhat immaterial. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Add typed surface access opcodes.Francisco Jerez2015-05-041-0/+3
| | | | | Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Add untyped surface write opcode.Francisco Jerez2015-05-041-0/+1
| | | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use device_info instead of the context in instruction schedulingJason Ekstrand2015-04-221-11/+10
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Make scheduler cycle estimates use the proper stage name.Kenneth Graunke2015-02-191-5/+6
| | | | | | | | | | | | | | | Previously, the vec4 backend labeled shaders as "vec4" - now it uses the specific names "VS" and "GS". The FS backend now correctly prints "VS" for vertex shaders (rather than "fs"). It also prints "FS" instead of "fs" for fragment shaders; preserving that behavior didn't seem essential. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965/vec4: Don't infer MRF dependencies for send from GRF instructions.Francisco Jerez2015-02-101-14/+18
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Fix the scheduler to take into account reads and writes of ↵Francisco Jerez2015-02-101-5/+10
| | | | | | | | multiple registers. v2: Avoid nested ternary operators in vec4_instruction::regs_read(). (Matt) Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Remove dependency of fs_inst on the visitor class.Francisco Jerez2015-02-101-4/+4
| | | | | | The fs_visitor argument of fs_inst::regs_read() wasn't used at all. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Factor out virtual GRF allocation to a separate object.Francisco Jerez2015-02-101-5/+5
| | | | | | | | | | | | | Right now virtual GRF book-keeping and allocation is performed in each visitor class separately (among other hundred different things), leading to duplicated logic in each visitor and preventing layering as it forces any code that manipulates i965 IR and needs to allocate virtual registers to depend on the specific visitor that happens to be used to translate from GLSL IR. v2: Use realloc()/free() to allocate VGRF book-keeping arrays (Connor). Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Don't make instructions with a null dest a barrier to scheduling.Matt Turner2015-01-231-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we properly track accumulator dependencies, the scheduler is able to schedule instructions between the mach and mov in the common the integer multiplication pattern: mul acc0, x, y mach null, x, y mov dest, acc0 Since a null destination implies no dependency on the destination, we can also safely schedule instructions (that don't write the accumulator) between the mul and mach. GAINED: 103 LOST: 43 Causes one program to spill (643 -> 1076 instructions). I committed this patch last year (commit 42a26cb5) but reverted it (commit 0d3f83f4) after inexplicable artifacts in Kerbal Space Program (bug 78648). Tapani reapplied this patch and could not reproduce the bug with current Mesa. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965: Remove tabs from instruction scheduler.Matt Turner2014-12-021-98/+98
| | | | Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
* i965/fs: Use instruction execution sizes instead of heuristicsJason Ekstrand2014-09-301-3/+1
| | | | | Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs_reg: Allocate double the number of vgrfs in SIMD16 modeJason Ekstrand2014-09-301-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is actually the squash of a bunch of different changes. Individual commit titles follow: i965/fs: Always 2-align registers SIMD16 for gen <= 5 i965/fs: Use the register width when applying offsets This reworks both byte_offset() and offset() to be more intelligent. The byte_offset() function now supports offsets bigger than 32. The offset() function uses the byte_offset() function together with the register width and the type size to offset the register by the correct amount. i965/fs: Change regs_read to be in hardware registers i965/fs: Change regs_written to be actual hardware registers i965/fs: Properly handle register widths in LOAD_PAYLOAD The LOAD_PAYLOAD instruction is a bit special because it collects a bunch of registers (with possibly different widths) into a single payload block. Once the payload is constructed, it's treated as a single block of data and most of the information such as register widths doesn't matter anymore. In particular, the offset of any particular source register is the accumulation of the sizes of the previous source registers. i965/fs: Properly set writemasks in LOAD_PAYLOAD i965/fs: Handle register widths in demote_pull_constants i965/fs: Get rid of implicit register doubling in the allocator i965/fs: Reserve enough registers for PLN instructions i965/fs: Make sources and destinations interfere in 16-wide i965/fs: Properly handle register widths in CSE i965/fs: Properly handle register widths in register_coalesce i965/fs: Properly handle widths in copy propagation i965/fs: Properly handle register widths in VARYING_PULL_CONSTANT_LOAD i965/fs: Properly handle register widths and odd register sizes in spilling i965/fs: Don't waste a register on texture lookups for gen >= 7 Previously, we were waisting a register in SIMD16 mode because we could only allocate registers in pairs. Now that we can allocate and address odd-sized registers, let's get rid of this special-case. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Make instruction lists local to the bblocks.Matt Turner2014-09-241-5/+5
| | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Remove now unneeded calls to calculate_cfg().Matt Turner2014-09-241-1/+0
| | | | | | | Now that nothing invalidates the CFG, we can calculate_cfg() immediately after emit_fb_writes()/emit_thread_end() and never again. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Remove cfg-invalidating parameter from invalidate_live_intervals.Matt Turner2014-09-241-2/+2
| | | | | | Everything has been converted to preserve the CFG. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Preserve the CFG in instruction scheduling.Matt Turner2014-09-241-32/+42
| | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Remove artificial dependency between math instructions.Matt Turner2014-07-081-1/+2
| | | | | | ... on Gen6+. I'm not actually sure which class Gen6 fits into. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Track dependencies in instruction scheduling per reg offset.Matt Turner2014-07-081-8/+15
| | | | | | | | | | | | | | | | | | | | | Previously instruction scheduling tracked dependencies on a per-register basis. This meant that there was an artificial dependency between interpolation instructions writing into the same virtual register. Instruction scheduling would insert a number of instructions between the two instructions in this example, when they are actually independent. linterp vgrf8+0.0:F, hw_reg2:F, hw_reg3:F, hw_reg6:F linterp vgrf8+1.0:F, hw_reg2:F, hw_reg3:F, hw_reg6+16:F This lead to cases where the first texture coordinate is interpolated at the beginning of the shader, but the second is done immediately before the texture operation that uses it as a source. After this change, the artificial dependency is removed and the interpolation instructions are scheduled together. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use typed foreach_in_list_safe instead of foreach_list_safe.Matt Turner2014-07-011-2/+1
| | | | Acked-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Use typed foreach_in_list instead of foreach_list.Matt Turner2014-07-011-18/+8
| | | | Acked-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Use brw->gen in some generation checks.Matt Turner2014-06-111-5/+6
| | | | | | | Will simplify the automated conversion if we want to allow compiling the driver for a single generation. Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
* i965/fs: Loop from 0 to inst->sources, not 0 to 3.Matt Turner2014-06-011-5/+5
| | | | | | Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* Revert "i965: Don't make instructions with a null dest a barrier to scheduling."Matt Turner2014-05-261-8/+4
| | | | | | | This reverts commit 42a26cb5e441a01d5288b299980f23affaad53fe. Cc: "10.2" <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78648
* i965: Don't treat HW_REGs as barriers if they're immediates.Matt Turner2014-05-251-4/+12
| | | | | | | | We had a handful of cases where we'd used brw_imm_*() to generate an immediate, rather than fs_reg(). We shouldn't do that but we shouldn't limit scheduling flexibility on account of immediate arguments either. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Relax accumulator dependency scheduling on Gen < 6Iago Toral Quiroga2014-05-131-59/+25
| | | | | | | | | | | Many instructions implicitly update the accumulator on Gen < 6. The instruction scheduling code just calls add_barrier_deps() for each accumulator access on these platforms, but a large class of operations don't actually update the accumulator -- mostly move and logical instructions. Teaching the scheduling code about this would allow more flexibility to schedule instructions. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77740 Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Don't make instructions with a null dest a barrier to scheduling.Matt Turner2014-04-161-4/+8
| | | | | | | | | | | | | | | | Now that we properly track accumulator dependencies, the scheduler is able to schedule instructions between the mach and mov in the common the integer multiplication pattern: mul acc0, x, y mach null, x, y mov dest, acc0 Since a null destination implies no dependency on the destination, we can also safely schedule instructions (that don't write the accumulator) between the mul and mach. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Add writes_accumulator flagJuha-Pekka Heikkila2014-04-161-0/+94
| | | | | | | | | | | | | | | | | | | | | | | Our hardware has an "accumulator" register, which can be used to store intermediate results across multiple instructions. Many instructions can implicitly write a value to the accumulator in addition to their normal destination register. This is enabled by the "AccWrEn" flag. This patch introduces a new flag, inst->writes_accumulator, which allows us to express the AccWrEn notion in the IR. It also creates a n ALU2_ACC macro to easily define emitters for instructions that implicitly write the accumulator. Previously, we only supported implicit accumulator writes from the ADDC, SUBB, and MACH instructions. We always enabled them on those instructions, and left them disabled for other instructions. To take advantage of the MAC (multiply-accumulate) instruction, we need to be able to set AccWrEn on other types of instructions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
* i965/vec4: Rename depends_on_flags() to reads_flag().Matt Turner2014-03-241-2/+2
| | | | | | To be consistent with the fs backend. Reviewed-by: Eric Anholt <eric@anholt.net>
* i965/vec4: Add and use vec4_instruction::writes_flag().Matt Turner2014-03-241-2/+2
| | | | | | | | To be consistent with the fs backend. Also the instruction scheduler incorrectly considered SEL with a conditional modifier to read the flag register. Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Disassemble 3-src operands widths' correctly.Matt Turner2014-03-101-8/+8
| | | | | | | <4,1,1> isn't a real thing. We meant <4,4,1>, i.e., each component of the whole register. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Move compiler debugging output to stderr.Eric Anholt2014-02-221-7/+9
| | | | | | Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Replace 8-wide and 16-wide with SIMD8 and SIMD16.Eric Anholt2014-01-171-3/+3
| | | | | | | | Those are the terms used in the docs, and think "n-wide" was something I just happened to say. Note that shader-db needs updating for the INTEL_DEBUG=fs parsing. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: Make the first pre-allocation heuristic be the post heuristic.Eric Anholt2013-11-221-2/+2
| | | | | | | | | | | | | | | | | | | | I recently made us try two different things that tried to reduce register pressure so that we would be more likely to allocate successfully. But now that we have the logic for trying two, we can make the first thing we try be the normal, not-prioritizing-register-pressure heuristic. This means one less scheduling pass in the common case of that heuristic not producing spills, plus the best schedule we know how to produce, if that one happens to succeed. This is important, because our register allocation produces a lot of possibly avoidable dependencies for the post-register-allocation schedule, despite ra_set_allocate_round_robin(). GLB2.7: 1.04127% +/- 0.732461% fps improvement (n=31) nexuiz: No difference (n=5) lightsmark: 0.838512% +/- 0.300147% fps improvement (n=86) minecraft apitrace: No difference (n=15) Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965/vec4: Add invalidate_live_intervals method.Matt Turner2013-11-201-1/+1
| | | | | Reviewed-by: Paul Berry <stereotype441@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Add missing break in SHADER_OPCODE_GEN7_SCRATCH_READ case.Vinson Lee2013-11-151-0/+2
| | | | | | | | | Fixes "Missing break in switch" defect reported by Coverity. Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Cc: "10.0" <mesa-stable@lists.freedesktop.org>
* i965: Initialize schedule_node::delay.Vinson Lee2013-11-141-0/+1
| | | | | | | Fixes "Uninitialized scalar field" defect reported by Coverity. Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
* i965/fs: Try a different pre-scheduling heuristic if the first spills.Eric Anholt2013-11-121-39/+46
| | | | | | | | | | | | | | | | | | | | | | | Since LIFO fails on some shaders in one particular way, and non-LIFO systematically fails in another way on different kinds of shaders, try them both, and pick whichever one successfully register allocates first. Slightly prefer non-LIFO in case we produce extra dependencies in register allocation, since it should start out with fewer stalls than LIFO. This is madness, but I haven't come up with another way to get unigine tropics to not spill while keeping other programs from not spilling and retaining the non-unigine performance wins from texture-grf. total instructions in shared programs: 1626728 -> 1626288 (-0.03%) instructions in affected programs: 1015 -> 575 (-43.35%) GAINED: 50 LOST: 0 Improves Unigine Tropics performance by 14.5257% +/- 0.241838% (n=38) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70445 Cc: "10.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Ignore actual latency pre-reg-alloc.Eric Anholt2013-11-121-21/+29
| | | | | | | | | | | | | | | | We care about depth-until-program-end, as a proxy for "make sure I schedule those early instructions that open up the other things that can make progress while keeping register pressure low", not actual latency (since we're relying on the post-register-alloc scheduling to actually schedule for the hardware). total instructions in shared programs: 1609931 -> 1609931 (0.00%) instructions in affected programs: 0 -> 0 GAINED: 55 LOST: 43 Cc: "10.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Prefer things we know reduce reg pressure when pre-scheduling.Eric Anholt2013-11-121-0/+144
| | | | | | | | | | | | | | | | | | | | | | | | | Previously, the best thing we had was to schedule the things unblocked by the last chosen instruction, on the hope that it would be consuming two values at the end of their live intervals while only producing one new value. But that's just a guess, and we can do counting of usage of registers to know when an instruction would (almost surely) reduce register pressure. The only failure mode I know of in this new dominant heuristic is that inside of a loop when scheduling the iterator (for example), choosing the last use of the iterator doesn't actually reduce the live interval of the iterator. But it doesn't seem to matter in shader-db: total instructions in shared programs: 1618700 -> 1618700 (0.00%) instructions in affected programs: 0 -> 0 GAINED: 13 LOST: 0 Note: The new functions are made virtual because I expect we'll soon lift the pre-regalloc scheduling heuristic over to the vec4 backend. Cc: "10.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/gen7: Add instruction latency estimates for untyped atomics and reads.Francisco Jerez2013-11-041-0/+39
| | | | | | | | The latency information has been obtained empirically from measurements taken on Haswell and Ivy Bridge. Acked-by: Paul Berry <stereotype441@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Add a 'has_side_effects' back-end instruction predicate.Francisco Jerez2013-11-041-1/+5
| | | | | | | | | | | | | This patch fixes the three dead code elimination passes and the VEC4/FS instruction scheduling passes so they leave instructions with side effects alone. At some point it might be interesting to have the instruction scheduler calculate the exact memory dependencies between atomic ops, but they're rare enough that it seems unlikely that it will make any practical difference. Reviewed-by: Paul Berry <stereotype441@gmail.com>
* i965/fs: Use reads_flag and writes_flag methods in the scheduler.Matt Turner2013-10-301-12/+4
| | | | | Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>
* i965/fs: Use the gen7 scratch read opcode when possible.Eric Anholt2013-10-301-0/+12
| | | | | | | | | | This avoids a lot of message setup we had to do otherwise. Improves GLB2.7 performance with register spilling force enabled by 1.6442% +/- 0.553218% (n=4). v2: Use BRW_PREDICATE_NONE, improve a comment (by Paul). Reviewed-by: Paul Berry <stereotype441@gmail.com>
* i965/fs: Prefer more-critical instructions of the same age in LIFO scheduling.Eric Anholt2013-10-301-15/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | When faced with a million instructions that all became candidates at the same time (none of which individually reduce register pressure), the ones on the critical path are more likely to be the ones that will free up some candidates soon. shader-db: total instructions in shared programs: 1681070 -> 1681070 (0.00%) instructions in affected programs: 0 -> 0 GAINED: 40 LOST: 74 Fixes indistinguishable-from-hanging behavior in GLES3conform's uniform_buffer_object_max_uniform_block_size test, regressed by c3c9a8c85758796a26b48e484286e6b6f5a5299a. Given that 93bd627d5a6c485948b94488e6cd53a06b7ebdcf was unlocked by that commit, the net effect on 16-wide program count is still quite positive, and I think this should give us more stable scheduling (less dependency on original instruction emit order). v2: Comment suggestions by Paul Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70943 Reviewed-by: Paul Berry <stereotype441@gmail.com>
* i965: Compute the node's delay time for scheduling.Eric Anholt2013-10-301-0/+28
| | | | | | | | | | | | This is a step in doing scheduling as described in Muchnick (p538). A difference is that our latency function is only specific to one instruction (it doesn't describe, for example, the different latency between WAR of a send's arguments and RAW of a send's destination), but that's changeable later. We also don't separately compute the postorder traversal of the graph, since we can use the setting of the delay field as the "visited" flag. Reviewed-by: Paul Berry <stereotype441@gmail.com>
* i965: Handle deallocation of some private ralloc contexts explicitly.Francisco Jerez2013-10-291-1/+1
| | | | | | | | | These ralloc contexts belong to a specific object and are being deallocated manually from the class destructor. Now that we've hooked up destructors to ralloc there's no reason for them to be children of any other context, and doing so might to lead to double frees under some circumstances. The class destructor has all the responsibility of freeing class memory resources now.
* i965/fs: Stop trying to hack around MRF dep chains on gen7+ LIFO scheduling.Eric Anholt2013-10-251-1/+1
| | | | | | | | | | | | | | This was a hack to avoid choosing to schedule all texturing before consumption of any texture results due to the way dependency chains worked out in the presence of MRFs. On gen7, we don't have MRFs, so the problem doesn't apply, and this was just badly constraining our scheduling. total instructions in shared programs: 1615306 -> 1612534 (-0.17%) instructions in affected programs: 9958 -> 7186 (-27.84%) GAINED: 259 LOST: 9 Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Try not to reverse-schedule things when doing LIFO scheduling.Eric Anholt2013-10-251-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The LIFO plan was simple: Take the most recently made available instructions, and pick those first. But because of the order we were pushing things onto our list of available-to-schedule instructions, it meant that when a set of instructions was made available at the same time (for example, everything at the start of the program that didn't depend on other instructions) we'd schedule them in reverse order. If you had 10 texture calls in a row in your program, each with independent argument setup, we'd set up the last texture call's args and execute it first, even though we wouldn't be able to consume its results until we'd finished the other 9 texture calls (assuming consumption of texture results happens near each texture call, and combines it with another texture result, which is normal for a convolution shader). To fix this, walk the list for doing LIFO in the order that instructions were originally generated in the program, but choose to push newly-made-available instructions to the other end of the list instead. total instructions in shared programs: 1587242 -> 1586290 (-0.06%) instructions in affected programs: 7801 -> 6849 (-12.20%) GAINED: 76 LOST: 67 Thanks to Chia-I Wu for pointing out the bug in my first version of the patch that made it a huge loss. Reviewed-by: Matt Turner <mattst88@gmail.com>