summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: Drop fs_inst::overwrites_reg() in favor of regions_overlap().Francisco Jerez2016-09-141-1/+3
| | | | | | | | | | | fs_inst::overwrites_reg is rather easy to misuse because it cannot tell how large the register region starting at 'reg' is, so in cases where the destination region starts after 'reg' it may give a misleading result. regions_overlap() is somewhat more verbose to use but handles arbitrary overlap correctly so it should generally be used instead. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Replace fs_inst::regs_written with ::size_written field in bytes.Francisco Jerez2016-09-141-2/+2
| | | | | | | | | | | | | | The previous regs_written field can be recovered by rewriting each rvalue reference of regs_written like 'x = i.regs_written' to 'x = DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference like 'i.regs_written = x' to 'i.size_written = x * reg_unit'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Add wrapper functions for fs_inst::regs_read and ::regs_written.Francisco Jerez2016-09-141-4/+4
| | | | | | | | | | | | | | This is in preparation for dropping fs_inst::regs_read and ::regs_written in favor of more accurate alternatives expressed in byte units. The main reason these wrappers are useful is that a number of optimization passes implement dataflow analysis with register granularity, so these helpers will come in handy once we've switched register offsets and sizes to the byte representation. The wrapper functions will also make sure that GRF misalignment (currently neglected by most of the back-end) is taken into account correctly in the calculation of regs_read and regs_written. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Replace fs_reg::reg_offset with fs_reg::offset expressed in bytes.Francisco Jerez2016-09-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fs_reg::offset field in byte units introduced in this patch is a more straightforward alternative to the current register offset representation split between fs_reg::reg_offset and ::subreg_offset. The split representation makes it too easy to forget about one of the offsets while dealing with the other, which has led to multiple back-end bugs in the past. To make the matter worse the unit reg_offset was expressed in was rather inconsistent, for uniforms it would be expressed in either 4B or 16B units depending on the back-end, and for most other things it would be expressed in 32B units. This encodes reg_offset as a new offset field expressed consistently in byte units. Each rvalue reference of reg_offset in existing code like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and each lvalue reference like 'r.reg_offset = x' is rewritten to 'r.offset = r.offset % reg_unit + x * reg_unit'. Because the change affects a lot of places and is rather non-trivial to verify due to the inconsistent value of reg_unit, I've tried to avoid making any additional changes other than applying the rewrite rule above in order to keep the patch as simple as possible, sometimes at the cost of introducing obvious stupidity (e.g. algebraic expressions that could be simplified given some knowledge of the context) -- I'll clean those up later on in a second pass. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Don't CSE render target messages with different target index.Francisco Jerez2016-08-251-0/+1
| | | | | | | | | We weren't checking the fs_inst::target field when comparing whether two instructions are equal. For FB writes it doesn't matter because they aren't CSE-able anyway, but this would have become a problem with FB reads which are expression-like instructions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Define logical framebuffer read opcode and lower it to physical reads.Francisco Jerez2016-08-251-0/+1
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Let CSE handle logical sampler sends as expressions.Francisco Jerez2016-05-291-0/+13
| | | | | | | This will prevent some shader-db regressions when we start plumbing logical sends through the optimizer. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Clean up remaining uses of fs_inst::reads_flag and ::writes_flag.Francisco Jerez2016-05-271-3/+3
| | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Expose arbitrary channel execution groups to the IR.Francisco Jerez2016-05-271-2/+2
| | | | | | | | | | This generalizes the current fs_inst::force_sechalf flag to allow specifying channel enable groups other than 0 or 8. At some point it will likely make sense to fix the vec4 generator to support arbitrary execution groups and then move the definition of fs_inst::group into backend_instruction (e.g. so we can do FP64 in the VEC4 back-end). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Remove extract virtual opcodes.Francisco Jerez2016-05-271-2/+0
| | | | | | | These can be easily represented in the IR as a MOV instruction with strided source so they seem rather redundant. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Hide varying pull constant load message setup behind logical opcode.Francisco Jerez2016-05-271-1/+1
| | | | | | | | This will allow the SIMD lowering pass to split 32-wide varying pull constant loads (not natively supported by the hardware) into 16-wide instructions. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Fix CSE temporary copy for some LOAD_PAYLOAD corner cases.Francisco Jerez2016-05-271-1/+2
| | | | | | | | | | | | | If the LOAD_PAYLOAD instruction only has header sources it's possible for the number of registers written to be less than or equal to the SIMD component size, in which case it would take the single-MOV path at the bottom which would cause the channel enable masks to be applied incorrectly to the header contents and/or cause it to write past the end of the allocated temporary. If the instruction is either LOAD_PAYLOAD or doesn't write exactly one component the MOV path is going to mess up the program so just don't use it. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: fix dst width calculation in CSEConnor Abbott2016-05-101-1/+2
| | | | | | | | v2 (Sam): - Fix line width (Topi). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965/fs: add PACK opcodeConnor Abbott2016-05-101-0/+1
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Don't CSE negated multiplies with saturation.Matt Turner2016-02-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | It's not correct to CSE these multiplies mul.sat dst1, -a, b mul.sat dst2, a, b by emitting a negated MOV from dst1 to dst2: mul.sat dst1, -a, b mov dst2, -dst1 Take 2.0*2.0 for example. The first multiply would produce 0.0 and the second would produce 1.0. Fixes bad generated code in 18 to 22 shaders: instructions in affected programs: 432 -> 464 (7.41%) helped: 4 HURT: 18 Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: Implement support for extract_word.Matt Turner2016-02-011-0/+2
| | | | | | The vec4 backend will lower it. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: respect force_sechalf/force_writemask_all in CSEConnor Abbott2015-11-231-0/+2
| | | | | | Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Introduce a MOV_INDIRECT opcode.Kenneth Graunke2015-11-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The geometry and tessellation control shader stages both read from multiple URB entries (one per vertex). The thread payload contains several URB handles which reference these separate memory segments. In GLSL, these inputs are represented as per-vertex arrays; the outermost array index selects which vertex's inputs to read. This array index does not necessarily need to be constant. To handle that, we need to use indirect addressing on GRFs to select which of the thread payload registers has the appropriate URB handle. (This is before we can even think about applying the pull model!) This patch introduces a new opcode which performs a MOV from a source using VxH indirect addressing (which allows each of the 8 SIMD channels to select distinct data.) Based on a patch by Jason Ekstrand. v2: Rename from INDIRECT_THREAD_PAYLOAD_MOV to MOV_INDIRECT; make it a bit more generic. Use regs_read() instead of hacking up the register allocator. (Suggested by Jason Ekstrand.) v3: Fix regs_read() to be more accurate for small unaligned regions. Also rebase on Matt's work. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v3] Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> [v1]
* i965: Replace HW_REG with ARF/FIXED_GRF.Matt Turner2015-11-131-1/+2
| | | | | | | | | | | | | | | | HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to look at different fields for type, abs, negate, writemask, swizzle, and a second file. They also caused annoying problems like immediate sources being considered scheduling barriers (commit 6148e94e2) and other such nonsense. Instead use ARF/FIXED_GRF/MRF for fixed registers in those files. After a sufficient amount of time has passed since "GRF" was used, we can rename FIXED_GRF -> GRF, but doing so now would make rebasing awful. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Rename GRF to VGRF.Matt Turner2015-11-131-3/+3
| | | | | | | | | | The 2-bit hardware register file field is ARF, GRF, MRF, IMM. Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to mean an assigned general purpose register. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use brw_reg's nr field to store register number.Matt Turner2015-11-131-1/+1
| | | | | | | | | | | | In addition to combining another field, we get replace silliness like "reg.reg" with something that actually makes sense, "reg.nr"; and no one will ever wonder again why dst.reg isn't a dst_reg. Moving the now 16-bit nr field to a 16-bit boundary decreases code size by about 3k. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use immediate storage in inherited brw_reg.Matt Turner2015-11-131-8/+8
| | | | | Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Make 'dw1' and 'bits' unnamed structures in brw_reg.Matt Turner2015-11-131-8/+8
| | | | | | | | | | | | | | | | | | | | | | Generated by sed -i -e 's/\.bits\././g' *.c *.h *.cpp sed -i -e 's/dw1\.//g' *.c *.h *.cpp and then reverting changes to comments in gen7_blorp.cpp and brw_fs_generator.cpp. There wasn't any utility offered by forcing the programmer to list these to access their fields. Removing them will reduce churn in future commits. This is C11 (and gcc has apparently supported it for sometime "compatibility with other compilers") See https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Disable CSE optimization for untyped & typed surface readsJordan Justen2015-10-221-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | An untyped surface read is volatile because it might be affected by a write. In the ES31-CTS.compute_shader.resources-max test, two back to back read/modify/writes of an SSBO variable looked something like this: r1 = untyped_surface_read(ssbo_float) r2 = r1 + 1 untyped_surface_write(ssbo_float, r2) r3 = untyped_surface_read(ssbo_float) r4 = r3 + 1 untyped_surface_write(ssbo_float, r4) And after CSE, we had: r1 = untyped_surface_read(ssbo_float) r2 = r1 + 1 untyped_surface_write(ssbo_float, r2) r4 = r1 + 1 untyped_surface_write(ssbo_float, r4) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Define virtual instruction to calculate the high 32 bits of a multiply.Francisco Jerez2015-08-061-0/+1
| | | | | | | | | | | This instruction will translate to the MUL/MACH sequence that computes the high 32-bits of the result of a 64-bit multiply. Before Gen8 integer operations that used the accumulator were limited to 8-wide, but the SIMD lowering pass can easily be hooked up to sidestep this limitation, we just need a virtual opcode to represent the MUL/MACH sequence in the IR. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Switch opt_cse() to the fs_builder constructor from instruction.Francisco Jerez2015-07-291-8/+8
| | | | Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs: Relax fs_builder channel group assertion when force_writemask_all ↵Francisco Jerez2015-07-011-2/+2
| | | | | | | | | | | | | | | | | | | is on. This assertion was meant to catch code inadvertently escaping the control flow jail determined by the group of channel enable signals selected by some caller, however it seems useful to be able to increase the default execution size as long as force_writemask_all is enabled, because force_writemask_all is an explicit indication that there is no longer a one-to-one correspondence between channels and SIMD components so the restriction doesn't apply. In addition reorder the calls to fs_builder::group and ::exec_all in a couple of places to make sure that we don't temporarily break this invariant in the future for instructions with exec_size higher than the dispatch width. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs: Remove the width field from fs_regJason Ekstrand2015-06-301-5/+1
| | | | | | | | | | | | | As of now, the width field is no longer used for anything. The width field "seemed like a good idea at the time" but is actually entirely redundant with the instruction's execution size. Initially, it gave us the ability to easily set the instructions execution size based entirely on register widths. With the builder, we can easiliy set the sizes explicitly and the width field doesn't have as much purpose. At this point, it's just redundant information that can get out of sync so it really needs to go. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Use exec_size instead of dst.width for computing component sizeJason Ekstrand2015-06-301-1/+1
| | | | | | | | There are a variety of places where we use dst.width / 8 to compute the size of a single logical channel. Instead, we should be using exec_size. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Add a builder argument to offset()Jason Ekstrand2015-06-301-1/+1
| | | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Migrate opt_cse to the IR builder.Francisco Jerez2015-06-091-15/+12
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Don't drop force_writemask_all and _sechalf when copying a CSE ↵Francisco Jerez2015-06-091-1/+2
| | | | | | | | | | temporary. LOAD_PAYLOAD instructions need the same treatment as any other generator instructions, at least FB writes and typed surface messages will need a payload built with non-zero execution controls. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Take into account all instruction fields in CSE instructions_match().Francisco Jerez2015-06-091-8/+12
| | | | | | | | Most of these fields affect the behaviour of the instruction so it could actually break the program if we CSE a pair of otherwise matching instructions with different values of these fields. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Rework the fs_visitor LOAD_PAYLOAD instructionJason Ekstrand2015-05-061-7/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The newly reworked instruction is far more straightforward than the original. Before, the LOAD_PAYLOAD instruction was lowered by a the complicated and broken-by-design pile of heuristics to try and guess force_writemask_all, exec_size, and a number of other factors on the sources. Instead, we use the header_size on the instruction to denote which sources are "header sources". Header sources are required to be a single physical hardware register that is copied verbatim. The registers that follow are considered the actual payload registers and have a width that correspond's to the LOAD_PAYLOAD's exec_size and are treated as being per-channel. This gives us a fairly straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and force_writemask_all modifiers as the LOAD_PAYLOAD operation itself. 3) In order to accommodate older gens that need interleaved colors, lower_load_payload detects when the destination is a COMPR4 register and automatically interleaves the non-header sources. The lower_load_payload pass does the right thing here regardless of whether or not the hardware actually supports COMPR4. This patch commit itself is made up of a bunch of smaller changes squashed together. Individual change descriptions follow: i965/fs: Rework fs_visitor::LOAD_PAYLOAD We rework LOAD_PAYLOAD to verify that all of the sources that count as headers are, indeed, exactly one register and that all of the non-header sources match the destination width. We then take the exec_size for LOAD_PAYLOAD directly from the destination width. i965/fs: Make destinations of load_payload have the appropreate width i965/fs: Rework fs_visitor::lower_load_payload v2: Don't allow the saturate flag on LOAD_PAYLOAD instructions i965/fs_cse: Support the new-style LOAD_PAYLOAD i965/fs_inst::is_copy_payload: Support the new-style LOAD_PAYLOAD i965/fs: Simplify setup_color_payload Previously, setup_color_payload was a a big helper function that did a lot of gen-specific special casing for setting up the color sources of the LOAD_PAYLOAD instruction. Now that lower_load_payload is much more sane, most of that complexity isn't needed anymore. Instead, we can do a simple fixup pass for color clamps and then just stash sources directly in the LOAD_PAYLOAD. We can trust lower_load_payload to do the right thing with respect to COMPR4. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Make LOAD_PAYLOAD take a header sizeJason Ekstrand2015-05-061-1/+2
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs_inst: Add an is_copy_payload helperJason Ekstrand2015-05-061-19/+3
| | | | | | | | | | This commit adds a new is_copy_payload helper to fs_inst that takes the place of the similarly named functions in cse and register coalesce. The two is_copy_payload functions in CSE and register coalesce were subtly different and potentially subtly broken. The new version unifies the two and should be more correct. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Change header_present to header_size in backend_instructionJason Ekstrand2015-05-061-1/+1
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs_cse: Factor out code to create copy instructionsJason Ekstrand2015-05-061-37/+38
| | | | | | | | v2: Get rid of the block parameter and make src a const reference Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Perform basic optimizations on the FIND_LIVE_CHANNEL opcode.Francisco Jerez2015-05-041-0/+1
| | | | | | | | v2: Save some CPU cycles by doing 'return progress' rather than 'depth++' in the discard jump special case. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Perform basic optimizations on the BROADCAST opcode.Francisco Jerez2015-05-041-0/+1
| | | | | | v2: Style fixes. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Correct mistake in determining whether a MUL is negated.Matt Turner2015-04-141-1/+1
| | | | | | | | | a * b is equivalent to -a * -b, and the previous code was failing at that. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89961 Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: Allow CSE to handle MULs with negated arguments.Matt Turner2015-03-311-5/+37
| | | | | | | | | | | | | | | | | | | | | mul x, -y is equivalent to mul -x, y; and mul x, y is the negation of mul x, -y. With NIR: total instructions in shared programs: 6167779 -> 6161193 (-0.11%) instructions in affected programs: 983511 -> 976925 (-0.67%) helped: 4106 HURT: 16 GAINED: 18 LOST: 7 Without NIR: total instructions in shared programs: 6192323 -> 6185299 (-0.11%) instructions in affected programs: 987875 -> 980851 (-0.71%) helped: 4146 HURT: 16 GAINED: 16 LOST: 0
* i965: De-duplicate is_expression_commutative() functions.Kenneth Graunke2015-03-151-23/+1
| | | | | | | | Create a backend_inst::is_commutative() method to replace two static functions that did the exact same thing. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Factor out virtual GRF allocation to a separate object.Francisco Jerez2015-02-101-1/+1
| | | | | | | | | | | | | Right now virtual GRF book-keeping and allocation is performed in each visitor class separately (among other hundred different things), leading to duplicated logic in each visitor and preventing layering as it forces any code that manipulates i965 IR and needs to allocate virtual registers to depend on the specific visitor that happens to be used to translate from GLSL IR. v2: Use realloc()/free() to allocate VGRF book-keeping arrays (Connor). Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Don't consider null dst instructions as matching non-null dst.Matt Turner2015-01-151-1/+2
| | | | | | | | | | | | | | | | | | | When performing common subexpression elimination on instructions with non-null destinations we emit a MOV to copy the result to a new register that must have no other uses. In the case of: cmp.g.f0.0(8) null:D, vgrf43:F, 0.500000f ... cmp.g.f0.0(8) vgrf113:D, vgrf43:F, 0.500000f we put the first instruction in the AEB and decided that we could reuse its result when we found the second. Unfortunately, that meant that we'd emit a MOV from the first's destination, which is null. Don't do anything if the entry's destination is null and the instruction's destination is non-null. Tested-by: Tapani Pälli <tapani.palli@intel.com>
* i965: Consider SEL.{GE,L} to be commutative operations.Matt Turner2015-01-081-4/+11
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Perform CSE on MOV ..., VF instructions.Matt Turner2014-12-051-5/+11
| | | | | | | | | | | Safe from causing optimization loops, since we don't constant propagate VF arguments. (for this and the previous patch): total instructions in shared programs: 4289075 -> 4271932 (-0.40%) instructions in affected programs: 1616779 -> 1599636 (-1.06%) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Combine offset/texture_offset fields.Matt Turner2014-11-211-1/+1
| | | | | | | | texture_offset was only used by some texturing operations, and offset was only used by spill/unspill and some URB operations. These fields are never used at the same time. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs: Perform CSE on MAD instructions with final arguments switched.Matt Turner2014-10-291-1/+5
| | | | | | | | Multiplication is commutative. instructions in affected programs: 48314 -> 47954 (-0.75%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Allow CSE on Gen4-5 unary math.Kenneth Graunke2014-10-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Due to the implicit move-from-GRF, unary math looks a lot like the Gen6+ math instruction: it's a single instruction (SEND) with a GRF source. The difference is that it also implicitly clobbers a message register. The only visible effect is that CSE will remove the MRF-clobbering from later math operations. This should be fine; compute_to_mrf and remove_redundant_mrf_writes don't look at the values populated by implied writes, so they can't rely on those values being present. Less interference may actually help those passes make more progress. Binary math is still problematic, since it involves a separate MOV instruction to load the second operand. We continue disallowing CSE for binary math operations. total instructions in shared programs: 3340303 -> 3340100 (-0.01%) instructions in affected programs: 26927 -> 26724 (-0.75%) Nothing hurt, gained, or lost. ~6% reduction on a few shaders. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>