summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: Move region_contained_in to the IR header and fix for non-VGRF files.Francisco Jerez2016-09-141-14/+0
| | | | | | | Also changed the argument names since 'src' and 'dst' don't make that much sense outside of the context of copy propagation. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Change region_contained_in() to use byte units.Francisco Jerez2016-09-141-15/+10
| | | | | | | | | | | This makes the function less annoying to use and more accurate -- We shouldn't propagate a copy into a register region that wasn't fully contained in the destination of the copy (IOW, a source region that wasn't fully defined by the copy) just because the number of registers written and read by each instruction happened to get rounded up to the same GRF multiple. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Simplify copy propagation LOAD_PAYLOAD ACP setup.Francisco Jerez2016-09-141-3/+2
| | | | | | By keeping track of 'offset' in byte units. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Drop fs_inst::overwrites_reg() in favor of regions_overlap().Francisco Jerez2016-09-141-2/+4
| | | | | | | | | | | fs_inst::overwrites_reg is rather easy to misuse because it cannot tell how large the register region starting at 'reg' is, so in cases where the destination region starts after 'reg' it may give a misleading result. regions_overlap() is somewhat more verbose to use but handles arbitrary overlap correctly so it should generally be used instead. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Fix can_propagate_from() source/destination overlap check.Francisco Jerez2016-09-141-2/+2
| | | | | | | | | The previous overlap condition only made sure that the VGRF numbers or GRF-aligned offsets were different without taking the amount of data written and read by the instruction into consideration. Use the regions_overlap() helper instead. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Replace fs_inst::regs_read with ::size_read using byte units.Francisco Jerez2016-09-141-6/+8
| | | | | | | | | | | | | The previous regs_read value can be recovered by rewriting each reference of regs_read() like 'x = i.regs_read(j)' to 'x = DIV_ROUND_UP(i.size_read(j), reg_unit)'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Replace fs_inst::regs_written with ::size_written field in bytes.Francisco Jerez2016-09-141-11/+13
| | | | | | | | | | | | | | The previous regs_written field can be recovered by rewriting each rvalue reference of regs_written like 'x = i.regs_written' to 'x = DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference like 'i.regs_written = x' to 'i.size_written = x * reg_unit'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Replace fs_reg::subreg_offset with fs_reg::offset expressed in bytes.Francisco Jerez2016-09-141-14/+5
| | | | | | | | | | | | | | | | | The fs_reg::subreg_offset and ::offset fields are now redundant, the sub-GRF offset can just be added to the single ::offset field expressed in byte units. The current subreg_offset value can be recovered by applying the following rule: Replace each rvalue reference of subreg_offset like 'x = r.subreg_offset' with 'x = r.offset % reg_unit', and each lvalue reference like 'r.subreg_offset = x' with 'r.offset = ROUND_DOWN_TO(r.offset, reg_unit) + x'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Replace fs_reg::reg_offset with fs_reg::offset expressed in bytes.Francisco Jerez2016-09-141-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fs_reg::offset field in byte units introduced in this patch is a more straightforward alternative to the current register offset representation split between fs_reg::reg_offset and ::subreg_offset. The split representation makes it too easy to forget about one of the offsets while dealing with the other, which has led to multiple back-end bugs in the past. To make the matter worse the unit reg_offset was expressed in was rather inconsistent, for uniforms it would be expressed in either 4B or 16B units depending on the back-end, and for most other things it would be expressed in 32B units. This encodes reg_offset as a new offset field expressed consistently in byte units. Each rvalue reference of reg_offset in existing code like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and each lvalue reference like 'r.reg_offset = x' is rewritten to 'r.offset = r.offset % reg_unit + x * reg_unit'. Because the change affects a lot of places and is rather non-trivial to verify due to the inconsistent value of reg_unit, I've tried to avoid making any additional changes other than applying the rewrite rule above in order to keep the patch as simple as possible, sometimes at the cost of introducing obvious stupidity (e.g. algebraic expressions that could be simplified given some knowledge of the context) -- I'll clean those up later on in a second pass. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* intel: s/brw_device_info/gen_device_info/Jason Ekstrand2016-09-031-1/+1
| | | | | | | | | | | | | Generated by: sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* Revert "i965/fs: Allow scalar source regions on SNB math instructions."Francisco Jerez2016-06-031-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit c1107cec44ab030c7fcc97c67baa12df1cc9d7b5. Apparently the hardware spec text I quoted in the commit message was outright lying about scalar source math being supported on SNB, the hardware seems to load 32 contiguous bits of data for each channel regardless of the regioning mode. Fixes regressions in the following CTS tests (which we didn't catch early due to CTS being temporarily disabled in our CI system): es2-cts.gtf.gl.atan.atan_vec3_frag_xvary es2-cts.gtf.gl.cos.cos_vec2_frag_xvary es2-cts.gtf.gl.atan.atan_vec2_frag_xvary es2-cts.gtf.gl.pow.pow_vec2_frag_xvary_yconsthalf es2-cts.gtf.gl.cos.cos_float_frag_xvary es2-cts.gtf.gl.pow.pow_float_frag_xvary_yconsthalf es2-cts.gtf.gl.atan.atan_vec3_frag_xvaryyvary es2-cts.gtf.gl.pow.pow_vec3_frag_xvary_yconsthalf es2-cts.gtf.gl.cos.cos_vec3_frag_xvary es2-cts.gtf.gl.atan.atan_vec2_frag_xvaryyvary Cc: mesa-stable@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96346 Reported-by: Mark Janes <mark.a.janes@intel.com> Acked-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Allow scalar source regions on SNB math instructions.Francisco Jerez2016-05-311-7/+2
| | | | | | | | | | | | | | | | I haven't found any evidence that this isn't supported by the hardware, in fact according to the SNB hardware spec: "The supported regioning modes for math instructions are align16, align1 with the following restrictions: - Scalar source is supported. [...] - Source and destination offset must be the same, except the case of scalar source." Cc: "12.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Generalize regions_overlap() from copy propagation to handle ↵Francisco Jerez2016-05-291-12/+4
| | | | | | | | | | | | non-VGRF files. This will be useful in several places. The only externally visible difference (other than non-VGRF files being supported now) is that the region sizes are now passed in byte units instead of in GRF units because the loss of precision would have become a problem in the SIMD lowering pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Allow constant propagation into logical send sources.Francisco Jerez2016-05-291-0/+34
| | | | | | | Logical sends are eventually lowered into a series of copies so they can take almost anything as source. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Fix multiple ACP interference during copy propagation.Francisco Jerez2016-05-271-6/+2
| | | | | | | | | | This is more fallout from cf375a3333e54a01462f192202d609436e5fbec8. It's possible for multiple ACP entries to interfere with a given VGRF write, so we need to continue iterating even if an overlapping entry has already been found. Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Avoid constant propagation when the type sizes don't match.Francisco Jerez2016-05-271-0/+8
| | | | | | | | | | | | The case where the source type of the instruction is smaller than the immediate type could be handled by calculating the portion of the immediate read by the instruction (assuming that the source channels are aligned with the destination channels of the copy) and then representing the same value as an immediate of the source type (assuming such an immediate type exists), but the code below doesn't do that, so just bail for the moment. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Fix off-by-one region overlap comparison in copy propagation.Francisco Jerez2016-05-271-2/+2
| | | | | | | | | | | | | | This was introduced in cf375a3333e54a01462f192202d609436e5fbec8 but the blame is mine because the pseudocode I sent in my review comment for the original patch suggesting to do things this way already had the off-by-one error. This may have caused copy propagation to be unnecessarily strict while checking whether VGRF writes interfere with any ACP entries and possibly miss valid optimization opportunities in cases where multiple copy instructions write sequential locations of the same VGRF. Cc: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
* i965/fs: fix copy/constant propagation regioning checksIago Toral Quiroga2016-05-161-8/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | We were not accounting for subreg_offset in the check for the start of the region. Also, fs_reg::regs_read() already takes the stride into account, so we should not multiply its result by the stride again. This was making copy-propagation fail to copy-propagate cases that would otherwise be safe to copy-propagate. Again, this was observed in fp64 code, since there we use stride > 1 often. v2 (Sam): - Rename function and add comment (Jason, Curro). - Assert that register files and number are the same (Jason). - Fix code to take into account the assumption that src.subreg_offset is strictly less than the reg_offset unit (Curro). - Don't pass the registers by value to the function, use 'const fs_reg &' instead (Curro). - Remove obsolete comment in the commit log (Curro). v3 (Sam): - Remove the assert and put the condition in the return (Curro). - Fix function name (Curro). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: fix copy propagation from load payloadIago Toral Quiroga2016-05-161-1/+1
| | | | | | | | We were not considering the case where the load payload is writing to a destination with a reg_offset > 0. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: fix copy propagation of partially invalidated entriesIago Toral Quiroga2016-05-161-8/+27
| | | | | | | | | | | | | | | We were not invalidating entries with a src that reads more than one register when we find writes that overwrite any register read by entry->src after the first. This leads to incorrect copy propagation because we re-use entries from the ACP that have been partially invalidated. Same thing for entries with a dst that writes to more than one register. v2 (Sam): - Improve code by defining regions_overlap() and using it instead of a loop (Curro). Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Reindent register offset calculation of try_copy_propagate().Francisco Jerez2016-05-161-23/+23
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Simplify and fix register offset calculation of try_copy_propagate().Francisco Jerez2016-05-161-31/+9
| | | | | | | | | | | | | | | try_copy_propagate() was special-casing UNIFORM registers (the BAD_FILE, ARF and FIXED_GRF cases are dead, see the assertion at the top of the function) and then failing to take into account the possibility of the instruction reading from a non-zero offset of the destination of the copy. The VGRF/ATTR handling takes it into account correctly, and there is no reason we couldn't use the exact same logic for the UNIFORM file aside from the fact that uniforms represent reg_offset in different units. We can work around that easily by defining an additional constant with the right unit reg_offset is expressed in. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: disallow type change in copy-propagation if types have different sizesIago Toral Quiroga2016-05-161-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because the semantics of source modifiers are type-dependent, the type of the original source of the copy must be kept unmodified while propagating it into some instruction, which implies that we need to have the guarantee that the meaning of the instruction is going to remain the same after we have changed the types. Whenthe size of the new type is different from the size of the old type the new and old instructions cannot possibly be equivalent because the new instruction will be reading more data than the old one was. Prevents that we turn this: load_payload(8) vgrf17:DF, |vgrf4+0.0|:DF 1sthalf mov(8) vgrf18:DF, vgrf17:DF 1sthalf load_payload(8) vgrf5:DF, vgrf18:DF, vgrf20:DF NoMask 1sthalf WE_all load_payload(8) vgrf21:UD, vgrf5+0.4<2>:UD 1sthalf mov(8) vgrf22:UD, vgrf21:UD 1sthalf into: load_payload(8) vgrf17:DF, |vgrf4+0.0|:DF 1sthalf mov(8) vgrf18:DF, |vgrf4+0.0|:DF 1sthalf load_payload(8) vgrf5:DF, |vgrf4+0.0|:DF, |vgrf4+2.0|:DF NoMask 1sthalf WE_all load_payload(8) vgrf21:UD, vgrf5+0.4<2>:UD 1sthalf mov(8) vgrf22:DF, |vgrf4+0.4|<2>:DF 1sthalf where the semantics of the last instruccion have changed. v2 (Curro): - Update commit log and add comment to explain the problem better. - Simplify the condition. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Fix copy propagation of load payload for double operandsIago Toral Quiroga2016-05-161-1/+3
| | | | | | | | | | | | | | | | Specifically, consider the size of the data type of the operand to compute the number of registers written. v2 (Sam): - Fix line width (Jordan). - Add an assert (Jordan). - Use REG_SIZE in the calculation of regs_written (Curro) v3 (Sam): - Fix assert and calculation of regs_written (Curro). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Fix propagation of copies with strided source.Francisco Jerez2016-05-161-10/+20
| | | | | | | | | | | | This has likely been broken since we started propagating copies not matching the offset of the instruction exactly (1728e74957a62b1b4b9fbb62a7de2c12b77c8a75). The copy source stride needs to be taken into account to find out the offset at the origin that corresponds to the offset at the destination of the copy which is being read by the instruction. This has led to program miscompilation on both my SIMD32 branch and Igalia's FP64 branch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: add PACK opcodeConnor Abbott2016-05-101-0/+1
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: don't propagate 64-bit immediatesConnor Abbott2016-05-101-0/+2
| | | | | | | | | They can only be used with 1-src instructions, which practically (since we should've constant-propagated away all 1-src instructions with 64-bit immediates in NIR) means that they must be kept in separate MOV's and can't be propagated. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Pass devinfo pointer to is_3src() helpers.Francisco Jerez2016-05-031-1/+1
| | | | | | | | | | | | | | This is not strictly required for the following changes because none of the three-source opcodes we support at the moment in the compiler back-end has been removed or redefined, but that's likely to change in the future. In any case having hardware instructions specified as a pair of hardware device and opcode number explicitly in all cases will simplify the opcode look-up interface introduced in a subsequent commit, since the opcode number alone is in general ambiguous. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Don't constant-fold RCPJason Ekstrand2016-03-221-15/+0
| | | | | | No shader-db changes on Broadwell Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Don't try copy propagation if constant propagation succeeded.Francisco Jerez2016-03-061-2/+1
| | | | | | | It cannot get any better. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: fix new gcc6 warningsRob Clark2016-02-181-1/+1
| | | | | | | | | | | | | src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp:244:1: warning: ‘void {anonymous}::fs_copy_prop_dataflow::dump_block_data() const’ defined but not used [-Wunused-function] fs_copy_prop_dataflow::dump_block_data() const ^~~~~~~~~~~~~~~~~~~~~ From looking at git history, it looks like this is intended to be unused (ie. just for adding on-demand debug prints) Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Clean up #includes in the compiler.Matt Turner2015-11-241-0/+1
| | | | Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Prevent implicit upcasts to brw_reg.Matt Turner2015-11-241-2/+2
| | | | | | | Now that backend_reg inherits from brw_reg, we have to be careful to avoid the object slicing problem. Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: add stride restrictions for copy propagationConnor Abbott2015-11-231-1/+55
| | | | | | | | | | There are various restrictions on what the hstride can be that depend on the Gen, and now that we're using hstride == 2 for packing/unpacking doubles, we're going to run into these restrictions a lot more often. Pull them out into a separate function, and move the one restriction we checked previously into it. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Replace HW_REG with ARF/FIXED_GRF.Matt Turner2015-11-131-1/+2
| | | | | | | | | | | | | | | | HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to look at different fields for type, abs, negate, writemask, swizzle, and a second file. They also caused annoying problems like immediate sources being considered scheduling barriers (commit 6148e94e2) and other such nonsense. Instead use ARF/FIXED_GRF/MRF for fixed registers in those files. After a sufficient amount of time has passed since "GRF" was used, we can rename FIXED_GRF -> GRF, but doing so now would make rebasing awful. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Rename GRF to VGRF.Matt Turner2015-11-131-13/+13
| | | | | | | | | | The 2-bit hardware register file field is ARF, GRF, MRF, IMM. Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to mean an assigned general purpose register. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use brw_reg's nr field to store register number.Matt Turner2015-11-131-9/+9
| | | | | | | | | | | | In addition to combining another field, we get replace silliness like "reg.reg" with something that actually makes sense, "reg.nr"; and no one will ever wonder again why dst.reg isn't a dst_reg. Moving the now 16-bit nr field to a 16-bit boundary decreases code size by about 3k. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use immediate storage in inherited brw_reg.Matt Turner2015-11-131-6/+6
| | | | | Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Make 'dw1' and 'bits' unnamed structures in brw_reg.Matt Turner2015-11-131-4/+4
| | | | | | | | | | | | | | | | | | | | | | Generated by sed -i -e 's/\.bits\././g' *.c *.h *.cpp sed -i -e 's/dw1\.//g' *.c *.h *.cpp and then reverting changes to comments in gen7_blorp.cpp and brw_fs_generator.cpp. There wasn't any utility offered by forcing the programmer to list these to access their fields. Removing them will reduce churn in future commits. This is C11 (and gcc has apparently supported it for sometime "compatibility with other compilers") See https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Replace default case with list of enum values.Matt Turner2015-11-021-3/+4
| | | | | | | If we add a new file type, we'd like to get warnings if it's not handled. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: Allow copy propagating into new surface access opcodesKristian Høgsberg Kristensen2015-10-231-0/+15
| | | | | Reviewed-by: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
* i965: Extract can_change_source_types() functions.Matt Turner2015-10-191-13/+2
| | | | | | | | | Make them members of fs_inst/vec4_instruction for use elsewhere. Also fix the fs version to check that dst.type == src[1].type and for !saturate. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965: Fix copy propagation type changes.Kenneth Graunke2015-09-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | commit 472ef9a02f2e5c5d0caa2809cb736a0f4f0d4693 introduced code to change the types of SEL and MOV instructions for moves that simply "copy bits around". It didn't account for type conversion moves, however. So it would happily turn this: mov(8) vgrf6:D, -vgrf5:D mov(8) vgrf7:F, vgrf6:UD into this: mov(8) vgrf6:D, -vgrf5:D mov(8) vgrf7:D, -vgrf5:D which erroneously drops the conversion to float. Cc: "11.0 10.6" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Define virtual instruction to calculate the high 32 bits of a multiply.Francisco Jerez2015-08-061-0/+1
| | | | | | | | | | | This instruction will translate to the MUL/MACH sequence that computes the high 32-bits of the result of a 64-bit multiply. Before Gen8 integer operations that used the accumulator were limited to 8-wide, but the SIMD lowering pass can easily be hooked up to sidestep this limitation, we just need a virtual opcode to represent the MUL/MACH sequence in the IR. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Make sure that the type sizes are compatible during copy propagation.Francisco Jerez2015-07-291-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | It's surprising that we weren't checking for this already. A future patch will cause code like the following to be emitted: MOV(16) tmp<1>:uw, src MOV(8) dst<1>:ud, tmp<8,8,1>:ud The second MOV comes from the expansion of a LOAD_PAYLOAD header copy, so I don't have control over its types. Copy propagation will happily turn this into: MOV(8) dst<1>:ud, src Which has different semantics. Fix it by preventing propagation in cases where a single channel of the instruction would span several channels of the copy (this requirement could in fact be relaxed if the copy is just a trivial memcpy, but this case is unusual enough that I don't think it matters in practice). I'm deliberately only checking if the type of the instruction is larger than the original, because the converse case seems to be handled correctly already in the code below. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Remove the width field from fs_regJason Ekstrand2015-06-301-4/+0
| | | | | | | | | | | | | As of now, the width field is no longer used for anything. The width field "seemed like a good idea at the time" but is actually entirely redundant with the instruction's execution size. Initially, it gave us the ability to easily set the instructions execution size based entirely on register widths. With the builder, we can easiliy set the sizes explicitly and the width field doesn't have as much purpose. At this point, it's just redundant information that can get out of sync so it really needs to go. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Unrestrict constant propagation into integer multiply.Matt Turner2015-05-181-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | Gen8+'s MUL instruction doesn't ignore the high 16-bits of one source like on earlier platforms, so we can constant propagate into it without worry. Integer multiplies (not into the accumulator, which is done for imul_high) are lowered in lower_integer_multiplication(), so it's safe there as well. On Broadwell, fragment shaders only: total instructions in shared programs: 4377769 -> 4377451 (-0.01%) instructions in affected programs: 48064 -> 47746 (-0.66%) helped: 156 On Broadwell, vertex shaders only: total instructions in shared programs: 2858885 -> 2856313 (-0.09%) instructions in affected programs: 26380 -> 23808 (-9.75%) helped: 134 On Broadwell, vertex shaders only (with INTEL_USE_NIR=1): total instructions in shared programs: 2911688 -> 2865984 (-1.57%) instructions in affected programs: 1421715 -> 1376011 (-3.21%) helped: 6186 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs: Allow copy propagation on ATTR file registers.Kenneth Graunke2015-05-061-1/+4
| | | | | | | | | | | | | | | | | | | | | | This especially helps with NIR because we currently emit MOVs at the top of the shader to copy from various ATTR registers to a giant VGRF array of all inputs. (This could potentially be done better, but since there's only ever one write to each register, it should be trivial to copy propagate away...) With NIR - only vertex shaders: total instructions in shared programs: 3129373 -> 2889581 (-7.66%) instructions in affected programs: 3119717 -> 2879925 (-7.69%) helped: 20833 Without NIR - only vertex shaders: total instructions in shared programs: 2745901 -> 2724483 (-0.78%) instructions in affected programs: 693426 -> 672008 (-3.09%) helped: 3516 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs_inst: Get rid of the effective_width fieldJason Ekstrand2015-05-061-4/+3
| | | | | | | | The effective_width field was an ill-concieved hack to get around issues in the LOAD_PAYLOAD instruction. Now that the LOAD_PAYLOAD instruction is far more sane, this field can die. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Perform basic optimizations on the BROADCAST opcode.Francisco Jerez2015-05-041-0/+1
| | | | | | v2: Style fixes. Reviewed-by: Matt Turner <mattst88@gmail.com>