summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
Commit message (Collapse)AuthorAgeFilesLines
* i965/vec4: Fix copy propagation for non-register-aligned regions.Francisco Jerez2016-09-141-3/+6
| | | | | | | | | | | | | | This prevents it from trying to propagate a copy through a register-misaligned region. MOV instructions with a misaligned destination shouldn't be treated as a direct GRF copy, because they only define the destination GRFs partially. Also fix the interference check implemented with is_channel_updated() to consider overlapping regions with different register offset to interfere, since the writemask check implemented in the function is only valid under the assumption that the source and destination regions are aligned component by component. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Drop backend_reg::in_range() in favor of regions_overlap().Francisco Jerez2016-09-141-2/+2
| | | | | | | | | | This makes sure that overlap checks are done correctly throughout the back-end when the '*this' register starts before the register/size pair provided as argument, and is actually less annoying to use than in_range() at this point since regions_overlap() takes its size arguments in bytes. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Replace vec4_instruction::regs_read with ::size_read using byte ↵Francisco Jerez2016-09-141-1/+1
| | | | | | | | | | | | | | | units. The previous regs_read value can be recovered by rewriting each reference of regs_read() like 'x = i.regs_read(j)' to 'x = DIV_ROUND_UP(i.size_read(j), reg_unit)'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Replace vec4_instruction::regs_written with ::size_written field ↵Francisco Jerez2016-09-141-1/+1
| | | | | | | | | | | | | | | | in bytes. The previous regs_written field can be recovered by rewriting each rvalue reference of regs_written like 'x = i.regs_written' to 'x = DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference like 'i.regs_written = x' to 'i.size_written = x * reg_unit'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Replace dst/src_reg::reg_offset with dst/src_reg::offset ↵Francisco Jerez2016-09-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | expressed in bytes. The dst/src_reg::offset field in byte units introduced in the previous patch is a more straightforward alternative to an offset representation split between ::reg_offset and ::subreg_offset fields. The split representation makes it too easy to forget about one of the offsets while dealing with the other, which has led to multiple FS back-end bugs in the past. To make the matter worse the unit reg_offset was expressed in was rather inconsistent, for uniforms it would be expressed in either 4B or 16B units depending on the back-end, and for most other things it would be expressed in 32B units. This encodes reg_offset as a new offset field expressed consistently in byte units. Each rvalue reference of reg_offset in existing code like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and each lvalue reference like 'r.reg_offset = x' is rewritten to 'r.offset = r.offset % reg_unit + x * reg_unit'. Because the change affects a lot of places and is rather non-trivial to verify due to the inconsistent value of reg_unit, I've tried to avoid making any additional changes other than applying the rewrite rule above in order to keep the patch as simple as possible, sometimes at the cost of introducing obvious stupidity (e.g. algebraic expressions that could be simplified given some knowledge of the context) -- I'll clean those up later on in a second pass. v2: Fix division by the wrong reg_unit in the UNIFORM case of convert_to_hw_regs(). (Iago) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* intel: s/brw_device_info/gen_device_info/Jason Ekstrand2016-09-031-2/+2
| | | | | | | | | | | | | Generated by: sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Pass devinfo pointer to is_3src() helpers.Francisco Jerez2016-05-031-1/+1
| | | | | | | | | | | | | | This is not strictly required for the following changes because none of the three-source opcodes we support at the moment in the compiler back-end has been removed or redefined, but that's likely to change in the future. In any case having hardware instructions specified as a pair of hardware device and opcode number explicitly in all cases will simplify the opcode look-up interface introduced in a subsequent commit, since the opcode number alone is in general ambiguous. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/vec4: Propagate swizzles correctly during copy propagation.Francisco Jerez2016-03-061-66/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This simplifies the code that iterates over the per-component values found in the matching copy_entry struct and checks whether the register regions that were copied to each component are similar enough to be treated as a single (reswizzled) value which can be propagated into the current instruction. Aside from being scattered between opt_copy_propagation(), try_copy_propagate(), and try_constant_propagate(), what I found terribly confusing about the preexisting logic was that opt_copy_propagation() tried to reorder the array of values according to the swizzle of the instruction source, which meant one would have had to invert the reordering applied at the top level in order to find out which component to take from each value (we were just taking the i-th component from the i-th value, which is not correct in general). The saturate mask was also being swizzled incorrectly. This consolidates the logic for matching multiple components of a copy_entry into a single function which returns the result as a regular src_reg on success, as if the copy had been performed with a single MOV instruction copying all components of the src_reg into the destination. Fixes several ARB_vertex_program MOV test-cases from: https://cgit.freedesktop.org/~kwg/piglit/log/?h=arb_program Acked-by: Matt Turner <mattst88@gmail.com>
* i965: Don't try copy propagation if constant propagation succeeded.Francisco Jerez2016-03-061-2/+1
| | | | | | | It cannot get any better. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Use swizzle() to swizzle immediates during constant propagation.Francisco Jerez2016-03-061-18/+1
| | | | Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: don't copy ATTR into 3src instructions with complex swizzlesMatt Turner2016-02-051-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | The vec4 backend, at the end, does this: if (inst->is_3src()) { for (int i = 0; i < 3; i++) { if (inst->src[i].vstride == BRW_VERTICAL_STRIDE_0) assert(brw_is_single_value_swizzle(inst->src[i].swizzle)); So make sure that we use the same conditions when trying to copy-propagate. UNIFORMs will be converted to vstride 0 in convert_to_hw_regs, but so will ATTRs when interleaved (as will happen in a GS with multiple attributes). Since the vstride is not set at copy-prop time, infer it by inspecting dispatch_mode and reject ATTRs if they have non-scalar swizzles and are interleaved. Fixes assertion errors in dolphin-generated geometry shaders (or misrendering on opt builds) on Sandybridge or on IVB/HSW with INTEL_DEBUG=nodualobj. Co-authored-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93418 Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
* i965: Clean up #includes in the compiler.Matt Turner2015-11-241-3/+1
| | | | Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Prevent implicit upcasts to brw_reg.Matt Turner2015-11-241-2/+2
| | | | | | | Now that backend_reg inherits from brw_reg, we have to be careful to avoid the object slicing problem. Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965: Rename GRF to VGRF.Matt Turner2015-11-131-7/+7
| | | | | | | | | | The 2-bit hardware register file field is ARF, GRF, MRF, IMM. Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to mean an assigned general purpose register. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use brw_reg's nr field to store register number.Matt Turner2015-11-131-3/+3
| | | | | | | | | | | | In addition to combining another field, we get replace silliness like "reg.reg" with something that actually makes sense, "reg.nr"; and no one will ever wonder again why dst.reg isn't a dst_reg. Moving the now 16-bit nr field to a 16-bit boundary decreases code size by about 3k. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Unwrap some lines.Matt Turner2015-11-131-2/+1
| | | | | Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use immediate storage in inherited brw_reg.Matt Turner2015-11-131-5/+5
| | | | | Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Make 'dw1' and 'bits' unnamed structures in brw_reg.Matt Turner2015-11-131-3/+3
| | | | | | | | | | | | | | | | | | | | | | Generated by sed -i -e 's/\.bits\././g' *.c *.h *.cpp sed -i -e 's/dw1\.//g' *.c *.h *.cpp and then reverting changes to comments in gen7_blorp.cpp and brw_fs_generator.cpp. There wasn't any utility offered by forcing the programmer to list these to access their fields. Removing them will reduce churn in future commits. This is C11 (and gcc has apparently supported it for sometime "compatibility with other compilers") See https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Extract can_change_source_types() functions.Matt Turner2015-10-191-14/+2
| | | | | | | | | Make them members of fs_inst/vec4_instruction for use elsewhere. Also fix the fs version to check that dst.type == src[1].type and for !saturate. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/gs: Allow src0 immediates in GS_OPCODE_SET_WRITE_OFFSET.Kenneth Graunke2015-09-261-0/+7
| | | | | | | | | | | | | | | | | | | GS_OPCODE_SET_WRITE_OFFSET is a MUL with a constant src[1] and special strides. We can easily make the generator handle constant src[0] arguments by instead generating a MOV with the product of both operands. This isn't necessarily a win in and of itself - instead of a MUL, we generate a MOV, which should be basically the same cost. However, we can probably avoid the earlier MOV to put src[0] into a register. shader-db statistics for geometry shaders only: total instructions in shared programs: 3207 -> 3173 (-1.06%) instructions in affected programs: 3207 -> 3173 (-1.06%) helped: 11 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: check swizzle before discarding a uniform on a 3src operandAlejandro Piñeiro2015-09-241-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without this commit, copy propagation is discarded if it involves a uniform with an instruction that has 3 sources. But 3 sourced instructions can access scalar values. For example, this is what vec4_visitor::fix_3src_operand() is already doing: if (src.file == UNIFORM && brw_is_single_value_swizzle(src.swizzle)) return src; Shader-db results (unfiltered) on NIR: total instructions in shared programs: 6259650 -> 6241985 (-0.28%) instructions in affected programs: 812755 -> 795090 (-2.17%) helped: 7930 HURT: 0 Shader-db results (unfiltered) on IR: total instructions in shared programs: 6445822 -> 6441788 (-0.06%) instructions in affected programs: 296630 -> 292596 (-1.36%) helped: 2533 HURT: 0 v2: - Updated commit message, using Matt Turner suggestions - Move the check after we've created the final value, as Jason Ekstrand suggested - Clean up the condition v3: - Move the check back to the original place, to keep things tidy, as suggested by Jason Ekstrand v4: - Fixed missing is_single_value_swizzle() as pointed by Jason Ekstrand Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: refactor brw_vec4_copy_propagation.Alejandro Piñeiro2015-09-221-14/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now it is more similar to brw_fs_copy_propagation, with three clear stages: 1) Build up the value we are propagating as if it were the source of a single MOV: 2) Check that we can propagate that value 3) Build the final value Previously everything was somewhat messed up, making the implementation on some specific cases, like knowing if you can propagate from a previous instruction even with type mismatches, even messier (for example, with the need of maintaining more of one has_source_modifiers). The refactoring clears stuff, and gives support to this mentioned use case without doing anything extra (for example, only one has_source_modifiers is used). Shader-db results for vec4 programs on Haswell: total instructions in shared programs: 1683842 -> 1669037 (-0.88%) instructions in affected programs: 739837 -> 725032 (-2.00%) helped: 6237 HURT: 0 v2: using 'arg' index to get the from inst was wrong v3: rebased against last change on the previous patch of the series v4: don't need to track instructions on struct copy_entry, as we only set the source on a direct copy v5: change the approach for a refactoring v6: tweaked comments Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/vec4: Change types as needed to propagate source modifiers using ↵Alejandro Piñeiro2015-09-191-2/+28
| | | | | | | | | | | | | | | | | | | | | | current instruction SEL and MOV instructions, as long as they don't have source modifiers, are just copying bits around. So those kind of instruction could be propagated even if there are type mismatches. This is needed because NIR generates integer SEL and MOV instructions whenever it doesn't know what else to generate. This commit adds support for copy propagation using current instruction as reference. Equivalent to commit 472ef9 but for vec4. v2: include check for saturate, as Jason Ekstrand suggested v3: check that the dst.type and the src type are the same, in order to solve (among others) the following deqp regression with v2: dEQP-GLES3.functional.shaders.operator.unary_operator.minus.lowp_uint_vertex Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965: Define virtual instruction to calculate the high 32 bits of a multiply.Francisco Jerez2015-08-061-0/+1
| | | | | | | | | | | This instruction will translate to the MUL/MACH sequence that computes the high 32-bits of the result of a 64-bit multiply. Before Gen8 integer operations that used the accumulator were limited to 8-wide, but the SIMD lowering pass can easily be hooked up to sidestep this limitation, we just need a virtual opcode to represent the MUL/MACH sequence in the IR. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Perform basic optimizations on the BROADCAST opcode.Francisco Jerez2015-05-041-0/+1
| | | | | | v2: Style fixes. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Add a devinfo field to backend_visitor and use it for gen checksJason Ekstrand2015-04-221-9/+11
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Fix handling of multiple register reads and writes during copy ↵Francisco Jerez2015-03-231-3/+6
| | | | | | propagation. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Trivial copy propagate clean-up.Francisco Jerez2015-03-231-5/+6
| | | | | | | Fix typo and punctuation in a comment, break long line and add space before curly bracket. Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
* i965/vec4: Add argument index and type checks to SEL saturate propagation.Francisco Jerez2015-03-231-1/+4
| | | | | | | SEL saturate propagation already implicitly relies on these assumptions. Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
* i965/vec4: Fix broken saturate mask check in copy propagation.Francisco Jerez2015-03-231-4/+11
| | | | | | | | | | | try_copy_propagate() was checking the bit of the saturate mask for the arg-th component of the source to decide whether the whole source should be saturated (WTF?). We need to swizzle the original saturate mask and check that for all enabled channels the saturate flag is either set or unset, as we cannot saturate a subset of destination components only. Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
* i965/vec4: Don't lose copy propagation saturate bits for not written components.Francisco Jerez2015-03-231-1/+1
| | | | Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
* Revert "i965/vec4: Don't lose the saturate modifier in copy propagation."Francisco Jerez2015-03-231-1/+1
| | | | | | | | | This reverts commit 0dfec59a2785cf7a87ee5128889ecebe810b611b. The change prevented propagation of copies with the saturate flag set, making the whole saturate mask tracking completely useless. A proper fix follows. Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
* i965/vec4: Some more trivial swizzle clean-up.Francisco Jerez2015-03-231-3/+3
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Don't lose the saturate modifier in copy propagation.Andrey Sudnik2015-03-051-1/+1
| | | | | | | Cc: 10.4, 10.5 <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89224 Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/vec4: Fix constant propagation across different types.Francisco Jerez2015-02-191-0/+10
| | | | | | | | | | If the source type differs from the original type of the constant we need to bit-cast it before propagating, otherwise the original type information will be lost. If the constant was a vector float there isn't much we can do, because the result of bit-casting the component values of a vector float cannot itself be represented as an immediate. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Silence unused parameter warningsIan Romanick2015-02-171-2/+2
| | | | | | | | | | | | | brw_vec4_copy_propagation.cpp:243:59: warning: unused parameter 'reg' [-Wunused-parameter] int arg, struct copy_entry *entry, int reg) ^ brw_vec4_generator.cpp:869:57: warning: unused parameter 'inst' [-Wunused-parameter] vec4_generator::generate_unpack_flags(vec4_instruction *inst, ^ Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Handle negated unsigned immediate values in constant propagation.Francisco Jerez2015-02-101-11/+13
| | | | | | | | | Negation of UD/UW sources behaves the same as for D/W sources, taking the two's complement of the source, except for bitwise logical operations on Gen8 and up which take the one's complement. Fixes crash in a GLSL shader with subtraction of two unsigned values. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Factor out virtual GRF allocation to a separate object.Francisco Jerez2015-02-101-4/+4
| | | | | | | | | | | | | Right now virtual GRF book-keeping and allocation is performed in each visitor class separately (among other hundred different things), leading to duplicated logic in each visitor and preventing layering as it forces any code that manipulates i965 IR and needs to allocate virtual registers to depend on the specific visitor that happens to be used to translate from GLSL IR. v2: Use realloc()/free() to allocate VGRF book-keeping arrays (Connor). Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Use abs/negate functions in const propagation.Matt Turner2015-02-031-13/+5
| | | | | | No changes in shader-db. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965: Fix negate with unsigned integersIago Toral Quiroga2015-02-031-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For code such as: uint tmp1 = uint(in0); uint tmp2 = -tmp1; float out0 = float(tmp2); We produce code like: mov(8) g5<1>.xF -g9<4,4,1>.xUD which does not produce correct results. This code produces the results we would expect if tmp1 and tmp2 were signed integers instead. It seems that a similar problem was detected and addressed when using negations with unsigned integers as part of condionals, but it looks like the problem has a wider impact than that. This patch fixes the problem by preventing copy-propagation of negated UD registers in all scenarios, not only in conditionals. Fixes the following 24 dEQP tests: dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uint_* dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec2_* dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec3_* dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec4_* Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Add is_3src() to backend_instruction.Matt Turner2015-01-231-5/+1
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/vec4: Allow constant propagation of VF immediates.Matt Turner2014-12-291-1/+27
| | | | | | | | | | total instructions in shared programs: 5877951 -> 5877012 (-0.02%) instructions in affected programs: 155923 -> 154984 (-0.60%) Helps 1233, hurts 156 shaders. The hurt shaders are addressed in the next commit. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/vec4: Add parameter to skip doing constant propagation.Matt Turner2014-12-291-2/+2
| | | | | | | | | | | | | | | | After CSEing some MOV ..., VF instructions we have code like mov tmp, [1F, 2F, 3F, 4F]VF mov r10, tmp mov r11, tmp ... use r10 use r11 We want to copy propagate tmp into the uses of r10 and r11, but *not* constant propagate the VF immediate into the uses of tmp. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/vec4: Don't use instruction list after calculating the cfg.Matt Turner2014-09-241-1/+2
| | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Remove cfg-invalidating parameter from invalidate_live_intervals.Matt Turner2014-09-241-1/+1
| | | | | | Everything has been converted to preserve the CFG. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965/vec4: Allow propagation of instructions with saturate flag to selAbdiel Janulgue2014-08-311-27/+58
| | | | | | | | | | | | | | | | | When sel conditon is bounded within 0 and 1.0. This allows code as: mov.sat a b sel.ge dst a 0.25F To be propagated as: sel.ge.sat dst b 0.25F v3: - Syntax clarifications in inst->saturate assignment - Remove extra parenthesis when assigning src_reg value from copy_entry (Matt Turner) v4: - Take channels into consideration when propagating saturated instructions. Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
* i965: Use basic-block aware insertion/removal functions.Matt Turner2014-08-221-1/+1
| | | | | | | | | To avoid invalidating and recreating the control flow graph. Also stop invalidating the CFG in places we didn't add or remove an instruction. cfg calculations: 202951 -> 80307 (-60.43%) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Bail on vec4 copy propagation for scratch writes with source modifiersAnuj Phogat2014-08-141-0/+4
| | | | | | | | | Fixes Khronos GLES3 CTS test: dynamic_expression_array_access_vertex Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Return NONE from brw_swap_cmod on unknown input.Matt Turner2014-08-121-1/+1
| | | | | | | | Comparing ~0u with a packed enum (i.e., 1 byte) always evaluates to false. Shouldn't gcc warn about this? Reported-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Don't copy propagate abs into Broadwell logic instructions.Kenneth Graunke2014-07-151-6/+3
| | | | | | | | | | | | It's not clear what abs on logical instructions means on Broadwell, and it doesn't appear to do anything sensible. Fixes 270 Piglit tests (the bitand/bitor/bitxor tests with abs). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81157 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: "10.2" <mesa-stable@lists.freedesktop.org>