summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_vec4.cpp
Commit message (Collapse)AuthorAgeFilesLines
* i965: Silence unused parameter warningsIan Romanick2016-10-171-1/+1
| | | | | | | | | | | | | | | | | | | | brw_link.cpp:76:44: warning: unused parameter ‘shader_type’ [-Wunused-parameter] gl_shader_stage shader_type, ^ brw_nir.c: In function ‘brw_nir_lower_vs_inputs’: brw_nir.c:194:55: warning: unused parameter ‘devinfo’ [-Wunused-parameter] const struct gen_device_info *devinfo, ^ brw_vec4_visitor.cpp:914:37: warning: unused parameter ‘sampler’ [-Wunused-parameter] uint32_t sampler, ^ brw_vec4_visitor.cpp:1146:34: warning: unused parameter ‘stream_id’ [-Wunused-parameter] vec4_visitor::gs_emit_vertex(int stream_id) ^ Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Engestrom <eric@engestrom.ch>
* i965/ir: Skip eliminate_find_live_channel() for stages with sparse thread ↵Francisco Jerez2016-09-211-0/+8
| | | | | | | | | | | | | | | | | | dispatch. The eliminate_find_live_channel optimization eliminates FIND_LIVE_CHANNEL instructions in cases where control flow is known to be uniform, and replaces them with 'MOV 0', which in turn unblocks subsequent elimination of the BROADCAST instruction frequently used on the result of FIND_LIVE_CHANNEL. This is however not correct in per-sample fragment shader dispatch because the PSD can dispatch a fully unlit sample under certain conditions. Disable the optimization in that case. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> v2: Add devinfo argument to brw_stage_has_packed_dispatch() to implement hardware generation check.
* i965/vec4: Assert that ATTR regions are register-aligned.Francisco Jerez2016-09-141-0/+1
| | | | | | | It might be useful to actually handle this once copy propagation becomes smarter about register-misaligned offsets. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Assign correct destination offset to rewritten instruction in ↵Francisco Jerez2016-09-141-2/+1
| | | | | | | | | | | | register coalesce. Because the pass already checks that the destination offset of each 'scan_inst' that needs to be rewritten matches 'inst->src[0].offset' exactly, the final offset of the rewritten instruction is just the original destination offset of the copy. This is in preparation for adding support for sub-GRF offsets to the VEC4 IR. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Don't coalesce registers with overlapping writes not matching the ↵Francisco Jerez2016-09-141-4/+6
| | | | | | | | MOV source. In preparation for adding support for sub-GRF offsets to the VEC4 IR. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Compare full register offsets in opt_register_coalesce nop move ↵Francisco Jerez2016-09-141-1/+1
| | | | | | | | check. In preparation for adding support for sub-GRF offsets to the VEC4 IR. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Check that the write offsets match when setting dependency controls.Francisco Jerez2016-09-141-0/+2
| | | | | | | | | For simplicity just assume that two writes to the same GRF with different sub-GRF offsets will potentially interfere and break the dependency control chain. This is in preparation for adding sub-GRF offset support to the VEC4 IR. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Change opt_vector_float to keep track of the last offset seen in ↵Francisco Jerez2016-09-141-3/+3
| | | | | | | | | bytes. This simplifies things slightly and makes the pass more correct in presence of sub-GRF offsets. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Simplify src/dst_reg to brw_reg conversion by using byte_offset().Francisco Jerez2016-09-141-7/+8
| | | | | | | This should also have the side effect of fixing convert_to_hw_regs() to handle sub-GRF register offsets. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/ir: Update several stale comments.Francisco Jerez2016-09-141-4/+4
| | | | Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/ir: Don't print ARF subnr values twice.Francisco Jerez2016-09-141-4/+0
| | | | Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Print src/dst_reg::offset field consistently for all register files.Francisco Jerez2016-09-141-6/+15
| | | | | | | C.f. 'i965/fs: Print fs_reg::offset field consistently for all register files.'. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Drop backend_reg::in_range() in favor of regions_overlap().Francisco Jerez2016-09-141-6/+8
| | | | | | | | | | This makes sure that overlap checks are done correctly throughout the back-end when the '*this' register starts before the register/size pair provided as argument, and is actually less annoying to use than in_range() at this point since regions_overlap() takes its size arguments in bytes. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Replace vec4_instruction::regs_read with ::size_read using byte ↵Francisco Jerez2016-09-141-10/+20
| | | | | | | | | | | | | | | units. The previous regs_read value can be recovered by rewriting each reference of regs_read() like 'x = i.regs_read(j)' to 'x = DIV_ROUND_UP(i.size_read(j), reg_unit)'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Replace vec4_instruction::regs_written with ::size_written field ↵Francisco Jerez2016-09-141-3/+3
| | | | | | | | | | | | | | | | in bytes. The previous regs_written field can be recovered by rewriting each rvalue reference of regs_written like 'x = i.regs_written' to 'x = DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference like 'i.regs_written = x' to 'i.size_written = x * reg_unit'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Add wrapper functions for vec4_instruction::regs_read and ↵Francisco Jerez2016-09-141-2/+2
| | | | | | | | | | | | | | | | ::regs_written. This is in preparation for dropping vec4_instruction::regs_read and ::regs_written in favor of more accurate alternatives expressed in byte units. The main reason these wrappers are useful is that a number of optimization passes implement dataflow analysis with register granularity, so these helpers will come in handy once we've switched register offsets and sizes to the byte representation. The wrapper functions will also make sure that GRF misalignment (currently neglected by most of the back-end) is taken into account correctly in the calculation of regs_read and regs_written. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Replace dst/src_reg::reg_offset with dst/src_reg::offset ↵Francisco Jerez2016-09-141-29/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | expressed in bytes. The dst/src_reg::offset field in byte units introduced in the previous patch is a more straightforward alternative to an offset representation split between ::reg_offset and ::subreg_offset fields. The split representation makes it too easy to forget about one of the offsets while dealing with the other, which has led to multiple FS back-end bugs in the past. To make the matter worse the unit reg_offset was expressed in was rather inconsistent, for uniforms it would be expressed in either 4B or 16B units depending on the back-end, and for most other things it would be expressed in 32B units. This encodes reg_offset as a new offset field expressed consistently in byte units. Each rvalue reference of reg_offset in existing code like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and each lvalue reference like 'r.reg_offset = x' is rewritten to 'r.offset = r.offset % reg_unit + x * reg_unit'. Because the change affects a lot of places and is rather non-trivial to verify due to the inconsistent value of reg_unit, I've tried to avoid making any additional changes other than applying the rewrite rule above in order to keep the patch as simple as possible, sometimes at the cost of introducing obvious stupidity (e.g. algebraic expressions that could be simplified given some knowledge of the context) -- I'll clean those up later on in a second pass. v2: Fix division by the wrong reg_unit in the UNIFORM case of convert_to_hw_regs(). (Iago) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* intel: s/brw_device_info/gen_device_info/Jason Ekstrand2016-09-031-3/+3
| | | | | | | | | | | | | Generated by: sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965/vec4: Ignore swizzle of VGRF for use by var_range_end().Matt Turner2016-08-191-1/+1
| | | | | | | | | | | | | | | | | | | | | var_range_end(v, n) loops over the n components of variable number v and finds the maximum value, giving the last use of any component of v. Therefore it expects v to correspond to the variable associated with the .x channel of the VGRF. var_from_reg() however returns the variable for the first channel of the VGRF, post-swizzle. So, if the last register had a swizzle with y, z, or w in the swizzle component, we would read out of bounds. For any other register, we would read liveness information from the next register. The fix is to convert the src_reg to a dst_reg in order to call the dst_reg version of var_from_reg() that doesn't consider the swizzle. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/vec4: Make opt_vector_float reset at the top of each blockJason Ekstrand2016-08-101-80/+82
| | | | | | | | | | | The pass isn't really control-flow aware and you can get into case where it tries to combine instructions from different blocks. This can actually lead to an assertion failure when removing unneeded instructions if part of the vector is set in one block and part in another. This prevents regressions in the next commit. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Cc: "12.0" <mesa-stable@lists.freedesktop.org>
* i965/fs: calculate first non-payload GRF using attrib slotsJuan A. Suarez Romero2016-05-171-0/+1
| | | | | | | | | | When computing where the first non-payload GRF starts, we can't rely on the number of attributes, as each attribute can be using 1 or 2 slots depending on whether they are a dvec3/4 or other. Instead, we need to use the number of slots used by the attributes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/vec4: use attribute slots to calculate URB read lengthJuan A. Suarez Romero2016-05-171-3/+9
| | | | | | | | | | Do not use total attributes because a dvec3/dvec4 attribute requires two slots. So rather use total attribute slots. v2: do not use loop to calculate required attribute slots (Kenneth Graunke) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Stop setting dispatch_grf_start_reg from the visitorJason Ekstrand2016-05-141-0/+2
| | | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Pass devinfo pointer to is_3src() helpers.Francisco Jerez2016-05-031-1/+1
| | | | | | | | | | | | | | This is not strictly required for the following changes because none of the three-source opcodes we support at the moment in the compiler back-end has been removed or redefined, but that's likely to change in the future. In any case having hardware instructions specified as a pair of hardware device and opcode number explicitly in all cases will simplify the opcode look-up interface introduced in a subsequent commit, since the opcode number alone is in general ambiguous. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Pass devinfo pointer to brw_instruction_name().Francisco Jerez2016-05-031-1/+1
| | | | | | | | | | | | A future series will implement support for an instruction that happens to have the same opcode number as another instruction we support already on a disjoint set of hardware generations. In order to disambiguate which instruction it is brw_instruction_name() will need some way to find out which device we are generating code for. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Properly handle integer types in opt_vector_float().Kenneth Graunke2016-04-201-4/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, opt_vector_float() always interpreted MOV sources as floating point, and always created a MOV with a F-type destination. This meant that we could mess up sequences of integer loads, such as: mov vgrf6.0.x:D, 0D mov vgrf6.0.y:D, 1D mov vgrf6.0.z:D, 2D mov vgrf6.0.w:D, 3D Here, integer 0/1/2/3 become approximately 0.0f, so we generated: mov vgrf6.0:F, [0F, 0F, 0F, 0F] which is clearly wrong. We can properly handle this by converting integer values to float (rather than bitcasting), and emitting a type converting MOV: mov vgrf6.0:D, [0F, 1F, 2F, 3F] To do this, see first see if the integer values (converted to float) are representable. If so, we use a D-type MOV. If not, we then try the floating point values and an F-type MOV. We make zero not impose type restrictions. This is important because 0D would imply a D-type MOV, but is often used in sequences such as MOV 0D, MOV 0x3f800000D, where we want to use an F-type MOV. Fixes about 54 dEQP-GLES2 failures with the vec4 VS backend. This recently became visible due to changes in opt_vector_float() which made it optimize more cases, but it was a pre-existing bug. Apparently it also manages to turn more integer loads into VFs, producing the following shader-db statistics on Haswell: total instructions in shared programs: 7084195 -> 7082191 (-0.03%) instructions in affected programs: 246027 -> 244023 (-0.81%) helped: 1937 total cycles in shared programs: 65669642 -> 65651968 (-0.03%) cycles in affected programs: 531064 -> 513390 (-3.33%) helped: 1177 v2: Handle the type of zero better. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Make opt_vector_float() only handle non-type-conversion MOVs.Kenneth Graunke2016-04-201-2/+5
| | | | | | | | | | | | | | | | We don't handle this properly - we'd have to perform the type conversion before trying to convert the value to a VF. While we could do that, it doesn't seem particularly useful - most vector loads should be consistently typed (all float or all integer). As a special case, we do allow type-converting MOVs of integer 0, as it's represented the same regardless of the type. I believe this case does actually come up. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Fold vectorize_mov() back into the one caller.Kenneth Graunke2016-04-201-24/+16
| | | | | | | | | | After the previous patch, this helper is only called in one place. So, just fold it back in - there are a lot of parameters here and not much code. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Rework opt_vector_float() control flow.Kenneth Graunke2016-04-201-27/+34
| | | | | | | | | | | | | This reworks opt_vector_float() so that there's only one place that flushes out any accumulated state and emits a VF. v2: Don't break the sequence for non-representable numbers - just skip recording their values. Only break it for non-MOVs or register changes. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Handle MOV_INDIRECT in pack_uniform_registersJason Ekstrand2016-04-151-0/+18
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/vec4: Add support for SHADER_OPCODE_MOV_INDIRECTJason Ekstrand2016-04-151-0/+1
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/vec4: Use can_do_writemask in can_reswizzleJason Ekstrand2016-04-151-3/+5
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/vec4: Move can_do_writemask to vec4_instructionJason Ekstrand2016-04-151-0/+28
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/vec4: Get rid of the uniform_size arrayJason Ekstrand2016-04-141-8/+0
| | | | | Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/vec4: Use MOV_INDIRECT instead of reladdr for indirect push constantsJason Ekstrand2016-04-141-1/+1
| | | | | | | | | | | | | | | This commit moves us to an instruction based model rather than a register-based model for indirects. This is more accurate anyway as we have to emit instructions to resolve the reladdr. It's also a lot simpler because it gets rid of the recursive reladdr problem by design. One side-effect of this is that we need a whole new algorithm in move_uniform_array_access_to_pull_constants. This new algorithm is much more straightforward than the old one and is fairly similar to what we're already doing in the FS backend. Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Remove the RCP+RSQ algebraic optimizationsJason Ekstrand2016-03-221-11/+0
| | | | | | | | | NIR already has this optimization and it can do much better than the little peephole in the backend. No shader-db change on Haswell or Broadwell. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Consider removal of no-op MOVs as progress during register coalesce.Francisco Jerez2016-03-141-0/+1
| | | | | | | | | | | | | Bug found by the liveness analysis validation pass that will be introduced in a later commit. The no-op MOV check in opt_register_coalesce() was removing instructions which makes the cached liveness analysis calculation inconsistent with the shader IR. We were failing to set progress to true in that case though, which means that invalidate_live_intervals() wouldn't necessarily be called at the end of the function. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: add opportunistic behaviour to opt_vector_float()Juan A. Suarez Romero2016-03-041-21/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | opt_vector_float() transforms several scalar MOV operations to a single vectorial MOV. This is done when those MOV covers all the components of the destination register. So something like: mov vgrf3.0.xy:D, 0D mov vgrf3.0.w:D, 1065353216D mov vgrf3.0.z:D, 0D is transformed in: mov vgrf3.0:F, [0F, 0F, 0F, 1F] But there are cases where not all the components are written. For example, in: mov vgrf2.0.x:D, 1073741824D mov vgrf3.0.xy:D, 0D mov vgrf3.0.w:D, 1065353216D mov vgrf4.0.xy:D, 1065353216D mov vgrf4.0.w:D, 0D mov vgrf6.0:UD, u4.xyzw:UD Nor vgrf3 nor vgrf4 .z components are written, so the optimization is not applied. But it could be applied anyway with the components covered, using a writemask to select the ones written. So we could transform it in: mov vgrf2.0.x:D, 1073741824D mov vgrf3.0.xyw:F, [0F, 0F, 0F, 1F] mov vgrf4.0.xyw:F, [1F, 1F, 0F, 0F] mov vgrf6.0:UD, u4.xyzw:UD This commit does precisely that: opportunistically apply opt_vector_float() when possible. total instructions in shared programs: 7124660 -> 7114784 (-0.14%) instructions in affected programs: 443078 -> 433202 (-2.23%) helped: 4998 HURT: 0 total cycles in shared programs: 64757760 -> 64728016 (-0.05%) cycles in affected programs: 1401686 -> 1371942 (-2.12%) helped: 3243 HURT: 38 v2: change vectorize_mov() signature (Matt). v3: take in account predicates (Juan). v4 [mattst88]: Update shader-db numbers. Fix some whitespace issues. Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
* i965: Eliminate brw_nir_lower_{inputs,outputs,io} functions.Kenneth Graunke2016-02-261-3/+3
| | | | | | | | | | | | Now that each stage is directly calling brw_nir_lower_io(), and we have per-stage helper functions, it makes sense to just call the relevant one directly, rather than going through multiple switch statements. This also eliminates stupid function parameters, such as the two that only apply to vertex attributes. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Lower min/max after optimization on Gen4/5.Matt Turner2016-02-171-0/+38
| | | | | | | | | | | | | | | | | | | Gen4/5's SEL instruction cannot use conditional modifiers, so min/max are implemented as CMP + SEL. Handling that after optimization lets us CSE more. On Ironlake: total instructions in shared programs: 6426035 -> 6422753 (-0.05%) instructions in affected programs: 326604 -> 323322 (-1.00%) helped: 1411 total cycles in shared programs: 129184700 -> 129101586 (-0.06%) cycles in affected programs: 18950290 -> 18867176 (-0.44%) helped: 2419 HURT: 328 Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965: Fix gl_DrawID in the vec4 backend.Kenneth Graunke2016-02-141-5/+5
| | | | | | | | | | | | brw_draw_upload.c uploads VertexID/InstanceID first, then DrawID. So we need to assign the attribute mapping in that order as well. Fixes the following Pigit tests with the vec4 backend: - arb_shader_draw_parameters-drawid vertexid - arb_shader_draw_parameters-drawid-indirect basevertex Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
* i965: Rename optimizer debug 00 filenameBen Widawsky2016-02-121-1/+1
| | | | | | | | This allows ls, and scripts to get the file names in the correct order of optimization. Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Drop support for ATTR as an instruction destination.Kenneth Graunke2016-02-091-16/+0
| | | | | | | | | This is no longer necessary...and it doesn't make much sense to have inputs as destinations. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965: Apply VS attribute workarounds in NIR.Kenneth Graunke2016-02-091-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch re-implements the pre-Haswell VS attribute workarounds. Instead of emitting shader code in the vec4 backend, we now simply call a NIR pass to emit the necessary code. This simplifies the vec4 backend. Beyond deleting code, it removes the primary use of ATTR as a destination. It also eliminates the requirement that the vec4 VS backend express the ATTR file in terms of VERT_ATTRIB_* locations, giving us a bit more flexibility. This approach is a little different: rather than munging the attributes at the top, we emit code to fix them up when they're accessed. However, we run the optimizer afterwards, so CSE should eliminate the redundant math. It may even be able to fuse it with other calculations based on the input value. shader-db does not handle non-default NOS settings, so I have no statistics about this patch. Note that the scalar backend does not implement VS attribute workarounds, as they are unnecessary on hardware which allows SIMD8 VS. v2: Do one multiply for FIXED rescaling and select components from either the original or scaled copy, rather than multiplying each component separately (suggested by Matt Turner). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965: Explicitly write the "TR DS Cache Disable" bit at TCS EOT.Kenneth Graunke2016-02-091-1/+1
| | | | | | | | | | | | | | | | Bit 0 of the Patch Header is "TR DS Cache Disable". Setting that bit disables the DS Cache for tessellator-output topologies resulting in stitch-transition regions (but leaves it enabled for other cases). We probably shouldn't leave this to chance - the URB could contain garbage - which could result in the cache randomly being turned on or off. This patch makes the final EOT write 0 to the first DWord (which only contains this one bit). This ensures the cache is always on. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs/generator: Take an actual shader stage rather than a stringJason Ekstrand2016-01-151-1/+1
| | | | | | Cc: "11.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Make an is_scalar boolean in brw_compile_vs().Kenneth Graunke2016-01-141-5/+5
| | | | | | | | Shorter than compiler->scalar_stage[MESA_SHADER_VERTEX], which can help with line-wrapping. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Move 3-src subnr swizzle handling into the vec4 backend.Kenneth Graunke2016-01-021-0/+13
| | | | | | | | | | | | | | | | | | | | | | | While most align16 instructions only support a SubRegNum of 0 or 4 (using swizzling to control the other channels), 3-src instructions actually support arbitrary SubRegNums. When the RepCtrl bit is set, we believe it ignores the swizzle and uses the equivalent of a <0,1,0> region from the subnr. In the past, we adopted a vec4-centric approach of specifying subnr of 0 or 4 and a swizzle, then having brw_eu_emit.c convert that to a proper SubRegNum. This isn't a great fit for the scalar backend, where we don't set swizzles at all, and happily set subnrs in the range [0, 7]. This patch changes brw_eu_emit.c to use subnr and swizzle directly, relying on the higher levels to set them sensibly. This should fix problems where scalar sources get copy propagated into 3-src instructions in the FS backend. I've only observed this with TES push model inputs, but I suppose it could happen in other cases. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Add support for gl_DrawIDARB and enable extensionKristian Høgsberg Kristensen2015-12-291-1/+12
| | | | | | | | | | We have to break open a new vec4 for gl_DrawIDARB. We've used up all space in the vec4 we use for SGVS and gl_DrawIDARB has to come from its own separate vertex buffer anyway. This is because we point the vb for base vertex and base instance into the draw parameter BO for indirect draw calls, but the draw id is generated by mesa in a different buffer. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Add support for gl_BaseVertexARB and gl_BaseInstanceARBKristian Høgsberg Kristensen2015-12-291-2/+5
| | | | | | | We already have gl_BaseVertexARB in the .x component of the SGVS vec4 and plug gl_BaseInstanceARB into the last free component (.y). Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>