external_mesa3d.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	i965/fs: Rename opt_copy_propagate -> opt_copy_propagation.	Matt Turner	2016-12-16	1	-5/+5
\| \| \| \| \| \| \|	Matches the vec4 backend, cmod propagation, and saturate propagation. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (cherry picked from commit 6014da50ec41d1ad43fec94a625962ac3f2f10cb)
*	i965: Fix GPU hang related to multiple render targets and alpha testing	Anuj Phogat	2016-11-09	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch should have been the part of commit e592f7df. In a situation when there are multiple render targets with alpha testing enabled, if fragment shader doesn't write to draw buffer zero, it causes the GPU hang on SKL. No GPU hang is seen on HSW. Simulator gives a warning for all gen6+ h/w: "Illegal render target write message length 0xa expected 0xc" This patch fixes the GPU hang as well as the simulator warning with new piglit test fbo-mrt-alphatest-no-buffer-zero-write: https://patchwork.freedesktop.org/patch/118212 No regressions in Jenkins CI system. Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (cherry picked from commit b9df2251c17e3ce52fa55c81f492591e08c3ee04)
*	i965: Don't use nir_assign_var_locations for VS/TES/GS outputs.	Kenneth Graunke	2016-10-27	1	-13/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Fixes spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-dvec3. v2: Remove nir_outputs field from fs_visitor (caught by Tim and Iago). Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com> (cherry picked from commit 59864e8e02057cc6fa0448a8af067a3cf53389da)
*	i965: Make split_virtual_grfs() call compact_virtual_grfs().	Kenneth Graunke	2016-10-27	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Post-splitting, VGRFs have a maximum size (MAX_VGRF_SIZE). This is required by the register allocator, as we have to create classes for each size of VGRF. We can (and do) allocate virtual registers larger than MAX_VGRF_SIZE, but we must ensure that they are splittable. split_virtual_grfs() asserts that the post-splitting register size is in range. Unfortunately, these trip for completely dead registers which are too large - we only set split points for live registers. So dead ones are never split, and if they happened to be too large, they'd trip asserts. To fix this, call compact_virtual_grfs() to eliminate dead registers before splitting. v2: Add a comment written by Iago. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com> (cherry picked from commit 27715c73ff84349466f62df0023863acd477f262)
*	i965: Introduce downcast helpers for prog_data structures.	Kenneth Graunke	2016-10-05	1	-24/+19
\| \| \| \| \| \| \|	Similar to brw_context(...), intel_texture_object(...), and so on. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
*	i965: add MAYBE_UNUSED to assert param	Timothy Arceri	2016-10-05	1	-1/+1
\| \| \| \| \| \|	This fixes an unused variable warning on release builds. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965: Drop pointless stage == MESA_SHADER_FRAGMENT checks.	Kenneth Graunke	2016-10-02	1	-5/+1
\| \| \| \| \| \| \|	There's an assert right above this. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
*	i965/ir: Test thread dispatch packing assumptions.	Francisco Jerez	2016-09-21	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Not [originally] intended for upstream. Should cause a GPU hang if some thread is executed with a non-contiguous dispatch mask breaking assumptions of brw_stage_has_packed_dispatch(). Doesn't cause any CTS, DEQP or Piglit regressions, while replacing brw_stage_has_packed_dispatch() with a dummy implementation that unconditionally returns true on top of this patch causes multiple GPU hangs. v2: Refactor into a separate function instead of emitting the test code directly from emit_nir_code(), drop VEC4 test and clean up slightly for upstream. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
*	i965/ir: Skip eliminate_find_live_channel() for stages with sparse thread ↵	Francisco Jerez	2016-09-21	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dispatch. The eliminate_find_live_channel optimization eliminates FIND_LIVE_CHANNEL instructions in cases where control flow is known to be uniform, and replaces them with 'MOV 0', which in turn unblocks subsequent elimination of the BROADCAST instruction frequently used on the result of FIND_LIVE_CHANNEL. This is however not correct in per-sample fragment shader dispatch because the PSD can dispatch a fully unlit sample under certain conditions. Disable the optimization in that case. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> v2: Add devinfo argument to brw_stage_has_packed_dispatch() to implement hardware generation check.
*	i965/reg: Make brw_sr0_reg take a subnr and return a vec1 reg	Jason Ekstrand	2016-09-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	The state register sr0 is really a collection of dwords not a SIMD8 anything. It's much more convenient for brw_sr0_reg to return the particular dword you're looking for rather than a giant blob you have to massage into what you want. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> [ Francisco Jerez: Trivial simplification of brw_ud1_reg(). ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>
*	i965/nir: Roll set_default_interpolation into lower_fs_inputs	Jason Ekstrand	2016-09-15	1	-39/+1
\| \| \| \| \| \|	Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/fs: Use NIR for handling forced per-sample interpolation	Jason Ekstrand	2016-09-15	1	-37/+3
\| \| \| \| \| \|	Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/fs: Use sample interpolation for interpolateAtCentroid in persample mode	Jason Ekstrand	2016-09-15	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From the ARB_gpu_shader5 spec: The built-in functions interpolateAtCentroid() and interpolateAtSample() will sample variables as though they were declared with the "centroid" or "sample" qualifiers, respectively. When running with persample dispatch forced by the API, we interpolate anything that isn't flat as if it's qualified by "sample". In order to keep interpolateAtCentroid() consistent with the "centroid" qualifier, we need to make interpolateAtCentroid() do sample interpolation instead. Nothing in the GLSL spec guarantees that the result of interpolateAtCentroid is uniform across samples in any way, so this is a perfectly fine thing to do. Fixes 8 of the new dEQP-VK.pipeline.multisample_interpolation.* Vulkan CTS tests that specifically validate consistency between the "sample" qualifier and interpolateAtSample() Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Cc: "12.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/ir: Update several stale comments.	Francisco Jerez	2016-09-14	1	-11/+7
\| \| \| \|	Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/ir: Don't print ARF subnr values twice.	Francisco Jerez	2016-09-14	1	-4/+0
\| \| \| \|	Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Print fs_reg::offset field consistently for all register files.	Francisco Jerez	2016-09-14	1	-16/+22
\| \| \| \| \| \| \| \| \| \| \| \|	The offset printing code in fs_visitor::dump_instruction() was doing things differently for sources and destinations and for each register file -- In some cases it would be added to the base register number fs_reg::nr, in other cases it would follow the base register separated with a plus sign, in other cases (uniforms) it would do both (!). The sub-register offset was also being printed or not rather inconsistently. Fix it. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Get rid of fs_inst::set_smear().	Francisco Jerez	2016-09-14	1	-26/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	component() was generally a better alternative because of several issues set_smear() had: - It wouldn't take the original stride and offset of the register into account, which means that set_smear() on the result of e.g. another set_smear() call or an offset() call would give a bogus region as result. - It was an inherently destructive operation. See the 'nir_intrinsic_shader_clock' hunk below for how this could lead to subtle bugs in cases where set_smear() was called multiple times on the same register like 'r.set_smear(0), r.set_smear(1)' with the expectation that each call would return a separate value instead of a reference to the same subsequently mutated object. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Use region_contained_in() in compute-to-mrf coalescing pass.	Francisco Jerez	2016-09-14	1	-3/+2
\| \| \| \|	Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Simplify a bunch of fs_inst::size_written calculations by using ↵	Francisco Jerez	2016-09-14	1	-1/+1
\| \| \| \| \| \| \| \| \|	component_size(). Using component_size() is easier and generally more correct because it takes into account the register type and stride for you. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Simplify and fix buggy stride/offset calculations using subscript().	Francisco Jerez	2016-09-14	1	-50/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These were bashing the 'offset' and 'stride' values of several registers without taking the previous value into account, which probably didn't matter in practice for optimize_frontfacing_ternary() because the 'tmp' register already had a known region, but it would have given the wrong region as result in the other cases in lower_integer_multiplication(). subscript(..., i) is a more straightforward way to take the i-th field of a given type from each channel of a register which should give the right answer as result regardless of the original 'offset' and 'stride' parameters of the register region. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Simplify get_fpu_lowered_simd_width() by using inequalities instead ↵	Francisco Jerez	2016-09-14	1	-2/+2
\| \| \| \| \| \|	of rounding. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Fix signedness of the return value of fs_inst::size_read().	Francisco Jerez	2016-09-14	1	-1/+1
\| \| \| \|	Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Switch mask_relative_to() used in compute-to-mrf to byte units.	Francisco Jerez	2016-09-14	1	-10/+10
\| \| \| \| \| \| \|	This makes the helper function less annoying to use and somewhat more accurate. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Fix bogus sub-MRF offset calculation in compute-to-mrf.	Francisco Jerez	2016-09-14	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \|	The 'scan_inst->dst.offset % REG_SIZE' term in the final 'scan_inst->dst.offset' calculation is obviously bogus. The offset from the start of the copy destination register 'inst->dst' where the destination of the generating instruction 'scan_inst' would be written to (before compute-to-mrf runs) is just the offset of 'scan_inst->dst' relative to the source of the copy instruction (AKA rel_offset in the code below). Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Take into account copy register offset during compute-to-mrf.	Francisco Jerez	2016-09-14	1	-1/+1
\| \| \| \| \| \| \| \| \|	This was dropping 'inst->dst.offset' on the floor. Nothing in the code above seems to guarantee that it's zero and in that case the offset of the register being coalesced into wouldn't be taken into account while rewriting the generating instruction. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Drop fs_inst::overwrites_reg() in favor of regions_overlap().	Francisco Jerez	2016-09-14	1	-6/+0
\| \| \| \| \| \| \| \| \| \| \|	fs_inst::overwrites_reg is rather easy to misuse because it cannot tell how large the register region starting at 'reg' is, so in cases where the destination region starts after 'reg' it may give a misleading result. regions_overlap() is somewhat more verbose to use but handles arbitrary overlap correctly so it should generally be used instead. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Don't consider LOAD_PAYLOAD with stride > 1 source to behave like a ↵	Francisco Jerez	2016-09-14	1	-1/+1
\| \| \| \| \| \| \| \|	raw copy. Noticed the problem by inspection while typing in the previous commit. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Don't consider LOAD_PAYLOAD with sub-GRF offset to behave like a ↵	Francisco Jerez	2016-09-14	1	-1/+1
\| \| \| \| \| \| \| \| \|	raw copy. This was likely the original intention, and at least register coalesce relies on it. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Take into account misalignment in regs_written() and regs_read().	Francisco Jerez	2016-09-14	1	-25/+1
\| \| \| \| \| \| \| \| \| \| \| \|	There was a workaround for this in fs_inst::size_read() for the SHADER_OPCODE_MOV_INDIRECT instruction and FIXED_GRF register file only. We should take this possibility into account for the sources and destinations of all instructions on all optimization passes that need to quantize dataflow in 32B increments by adding the amount of misalignment to the size read or written from the regs_read() and regs_written() helpers respectively. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Return more accurate read size for LINTERP from fs_inst::size_read.	Francisco Jerez	2016-09-14	1	-1/+1
\| \| \| \| \| \| \| \|	The LINTERP virtual instruction only reads three scalar components from the first 16B of the second source, we can now teach size_read() about it since its return value is represented with byte granularity. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Return more accurate read size from fs_inst::size_read for IMM and ↵	Francisco Jerez	2016-09-14	1	-1/+1
\| \| \| \| \| \|	UNIFORM files. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Replace fs_inst::regs_read with ::size_read using byte units.	Francisco Jerez	2016-09-14	1	-24/+22
\| \| \| \| \| \| \| \| \| \| \| \| \|	The previous regs_read value can be recovered by rewriting each reference of regs_read() like 'x = i.regs_read(j)' to 'x = DIV_ROUND_UP(i.size_read(j), reg_unit)'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Replace fs_inst::regs_written with ::size_written field in bytes.	Francisco Jerez	2016-09-14	1	-36/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous regs_written field can be recovered by rewriting each rvalue reference of regs_written like 'x = i.regs_written' to 'x = DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference like 'i.regs_written = x' to 'i.size_written = x * reg_unit'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Add wrapper functions for fs_inst::regs_read and ::regs_written.	Francisco Jerez	2016-09-14	1	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is in preparation for dropping fs_inst::regs_read and ::regs_written in favor of more accurate alternatives expressed in byte units. The main reason these wrappers are useful is that a number of optimization passes implement dataflow analysis with register granularity, so these helpers will come in handy once we've switched register offsets and sizes to the byte representation. The wrapper functions will also make sure that GRF misalignment (currently neglected by most of the back-end) is taken into account correctly in the calculation of regs_read and regs_written. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Replace fs_reg::subreg_offset with fs_reg::offset expressed in bytes.	Francisco Jerez	2016-09-14	1	-17/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The fs_reg::subreg_offset and ::offset fields are now redundant, the sub-GRF offset can just be added to the single ::offset field expressed in byte units. The current subreg_offset value can be recovered by applying the following rule: Replace each rvalue reference of subreg_offset like 'x = r.subreg_offset' with 'x = r.offset % reg_unit', and each lvalue reference like 'r.subreg_offset = x' with 'r.offset = ROUND_DOWN_TO(r.offset, reg_unit) + x'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Replace fs_reg::reg_offset with fs_reg::offset expressed in bytes.	Francisco Jerez	2016-09-14	1	-27/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The fs_reg::offset field in byte units introduced in this patch is a more straightforward alternative to the current register offset representation split between fs_reg::reg_offset and ::subreg_offset. The split representation makes it too easy to forget about one of the offsets while dealing with the other, which has led to multiple back-end bugs in the past. To make the matter worse the unit reg_offset was expressed in was rather inconsistent, for uniforms it would be expressed in either 4B or 16B units depending on the back-end, and for most other things it would be expressed in 32B units. This encodes reg_offset as a new offset field expressed consistently in byte units. Each rvalue reference of reg_offset in existing code like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and each lvalue reference like 'r.reg_offset = x' is rewritten to 'r.offset = r.offset % reg_unit + x * reg_unit'. Because the change affects a lot of places and is rather non-trivial to verify due to the inconsistent value of reg_unit, I've tried to avoid making any additional changes other than applying the rewrite rule above in order to keep the patch as simple as possible, sometimes at the cost of introducing obvious stupidity (e.g. algebraic expressions that could be simplified given some knowledge of the context) -- I'll clean those up later on in a second pass. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/fs: Fail the shader compile instead of asserting when we can't spill	Jason Ekstrand	2016-09-08	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \|	Blorp doesn't handle spilling so we set allow_spilling to false in that case. The blorp 16x MSAA resolve shader spills in 16-wide but not 8-wide. This commit makes it so that we fail the 16-wide compile and successfully fall back to 8-wide instead of just assert-failing when trying to compile the 16-wide shader. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
*	intel: s/brw_device_info/gen_device_info/	Jason Ekstrand	2016-09-03	1	-15/+15
\| \| \| \| \| \| \| \| \| \| \| \| \|	Generated by: sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/*/.c sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/*/.h sed -i -e 's/brw_device_info/gen_device_info/g' */i965/.c sed -i -e 's/brw_device_info/gen_device_info/g' */i965/.cpp sed -i -e 's/brw_device_info/gen_device_info/g' */i965/.h Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965/fs: Define logical framebuffer read opcode and lower it to physical reads.	Francisco Jerez	2016-08-25	1	-0/+24
\| \| \| \|	Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/fs: Define framebuffer read virtual opcode.	Francisco Jerez	2016-08-25	1	-0/+2
\| \| \| \|	Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/fs: Emit interpolation setup if non-coherent framebuffer fetch is in use.	Francisco Jerez	2016-08-25	1	-1/+2
\| \| \| \| \| \| \| \|	This will be required for the next commit since the non-coherent path makes use of the fragment coordinates implicitly, so they need to be calculated. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/fs: Force per-sample dispatch if the shader reads from a multisample FBO.	Francisco Jerez	2016-08-25	1	-1/+2
\| \| \| \| \| \| \| \| \|	The result of a framebuffer fetch from a multisample FBO is inherently per-sample, so the spec requires at least those sections of the shader that depend on the framebuffer fetch result to be executed once per sample. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/fs: Switch to per-subspan discard jumps.	Francisco Jerez	2016-08-18	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ANY4H is more efficient than ANY8H and ANY16H because it makes sure that whenever a whole subspan hits a discard statement it gets disabled by the EU until the end of the program, regardless of whether the discard condition is uniform across all channels of the SIMD8-16 thread. OTOH ANY8H/ANY16H would cause the rest of the program to be executed for all channels if only one of the channels hadn't taken the discard branch, potentially increasing the bandwidth and ALU usage of the program unnecessarily. This change increases the FPS by over 3x of a simple micro-benchmark that discards a bunch of fragments and then does a single costly texturing operation. I've just re-verified the FPS change on HSW and SKL, but I expect all platforms from Gen6 up to get a similar benefit. Note that we could potentially be more aggressive and use the NORMAL predicate to discard individual channels, but that would need to happen post-scheduling because the scheduler currently doesn't care to reorder HALT instructions with respect to other instructions, and the NORMAL predicate would cause the results of subsequent derivative computations to become undefined -- If the scheduler didn't reorder HALT instructions it would actually be safe to switch to NORMAL because the behavior of derivative computations after a non-uniform discard statement is undefined by the GLSL spec, but that would make the optimization implemented by one of the following commits somewhat more difficult. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
*	i965/fs: Estimate maximum sampler message execution size more accurately.	Francisco Jerez	2016-08-16	1	-37/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current logic used to determine the execution size of sampler messages was based on special-casing several argument and opcode combinations, which unsurprisingly missed the possibility that some messages could exceed the payload size limit or not depending on the number of coordinate components present. In particular: - The TXL, TXB and TEX messages (the latter on non-FS stages only) would attempt to use SIMD16 on Gen7+ hardware even if a shadow reference was present and the texture was a cubemap array, causing it to overflow the maximum supported sampler payload size and crash. - The TG4_OFFSET message with shadow comparison was falling back to SIMD8 regardless of the number of coordinate components, which is unnecessary when two coordinates or less are present. Both cases have been handled incorrectly ever since cubemap arrays and texture gather were respectively enabled (the current logic used by the SIMD lowering pass is almost unchanged from the previous no16 fall-back logic used pre-SIMD lowering times). Fixes the following GL4.5 conformance test on Gen7-8 (the bug also affects Gen9+ in principle, but SKL passes the test by luck because it manages to use the TXL_LZ message instead of TXL): GL45-CTS.texture_cube_map_array.sampling Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97267 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/fs: Return zero from fs_inst::components_read for non-present sources.	Francisco Jerez	2016-08-16	1	-2/+5
\| \| \| \| \| \| \| \| \|	This makes it easier for the caller to find out how many scalar components are actually read by the instruction. As a bonus we no longer need to special-case BAD_FILE in the implementation of fs_inst::regs_read. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/fs: Lower TEX to TXL during NIR translation.	Francisco Jerez	2016-08-16	1	-10/+0
\| \| \| \| \| \| \| \|	This simplifies the code slightly and will allow the SIMD lowering pass to find out easily what the actual texturing opcode is in order to determine the maximum execution size of texturing instructions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: fix comparison warning	Timothy Arceri	2016-08-01	1	-1/+1
\| \| \| \|	Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Fix move_interpolation_to_top() pass.	Kenneth Graunke	2016-07-29	1	-21/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The pass I introduced in commit a2dc11a7818c04d8dc0324e8fcba98d60bae was entirely broken. A missing "break" made the load_interpolated_input case always fall through to "default" and hit a "continue", making it not actually move any load_interpolated_input intrinsics at all. It would only move the simple load_barycentric_* intrinsics, which don't emit any code anyway, making it basically useless. The initial version I sent of the pass worked, but I apparently failed to verify that the simplified version in v2 actually worked. With the obvious fix applied (so we actually tried to move load_interpolated_input intrinsics), I discovered a second bug: we weren't moving the offset SSA def to the top, breaking SSA validation. The new version of the pass actually moves load_interpolated_input intrinsics and all their dependencies, as intended. Papers over GPU hangs on Ivybridge and Baytrail caused by the recent NIR FS input rework by restoring the old behavior. (I'm not honestly sure why they hang with PLN not at the top.) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97083 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Include VUE handles for GS with invocations > 1.	Kenneth Graunke	2016-07-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We always resort to the pull model for instanced GS inputs. So, we'd better include the VUE handles, or else we can't actually pull anything. Ian reports that on his branch with OES_geometry_shader enabled, this fixes a bunch of dEQP-GLES31.functional.geometry_shading tests:: - instanced.draw_2_instances_geometry_2_invocations - instanced.draw_2_instances_geometry_8_invocations - instanced.draw_4_instances_geometry_2_invocations - instanced.draw_4_instances_geometry_8_invocations - instanced.draw_8_instances_geometry_2_invocations - instanced.draw_8_instances_geometry_8_invocations - instanced.geometry_2_invocations - instanced.geometry_32_invocations - instanced.geometry_8_invocations - instanced.geometry_max_invocations - instanced.geometry_output_different_2_invocations - instanced.geometry_output_different_32_invocations - instanced.geometry_output_different_8_invocations - instanced.geometry_output_different_max_invocations - instanced.invocation_output_vary_by_attribute - instanced.invocation_output_vary_by_texture - instanced.invocation_output_vary_by_uniform - query.primitives_generated_instanced Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Tested-by: Ian Romanick <ian.d.romanick@intel.com>
*	i965: bring back type_size_vec4_times_4()	Timothy Arceri	2016-07-21	1	-0/+13
\| \| \| \| \| \| \| \|	We will use this for output varyings. To make component packing simpler we will just treat all varyings as vec4s. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>