external_mesa3d.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	i965/fs: Readd opt_drop_redundant_mov_to_flags().	Matt Turner	2016-04-21	2	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit b449366587b5f3f64c6fb45fe22c39e4bc8a4309. I removed the pass thinking that it was now not useful, but that was not true. I believe I ran shader-db on HSW and saw no results, but HSW does not use the unlit centroid workaround code and as a result does not emit redundant MOV_DISPATCH_TO_FLAGS instructions. On IVB, the shader-db results are: total instructions in shared programs: 6650806 -> 6646303 (-0.07%) instructions in affected programs: 106893 -> 102390 (-4.21%) helped: 793 total cycles in shared programs: 56195538 -> 56103720 (-0.16%) cycles in affected programs: 873048 -> 781230 (-10.52%) helped: 553 HURT: 209 On SNB, the shader-db results are: total instructions in shared programs: 7173074 -> 7168541 (-0.06%) instructions in affected programs: 119757 -> 115224 (-3.79%) helped: 799 total cycles in shared programs: 98128032 -> 98072938 (-0.06%) cycles in affected programs: 1437104 -> 1382010 (-3.83%) helped: 454 HURT: 237 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/blorp: Do not emit pma stall on gen9+	Topi Pohjolainen	2016-04-21	1	-1/+3
\| \| \| \| \| \| \|	This was left out from the original gen8 upload introduction. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: automake: remove gratuitous "+" during variable assignment	Emil Velikov	2016-04-21	1	-2/+2
\| \| \| \| \| \| \|	There is not initial assignment, thus appending to it does not work. Fixes: b27c85c4c08 "i965: add build rule for brw_nir_trig_workarounds.c" Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
*	i965: add build rule for brw_nir_trig_workarounds.c on Android	Rob Herring	2016-04-21	4	-2/+49
\| \| \| \| \| \| \| \| \| \| \|	Commit bfd17c76c126 ("i965: Port INTEL_PRECISE_TRIG=1 to NIR.") added a generated file brw_nir_trig_workarounds.c which broke the Android build. Add the necessary makefiles to the Android build. Cc: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Rob Herring <robh@kernel.org> Tested-by: Chih-Wei Huang <cwhuang@linux.org.tw> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
*	i965/tiled_memcpy: don't unconditionally use __builtin_bswap32	Jonathan Gray	2016-04-21	1	-1/+14
\| \| \| \| \| \| \| \| \| \|	Use the defines Mesa configure sets to indicate presence of the bswap32 builtins. This lets i965 work on OpenBSD again after the changes that were made in 0a5d8d9af42fd77fce1492d55f958da97816961a. Signed-off-by: Jonathan Gray <jsg@jsg.id.au> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965/blorp: Reduce the urb size requirement for vertex buffer	Topi Pohjolainen	2016-04-21	1	-5/+4
\| \| \| \| \|	Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp: Reduce the size of vertex buffer	Topi Pohjolainen	2016-04-21	1	-12/+19
\| \| \| \| \| \| \| \| \| \| \|	Previously the vertex buffer consisted of eight floats per vertex of which six where constants. These can be as easily provided by vertex fetcher as it is capable of filling vertex elements with constant one and zero. This reduces the size of the vertex buffer from 3 * 8 * 4 = 96 to 3 * 2 * 4 = 24 bytes. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp: Do not tricker urb re-configuration unnecessarily	Topi Pohjolainen	2016-04-21	2	-1/+5
\| \| \| \| \|	Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp: Skip re-emitting urb config whenever possible	Topi Pohjolainen	2016-04-21	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \|	Otherwise clearing with blorp will regress performance in some synthetic test cases. v2: Used vsize >= 2 instead of vsize > 0, and updated the comment. Review by Ken in one of the earlier patches revealed this. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp: Prepare to switch from compute pipeline	Topi Pohjolainen	2016-04-21	1	-0/+2
\| \| \| \| \|	Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp: Skip uploading state/options not needed for clears	Topi Pohjolainen	2016-04-21	3	-17/+37
\| \| \| \| \| \| \| \| \|	In case there is no source it means the program does a simple clear or a resolve. In such case there is no need to program sampling state or enable pixel kill in fragment shader. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp: Re-introduce clear programs	Topi Pohjolainen	2016-04-21	5	-4/+473
\| \| \| \| \| \| \|	This partially reverts 2f28a0dc23165123cf1e8b5942acad37878edd8a Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/meta: Move check for srgb into is_color_fast_clear_compatible()	Topi Pohjolainen	2016-04-21	2	-17/+19
\| \| \| \| \|	Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/meta: Expose check for fast clear compatibility	Topi Pohjolainen	2016-04-21	2	-20/+25
\| \| \| \| \| \| \|	Also add the additional render format check to the same utility. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/meta: Expose fast clear value setup	Topi Pohjolainen	2016-04-21	2	-5/+10
\| \| \| \| \|	Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/meta: Expose non-fast clear rectangle calculation	Topi Pohjolainen	2016-04-21	2	-10/+21
\| \| \| \| \|	Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/meta: Expose resolve clear rectangle calculation	Topi Pohjolainen	2016-04-21	2	-7/+15
\| \| \| \| \|	Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/meta: Expose fast clear rectangle calculation	Topi Pohjolainen	2016-04-21	2	-19/+33
\| \| \| \| \|	Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Declare input to mcs alignment calculation constant	Topi Pohjolainen	2016-04-21	2	-2/+2
\| \| \| \| \|	Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp: Switch the order of render and texture targets	Topi Pohjolainen	2016-04-21	2	-1/+5
\| \| \| \| \| \| \| \|	On gen8 color resolving won't work anymore if the target isn't the first entry in the binding table. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp: Reduce scope for generator and its inputs	Topi Pohjolainen	2016-04-21	3	-26/+25
\| \| \| \| \| \| \|	Generator is only needed for getting the assembly. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp: Add support for disabling color blending	Topi Pohjolainen	2016-04-21	4	-0/+19
\| \| \| \| \|	Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp: Add support for setting fast clear operation	Topi Pohjolainen	2016-04-21	4	-0/+5
\| \| \| \| \|	Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp: Enable blits on gen8	Topi Pohjolainen	2016-04-21	3	-6/+9
\| \| \| \| \| \| \| \|	v2 (Ken): Moved switch cases for gen8/9 in texel_fetch() to earlier patch adding gen8/9 sampling support. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp: Prepare stencil sampling for gen8	Topi Pohjolainen	2016-04-21	2	-3/+4
\| \| \| \| \|	Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp: Add check for supported sample numbers	Topi Pohjolainen	2016-04-21	2	-2/+17
\| \| \| \| \| \| \| \|	v2 (Ken): Fix the condition on using meta for stencil blits: use_blorp -> !use_blorp Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp: Add support for sampling 3D textures	Topi Pohjolainen	2016-04-21	3	-6/+23
\| \| \| \| \| \| \| \| \| \| \|	This patch adds additional MOV instruction for all blorp programs that use SHADER_OPCODE_TXF. Alternative is to augment blorp program key to tell if z-coordinate is needed, add condition to the blorp blit compiler and to produce a variant with and without the MOV. This seems a little overkill. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp: Add support for source swizzle	Topi Pohjolainen	2016-04-21	5	-9/+29
\| \| \| \| \| \| \| \| \| \| \| \| \|	In order to support cases where gen9 uses RGBA format to back client requested RGB, one needs to have means to force alpha channel to one when user requested RGB surface is used as blit source. v2 (Ken): Use helper for constructing the swizzle (this should be changed to use brw_get_texture_swizzle() as a follow-up). Also calculate the swizzle for CopyTexSubImage. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp: Pipeline upload support for gen8	Topi Pohjolainen	2016-04-21	3	-0/+699
\| \| \| \| \| \| \| \|	v2 (Ken): Drop GEN8_RASTER_FRONT_WINDING_CCW in raster state Add emission of pma stall. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/gen8: Expose pma stall emission	Topi Pohjolainen	2016-04-21	2	-4/+8
\| \| \| \| \|	Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Allow texture surface state setup to be used by blorp	Topi Pohjolainen	2016-04-21	4	-6/+12
\| \| \| \| \|	Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp: Prepare sampling for gen9	Topi Pohjolainen	2016-04-21	1	-2/+14
\| \| \| \| \| \| \| \|	v2 (Ken): Added switch cases for gen8/9 in texel_fetch(). These were wrongly introduced in blit-enabling patch. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp: Prepare render target write for gen8	Topi Pohjolainen	2016-04-21	5	-8/+10
\| \| \| \| \| \| \| \|	v2 (Ken): Use payload directly instead of retyping it into vec8. Drop the implied header, it isn't used for gen6+ anyway. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp/gen6: Prepare vertex buffer setup logic for gen8	Topi Pohjolainen	2016-04-21	1	-8/+22
\| \| \| \| \|	Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp/gen7: Expose state setup applicable to gen8	Topi Pohjolainen	2016-04-21	2	-11/+50
\| \| \| \| \|	Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp: Use 8k chunk size for urb allocation	Topi Pohjolainen	2016-04-21	1	-5/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, we hardcoded "VS URB Starting Address" to 2 (in 8kB chunks), which meant VS URB data would start at an offset of 16kB. However, on Haswell GT3 and Gen8+, we allocate the first 32kB for the push constant region. This means that the PS push constant and VS URB data regions overlap, which can lead to corruption. v2 (Ken): Better description of the change, and do not change vs_size from 2 to 1. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp/gen7: Prepare re-using for gen8	Topi Pohjolainen	2016-04-21	1	-2/+4
\| \| \| \| \|	Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/blorp: Let compiler calculate the vertex buffer size	Topi Pohjolainen	2016-04-21	1	-21/+10
\| \| \| \| \| \| \| \| \|	Currently the size is sizeof(float) times too large. One reserves GEN6_BLORP_VBO_SIZE many floats whereas GEN6_BLORP_VBO_SIZE stands for the size of vertex buffer in bytes. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/gen8: Expose state base address setup	Topi Pohjolainen	2016-04-21	2	-2/+4
\| \| \| \| \|	Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/gen8: Expose surface state helpers	Topi Pohjolainen	2016-04-21	2	-25/+41
\| \| \| \| \|	Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/gen9: Use correct size for DS_STATE	Topi Pohjolainen	2016-04-21	1	-4/+18
\| \| \| \| \|	Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Fix interpolateAtSample() on single sampled buffers.	Kenneth Graunke	2016-04-20	1	-0/+15
\| \| \| \| \| \| \| \| \| \|	Fixes dEQP-GLES31.functional.shaders.multisample_interpolation tests: - interpolate_at_sample.non_multisample_buffer.sample_n_default_framebuffer - interpolate_at_sample.non_multisample_buffer.sample_n_singlesample_rbo - interpolate_at_sample.non_multisample_buffer.sample_n_singlesample_texture Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Fix gl_SampleMaskIn[] in per-sample shading mode.	Kenneth Graunke	2016-04-20	3	-2/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The coverage mask is not sufficient - in per-sample mode, we also need to AND with a mask representing the samples being processed by the current fragment shader invocation. Fixes 18 dEQP-GLES31.functional.shaders.sample_variables tests: sample_mask_in.bit_count_per_sample.multisample_{rbo,texture}_{1,2,4,8} sample_mask_in.bit_count_per_two_samples.multisample_{rbo,texture}_{4,8} sample_mask_in.bits_unique_per_sample.multisample_{rbo,texture}_{1,2,4,8} sample_mask_in.bits_unique_per_two_samples.multisample_{rbo,texture}_{4,8} Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Only enable oMask output when there's a multisample FBO.	Kenneth Graunke	2016-04-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ARB_sample_shading specification says that setting gl_SampleMask bits to 0 means that the corresponding sample "should be considered uncovered for the purposes of multisample fragment operations (Section 4.1.3)." The OpenGL 4.4 specification, section 17.3.3 ("Multisample Fragment Operations") specifies: "No changes to the fragment alpha or coverage values are made at this step if MULTISAMPLE is disabled, or if the value of SAMPLE_BUFFERS is not one." oMask output alters coverage masks and can kill pixels. We need to disable it in the above case, which conveniently corresponds to key->multisample_fbo being false. Khronos bug #12188 also spells this out clearly: https://cvs.khronos.org/bugzilla/show_bug.cgi?id=12188 Fixes two Piglit tests: tests/spec/arb_sample_shading/builtin-gl-sample-mask-simple 0 tests/spec/arb_sample_shading/builtin-gl-sample-mask 0 Fixes 21 ES3 conformance tests: ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_zero ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_0 ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_1 ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_2 ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_3 ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_7 ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_zero ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_3 ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_4 ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_5 ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_7 ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_zero ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_2 ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_3 ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_4 ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_6 ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_zero ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_0 ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_2 ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_5 ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_7 Fixes 9 dEQP-GLES31.functional.shaders.sample_variables tests: sample_mask.discard_half_per_pixel.default_framebuffer sample_mask.discard_half_per_pixel.singlesample_rbo sample_mask.discard_half_per_pixel.singlesample_texture sample_mask.discard_half_per_sample.default_framebuffer sample_mask.discard_half_per_sample.singlesample_rbo sample_mask.discard_half_per_sample.singlesample_texture sample_mask.discard_half_per_two_samples.default_framebuffer sample_mask.discard_half_per_two_samples.singlesample_rbo sample_mask.discard_half_per_two_samples.singlesample_texture Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Generalize wm_key->compute_sample_id to wm_key->multisample_fbo.	Kenneth Graunke	2016-04-20	3	-7/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm going to need a key entry meaning "we have a multisample FBO, and multisampling is enabled" in an upcoming patch. This is basically wm_key->compute_sample_id, except that it also checks that the SAMPLE_ID system value is read. The only use of wm_key->compute_sample_id is in emit_sampleid_setup(), which is only called when handling the SAMPLE_ID system value. So we can just eliminate the check and generalize the field. v2: Also update the Vulkan driver. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Delete now dead persample_2x FS program key flag.	Kenneth Graunke	2016-04-20	2	-5/+0
\| \| \| \| \| \| \| \| \| \|	This was only used by the old gl_SampleID calculations. The new code doesn't need to handle 2x specially. v2: Delete it from the Vulkan driver, too. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Simplify gl_SampleID setup on Gen8+.	Kenneth Graunke	2016-04-20	1	-5/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Gen7+, the thread payload provides the sample ID - we can read it in two instructions, without any elaborate calculations. We don't even need a state dependency - this will properly produce zero in the non-MSAA case. Unfortunately, we need the state flag anyway, so we may as well continue to use it to produce a single MOV 0 instead of SHR/AND. For some reason, the sample ID field is always zero on Gen7/7.5, so we can't use this yet. However, it works fine on Gen8+. So, land the code and use it where it's working, and leave a TODO for later. v2: Fix register types in the comment (caught by Matt Turner!). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Flip key->compute_sample_id check.	Kenneth Graunke	2016-04-20	1	-7/+7
\| \| \| \| \| \| \|	This just moves the simple case first. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Properly handle integer types in opt_vector_float().	Kenneth Graunke	2016-04-20	1	-4/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, opt_vector_float() always interpreted MOV sources as floating point, and always created a MOV with a F-type destination. This meant that we could mess up sequences of integer loads, such as: mov vgrf6.0.x:D, 0D mov vgrf6.0.y:D, 1D mov vgrf6.0.z:D, 2D mov vgrf6.0.w:D, 3D Here, integer 0/1/2/3 become approximately 0.0f, so we generated: mov vgrf6.0:F, [0F, 0F, 0F, 0F] which is clearly wrong. We can properly handle this by converting integer values to float (rather than bitcasting), and emitting a type converting MOV: mov vgrf6.0:D, [0F, 1F, 2F, 3F] To do this, see first see if the integer values (converted to float) are representable. If so, we use a D-type MOV. If not, we then try the floating point values and an F-type MOV. We make zero not impose type restrictions. This is important because 0D would imply a D-type MOV, but is often used in sequences such as MOV 0D, MOV 0x3f800000D, where we want to use an F-type MOV. Fixes about 54 dEQP-GLES2 failures with the vec4 VS backend. This recently became visible due to changes in opt_vector_float() which made it optimize more cases, but it was a pre-existing bug. Apparently it also manages to turn more integer loads into VFs, producing the following shader-db statistics on Haswell: total instructions in shared programs: 7084195 -> 7082191 (-0.03%) instructions in affected programs: 246027 -> 244023 (-0.81%) helped: 1937 total cycles in shared programs: 65669642 -> 65651968 (-0.03%) cycles in affected programs: 531064 -> 513390 (-3.33%) helped: 1177 v2: Handle the type of zero better. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Make opt_vector_float() only handle non-type-conversion MOVs.	Kenneth Graunke	2016-04-20	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't handle this properly - we'd have to perform the type conversion before trying to convert the value to a VF. While we could do that, it doesn't seem particularly useful - most vector loads should be consistently typed (all float or all integer). As a special case, we do allow type-converting MOVs of integer 0, as it's represented the same regardless of the type. I believe this case does actually come up. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>