external_mesa3d.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	i965: Introduce downcast helpers for prog_data structures.	Kenneth Graunke	2016-10-05	1	-0/+17
\| \| \| \| \| \| \|	Similar to brw_context(...), intel_texture_object(...), and so on. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
*	i965/ir: Skip eliminate_find_live_channel() for stages with sparse thread ↵	Francisco Jerez	2016-09-21	1	-0/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dispatch. The eliminate_find_live_channel optimization eliminates FIND_LIVE_CHANNEL instructions in cases where control flow is known to be uniform, and replaces them with 'MOV 0', which in turn unblocks subsequent elimination of the BROADCAST instruction frequently used on the result of FIND_LIVE_CHANNEL. This is however not correct in per-sample fragment shader dispatch because the PSD can dispatch a fully unlit sample under certain conditions. Disable the optimization in that case. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> v2: Add devinfo argument to brw_stage_has_packed_dispatch() to implement hardware generation check.
*	intel: s/brw_device_info/gen_device_info/	Jason Ekstrand	2016-09-03	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	Generated by: sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/*/.c sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/*/.h sed -i -e 's/brw_device_info/gen_device_info/g' */i965/.c sed -i -e 's/brw_device_info/gen_device_info/g' */i965/.cpp sed -i -e 's/brw_device_info/gen_device_info/g' */i965/.h Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	intel: Add a new "common" library for more code sharing	Jason Ekstrand	2016-09-03	1	-1/+1
\| \| \| \| \| \| \|	The first thing to go in this new library is brw_device_info. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965: Allocate space in the binding table for non-coherent FB fetch.	Francisco Jerez	2016-08-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unfortunately due to the inconsistent meaning of some surface state structure fields, we cannot re-use the same binding table entries for sampling from and rendering into the same set of render buffers, so we need to allocate a separate binding table block specifically for render target reads if the non-coherent path is in use. The slight noise is due to the change of brw_assign_common_binding_table_offsets to return the next available binding table index rather than void. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/fs: Add brw_wm_prog_key bit specifying whether FB reads should be coherent.	Francisco Jerez	2016-08-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some of the following changes in this series are specific to the non-coherent path, so I need some way to tell whether the coherent or non-coherent path is in use. The flag defaults to the value of the gl_extensions::MESA_shader_framebuffer_fetch enable so that it can be overridden easily on hardware that supports both framebuffer fetch extensions in order to test the non-coherent path, like: MESA_EXTENSION_OVERRIDE=-GL_EXT_shader_framebuffer_fetch (Of course trying to force-enable the coherent framebuffer fetch extension on hardware without native support won't work and lead to assertion failures). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Implement the WaPreventHSTessLevelsInterference workaround.	Kenneth Graunke	2016-08-18	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	Fixes several GL44-CTS.tessellation_shader (and GL45 and ES31) subcases: - vertex_spacing - tessellation_shader_point_mode.points_verification - tessellation_shader_quads_tessellation.inner_tessellation_level_rounding Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
*	i965: Fix encode_slm_size() to take a generation, not a device info.	Kenneth Graunke	2016-06-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the Vulkan driver, we have the generation number (a compile time constant) but not necessarily the brw_device_info struct. I meant to rework the function to take a generation number instead of a brw_device_info pointer to accomodate this. But I forgot, and left it taking a brw_device_info pointer, while making Vulkan pass the generation number (8, 9, ...) directly. This led to crashes. Brown paper bag fix for commit 87d062a94080373995170f51063a9649. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96504 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Fix shared local memory size for Gen9+.	Kenneth Graunke	2016-06-12	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Skylake changes the representation of shared local memory size: Size \| 0 kB \| 1 kB \| 2 kB \| 4 kB \| 8 kB \| 16 kB \| 32 kB \| 64 kB \| ------------------------------------------------------------------- Gen7-8 \| 0 \| none \| none \| 1 \| 2 \| 4 \| 8 \| 16 \| ------------------------------------------------------------------- Gen9+ \| 0 \| 1 \| 2 \| 3 \| 4 \| 5 \| 6 \| 7 \| The old formula would substantially underallocate the amount of space. This fixes GPU hangs on Skylake when running with full thread counts. v2: Fix the Vulkan driver too, use a helper function, and fix the table in the comments and commit message. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965/fs Add a wm_prog_data bit for has_side_effects	Jason Ekstrand	2016-06-03	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	This is more accurate than calling _mesa_active_fragment_shader_has_side_effects because it looks at whether or not the SSBOs, images, or atomic buffers are actually written rather than just existing in the program. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "12.0" <mesa-stable@lists.freedesktop.org>
*	i965: Remove old CS local ID handling	Jordan Justen	2016-06-01	1	-8/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	The old method pushed data for each channels uvec3 data of gl_LocalInvocationID. The new method pushes 1 dword of data that is a 'thread local ID' value. Based on that value, we can generate gl_LocalInvocationIndex and gl_LocalInvocationID with some calculations. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
*	i965: Add CS push constant info to brw_cs_prog_data	Jordan Justen	2016-06-01	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need information about push constants in a few places for the GL driver, and another couple places for the vulkan driver. When we add support for uploading both a common (cross-thread) set of push constants, combined with the previous per-thread push constant data, things are going to get even more complicated. To simplify things, we add push constant info into the cs prog_data struct. The cross-thread constant support is added as of Haswell. To support it we need to make sure all push constants with uniform values are added to earlier registers. The register that varies per thread and holds the thread invocation's unique local ID needs to be added last. For now we add the code that would calculate cross-thread constatn information for hsw+, but we force it (cross_thread_supported) off until the other parts of the driver support it. v4: * Support older local ID push constant layout as well. (Jason) Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
*	i965: Store number of threads in brw_cs_prog_data	Jordan Justen	2016-06-01	1	-0/+1
\| \| \| \| \| \|	Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
*	i965: Add uniform for a CS thread local base ID	Jordan Justen	2016-06-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	v4: * Force thread_local_id_index to -1 for now, and have fs_visitor::setup_cs_payload look at thread_local_id_index. This enables us to more easily cut over from the old local ID layout to the new layout, as suggested by Jason. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
*	i965/fs: Implement SIMD32 register allocation support.	Francisco Jerez	2016-05-27	1	-1/+1
\| \| \| \|	Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
*	i965: Invoke lowering pass for YUV textures	Kristian Høgsberg Kristensen	2016-05-24	1	-0/+7
\| \| \| \|	Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965: Support textures with multiple planes	Kristian Høgsberg Kristensen	2016-05-24	1	-0/+1
\| \| \| \|	Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965: Delete brw_wm_prog_key::render_to_fbo and drawable_height.	Kenneth Graunke	2016-05-20	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Now that we handle flipping and other gl_FragCoord transformations via a uniform, these key fields have no users. This patch actually eliminates the associated recompiles. The Tomb Raider benchmark's minimum FPS increases from ~1 FPS to a reasonable number. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965/fs: Add an allow_spilling flag to brw_compile_fs	Jason Ekstrand	2016-05-17	1	-0/+1
\| \| \| \| \| \| \| \| \|	This allows us to disable spilling for blorp shaders since blorp state setup doesn't handle spilling. Without this, blorp fails hard if you run with INTEL_DEBUG=spill. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Tested-by: Francisco Jerez <currojerez@riseup.net>
*	i965/fs: calculate first non-payload GRF using attrib slots	Juan A. Suarez Romero	2016-05-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	When computing where the first non-payload GRF starts, we can't rely on the number of attributes, as each attribute can be using 1 or 2 slots depending on whether they are a dvec3/4 or other. Instead, we need to use the number of slots used by the attributes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/fs: Organize prog_data by ksp number rather than SIMD width	Jason Ekstrand	2016-05-14	1	-5/+7
\| \| \| \| \| \| \| \| \| \|	The hardware packets organize kernel pointers and GRF start by slots that don't map directly to dispatch width. This means that all of the state setup code has to re-arrange the data from prog_data into these slots. This logic has been duplicated 4 times in the GL driver and one more time in the Vulkan driver. Let's just put it all in brw_fs.cpp. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/fs: Rework the persample shading key/prog_data bits	Jason Ekstrand	2016-05-14	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit reworks and simplifies the way we handle persample shading in the shader key and prog_data. The previous approach had three different key bits that had slightly different and hard-to-decern meanings while the new bits are far more clear. This commit changes it to two easily understood bits that communicate everything we need: 1) key->persample_interp: means that the user has requested persample interpolation through the API. This is equivalent to having SAMPLE_SHADING enabled and having MIN_SAMPLE_SHADING_VALUE set high enough that you actually get multiple per-sample invocations. 2) key->multisample_fbo: means that the shader will be running on an actual multi-sampled framebuffer. This commit also adds a new "persample_dispatch" bit to prog_data which indicates that the shader should be run in persample mode. This way the state setup code doesn't have to look at the fragment program or GL state and can just pull that data out of the prog_data. In theory, this shuffle could mean more recompiles. However, in practice, we were shoving enough state into the key before that we were probably hitting a recompile on every per-sample shader anyway. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Add support for GL_ARB_cull_distance	Kristian Høgsberg Kristensen	2016-05-13	1	-0/+2
\| \| \| \| \|	Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Generalize wm_key->compute_sample_id to wm_key->multisample_fbo.	Kenneth Graunke	2016-04-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm going to need a key entry meaning "we have a multisample FBO, and multisampling is enabled" in an upcoming patch. This is basically wm_key->compute_sample_id, except that it also checks that the SAMPLE_ID system value is read. The only use of wm_key->compute_sample_id is in emit_sampleid_setup(), which is only called when handling the SAMPLE_ID system value. So we can just eliminate the check and generalize the field. v2: Also update the Vulkan driver. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Delete now dead persample_2x FS program key flag.	Kenneth Graunke	2016-04-20	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	This was only used by the old gl_SampleID calculations. The new code doesn't need to handle 2x specially. v2: Delete it from the Vulkan driver, too. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965/fs: Add a flat_inputs field to prog_data	Jason Ekstrand	2016-04-06	1	-0/+6
\| \| \| \| \|	Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Add an INTEL_PRECISE_TRIG=1 option to fix SIN/COS output range.	Kenneth Graunke	2016-04-04	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The SIN and COS instructions on Intel hardware can produce values slightly outside of the [-1.0, 1.0] range for a small set of values. Obviously, this can break everyone's expectations about trig functions. According to an internal presentation, the COS instruction can produce a value up to 1.000027 for inputs in the range (0.08296, 0.09888). One suggested workaround is to multiply by 0.99997, scaling down the amplitude slightly. Apparently this also minimizes the error function, reducing the maximum error from 0.00006 to about 0.00003. When enabled, fixes 16 dEQP precision tests dEQP-GLES31.functional.shaders.builtin_functions.precision. {cos,sin}.{highp,mediump}_compute.{scalar,vec2,vec4,vec4}. at the cost of making every sin and cos call more expensive (about twice the number of cycles on recent hardware). Enabling this option has been shown to reduce GPUTest Volplosion performance by about 10%. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
*	i965/gs: Pass VerticesIn though prog_data	Jason Ekstrand	2016-02-11	1	-0/+2
\| \| \| \|	Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/fs: Pass usage of depth, W, and sample mask through prog_data	Jason Ekstrand	2016-02-11	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	We really need to stop pulling information directly out of shaders for state setup. For one thing, if we want any sort of an on-disk shader cache, having all of this metadata in one place is going to be crucial. Also, passing it all through prog_data cleans up the compiler <-> state setup API substantially. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Move brw_compiler_create() to new brw_compiler.c.	Matt Turner	2016-02-01	1	-0/+3
\| \| \| \| \| \| \|	A future patch will want to use designated initalizers, which aren't available in C++, but this is C. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965: Implement a drirc workaround for broken dual color blending.	Kenneth Graunke	2016-01-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	OpenGL's dual color blending feature was specified so that an implementation could support both multiple render targets (MRT) and dual source blending. Fragment shader outputs specify both "location" (the render target number) and "index" (either color 0 or 1). I believe DirectX only has the notion of "location" - if using dual color blending, location 0 or 1 will specify the operands. If not, then location means the render target index. The two features can't be used together. As such, some applications mistakenly try to use <loc = 0, index = 0> and <loc = 1, index = 0> in a shader used for dual color blending with a single render target, rather than the correct <loc = 0, index = 0> and <loc = 0, index = 1>. In particular, Unigine Heaven 4.0 and Valley 1.0 suffer from this bug. Unigine is aware of the problem, and quickly developed a fix, but has not bothered to change the download link on their website to a working copy in over a year. People were still using the broken version and complaining. We tried working around this by disabling dual color blending, but that apparently hurts performance, and people were once again unhappy. On i965, dual source blending is achieved by using different framebuffer write messages than normal rendering. So, we have to compile different code for the two cases. We're not being pedantic: we actually have to know in order to function. Normally, dual source blending is detectable in the shader: if a shader has an output with index = 1, then it's meant for blending, not MRT. With the broken inputs, they're indistinguishable, so we can only tell by looking at the current GL state. This patch implements a new drirc workaround: export dual_color_blend_by_location=true which makes the i965 driver detect when OpenGL state is configured for dual source blending, and recompile the fragment shader to use the right messages. In that case, we allow either location = 1 or index = 1 to specify the second source for the blending equations. It also re-enables GL_ARB_blend_func_extended for Unigine. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92233 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
*	i965: Add support for gl_DrawIDARB and enable extension	Kristian Høgsberg Kristensen	2015-12-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	We have to break open a new vec4 for gl_DrawIDARB. We've used up all space in the vec4 we use for SGVS and gl_DrawIDARB has to come from its own separate vertex buffer anyway. This is because we point the vb for base vertex and base instance into the draw parameter BO for indirect draw calls, but the draw id is generated by mesa in a different buffer. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
*	i965: Add support for gl_BaseVertexARB and gl_BaseInstanceARB	Kristian Høgsberg Kristensen	2015-12-29	1	-0/+2
\| \| \| \| \| \| \|	We already have gl_BaseVertexARB in the .x component of the SGVS vec4 and plug gl_BaseInstanceARB into the last free component (.y). Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
*	i965: Handle mix-and-match TCS/TES with separate shader objects.	Kenneth Graunke	2015-12-22	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GL_ARB_separate_shader_objects allows the application to mix-and-match TCS and TES programs separately. This means that the interface between the two stages isn't known until the final SSO pipeline is in place. This isn't a great match for our hardware: the TCS and TES have to agree on the Patch URB entry layout. Since we store data as per-patch slots followed by per-vertex slots, changing the number of per-patch slots can significantly alter the layout. This can easily happen with SSO. To handle this, we store the [Patch]OutputsWritten and [Patch]InputsRead bitfields in the TCS/TES program keys, introducing program recompiles. brw_upload_programs() decides the layout for both TCS and TES, and passes it to brw_upload_tcs/tes(), which store it in the key. When creating the NIR for a shader specialization, we override nir->info.inputs_read (and friends) to the program key's values. Since everything uses those, no further compiler changes are needed. This also replaces the hack in brw_create_nir(). To avoid recompiles, brw_precompile_tes() looks to see if there's a TCS in the linked shader. If so, it accounts for the TCS outputs, just as brw_upload_programs() would. This eliminates all recompiles in the non-SSO case. In the SSO case, there should only be recompiles when using a TCS and TES that have different input/output interfaces. Fixes Piglit's mix-and-match-tcs-tes test. v2: Pull the brw_upload_programs code into a brw_upload_tess_programs() helper function (requested by Jordan Justen). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965: Create and set a new brw_tcs_prog_data::outputs_written field.	Kenneth Graunke	2015-12-22	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \|	When the application hasn't supplied a TCS, and we have to create one, we need to know what VS outputs to copy to TES inputs. To do this, we create a new program key field, and set it to the TES InputsRead bitfield. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965: Implement gl_PatchVerticesIn by baking it into brw_tcs_prog_key.	Kenneth Graunke	2015-12-22	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The hardware provides us no decent way of getting at the number of input vertices in the patch topology from the tessellation control shader. It's actually very surprising - normally this sort of information would be available in the thread payload. For the precompile, we guess that the number of vertices will be the same for both the input and output patches. This usually seems to be the case. On Gen8+, we could pass in an extra push constant containing this value. We may be able to do that on Haswell too. It's quite a bit trickier on Ivybridge, however. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965: Add tessellation control shaders.	Kenneth Graunke	2015-12-22	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The TCS is the first tessellation shader stage, and the most complicated. It has access to each of the control points in the input patch, and computes a new output patch. There is one logical invocation per output control point; all invocations run in parallel, and can communicate by reading and writing output variables. One of the main responsibilities of the TCS is to write the special gl_TessLevelOuter[] and gl_TessLevelInner[] output variables which control how much new geometry the hardware tessellation engine will produce. Otherwise, it simply writes outputs that are passed along to the TES. We run in SIMD4x2 mode, handling two logical invocations per EU thread. The hardware doesn't properly manage the dispatch mask for us; it always initializes it to 0xFF. We wrap the whole program in an IF..ENDIF block to handle an odd number of invocations, essentially falling back to SIMD4x1 on the last thread. v2: Update comments (requested by Jordan Justen). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965: Add tessellation evaluation shaders	Kenneth Graunke	2015-12-22	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The TES is essentially a post-tessellator VS, which has access to the entire TCS output patch, and a special gl_TessCoord input. Otherwise, they're very straightforward. This patch implements SIMD8 tessellation evaluation shaders for Gen8+. The tessellator can generate a lot of geometry, so operating in SIMD8 mode (8 vertices per thread) is more efficient than SIMD4x2 mode (only 2 vertices per thread). I have another patch which implements SIMD4x2 mode for older hardware (or via an environment variable override). We currently handle all inputs via the pull model. v2: Improve comments (suggested by Jordan Justen). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965: Add tessellation shader VUE map code.	Kenneth Graunke	2015-12-14	1	-2/+18
\| \| \| \| \| \| \|	Based on a patch by Chris Forbes, but largely rewritten by Ken. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965: Move brw_cs_fill_local_id_payload() to libi965_compiler	Kristian Høgsberg Kristensen	2015-12-11	1	-0/+7
\| \| \| \| \| \|	This is a helper function for setting up the local invocation ID payload according to the cs_prog_data generated by the compiler. It's intended to be available to users of libi965_compiler so move it there.
*	i965: Calculate appropriate L3 partition weights for the current pipeline state.	Francisco Jerez	2015-12-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This calculates a rather conservative partitioning of the L3 cache based on the shaders currently bound to the pipeline and whether they use SLM, atomics, images or scratch space. The result is intended to be fine-tuned later on based on other pipeline state. Note that the L3 partitioning calculated for VLV in the non-SLM non-DC case differs from the hardware defaults in that it doesn't include a DC partition and has twice as much RO cache space -- This is an intentional functional change that improves performance in several bandwidth-bound benchmarks on VLV (5% significance): SynMark OglTexFilterAniso by 14.18%, SynMark OglTexFilterTri by 7.15%, Unigine Heaven by 4.91%, SynMark OglShMapPcf by 2.15%, GpuTest Fur by 1.83%, SynMark OglDrvRes by 1.80%, SynMark OglVsTangent by 1.71%, and a few other benchmarks from the Finnish system by less than 1%. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
*	i965: Add backend structures for tess stages	Chris Forbes	2015-12-07	1	-0/+18
\| \| \| \| \| \|	Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Add enums for 3DSTATE_TE field values.	Kenneth Graunke	2015-11-18	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \|	3DSTATE_TE has partitioning, output topology, and domain fields, each of which has several enumerated values. We'll also need to switch on the domain, so enums (rather than #defines) seem like a natural fit. I chose to put these in brw_compiler.h because they'll be stored in struct brw_tes_prog_data, which will live there. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
*	i965: Add missing stdio.h include to brw_compiler.h.	Kenneth Graunke	2015-11-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This is needed for the FILE * type in brw_print_vue_map(). Apparently, all files that include brw_compiler.h already pick this up via some include chain, so this isn't actually a build fix. However, I have patches which introduce new consumers of brw_compiler.h that fail to build because of the missing #include. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965: Convert scalar_* flags to a scalar_stage array.	Kenneth Graunke	2015-11-16	1	-2/+1
\| \| \| \| \| \| \| \| \|	I was going to add scalar_tcs and scalar_tes flags, and then thought better of it and decided to convert this to an array. Simpler. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965: Print input/output VUE maps on INTEL_DEBUG=vs, gs.	Kenneth Graunke	2015-11-13	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	I've been carrying around a patch to do this for the last few months, and it's been exceedingly useful for debugging GS and tessellation problems. I've caught lots of bugs by inspecting the interface expectations of two adjacent stages. It's not that much spam, so I figure we may as well just print it. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Matt Turner <mattst88@gmail.com>
*	i965/fs: Add a sampler program key for whether the texture is 16x MSAA	Neil Roberts	2015-11-05	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When 16x MSAA is used for sampling with texelFetch the compiler needs to use a different instruction which passes more arguments for the MCS data. Previously on skl+ it was unconditionally using this new instruction. However since 16x MSAA is probably going to be pretty rare, it is probably worthwhile to avoid using this instruction for the other sample counts. In order to do that this patch adds a new member to brw_sampler_prog_key_data to track when a sampler refers to a buffer with 16 samples. Note that this isn't done for the vec4 backend because it wouldn't change how many registers it uses. Acked-by: Ben Widawsky <ben@bwidawsk.net>
*	i965: Update stale comment about unused VUE map slots.	Kenneth Graunke	2015-10-28	1	-3/+1
\| \| \| \| \| \| \|	I changed this from COUNT to PAD in commit 268008f98c3810b9f276df985dc93ef. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
*	i965: Make brw_varying_to_offset take a const pointer to the VUE map.	Kenneth Graunke	2015-10-24	1	-2/+2
\| \| \| \| \| \| \|	It doesn't modify it. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Implement ARB_shader_stencil_export (gen9+)	Ben Widawsky	2015-10-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	v2: remove useless source_stencil_to_render_target (Ken) Squash in the actual packing function, which also got to v2: Move the definition of the OPCODE outside of FB_WRITE opcodes (Matt) Reorder the regioning to be in VWH order (Matt) Don't retype src in the backend, just assert instead (Matt) Rename the debug prints to something better (Matt) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>