external_mesa3d.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	i965: get outputs read from nir info	Timothy Arceri	2016-10-06	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	This is a step towards dropping the GLSL IR version of do_set_program_inouts() in i965 and moving towards native nir support. This is important because we want to eventually convert to nir and use its optimisations passes before we can call this GLSL IR pass. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/sync: Replace 'intel' prefix with 'brw'	Chad Versace	2016-10-05	1	-1/+1
\| \| \| \| \| \| \|	This is yet another patch for the great renaming begun long ago. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/gen8+: Enable GL_OES_viewport_array	Anuj Phogat	2016-10-04	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch causes 2 regressions in khronos' gles cts tests on various intel platforms. Failing tests: ES3-CTS.functional.state_query.integers.viewport_getinteger ES3-CTS.functional.state_query.integers.viewport_getfloat Here is an explanation of what's causing the failures: CTS tests are not clamping the x, y location of the viewport's bottom-left corner as recommended by ARB_viewport_array and OES_viewport_array: "The location of the viewport's bottom-left corner, given by (x,y), are clamped to be within the implementation-dependent viewport bounds range. The viewport bounds range [min, max] tuple may be determined by calling GetFloatv with the symbolic constant VIEWPORT_BOUNDS_RANGE_OES" Khronos CTS merge request to fix the test case: https://gitlab.khronos.org/opengl/cts/merge_requests/399 V2: Initialize the relevant variables for GL_OES_viewport_array on gen8+ Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
*	i965: Only emit 1 viewport when possible.	Kenneth Graunke	2016-10-03	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In core profile, we support up to 16 viewports. However, in the majority of cases, only 1 of them is actually used - we only need the others if the last shader stage prior to the rasterizer writes gl_ViewportIndex. Processing all 16 viewports adds additional CPU overhead, which hurts CPU-intensive workloads such as Glamor. This meant that switching to core profile actually penalized Glamor to an extent, which is unfortunate. This patch tracks the number of relevant viewports, switching between 1 and ctx->Const.MaxViewports if gl_ViewportIndex is written. A new BRW_NEW_VIEWPORT_COUNT flag tracks this. This could mean re-emitting viewport state when switching, but hopefully this is offset by doing 1/16th of the work in the common case. The new flag is also lighter weight than BRW_NEW_VUE_MAP_GEOM_OUT, which we were using in one case. According to Eric Anholt, x11perf -copypixwin10 performance improves by 11.5094% +/- 3.10841% (n=10) on his Skylake. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
*	i965: get rid of duplicated values from gen_device_info	Lionel Landwerlin	2016-09-23	1	-17/+5
\| \| \| \| \| \| \| \|	Now that we have gen_device_info mutable, we can update its values and drop all copies we had in brw_context. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	intel/i965: make gen_device_info mutable	Lionel Landwerlin	2016-09-23	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Make gen_device_info a mutable structure so we can update the fields that can be refined by querying the kernel (like subslices and EU numbers). This patch does not make any functional change, it just makes gen_get_device_info() fill a structure rather than returning a const pointer. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Rename intelScreen to screen.	Kenneth Graunke	2016-09-20	1	-22/+22
\| \| \| \| \| \| \| \|	"intelScreen" is wordy and also doesn't fit our style guidelines. "screen" is shorter, which is nice, because we use it fairly often. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
*	i965: Rename __DRIScreen pointers to "dri_screen".	Kenneth Graunke	2016-09-20	1	-21/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I want to use "screen" as the variable name for a struct intel_screen pointer. This means that we can't use it for __DRIscreen pointers. Sometimes we called it "screen", sometimes "sPriv", sometimes "driScrnPriv", and sometimes "psp" (Pointer to Screen Private?). The last one is particularly confusing because we use "psp" to refer to the Gen4 PIPELINED_STATE_POINTERS packet as well. Let's be consistent. "dri_screen" is clear, and it's not used often enough that I'm worried about the verbosity. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
*	Revert "i965: Drop the maximum 3D texture size to 512 on Sandy Bridge"	Jason Ekstrand	2016-09-12	1	-10/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 6ba88bce64b343761aabe3a6c7ee285c6020a959. The commit was erroneous because GL has a separate limit, GL_MAX_FRAMEBUFFER_LAYERS which guards the number of layers you are allowed to render into. The GL 4.5 spec says: "The framebuffer attachment point attachment is said to be framebuffer attachment complete if [...] all of the following conditions are true: [...] If image is a three-dimensional, one- or two-dimensional array, or cube map array texture and the attachment is layered, the depth or layer count of the texture is less than or equal to the value of the implementation-dependent limit MAX_FRAMEBUFFER_LAYERS." and goes on to say that "framebuffer complete" requires all attachments to be "framebuffer attachment complete". On Sandy Bridge, we set GL_MAX_FRAMEBUFFER_LAYERS to 512 so creating a 3D texture bigger than 512 is fine; you just can't render into all of the slices at once. Fixes ES3-CTS.gtf.GL3Tests.npot_textures.npot_tex_image on Sandy Bridge Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chad Versace <chadversary@chromium.org>
*	i965/rbc: Clarify rational given for shader image resolves	Topi Pohjolainen	2016-09-12	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Original commit added documentation explaining lossless compression case: commit 56f29911ec9da25c78fbd3d4945d499e65ca4b5a Author: Topi Pohjolainen <topi.pohjolainen@intel.com> Date: Tue Feb 2 10:00:41 2016 +0200 i965: Add a flag telling color resolve pass to ignore CCS_E It, however, easily gives the impression that the sole purpose of the intel_miptree_resolve_color() is to address lossless compression. Original intention is to document the lack of INTEL_MIPTREE_IGNORE_CCS_E flag given for the resolve call. This patch fixes this along with a typo found spotted further down. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
*	i965: Track non-compressible sampling of renderbuffers	Topi Pohjolainen	2016-09-12	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	v3: - Actually set the flags when needed instead of falsely overwriting them (Jason). - Use more generic name for flag (dropped RENDERBUFFER) - Consult also shader images v4: - Consult only lossless compressd shader images v5: - Check the existence of renderbuffer before considering if it matches the given miptree Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
*	intel: Rename brw_get_device_name/info to gen_get_device_name/info	Jason Ekstrand	2016-09-03	1	-1/+1
\| \| \| \| \|	Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	intel: s/brw_device_info/gen_device_info/	Jason Ekstrand	2016-09-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Generated by: sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/*/.c sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/*/.h sed -i -e 's/brw_device_info/gen_device_info/g' */i965/.c sed -i -e 's/brw_device_info/gen_device_info/g' */i965/.cpp sed -i -e 's/brw_device_info/gen_device_info/g' */i965/.h Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965: enable OES_primitive_bounding_box with the no-op implementation	Ilia Mirkin	2016-08-30	1	-0/+3
\| \| \| \| \|	Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965/blorp: Add a blorp_context struct and init/finish funcs	Jason Ekstrand	2016-08-29	1	-0/+7
\| \| \| \| \|	Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
*	i965/gen7: Copy stencil when sampling the stencil texture	Jordan Justen	2016-08-26	1	-0/+5
\| \| \| \| \|	Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
*	i965/hsw: Don't advertise more than 64 threads for compute shaders	Jordan Justen	2016-08-26	1	-14/+25
\| \| \| \| \| \| \| \| \| \| \|	thread_width_max in the GPGPU walker command limits us to a maximum of 64 threads. This fixes a crash on Haswell in the OpenGLES 3.1 conformance test suite which tests the advertised limits of the max invocation counts. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
*	i965: Embrace "unlimited" GTT mmap support	Chris Wilson	2016-08-26	1	-15/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From about kernel 4.9, GTT mmaps are virtually unlimited. A new parameter, I915_PARAM_MMAP_GTT_VERSION, is added to advertise the feature so query it and use it to avoid limiting tiled allocations to only fit within the mappable aperture. A couple of caveats: - fence support is still limited by stride to 262144 and the stride needs to be a multiple of tile_width (as before, and same limitation as the current 3D pipeline in hardware) - the max_gtt_map_object_size forcing untiled may be hiding a few bugs in handling of large objects, though none were spotted in piglits. See kernel commit 4cc6907501ed ("drm/i915: Add I915_PARAM_MMAP_GTT_VERSION to advertise unlimited mmaps"). v2: Include some commentary on mmap virtual space vs CPU addressable space. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
*	i965: Resolve color for non-coherent FB fetch at UpdateState time.	Francisco Jerez	2016-08-25	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \|	This is required because the sampler unit used to fetch from the framebuffer is unable to interpret non-color-compressed fast-cleared single-sample texture data. Roughly the same limitation applies for surfaces bound to texture or image units, but unlike texture sampling, non-coherent framebuffer fetch is by definition non-coherent with previous rendering, so the brw_render_cache_set_check_flush() call can be omitted except after resolve. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Add an isl_device to the brw_context	Jason Ekstrand	2016-07-15	1	-0/+2
\| \| \| \| \|	Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Chad Versace <chad.versace@intel.com>
*	glsl: add driconf to zero-init unintialized vars	Rob Clark	2016-07-02	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Some games are sloppy.. perhaps because it is defined behavior for DX or perhaps because nv blob driver defaults things to zero. So add driconf param to force uninitialized variables to default to zero. This issue was observed with rust, from steam store. But has surfaced elsewhere in the past. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	glsl/mesa: split gl_shader in two	Timothy Arceri	2016-06-30	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are two distinctly different uses of this struct. The first is to store GL shader objects. The second is to store information about a shader stage thats been linked. The two uses actually share few fields and there is clearly confusion about their use. For example the linked shaders map one to one with a program so can simply be destroyed along with the program. However previously we were calling reference counting on the linked shaders. We were also creating linked shaders with a name even though it is always 0 and called the driver version of the _mesa_new_shader() function unnecessarily for GL shader objects. Acked-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965: Drop the maximum 3D texture size to 512 on Sandy Bridge	Jason Ekstrand	2016-06-22	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The RenderTargetViewExtent field of RENDER_SURFACE_STATE is supposed to be set to the depth of a 3-D texture when rendering. Unfortunatley, that field is only 9 bits on Sandy Bridge and prior so we can't actually bind a 3-D texturing for rendering if it has depth > 512. On Ivy Bridge, this field was bumpped to 11 bits so we can go all the way up to 2048. On Iron Lake and prior, we don't support layered rendering and we use OffsetX/Y hacks to render to particular layers so 2048 is ok there too. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
*	i965: Use a uniform for gl_PatchVerticesIn in the TCS on Gen8+.	Kenneth Graunke	2016-06-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We still need to recompile the passthrough shader when this value changes, as it also affects the output vertex count. But otherwise, we can eliminate recompiles on Gen8+. We probably want to do this for Gen7 as well, but that requires rewriting the input release code to use a loop, which is a trade-off I'd need to consider in more detail. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Cc: mesa-stable@lists.freedesktop.org
*	i965: Use a uniform for gl_PatchVerticesIn in the TES.	Kenneth Graunke	2016-06-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes three GL44-CTS.tessellation_shader subtests: - max_patch_vertices - single.max_patch_vertices - tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn These use gl_PatchVerticesIn in the TES, but don't link against a TCS (which would allow the linker to lower it to a constant). We had no handling for the system value in the backend, so it would just assert fail. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Cc: mesa-stable@lists.freedesktop.org
*	i965: Check return value of screen->image.loader->getBuffers (v2)	Tomasz Figa	2016-06-14	1	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The images struct is an uninitialized local variable on the stack. If the callback returns 0, the struct might not have been updated and so should be considered uninitialized. Currently the code ignores the return value, which (depending on stack contents) might end up in reading a non-zero value from images.image_mask and dereferencing further fields. Another solution would be to initialize image_mask with 0, but checking the return value seems more sensible and it is what Gallium is doing. v2: fix typos in commit message, fix indentation, remove unnecessary parentheses and pointer dereference to keep line length reasonable. Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org> Signed-off-by: Tomasz Figa <tfiga@chromium.org> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
*	i965/compiler: Bring back the INTEL_PRECISE_TRIG environment variable	Jason Ekstrand	2016-06-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	This was removed in d9546b0c5d and replced with the precise_trig driconf option. However, we still need precise trig in the Vulkan driver so this commit brings back the environment variable and compiler->precise_trig is effectively the logical OR of the two. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96484 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "12.0" <mesa-stable@lists.freedesktop.org>
*	i965: Don't leak scratch BOs for TCS/TES.	Kenneth Graunke	2016-06-13	1	-0/+4
\| \| \| \| \| \| \|	These need to be freed too. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Integrate precise trig into configuration infrastructure	Gurchetan Singh	2016-06-07	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	With this change, to enable precise SIN and COS instructions on Intel hardware, one can put <option name="precise_trig" value="true"/> in the proper drirc file. V2: Make option name more generic Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Stephane Marchesin <stephane.marchesin@gmail.com>
*	i965: Enable cross-thread constants and compact local IDs for hsw+	Jordan Justen	2016-06-01	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The cross thread constant support appears on Haswell. It allows us to upload a set of uniform data for all threads without duplicating it per thread. One complication is that cross-thread constants are loaded into registers before per-thread constants. Previously, our local IDs were loaded before the uniform data and treated as 'payload' data, even though they were actually pushed into the registers like the other uniform data. Therefore, in this patch we simultaneously enable a newer layout where each thread now uses a single uniform slot for a unique local ID for the thread. This uniform is handled specially to make sure it is added last into the uniform push constant registers. This minimizes our usage of push constant registers, and maximizes our ability to use cross-thread constants for registers. To swap from the old to the new layout, we also need to flip some lowering pass switches to let our driver handle the lowering instead. We also no longer force thread_local_id_index to -1. v4: * Minimize size of patch that switches from the old local ID layout to the new layout (Jason) Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
*	glsl: Add glsl LowerCsDerivedVariables option	Jordan Justen	2016-06-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	v2: * Move lower flag to context constants. (Ken) Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
*	i965/gen9: Configure rbc buffers as plain for non-rbc tex views	Topi Pohjolainen	2016-06-01	1	-2/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes rendering in Shadow of Mordor with rbc. Application writes RGBA_UNORM texture filling it with values the application wants to later on treat as SRGB_ALPHA. Intel driver enables lossless compression for the buffer by the time of writing. However, the driver fails to make sure the buffer can be sampled as something else later on and unfortunately there is restriction in the hardware for using lossless compression for srgb formats which looks to extend itself to the sampling engine also. Requesting srgb to linear conversion on top of compressed buffer results the color values to be pretty much garbage. Fortunately none of tracked benchmarks showed a regression with this. v2 (Matt): Add missing space Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Update compute workgroup size limit calculation for SIMD32.	Francisco Jerez	2016-05-27	1	-11/+3
\| \| \| \| \| \| \| \| \| \| \| \|	This should have the side effect of enabling the ARB_compute_shader extension on Gen8+ hardware and all Gen7 platforms that didn't previously expose it (VLV and IVB GT1) due to the number of hardware threads per subslice being insufficient in SIMD16 mode. v2: Bump workgroup size limit for GLES too (Jordan). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	mesa: Implement glGet*(GL_PRIMITIVE_RESTART_FOR_PATCHES_SUPPORTED).	Kenneth Graunke	2016-05-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Technically, this was introduced with GL 4.4. However, I believe it was intended to be retroactive. As far as I know, AMD has never supported primitive restart with patches, while NVidia and Intel do. This necessitated the need for a query which would allow applications to figure out whether this was usable or not. I decided to expose it everywhere ARB_tessellation_shader is exposed. (It's also in both OES and EXT_tessellation_shader.) Enable this for i965 and Gallium drivers which expose the capability. v2: Fix a bug in the state_tracker code (caught by Ilia Mirkin). Bugzilla: https://cvs.khronos.org/bugzilla/show_bug.cgi?id=10364 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
*	i965: Use blorp for all clears	Jason Ekstrand	2016-05-14	1	-1/+0
\| \| \| \| \| \| \|	We used to use a meta path on gen8 but we haven't since c7cf17ae758. We might as well delete the meta path since blorp works on all gens. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
*	i965: Reimplement ARB_transform_feedback2 on Haswell and later.	Kenneth Graunke	2016-05-09	1	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	My old implementation accumulated <start, end> pairs in a buffer, and eventually processed that data on the CPU. This meant flushing the batchbuffer and waiting for it to completely execute before we could map it, resulting in really long stalls. We could also run out of space in the buffer, and have to do this early. Instead, we can use Haswell's MI_MATH command to do the (end - start) subtraction, as well as the multiplication by 2 or 3 to convert from the number of primitives written to the number of vertices written. We still need to CS stall to read the counters, but otherwise everything is completely pipelined - there's no CPU<->GPU synchronization required. It also uses only 80 bytes in the buffer, no matter what. Improves performance in Manhattan on Skylake GT3e at 800x600 by 6.1086% +/- 0.954166% (n=9). At 1920x1080, improves performance by 2.82103% +/- 0.148596% (n=84). v2: Fix number of primitives -> number of vertices calculation for GL_TRIANGLES (I was multiplying by 4 instead of 3.) Caught by Jordan Justen. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965: Implement ARB_query_buffer_object for HSW+	Jordan Justen	2016-05-04	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	v2: * Declare loop index variable at loop site (idr) * Make arrays of MI_MATH instructions 'static const' (idr) * Remove commented debug code (idr) * Updated comment in set_query_availability (Ken) * Replace switch with if/else in hsw_result_to_gpr0 (Ken) * Only divide GL_FRAGMENT_SHADER_INVOCATIONS_ARB by 4 on hsw and gen8 (Ken) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
*	dri/i965: extend GLES3 sRGB workaround to cover all formats	Haixia Shi	2016-04-12	1	-4/+3
\| \| \| \| \| \| \| \| \| \|	It is incorrect to assume BGRA byte order for the GLES3 sRGB workaround. v2: use _mesa_get_srgb_format_linear to handle all formats Signed-off-by: Haixia Shi <hshi@chromium.org> Reviewed-by: Stéphane Marchesin <marcheu@chromium.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/chv: Display proper branding	Ben Widawsky	2016-03-11	1	-3/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"Braswell" is a Cherryview based thing. It unfortunately requires extra information to determine its marketing name. Unlike all previous products, and hopefully all future ones, there is no unique 1:1 mapping of PCI device ID to brand string. I put up a fight about adding any complexity to our GL renderer string code for a very long time. However, a wise man made a comment to me that I couldn't argue with: if a user installs Windows on their hardware, the brand string should be the same as what we display in Linux. The Windows driver apparently does this check, so we should too. Note that I did manage to find a good use for this info anyway in the compute shader thread counts. v2: memcpy instead of strncpy, and some minor changes (Matt) Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com
*	i965/chv: Check that compute threads are above threshold	Ben Widawsky	2016-03-11	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	The way we are organizing this code, the statically configured max_cs_threads should always be the minimum value we actually support (ie. are aware of). As a result, we can fall back to that if we get invalid numbers from the kernel (ie. when the query succeeds, but the result is lower than expected). I was originally planning to use an assert, but there is no reason to be so mean. Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com
*	i965/chv: Use kernel provided info for max_cs_threads	Ben Widawsky	2016-03-11	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	With the previous patches, the code can find out the actual number of available compute threads. It is enabled only for Cherryview since that is the only platform I know for a fact has shipped devices which can benefit from this. It seems like other platforms /might/ benefit from this because of fused configurations which /might/ have shipped. Fallback code is still there. v2: Some minor adjustments from Matt Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com
*	i965: Set MaxFramebufferWidth/Height to 16384, not viewport.	Kenneth Graunke	2016-03-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dEQP-GLES31.functional.fbo.no_attachments.maximums.{all,height,size,width} started hitting assertion failures when emitting SURFACE_STATE, after commit e8fd60e7891c7 where Samuel increased the maximum viewport size to 32768, from 16384. MaxFramebufferWidth/Height were being set to the maximum viewport size, but are actually limited by the SURFACE_STATE width/height field range, which is 16384 on Gen7+ (where ARB_framebuffer_no_attachments is exposed). So, reduce these to 16384 explicitly. Fixes assert fails in the above mentioned dEQP tests. (Those tests still fail, however.) Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
*	i965/formatquery: Respond queries SAMPLES and NUM_SAMPLE_COUNTS	Eduardo Lima Mitev	2016-03-03	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	This effectively disables old QuerySamplesForFormat driver hook, since it is never called by Mesa anymore. v2: Call brw_query_samples_for_format() with a dummy buffer to calculate num samples, to avoid modifying the original buffer. Reviewed-by: Dave Airlie <airlied@redhat.com>
*	i965: Move brw_query_samples_for_format() to brw_queryformat.c	Eduardo Lima Mitev	2016-03-03	1	-38/+0
\| \| \| \| \| \| \|	Now that there is a dedicated source file for internal format queries, this function belongs there. Reviewed-by: Dave Airlie <airlied@redhat.com>
*	i965: Add boilerplate function for QueryInternalFormat driver hook	Eduardo Lima Mitev	2016-03-03	1	-0/+1
\| \| \| \| \| \| \|	By default, we call back the driver's hook fallback function that has generic implementations for the all the queries. Reviewed-by: Dave Airlie <airlied@redhat.com>
*	i965: set ctx->Const.MaxViewport{Width,Height} to 32k	Samuel Iglesias Gonsálvez	2016-03-02	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From ARB_viewport_array spec: " * On GL3-capable hardware the VIEWPORT_BOUNDS_RANGE should be at least [-16384, 16383]. * On GL4-capable hardware the VIEWPORT_BOUNDS_RANGE should be at least [-32768, 32767]." This range is set using ctx->Const.MaxViewportWidth value, so just bump those constants to 32k for gen7+ which can support OpenGL 4.0. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Set compute shader shared memory max to 64k	Jordan Justen	2016-02-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	See Ivy Bridge PRM, Volume 2, Part 2, 1.8.4 INTERFACE_DESCRIPTOR_DATA: DWORD 5, bits 20:16: "This field indicates how much shared local memory the thread group requires. The amount is specified in 4k blocks, but only powers of 2 are allowed: 0, 4k, 8k, 16k, 32k and 64k per half-slice." For Haswell, see Volume 2d, INTERFACE_DESCRIPTOR_DATA: DWORD 5, bits 20:16: With text identical to the Ivy Bridge PRM. For Broadwell, see Volume 2d, INTERFACE_DESCRIPTOR_DATA: DWORD 6, bits 20:16: With text identical to the Ivy Bridge PRM. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
*	i965: Add a few assertions on lossless compression	Topi Pohjolainen	2016-02-16	1	-0/+4
\| \| \| \| \| \| \| \| \|	v2 (Ben): Use combination of msaa_layout and number of samples instead of introducing explicit type for lossless compression (intel_miptree_is_lossless_compressed()). Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
*	i965: Add a flag telling color resolve pass to ignore CCS_E	Topi Pohjolainen	2016-02-16	1	-1/+11
\| \| \| \| \| \| \| \| \|	v2 (Ben): Use combination of msaa_layout and number of samples instead of introducing explicit type for lossless compression (intel_miptree_is_lossless_compressed()). Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
*	i965: fix MAX_COMPUTE_SHARED_SIZE constant value	Samuel Pitoiset	2016-02-13	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	MAX_COMPUTE_SHARED_SIZE should be set to 32768. This fixes a regression introduced in be27f77 (mesa: do not use a constant for MAX_COMPUTE_SHARED_SIZE). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94139 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>