external_mesa3d.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	i965: remove remaining tabs in brw_draw.c	Timothy Arceri	2016-10-06	1	-13/+13
\| \| \| \|	Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: get inputs read from nir info	Timothy Arceri	2016-10-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	This is a step towards dropping the GLSL IR version of do_set_program_inouts() in i965 and moving towards native nir support. This is important because we want to eventually convert to nir and use its optimisations passes before we can call this GLSL IR pass. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Eliminate brw->vs.prog_data pointer.	Kenneth Graunke	2016-10-05	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \|	Just say no to: - brw->vs.base.prog_data = &brw->vs.prog_data->base.base; We'll just use the brw_stage_prog_data pointer in brw_stage_state and downcast it to brw_vs_prog_data as needed. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
*	i965/rbc: Allocate mcs directly	Topi Pohjolainen	2016-09-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	such as we do for compressed msaa. In case of non-compressed simgle sampled buffers the allocation of mcs is deferred until there is actually a clear operation that needs the mcs. In case of render buffer compression the mcs buffer always needed and there is no real reason to defer the allocation. By doing it directly allows to drop quite a bit unnecessary complexity. Patch leaves brw_predraw_set_aux_buffers() a no-op. Subsequent patches will re-use it and it seemed cleaner to leave it instead of removing and re-introducing. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
*	util: Move _mesa_fsl/util_last_bit into util/bitscan.h	Mathias Fröhlich	2016-08-09	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \|	As requested with the initial creation of util/bitscan.h now move other bitscan related functions into util. v2: Split into two patches. Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Tested-by: Brian Paul <brianp@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>
*	i965: Use bitmask/ffs to iterate used vertex attributes.	Mathias Fröhlich	2016-06-16	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \|	Replaces an iterate and test bit in a bitmask loop by a loop only iterating over the bits set in the bitmask. v2: Use _mesa_bit_scan{,64} instead of open coding. v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
*	i965/draw: Account for BaseInstance in VBO bounds	Jason Ekstrand	2016-05-23	1	-1/+3
\| \| \| \| \|	Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/draw: Stop relying on min_index == -1 for invalid index bounds	Jason Ekstrand	2016-05-23	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \|	The vbo layer passes an index_bounds_valid flag that we should be using instead. This also fixes a bug when min_index == -1 and basevertex != 0 where we were actually comparing min_index + basevertex == -1 which was false and we were getting the wrong buffer-sizing path. Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Set render state for lossless compressed	Topi Pohjolainen	2016-05-12	1	-1/+6
\| \| \| \| \| \| \| \| \|	v2: Add support for blorp and removed the support for meta v3 (Ben): Add assertion on compressed non-fast clear - must be partial clear. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
*	i965: Deferred allocation of mcs for lossless compressed	Topi Pohjolainen	2016-05-12	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Until now mcs was associated to single sampled buffers only for fast clear purposes and it was therefore the responsibility of the clear logic to allocate the aux buffer when needed. Now that normal 3D render or blorp blit may render with mcs enabled also, they need to prepare the mcs just as well. v2: Do not enable for scanout buffers v3 (Ben): - Fix typo in commit message. - Check for gen < 9 and return early in brw_predraw_set_aux_buffers() - Check for gen < 9 and return early in intel_miptree_prepare_mcs() v4: Check for msaa_layput and number of samples to determine if lossless compression is to used. Otherwise one cannot distuingish between fast clear with and without compression. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
*	i965: Reimplement ARB_transform_feedback2 on Haswell and later.	Kenneth Graunke	2016-05-09	1	-8/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	My old implementation accumulated <start, end> pairs in a buffer, and eventually processed that data on the CPU. This meant flushing the batchbuffer and waiting for it to completely execute before we could map it, resulting in really long stalls. We could also run out of space in the buffer, and have to do this early. Instead, we can use Haswell's MI_MATH command to do the (end - start) subtraction, as well as the multiplication by 2 or 3 to convert from the number of primitives written to the number of vertices written. We still need to CS stall to read the counters, but otherwise everything is completely pipelined - there's no CPU<->GPU synchronization required. It also uses only 80 bytes in the buffer, no matter what. Improves performance in Manhattan on Skylake GT3e at 800x600 by 6.1086% +/- 0.954166% (n=9). At 1920x1080, improves performance by 2.82103% +/- 0.148596% (n=84). v2: Fix number of primitives -> number of vertices calculation for GL_TRIANGLES (I was multiplying by 4 instead of 3.) Caught by Jordan Justen. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965: Move get_hw_prim_for_gl_prim to brw_util.c	Jason Ekstrand	2016-04-06	1	-29/+0
\| \| \| \| \| \| \| \|	It's used by brw_compile_gs in brw_vec4_gs_visitor.cpp so it needs to be in a file that's linked into libi965_compiler.la. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Validate textures before altering driver state	Topi Pohjolainen	2016-02-12	1	-9/+9
\| \| \| \| \| \| \| \| \| \|	Validation may kick off copies and subsequently color resolves. Color resolves (and the copies themselves if ending up in meta path) will overwrite the internal driver state but are not prepared to restore it. Instead of adding that capability the validation can be simply performed before the state is updated. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
*	i965: Reemit vertex state between indirect multi draws	Kristian Høgsberg Kristensen	2015-12-29	1	-2/+22
\| \| \| \| \| \| \| \| \| \| \| \|	If we're doing an indirect draw, prims[i].basevertex is always 0 and the real base vertex value is in the indirect parameter buffer. We try to avoid flagging BRW_NEW_VERTICES if prims[i].basevertex doesn't change, which then breaks down for indirect draws. Thus, if a program uses base vertex or base instance, and the draw call is indirect, always flag BRW_NEW_VERTICES. A new piglit test, spec/ARB_shader_draw_parameters/drawid-indirect-vertexid tests this. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
*	i965: Add support for gl_DrawIDARB and enable extension	Kristian Høgsberg Kristensen	2015-12-29	1	-0/+12
\| \| \| \| \| \| \| \| \| \|	We have to break open a new vec4 for gl_DrawIDARB. We've used up all space in the vec4 we use for SGVS and gl_DrawIDARB has to come from its own separate vertex buffer anyway. This is because we point the vb for base vertex and base instance into the draw parameter BO for indirect draw calls, but the draw id is generated by mesa in a different buffer. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
*	i965: Add support for gl_BaseVertexARB and gl_BaseInstanceARB	Kristian Høgsberg Kristensen	2015-12-29	1	-2/+2
\| \| \| \| \| \| \|	We already have gl_BaseVertexARB in the .x component of the SGVS vec4 and plug gl_BaseInstanceARB into the last free component (.y). Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
*	i965: Add state bits for tess stages	Chris Forbes	2015-12-07	1	-2/+5
\| \| \| \| \| \|	Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Add backend structures for tess stages	Chris Forbes	2015-12-07	1	-0/+4
\| \| \| \| \| \|	Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Drop #include of main/glheader.h.	Matt Turner	2015-11-24	1	-1/+0
\| \| \| \| \| \|	It's never used. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
*	i965: Map GL_PATCHES to 3DPRIM_PATCHLIST_n.	Kenneth Graunke	2015-11-11	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Inspired by a patch by Fabian Bieler. Fabian defined a _3DPRIM_PATCHLIST_0 macro (which isn't actually a valid topology type); I instead chose to make a macro that takes an argument. He also took the number of patch vertices from _mesa_prim (which was set to ctx->TessCtrlProgram.patch_vertices) - I chose to use it directly to avoid the need for the VBO patch. v2: Change macro to 0x20 + (n - 1) instead of 0x1F + n to better match the documentation (suggested by Ian). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
*	mesa/i965: Refactor brw_is_front_buffer_{drawing,reading} to common code	Ian Romanick	2015-10-06	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	There are multiple similar implementations of these functions, and a later patch was going to add another. v2: Move removing intel_framebuffer to a different patch. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965: Use C99 initializers for primitive arrays	Ian Romanick	2015-10-06	1	-24/+24
\| \| \| \| \| \| \| \| \|	Using C99 initializers for the primitive arrays makes things more readable. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Suggested-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Fix typos in license	Ian Romanick	2015-09-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	grep -lr 'sub license' \| while read f; do \ sed --in-place -e 's/sub license/sublicense/' $f ;\ done grep -lr 'NON-INFRINGEMENT' \| while read f; do \ sed --in-place -e 's/NON-INFRINGEMENT/NONINFRINGEMENT/' $f ;\ done As noted by Matt, both of these changes match the MIT license text found at http://opensource.org/licenses/MIT. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Matt Turner <mattst88@gmail.com>
*	i965: Remove horizontal bars from file header comments	Ian Romanick	2015-09-10	1	-4/+2
\| \| \| \| \| \| \|	Why was that ever a thing? Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Matt Turner <mattst88@gmail.com>
*	i965: Resolve GCC sign-compare warning.	Rhys Kidd	2015-08-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	mesa/src/mesa/drivers/dri/i965/brw_draw.c: In function 'brw_draw_destroy': mesa/src/mesa/drivers/dri/i965/brw_draw.c:630:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] for (i = 0; i < brw->vb.nr_buffers; i++) { ^ mesa/src/mesa/drivers/dri/i965/brw_draw.c:636:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] for (i = 0; i < brw->vb.nr_enabled; i++) { ^ Signed-off-by: Rhys Kidd <rhyskidd@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
*	i965: Resolve GCC sign-compare warning.	Rhys Kidd	2015-08-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	mesa/src/mesa/drivers/dri/i965/brw_draw.c: In function 'brw_postdraw_set_buffers_need_resolve': mesa/src/mesa/drivers/dri/i965/brw_draw.c:390:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] for (int i = 0; i < fb->_NumColorDrawBuffers; i++) { ^ Signed-off-by: Rhys Kidd <rhyskidd@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
*	vbo: pass the stream from DrawTransformFeedbackStream to drivers	Marek Olšák	2015-08-06	1	-1/+2
\| \| \| \|	Reviewed-by: Dave Airlie <airlied@redhat.com>
*	i965: Trivial formatting changes in brw_draw.c	Ian Romanick	2015-08-03	1	-51/+51
\| \| \| \| \| \|	Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
*	mesa: Rename _mesa_lookup_enum_by_nr() to _mesa_enum_to_string().	Kenneth Graunke	2015-07-20	1	-4/+4
\| \| \| \| \| \| \|	Generated by sed; no manual changes. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Brian Paul <brianp@vmware.com>
*	i965: Move BEGIN_BATCH() into same control flow as ADVANCE_BATCH().	Matt Turner	2015-07-15	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	BEGIN_BATCH() and ADVANCE_BATCH() will contain "do {" and "} while (0)" respectively to allow declaring local variables used by intervening OUT_BATCH macros. As such, BEGIN_BATCH() and ADVANCE_BATCH() need to be in the same control flow. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
*	i965: Rename intel_emit* to reflect their new location in brw_pipe_control	Chris Wilson	2015-06-24	1	-2/+2
\| \| \| \| \|	Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Assert that the GL primitive isn't out of range.	Matt Turner	2015-06-23	1	-1/+3
\| \| \| \| \| \| \| \|	Coverity sees the if (mode >= BRW_PRIM_OFFSET (128)) test and assumes that the else-branch might execute for mode to up 127, which out be out of bounds. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965: Use predicate enable bit for conditional rendering w/o stalling	Neil Roberts	2015-05-12	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously whenever a primitive is drawn the driver would call _mesa_check_conditional_render which blocks waiting for the result of the query to determine whether to render. On Gen7+ there is a bit in the 3DPRIMITIVE command which can be used to disable the primitive based on the value of a state bit. This state bit can be set based on whether two registers have different values using the MI_PREDICATE command. We can load these two registers with the pixel count values stored in the query begin and end to implement conditional rendering without stalling. Unfortunately these two source registers were not in the whitelist of available registers in the kernel driver until v3.19. This patch uses the command parser version from intel_screen to detect whether to attempt to set the predicate data registers. The predicate enable bit is currently only used for drawing 3D primitives. For blits, clears, bitmaps, copypixels and drawpixels it still causes a stall. For most of these it would probably just work to call the new brw_check_conditional_render function instead of _mesa_check_conditional_render because they already work in terms of rendering primitives. However it's a bit trickier for blits because it can use the BLT ring or the blorp codepath. I think these operations are less useful for conditional rendering than rendering primitives so it might be best to leave it for a later patch. v2: Use the command parser version to detect whether we can write to the predicate data registers instead of trying to execute a register load command. v3: Simple rebase v4: Changes suggested by Kenneth Graunke: Split the load_64bit_register function out to a separate patch so it can be a shared public function. Avoid calling _mesa_check_conditional_render if we've already determined that there's no query object. Some styling fixes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/state: Don't use brw->state.dirty.brw	Jordan Justen	2015-03-31	1	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now, we only use ctx->NewDriverState. I used this bash & sed command in the i965 directory: for file in .[ch] .[ch]pp; do sed -i -e 's/state\.dirty\.brw/ctx.NewDriverState/g' $file done Followed by manual changes to brw_state_upload.c. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/state: Rename brw_clear_dirty_bits to brw_render_state_finished	Jordan Justen	2015-03-31	1	-1/+1
\| \| \| \| \| \|	Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/state: Rename brw_upload_state to brw_upload_render_state	Jordan Justen	2015-03-31	1	-3/+4
\| \| \| \| \| \|	Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Do Sandybridge workaround flushes before each primitive.	Kenneth Graunke	2015-02-17	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sandybridge requires the post-sync non-zero workaround in a ton of places, and if you ever miss one, the GPU usually hangs. Currently, we try to track exactly when a workaround flush is necessary (via the brw->batch.need_workaround_flush flag). This is tricky to get right, and we've botched it several times in the past. This patch unconditionally performs the post-sync non-zero flush at the start of each primitive's state upload (including BLORP). We drop the needs_workaround_flush flag, and drop all the other callers, as the flush has already been performed. We have no data to indicate that simply flushing all the time will hurt performance, and it has the potential to help stability. v2: Add post-sync workaround to initial GPU state upload to be extra cautious (suggested by Chad Versace). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
*	i965: Fix start/base_vertex_location for >1 prims but !BRW_NEW_VERTICES.	Kenneth Graunke	2014-12-31	1	-5/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a partial revert of c89306983c07e5a88c0d636267e5ccf263cb4213. It split the {start,base}_vertex_location handling into several steps: 1. Set brw->draw.start_vertex_location = prim[i].start and brw->draw.base_vertex_location = prim[i].basevertex. (This happened once per _mesa_prim, in the main drawing loop.) 2. Add brw->vb.start_vertex_bias and brw->ib.start_vertex_offset appropriately. (This happened in brw_prepare_shader_draw_parameters, which was called just after brw_prepare_vertices, as part of state upload, and only happened when BRW_NEW_VERTICES was flagged.) 3. Use those values when emitting 3DPRIMITIVE (once per _mesa_prim). If we drew multiple _mesa_prims, but didn't flag BRW_NEW_VERTICES on the second (or later) primitives, we would do step #1, but not #2. The first _mesa_prim would get correct values, but subsequent ones would only get the first half of the summation. The reason I originally did this was because I needed the value of gl_BaseVertexARB to exist in a buffer object prior to uploading 3DSTATE_VERTEX_BUFFERS. I believed I wanted to upload the value of 3DPRIMITIVE's "Base Vertex Location" field, which was computed as: (prims[i].indexed ? prims[i].start : prims[i].basevertex) + brw->vb.start_vertex_bias. The latter value wasn't available until after brw_prepare_vertices, and the former weren't available in the state upload code at all. Hence the awkward split. However, I believe that including brw->vb.start_vertex_bias was a mistake. It's an extra bias we apply when uploading vertex data into VBOs, to move [min_index, max_index] to [0, max_index - min_index]. >From the GL_ARB_shader_draw_parameters specification: "<gl_BaseVertexARB> holds the integer value passed to the <baseVertex> parameter to the command that resulted in the current shader invocation. In the case where the command has no <baseVertex> parameter, the value of <gl_BaseVertexARB> is zero." I conclude that gl_BaseVertexARB should only include the baseVertex parameter from glDrawElements, not any internal biases we add for optimization purposes. With that in mind, gl_BaseVertexARB only needs prim[i].start or prim[i].basevertex. We can simply store that, and go back to computing start_vertex_location and base_vertex_location in brw_emit_prim(), like we used to. This is much simpler, and should actually fix two bugs. Fixes missing geometry in Unvanquished. Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85529 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
*	i965: Use WARN_ONCE for the single-primitive-exceeded-aperture message.	Kenneth Graunke	2014-12-31	1	-9/+4
\| \| \| \| \| \| \|	This makes it show up via ARB_debug_output and is also less code. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
*	i965: Compute VS attribute WA bits earlier and check if they changed.	Kenneth Graunke	2014-12-04	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	BRW_NEW_VERTICES is flagged every time we draw a primitive. Having the brw_vs_prog atom depend on BRW_NEW_VERTICES meant that we had to compute the VS program key and do a program cache lookup for every single primitive. This is painfully expensive. The workaround bit computation is almost entirely based on the vertex attribute arrays (brw->vb.inputs[i]), which are set by brw_merge_inputs. The only thing it uses the VS program for is to see which VS inputs are actually read. brw_merge_inputs() happens once per primitive, and can safely look at the currently bound vertex program, as it doesn't change in the middle of a draw. This patch moves the workaround bit computation to brw_merge_inputs(), right after assigning brw->vb.inputs[i], and stores the previous WA bit values in the context. If they've actually changed from the last draw (which is uncommon), we signal that we need a new vertex program, causing brw_vs_prog to compute a new key. Improves performance in Gl32Batch7 by 13.6123% +/- 0.739652% (n=166) on Haswell GT3e. I'm told Baytrail shows similar gains. v2: Introduce a new BRW_NEW_VS_ATTRIB_WORKAROUNDS dirty bit, rather than reusing BRW_NEW_VERTEX_PROGRAM (suggested by Chris Forbes). This prevents unnecessary re-emission of surface/sampler related atoms (and an SOL atom on Sandybridge). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
*	i965: Just return void from brw_try_draw_prims	Ian Romanick	2014-12-02	1	-5/+2
\| \| \| \| \| \| \| \| \| \|	Note from Ken: "We used to use the return value to indicate whether software fallbacks were necessary, but we haven't in years." Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Fix reference counting in new basevertex upload code.	Kenneth Graunke	2014-09-12	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the non-indirect draw case, we call intel_upload_data to upload gl_BaseVertex. It makes brw->draw.draw_params_bo point to the upload buffer, and increments the upload BO reference count. So, we need to unreference it when making brw->draw.draw_params_bo point at something else, or else we'll retain a reference to stale upload buffers and hold on to them forever. This also means that the indirect case should increment the reference count on the indirect draw buffer when making brw->draw.draw_params_bo point at it. That way, both paths increment the reference count, so we can safely unreference it every time. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "10.3" <mesa-stable@lists.freedesktop.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Tested-by: Ian Romanick <ian.d.romanick@intel.com>
*	i965: Make gl_BaseVertex available in a buffer object.	Kenneth Graunke	2014-09-10	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \|	This will be used for GL_ARB_shader_draw_parameters, as well as fixing gl_VertexID, which is supposed to include gl_BaseVertex's value. For indirect draws, we simply point at the indirect buffer; for normal draws, we upload the value via the upload buffer. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
*	i965: Calculate start/base_vertex_location after preparing vertices.	Kenneth Graunke	2014-09-10	1	-9/+8
\| \| \| \| \|	Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
*	Revert 5 i965 patches: 8e27a4d2, 373143ed, c5bdf9be, 6f56e142, 88e3d404	Jordan Justen	2014-09-04	1	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reverts * "i965: Modify state upload to allow 2 different sets of state atoms." 8e27a4d2b3e4e74e9a77446bce49607433d86be3 * "i965: Modify dirty bit handling to support 2 pipelines." 373143ed9187c4d4ce1e3c486b5dd0880d18ec8b * "i965: Create a macro for checking a dirty bit." c5bdf9be1eca190417998d548fd140c1eca37a54 Conflicts: src/mesa/drivers/dri/i965/brw_context.h * "i965: Create a macro for setting all dirty bits." 6f56e1424d923fd80c84090fbf4506c9eaaffea1 Conflicts: src/mesa/drivers/dri/i965/brw_blorp.cpp src/mesa/drivers/dri/i965/brw_state_cache.c src/mesa/drivers/dri/i965/brw_state_upload.c * "i965: Create a macro for setting a dirty bit." 88e3d404dad009d8cff5124cf8acee7daeaceb64 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965: Modify dirty bit handling to support 2 pipelines.	Paul Berry	2014-09-01	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The hardware state for compute shaders is almost entirely orthogonal to the hardware state for 3D rendering. To avoid sending unnecessary state to the hardware, we'll need to have a separate set of state atoms for the compute pipeline and the 3D pipeline. That means we need to maintain two separate sets of dirty bits to determine which state atoms need to be run. But the dirty bits are not completely independent; for example, if BRW_NEW_SURFACES is flagged while doing 3D rendering, then not only do we need to re-run 3D state atoms that depend on BRW_NEW_SURFACES, but we also need to re-run compute state atoms that depend on BRW_NEW_SURFACES. But we'll also need to re-run those state atoms the next time the compute pipeline is run. To accomplish this, we record two sets of dirty bits, one for each pipeline. When bits are dirtied (via SET_DIRTY_BIT() or SET_DIRTY_ALL()) we set them to the dirty state in both pipelines. When brw_state_upload() is run, we clear the dirty bits just for the pipeline that was run. Note that since the number of pipelines is known at compile time to be 2, the compiler should unroll the loops in SET_DIRTY_BIT() and SET_DIRTY_ALL(). Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965: Create a macro for setting a dirty bit.	Paul Berry	2014-09-01	1	-6/+6
\| \| \| \| \| \| \|	This will make it easier to extend dirty bit handling to support compute shaders. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965: Move pre-draw resolve buffers to dd::UpdateState	Kristian Høgsberg	2014-08-15	1	-40/+0
\| \| \| \| \| \| \| \| \| \| \|	No functional change except for glBegin/glEnd style rendering, where we now do the resolves at glBegin time instead of FLUSH_VERTICES time. This is also the reason for this change, so that when we later switch fast clear resolve to use meta, we won't be doing meta operations in the middle of a begin/end sequence. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Add a mechanism for sending native primitives into the driver	Kristian Høgsberg	2014-08-15	1	-3/+12
\| \| \| \| \| \| \| \| \| \| \|	The brw_draw_prims() function is the draw entry point into the driver, and takes struct _mesa_prim for input. We want to be able to feed native primitives into the driver, and to that end we introduce BRW_PRIM_OFFSET, which lets use describe geometry using the native GEN primitive types. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Emit a performance warning on conditional rendering.	Kenneth Graunke	2014-08-08	1	-0/+5
\| \| \| \| \| \| \| \| \|	We have a CPU-side implementation of conditional rendering; it really should be done on the GPU. It's not necessarily that hard, but nobody has gotten to fixing it yet. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>