summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/gen7_gs_state.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Create a shader_dispatch_mode enum to replace VS/GS fields.Kenneth Graunke2015-06-011-1/+1
| | | | | | | | | | | | | We used to store the GS dispatch mode in brw_gs_prog_data while separately storing the VS dispatch mode in brw_vue_prog_data::simd8. This patch introduces an enum to represent all possible dispatch modes, and stores it in brw_vue_prog_data::dispatch_mode, unifying the two. Based on a suggestion by Matt Turner. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
* i965: Rename brw_vec4_prog_data/key to brw_bue_prog_data/keyKristian Høgsberg2014-12-101-1/+1
| | | | | | | | | | | These structs aren't vec4 specific, they are shared by shader stages operating on Vertex URB Entries (VUEs). VUEs are the data structures in the URB that hold vertex data between the pipeline geometry stages. Using vue in the name instead of vec4 makes a lot more sense, especially when we add scalar vertex shader support. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Move BRW_NEW_*_PROG_DATA flags to .brw (not .cache).Kenneth Graunke2014-12-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | I put the BRW_NEW_*_PROG_DATA flags at the beginning so that brw_state_cache.c can still continue using 1 << brw_cache_id. I also added a comment explaining the difference between BRW_NEW_*_PROG_DATA and BRW_NEW_*_PROGRAM, as it took me a long time to remember it. Non-mechanical changes: - brw_state_cache.c and brw_ff_gs.c now signal .brw, not .cache. - brw_state_upload.c - INTEL_DEBUG=state changes. - brw_context.h - bit definition merging. v2: Correct the explanation of BRW_NEW_*_PROG_DATA to mention state-based recompiles, and nix the "proper subset" claim, as it's false. (Caught by Kristian Høgsberg). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Rename CACHE_NEW_*_PROG to BRW_NEW_*_PROG_DATA.Kenneth Graunke2014-12-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we've moved a bunch of CACHE_NEW_* bits to BRW_NEW_*, the only ones that are left are legitimately related to the program cache. Yet, it seems a bit wasteful to have an entire bitfield for only 7 bits. State upload is one of the hottest paths in the driver. For each atom in the list, we call check_state() to see if it needs to be emitted. Currently, this involves comparing three separate bitfields (mesa, brw, and cache). Consolidating the brw and cache bitfields would save a small amount of CPU overhead per atom. Broadwell, for example, has 57 state atoms, so this small savings can add up. CACHE_NEW_*_PROG covers the brw_*_prog_data structures, as well as the offset into the program cache BO (prog_offset). Since most uses refer to brw_*_prog_data, I decided to use BRW_NEW_*_PROG_DATA as the name. Removing "cache" completely is a bit painful, so I decided to do it in several patches for easier review, and to separate mechanical changes from manual ones. This one simply renames things, and was made via: $ for file in *.[ch]; do sed -i -e 's/CACHE_NEW_\([A-Z_\*]*\)_PROG/BRW_NEW_\1_PROG_DATA/g' \ -e 's/BRW_NEW_WM_PROG_DATA/BRW_NEW_FS_PROG_DATA/g' $file done Note that BRW_NEW_*_PROG_DATA is still in .cache, not .brw! The next patch will remedy this flaw. It will also fix the alphabetization issues. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Acked-by: Matt Turner <mattst88@gmail.com>
* i965: Alphabetize brw_tracked_state flags and use a consistent style.Kenneth Graunke2014-11-291-3/+3
| | | | | | | | | | | | | | | | Most of the dirty flags were listed in some arbitrary order. Some used bonus parenthesis. Some put multiple flags on one line, others put one per line. Some used tabs instead of spaces...but only on some lines. This patch settles on one flag per line, in alphabetical order, using spaces instead of tabs, and sheds the unnecessary parentheses. Sorting was mostly done with vim's visual block feature and !sort, although I alphabetized short lists by hand; it was pretty manual. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/gs: Reuse gen6 constant push buffers setup code in gen7+.Samuel Iglesias Gonsalvez2014-09-191-33/+0
| | | | | | | | The code required for gen6 and gen7+ is almost the same, so reuse it. Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/gs: Use single dispatch mode as fallback to dual object mode when possible.Iago Toral Quiroga2014-09-191-3/+1
| | | | | | | | | | | | | | | | | | Currently, when a geometry shader can't use dual object mode we fall back to dual instance mode, however, when invocations == 1, single dispatch mode is more performant and equally efficient in terms of register pressure. Single dispatch mode requires that the driver can handle interleaving of input registers, but this is already supported (dual instance mode has the same requirement). However, to take full advantage of single dispatch mode to reduce register pressure we would also need the ability to store two separate vec4 output values into vec8 registers, which would approximately double our capacity to store temporary values, but currently the vec4 visitor and generator classes do not support this, so at the moment register pressure in single and dual instance modes is the same. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Move curb_read_length/total_scratch to brw_stage_prog_data.Kenneth Graunke2014-09-031-2/+2
| | | | | | | | All shader stages have these fields, so it makes sense to store them in the common base structure, rather than duplicating them in each. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965/gen6+: Merge VS/GS and WM push constant buffer upload paths.Eric Anholt2014-07-021-3/+3
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Move dispatch_grf_start_reg and first_curbe_grf into stage_prog_data.Eric Anholt2014-07-021-1/+1
| | | | | | | I wanted to access this value from stage-generic code, so stop storing it under two different names. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Move push constant state packets to push constant update time.Eric Anholt2014-05-021-14/+15
| | | | | | | -0.553779% +/- 0.423394% effect on cairo-perf-trace runtime on glamor (n=612) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Drop unnecessary state flag for units on NEW_BINDING_TABLE.Eric Anholt2014-05-021-1/+0
| | | | | | | | | | | Commit 30259856a8a82a55c030df1ad052e505c61144bc moved the state packets to table generation time, but forgot to make this change. Apparently the performance win there was about not reemitting the table pointers on unrelated state changes. No performance difference on cairo on glamor (n=118). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/gen7+: Move sampler state packets to the stage sampler state table update.Eric Anholt2014-05-021-7/+1
| | | | | | | | | | | | Now that we have the stage state coming into our setup of sampler states, it's easy to drop an identifier into it of which stage the stage_state is, and then look up which packet to emit in a little table. No performance difference on cairo on glamor (n=492). v2: Don't forget to do the workaround flush on IVB. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Move binding table update packets to binding table setup time.Eric Anholt2014-03-101-6/+0
| | | | | | | | | | | | | | | | | | | This keeps us from needing to reemit all the other stage state just because a surface changed. Improves unoptimized glamor x11perf -f8text by 1.10201% +/- 0.489869% (n=296). [v1] v2: - Drop binding table packets from Gen8 unit state as well. - Pass _3DSTATE_BINDING_TABLE_POINTERS_XS to brw_upload_binding_table, cutting even more code. v3: Don't forget to drop them from 3DSTATE_GS (botched refactor in v2). Signed-off-by: Eric Anholt <eric@anholt.net> [v1] Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1] Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> [v2, v3] Reviewed-by: Eric Anholt <eric@anholt.net> [v3]
* i965: support instanced GS on gen7Jordan Justen2014-02-201-0/+2
| | | | | | | | | | v3: * Properly prevent dual object mode execution when the invocation count > 1 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Paul Berry <stereotype441@gmail.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965/gen7: Emit workaround flush when changing GS enable state.Paul Berry2013-11-181-0/+16
| | | | | | | | | | v2: Don't go to extra work to avoid extraneous flushes. (Previous experiments in the kernel have suggested that flushing the pipeline when it is already empty is extremely cheap). Cc: "10.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Tell the unit states how many binding table entries we have.Eric Anholt2013-11-051-1/+3
| | | | | | | | | | Before the series with 3c9dc2d31b80fc73bffa1f40a91443a53229c8e2 to dynamically assign our binding table indices, we didn't really track our binding table count per shader, so we never filled in these fields. Affects cairo-gl trace runtime by -2.47953% +/- 1.07281% (n=20) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/gs: Add the ability to compile a DUAL_INSTANCED geometry shader.Paul Berry2013-10-241-1/+3
| | | | | | Not yet enabled. Reviewed-by: Eric Anholt <eric@anholt.net>
* i965/gs: Set the REORDER bit in 3DSTATE_GS.Paul Berry2013-10-151-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | Ivy Bridge's "reorder enable" bit gives us a binary choice for the order in which vertices from triangle strips are delivered to the geometry shader. Neither choice follows the OpenGL spec, but setting the bit is better, because it gets triangle orientation correct. Haswell replaces the "reorder enable" bit with a new "reorder mode" bit (which occupies the same location in the command packet). This bit gives us a different binary choice, which affects both triangle strips and triangle strips with adjacency. Setting the bit ("reorder trailing") gives the proper order according to the OpenGL spec. So in either case we want to set the bit. On Ivy Bridge, fixes piglit test "triangle-strip-orientation". On Haswell, fixes piglit tests "glsl-1.50-geometry-primitive-types {GL_TRIANGLE_STRIP,GL_TRIANGLE_STRIP_ADJACENCY}" and "glsl-1.50-geometry-tri-strip-ordering-with-prim-restart *". v2: Rename the bit to "REORDER_TRAILING" for consistency with Haswell docs. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/gs: Fix incorrect numbering of DWORDs in 3DSTATE_GSPaul Berry2013-10-011-3/+4
| | | | | | | | | | | | | | | In commit 247f90c77e8f3894e963d796628246ba0bde27b5 (i965/gs: Set control data header size/format appropriately for EndPrimitive()), I incorrectly numbered the DWORDs in the 3DSTATE_GS command starting from 1 instead of starting from 0. This caused the control data format to be programmed into the wrong DWORD, resulting in corruption in some geometry shaders that used an output type of points. This patch numbers the DWORDs starting from 0, as we do for all other commands, which causes the control data format to be programmed into the correct DWORD. Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
* i965/gs: Implement basic gl_PrimitiveIDIn functionality.Paul Berry2013-09-171-0/+2
| | | | | | | | | | | | | | | | | If the geometry shader refers to the built-in variable gl_PrimitiveIDIn, we need to set a bit in 3DSTATE_GS to tell the hardware to dispatch primitive ID to r1, and we need to leave room for it when allocating registers. Note: this feature doesn't yet work properly when software primitive restart is in use (the primitive ID counter will incorrectly reset with each primitive restart, since software primitive restart works by performing multiple draw calls). I plan to address that in a future patch series. Fixes piglit test "spec/glsl-1.50/execution/geometry/primitive-id-in". Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/gs: Set control data header size/format appropriately for EndPrimitive().Paul Berry2013-09-111-14/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The gen7 geometry shader uses a "control data header" at the beginning of the output URB entry to store either (a) flag bits (1 bit/vertex) indicating whether EndPrimitive() was called after each vertex, or (b) stream ID bits (2 bits/vertex) indicating which stream each vertex should be sent to (when multiple transform feedback streams are in use). Fortunately, OpenGL only requires separate streams to be supported when the output type is points, and EndPrimitive() only has an effect when the output type is line_strip or triangle_strip, so it's not a problem that these two uses of the control data header are mutually exclusive. This patch modifies do_vec4_gs_prog() to determine the correct hardware settings for configuring the control data header, and modifies upload_gs_state() to propagate these settings to the hardware. In addition, it modifies do_vec4_gs_prog() to ensure that the output URB entry is large enough to contain both the output vertices *and* the control data header. Finally, it modifies vec4_gs_visitor so that it accounts for the size of the control data header when computing the offset within the URB where output vertex data should be stored. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> v2: Fixed incorrect handling of IVB/HSW differences. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/gs: Add a state atom to set up geometry shader state.Paul Berry2013-09-111-0/+144
v2: Do not attempt to share the code that uploads 3DSTATE_BINDING_TABLE_POINTERS_GS, 3DSTATE_SAMPLER_STATE_POINTERS_GS, or 3DSTATE_GS with VS. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> v3: Add _NEW_TRANSFORM to gen7_gs_state. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>