summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/gen7_vs_state.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Eliminate brw->vs.prog_data pointer.Kenneth Graunke2016-10-051-6/+8
| | | | | | | | | | | | Just say no to: - brw->vs.base.prog_data = &brw->vs.prog_data->base.base; We'll just use the brw_stage_prog_data pointer in brw_stage_state and downcast it to brw_vs_prog_data as needed. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
* i965: Drop _NEW_TRANSFORM from 3DSTATE_VS atom on Gen7.Kenneth Graunke2016-10-041-1/+1
| | | | | | | | | | | | | | The atom that uploads push constants listens to _NEW_TRANSFORM for legacy clip plane handling. On Sandybridge, the gen6_vs_state atom emits 3DSTATE_CONSTANT_VS as well as 3DSTATE_VS, so it needs to listen to the same set of conditions. However, it looks like Gen7 doesn't need this. The push constant atom emits 3DSTATE_CONSTANT_VS directly, and the gen7_vs_state atom that emits 3DSTATE_VS doesn't have a dependency on ctx->Transform. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* anv/gen7: Make use of local variable prog_dataAnuj Phogat2016-10-041-2/+2
| | | | | Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965: get rid of duplicated values from gen_device_infoLionel Landwerlin2016-09-231-1/+2
| | | | | | | | Now that we have gen_device_info mutable, we can update its values and drop all copies we had in brw_context. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Fix cross-primitive scratch corruption when changing the per-thread ↵Francisco Jerez2016-06-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | allocation. I haven't found any mention of this in the hardware docs, but experimentally what seems to be going on is that when the per-thread scratch slot size is changed between two pipelined draw calls, shader invocations using the old and new scratch size setting may end up being executed in parallel, causing their scratch offset calculations to be based in a different partitioning of the scratch space, which can cause their thread-local scratch space to overlap leading to cross-thread scratch corruption. I've been experimenting with alternative workarounds, like emitting a PIPE_CONTROL with DC flush and CS stall between draw (or dispatch compute) calls using different per-thread scratch allocation settings, or avoiding reuse of the scratch BO if the per-thread scratch allocation doesn't exactly match the original. Both seem to be as effective as this workaround, but they have potential performance implications, while this should be basically for free. Fixes over 40 failures in our CI system with spilling forced on (including CTS, dEQP and Piglit failures) on a number of different platforms from Gen4 to Gen9. The 'glsl-max-varyings' piglit test seems to be able to reproduce this bug consistently in the vertex shader on at least Gen4, Gen8 and Gen9 with spilling forced on. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Make all atoms to track BRW_NEW_BLORP by defaultKenneth Graunke2016-04-231-0/+1
| | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com
* i965: Extract push constant state to a new fileBen Widawsky2016-02-171-75/+0
| | | | | | | | | | | Every stage has a corresponding 3DSTATE_CONSTANT_XS packet, so having the code to create and emit push constant buffers in genX_vs_state.c is a little strange. Moving it to a separate file seems more logical. v2 [Ken]: Rebase on master, explain motivation in the commit message. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Don't tell the hardware about our UAV access.Francisco Jerez2015-10-091-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hardware documentation relating to the UAV HW-assisted coherency mechanism and UAV access enable bits is scarce and sometimes contradictory, and there's quite some guesswork behind this commit, so let me summarize the background first: HSW and later hardware have infrastructure to support a stricter form of data coherency between shader invocations from separate primitives. The mechanism is controlled by the "Accesses UAV" bits on 3DSTATE_VS, _HS, _DS, _GS and _PS (or _PS_EXTRA on BDW+), and the "UAV Coherency Required" bit on the 3DPRIMITIVE command. Regardless of whether "UAV Coherency Required" is set, the hardware fixed-function units will increment a per-stage semaphore for each request received if "Accesses UAV" is set for the same or any lower stage. An implicit DC flush is emitted by the lowermost stage with "Accesses UAV" set once it's done processing the request, this also happens regardless of the value of "UAV Coherency Required". The completion of the DC flush will cause the same stage and all previous ones to decrement the semaphore, marking the UAV accesses for the primitive as coherent with L3. The "UAV Coherency Required" 3DPRIMITIVE bit will cause a pipeline stall before any threads are dispatched for the first FF stage with "Accesses UAV" set until the semaphore is cleared for the same stage. Effectively this guarantees that UAV memory accesses performed by previous primitives from any stage will be strictly ordered (and thanks to the implicit DC flush visible in memory) with UAV accesses from the following primitives. None of this is required by the usual image, atomic counter and SSBO GL APIs which have very relaxed cross-primitive coherency and ordering requirements, so we don't actually ever set the "UAV Coherency Required" bit -- Ordering with respect to shader invocations from previous stages on the same primitive where there is a data dependency is of course already guaranteed as the spec requires, regardless of this mechanism being enabled. We do set the "Accesses UAV" bits though since my commit ac7664e493655e290783c23a0412b9c70936da50 (which this patch partially reverts), mainly because of comments like the following from the BDW PRM: > 3DSTATE_GS >[...] > 12 Accesses UAV > Format: Enable > This field must be set when GS has a UAV access. There are similar comments in the documentation for the other 3DSTATE_*S commands. The "must" part is misleading and unjustified AFAIK. Most of the "Accesses UAV" bits don't seem to have any side effects other than the implicit DC flushes and the related book-keeping in anticipation for a subsequent primitive with "UAV Coherency Required" set, so in most cases they are unnecessary and may incur a performance penalty. There is an exception though. On Gen8+ the PS_EXTRA UAV access bit influences the calculation of the PS UAV-only and ThreadDispatchEnable signals which on previous generations were set explicitly by the driver, so we cannot always avoid enabling it on the PS stage. The primary motivation for this change is that in fact the hardware coherency mechanism is buggy and will cause a rather non-deterministic hang on Gen8 when VS is the only stage with "Accesses UAV" set and the processing of a request terminates immediately after the implicit DC flush is sent for a previous primitive with no additional vertices being emitted for the second primitive, what will cause the hardware to skip sending a second DC flush and cause the VS to stall indefinitely waiting for a response from the DC (BDWGFX HSD 1912017). This hardware bug can be reproduced on current master with the spec@arb_shader_image_load_store@host-mem-barrier@Indirect/RaW piglit subtest (if you have the patience to run it a few dozen times). The proposed workaround is to insert CS STALLs speculatively between 3DPRIMITIVE commands when "Accesses UAV" is enabled for the VS stage only. Because this would affect one of the hottest paths in the driver and likely decrease performance even further due to the unnecessary serialization, and because we don't actually need the implicit DC flushes, it seems better to just disable them. Cc: 11.0 <mesa-stable@lists.freedesktop.org>
* i965/gen7-8: Poke the 3DSTATE UAV access enable bits.Francisco Jerez2015-08-111-5/+8
| | | | | | v2: Set the PS UAV-only bit on HSW (Ken). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Trivial formatting changes in gen7_vs_state.cIan Romanick2015-08-031-5/+5
| | | | | | Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
* i965/gen9: Implement Push Constant Buffer workaroundBen Widawsky2015-06-221-7/+41
| | | | | | | | | | | | | | | | | | | | | | | This implements a workaround (exact excerpt as a comment in the code). The docs specify [clearly, after you struggle for a while] that the offset isn't relative to state base. This actually makes sense. This fixes hangs on SKL. Buffer #0 is meant to be used for normal uniforms. Buffer #1 is typically used for gather constants when using RS. Buffer #1-#3 could be used to push a bunch of UBO data which would just be somewhere in memory, and not relative to the dynamic state. NOTE: I've moved away from the ternary operator for the new gen9 conditions. Admittedly it's probably not great to do this, but I really want to fix this all up in the subsequent patch and doing it here makes that diff a lot nicer. I want to split out the gen8/9 code to make the function a bit more readable, but to keep this easily cherry-pickable I am doing this fix first. If we decide not to merge the cleanup patch then I can revisit this. Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Tested-by: Valtteri Rantala <Valtteri.rantala@intel.com>
* i965/state: Don't use brw->state.dirty.brwJordan Justen2015-03-311-1/+1
| | | | | | | | | | | | | | | Now, we only use ctx->NewDriverState. I used this bash & sed command in the i965 directory: for file in *.[ch] *.[ch]pp; do sed -i -e 's/state\.dirty\.brw/ctx.NewDriverState/g' $file done Followed by manual changes to brw_state_upload.c. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/skl: Force a BINDING_TABLE_POINTER_* after push constant commandNeil Roberts2015-01-301-0/+7
| | | | | | | | | | | | | According to the SKL bspec the 3DSTATE_CONSTANT_* commands only take effect on the next corresponding 3DSTATE_BINDING_TABLE_POINTER_* command. This patch just makes it set the BRW_NEW_SURFACES state when uploading the push constants to ensure the binding tables will be updated. This fixes the fbo-blending-formats Piglit test and possibly others. Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Drop BRW_NEW_VERTEX_PROGRAM from Gen7+ 3DSTATE_VS atoms.Kenneth Graunke2014-12-041-1/+0
| | | | | | | | | | | We don't access brw->vertex_program or ctx->_Shader since the previous commit, so we don't need this dirty bit. I think it's still necessary on Gen6 because it still conflates constant uploading with unit state uploading. We can fix that later. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Store floating point mode choice in brw_stage_prog_data.Kenneth Graunke2014-12-041-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | We use IEEE mode for GLSL programs, but need to use ALT mode for ARB programs so that 0^0 == 1. The choice is based entirely on the shader source language. Previously, our code to determine which mode we wanted was duplicated in 8 different places (VS and FS for Gen4-5, Gen6, Gen7, and Gen8). The ctx->_Shader->CurrentProgram[stage] == NULL check was confusing as well - we use CurrentProgram (non-derived state), but _Shader (derived state). It also relies on knowing that ARB programs don't use gl_shader_program structures today. The compiler already makes this assumption in a few places, but I'd rather keep that assumption out of the state upload code. With this patch, we select the mode at compile time, and store that choice in prog_data. The state upload code simply uses that decision. This eliminates a BRW_NEW_*_PROGRAM dependency in the state upload code. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Move BRW_NEW_*_PROG_DATA flags to .brw (not .cache).Kenneth Graunke2014-12-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | I put the BRW_NEW_*_PROG_DATA flags at the beginning so that brw_state_cache.c can still continue using 1 << brw_cache_id. I also added a comment explaining the difference between BRW_NEW_*_PROG_DATA and BRW_NEW_*_PROGRAM, as it took me a long time to remember it. Non-mechanical changes: - brw_state_cache.c and brw_ff_gs.c now signal .brw, not .cache. - brw_state_upload.c - INTEL_DEBUG=state changes. - brw_context.h - bit definition merging. v2: Correct the explanation of BRW_NEW_*_PROG_DATA to mention state-based recompiles, and nix the "proper subset" claim, as it's false. (Caught by Kristian Høgsberg). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Rename CACHE_NEW_*_PROG to BRW_NEW_*_PROG_DATA.Kenneth Graunke2014-12-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we've moved a bunch of CACHE_NEW_* bits to BRW_NEW_*, the only ones that are left are legitimately related to the program cache. Yet, it seems a bit wasteful to have an entire bitfield for only 7 bits. State upload is one of the hottest paths in the driver. For each atom in the list, we call check_state() to see if it needs to be emitted. Currently, this involves comparing three separate bitfields (mesa, brw, and cache). Consolidating the brw and cache bitfields would save a small amount of CPU overhead per atom. Broadwell, for example, has 57 state atoms, so this small savings can add up. CACHE_NEW_*_PROG covers the brw_*_prog_data structures, as well as the offset into the program cache BO (prog_offset). Since most uses refer to brw_*_prog_data, I decided to use BRW_NEW_*_PROG_DATA as the name. Removing "cache" completely is a bit painful, so I decided to do it in several patches for easier review, and to separate mechanical changes from manual ones. This one simply renames things, and was made via: $ for file in *.[ch]; do sed -i -e 's/CACHE_NEW_\([A-Z_\*]*\)_PROG/BRW_NEW_\1_PROG_DATA/g' \ -e 's/BRW_NEW_WM_PROG_DATA/BRW_NEW_FS_PROG_DATA/g' $file done Note that BRW_NEW_*_PROG_DATA is still in .cache, not .brw! The next patch will remedy this flaw. It will also fix the alphabetization issues. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Acked-by: Matt Turner <mattst88@gmail.com>
* i965: Alphabetize brw_tracked_state flags and use a consistent style.Kenneth Graunke2014-11-291-3/+3
| | | | | | | | | | | | | | | | Most of the dirty flags were listed in some arbitrary order. Some used bonus parenthesis. Some put multiple flags on one line, others put one per line. Some used tabs instead of spaces...but only on some lines. This patch settles on one flag per line, in alphabetical order, using spaces instead of tabs, and sheds the unnecessary parentheses. Sorting was mostly done with vim's visual block feature and !sort, although I alphabetized short lists by hand; it was pretty manual. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Move curb_read_length/total_scratch to brw_stage_prog_data.Kenneth Graunke2014-09-031-2/+2
| | | | | | | | All shader stages have these fields, so it makes sense to store them in the common base structure, rather than duplicating them in each. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Move dispatch_grf_start_reg and first_curbe_grf into stage_prog_data.Eric Anholt2014-07-021-1/+1
| | | | | | | I wanted to access this value from stage-generic code, so stop storing it under two different names. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Remove unneeded VS workaround stalls on Baytrail.Greg Hunt2014-06-261-1/+1
| | | | | | | | | | | | | According to the workarounds list, these stalls aren't needed on production Baytrail systems. Piglit confirms that as well. These cause a small slowdown when we are sending a large number of small batches to the GPU. Removing these improves performance by up to 5% on some CPU bound SynMark tests (Batch[4-7], DrvState1, HdrBloom, Multithread, ShMapPcf). Signed-off-by: Gregory Hunt <greg.hunt@mobica.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Move push constant state packets to push constant update time.Eric Anholt2014-05-021-6/+2
| | | | | | | -0.553779% +/- 0.423394% effect on cairo-perf-trace runtime on glamor (n=612) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Merge gen8_upload_constant_state into gen7_upload_constant_state.Eric Anholt2014-05-021-3/+13
| | | | | | | The two paths are really similar, and the extra conditionals will be dwarfed by the cost of the actual upload. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Refactor gen7_upload_constant_state to look more like gen8.Eric Anholt2014-05-021-25/+15
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Drop unnecessary state flag for units on NEW_BINDING_TABLE.Eric Anholt2014-05-021-1/+0
| | | | | | | | | | | Commit 30259856a8a82a55c030df1ad052e505c61144bc moved the state packets to table generation time, but forgot to make this change. Apparently the performance win there was about not reemitting the table pointers on unrelated state changes. No performance difference on cairo on glamor (n=118). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/gen7+: Move sampler state packets to the stage sampler state table update.Eric Anholt2014-05-021-7/+1
| | | | | | | | | | | | Now that we have the stage state coming into our setup of sampler states, it's easy to drop an identifier into it of which stage the stage_state is, and then look up which packet to emit in a little table. No performance difference on cairo on glamor (n=492). v2: Don't forget to do the workaround flush on IVB. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* mesa/sso: rename Shader to the pointer _ShaderGregory Hainaut2014-03-251-1/+1
| | | | | | | | | | | | | | | | Basically a sed but shaderapi.c and get.c. get.c => GL_CURRENT_PROGAM always refer to the "old" UseProgram behavior shaderapi.c => the old api stil update the Shader object directly V2: formatting improvement V3 (idr): * Rebase fixes after a block of code was moved from ir_to_mesa.cpp to shaderapi.c. * Trivial reformatting. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Move binding table update packets to binding table setup time.Eric Anholt2014-03-101-6/+0
| | | | | | | | | | | | | | | | | | | This keeps us from needing to reemit all the other stage state just because a surface changed. Improves unoptimized glamor x11perf -f8text by 1.10201% +/- 0.489869% (n=296). [v1] v2: - Drop binding table packets from Gen8 unit state as well. - Pass _3DSTATE_BINDING_TABLE_POINTERS_XS to brw_upload_binding_table, cutting even more code. v3: Don't forget to drop them from 3DSTATE_GS (botched refactor in v2). Signed-off-by: Eric Anholt <eric@anholt.net> [v1] Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1] Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> [v2, v3] Reviewed-by: Eric Anholt <eric@anholt.net> [v3]
* i965: Only emit VS state pipe control workaround on IVB and BYT.Kenneth Graunke2014-02-271-1/+2
| | | | | | | | | | | | | | According to the BSpec's 3D workarounds page, this is unnecessary on shipping Haswell hardware, and was never necessary on Broadwell. It unfortunately doesn't say anything about Baytrail. The workaround database confirms those results for Ivybridge, Haswell, and Broadwell. Baytrail is less clear - one page says it's necessary, while the other says it isn't. For now, be conservative and leave it enabled. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* mesa: Replace ctx->Shader.Current{Vertex,Fragment,Geometry}Program with an ↵Paul Berry2014-01-211-1/+1
| | | | | | | | | | | | | | | | | | | | | array. These are replaced with ctx->Shader.CurrentProgram[MESA_SHADER_{VERTEX,FRAGMENT,GEOMETRY}]. In patches to follow, this will allow us to replace a lot of ad-hoc logic with a variable index into the array. With the exception of the changes to mtypes.h, this patch was generated entirely by the command: find src -type f '(' -iname '*.c' -o -iname '*.cpp' ')' \ -print0 | xargs -0 sed -i \ -e 's/\.CurrentVertexProgram/.CurrentProgram[MESA_SHADER_VERTEX]/g' \ -e 's/\.CurrentGeometryProgram/.CurrentProgram[MESA_SHADER_GEOMETRY]/g' \ -e 's/\.CurrentFragmentProgram/.CurrentProgram[MESA_SHADER_FRAGMENT]/g' Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Brian Paul <brianp@vmware.com>
* i965: Tell the unit states how many binding table entries we have.Eric Anholt2013-11-051-1/+3
| | | | | | | | | | Before the series with 3c9dc2d31b80fc73bffa1f40a91443a53229c8e2 to dynamically assign our binding table indices, we didn't really track our binding table count per shader, so we never filled in these fields. Affects cairo-gl trace runtime by -2.47953% +/- 1.07281% (n=20) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/gen7: Extract a function for setting up a shader stage's constants.Paul Berry2013-09-111-25/+36
| | | | | | | | This will allow us to reuse some code when setting up the geometry shader stage. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Make sure constants re-sent after constant buffer reallocation.Paul Berry2013-08-311-1/+2
| | | | | | | | | | | | | | | | | | | | The hardware requires that after constant buffers for a stage are allocated using a 3DSTATE_PUSH_CONSTANT_ALLOC_{VS,HS,DS,GS,PS} command, and prior to execution of a 3DPRIMITIVE, the corresponding stage's constant buffers must be reprogrammed using a 3DSTATE_CONSTANT_{VS,HS,DS,GS,PS} command. Previously we didn't need to worry about this, because we only programmed 3DSTATE_PUSH_CONSTANT_ALLOC_{VS,HS,DS,GS,PS} once on startup (or, previous to that, whenever BRW_NEW_CONTEXT was flagged). But now that we reallocate the constant buffers whenever geometry shaders are switched on and off, we need to make sure the constant buffers are reprogrammed. We do this by adding a new bit, BRW_NEW_PUSH_CONSTANT_ALLOCATION, to brw->state.dirty.brw. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Move data from brw->vs into a base class if gs will also need it.Paul Berry2013-08-311-8/+10
| | | | | | | | | This paves the way for sharing the code that will set up the vertex and geometry shader pipeline state. v2: Rename the base class to brw_stage_state. Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
* i965/vec4: Allow for dispatch_grf_start_reg to vary.Paul Berry2013-08-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both 3DSTATE_VS and 3DSTATE_GS have a dispatch_grf_start_reg control, which determines the register where the hardware delivers data sourced from the URB (push constants followed by per-vertex input data). For vertex shaders, we always set dispatch_grf_start_reg to 1, since R1 is always the first register available for push constants in vertex shaders. For geometry shaders, we'll need the flexibility to set dispatch_grf_start_reg to different values depending on the behvaiour of the geometry shader; if it accesses gl_PrimitiveIDIn, we'll need to set it to 2 to allow the primitive ID to be delivered to the thread in R1. This patch eliminates the assumption that dispatch_grf_start_reg is always 1. In vec4_visitor, we record the regnum that was passed to vec4_visitor::setup_uniforms() in prog_data for later use. In vec4_generator, we consult this value when converting an abstract UNIFORM register to a concrete hardware register. And in the code that emits 3DSTATE_VS, we set dispatch_grf_start_reg based on the value recorded in prog_data. This will allow us to set dispatch_grf_start_reg to the appropriate value when compiling geometry shaders. Vertex shaders will continue to always use a dispatch_grf_start_reg of 1. v2: Make dispatch_grf_start_reg "unsigned" rather than "GLuint". Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/gen7: Set MOCS L3 cacheability for IVB/BYT (v2)Ville Syrjälä2013-08-211-3/+1
| | | | | | | | | | | | | | | | IVB/BYT also has the same L3 cacheability control in MOCS as HSW, so let's make use of it. pts/xonotic and pts/reaction @ 1920x1080 gain ~4% on my IVB GT2. Most other things show less gains/no regressions, except furmark which loses some 10 points. I didn't have a BYT at hand for testing. v2: Don't check (brw->gen == 7) in gen7 functions. (chadv) Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
* i965: Upload separate per-stage sampler state tables.Kenneth Graunke2013-08-191-1/+1
| | | | | | | | | | | | | | Also upload separate sampler default/texture border color entries. At the moment, this is completely idiotic: both tables contain exactly the same contents, so we're simply wasting batch space and CPU time. However, soon we'll only upload data for textures actually /used/ in a particular stage, which will usually make the VS table empty and very likely eliminate all redundancy. This is just a stepping stone. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
* i965: Split sampler count variable to be per-stage.Kenneth Graunke2013-08-191-1/+1
| | | | | | | | | | | Currently, we only have a single sampler state table shared among all stages, so we just copy wm.sampler_count into vs.sampler_count. In the future, each shader stage will have its own SAMPLER_STATE table, at which point we'll need these separate sampler counts. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
* i965/hsw: Change L3 MOCS of 3DSTATE_CONSTANT_VS/PSChad Versace2013-07-181-1/+3
| | | | | | | | | | Change from "not cacheable" to "cacheable" in L3. Do so for the draw upload path and blorp. In blorp, change only the PS packet, because the VS packet is disabled. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
* i965: Delete intel_context entirely.Kenneth Graunke2013-07-091-1/+1
| | | | | | | | | | This makes brw_context inherit directly from gl_context; that was the only thing left in intel_context. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Move intel_context::is_<platform> flags to brw_context.Kenneth Graunke2013-07-091-1/+1
| | | | | | | Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Shorten context base class dereference chains.Kenneth Graunke2013-07-091-2/+2
| | | | | | | | | ctx->DrawBuffer is much more sensible than brw->intel.ctx.DrawBuffer. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Pass brw_context to functions rather than intel_context.Kenneth Graunke2013-07-091-1/+1
| | | | | | | | | | | | | | This makes brw_context available in every function that used intel_context. This makes it possible to start migrating fields from intel_context to brw_context. Surprisingly, this actually removes some code, as functions that use OUT_BATCH don't need to declare "intel"; they just use "brw." Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965/vs: split brw_vs_prog_data into generic and VS-specific parts.Paul Berry2013-04-111-3/+3
| | | | | | | | | | | This will allow the generic parts to be re-used for geometry shaders. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> v2: Put urb_read_length and urb_entry_size in the generic struct. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Update max VS/PS threads shift offsets for Haswell.Kenneth Graunke2012-03-301-1/+3
| | | | | | | These now start at bit 23 instead of bit 24/25. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Split the VS binding table to a separate table.Eric Anholt2012-02-211-1/+2
| | | | | | | | This is a step toward making the samplers/binding tables reflect sampler uniform mappings instead of embedding those in the programs. No significant performance difference on the microbenchmark (n=10). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Emit Ivybridge VS workaround flushes.Kenneth Graunke2012-02-151-0/+2
| | | | | | | | | | | | | I recently discovered this text in the BSpec. It seems wise to comply, though I haven't observed it to fix anything yet. Fixes a regression in glean/fbo since 28cfa1fa213fe. NOTE: This is a candidate for stable release branches. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45221 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Remove BRW_NEW_CURBE_OFFSETS dirty bit from Gen7 atoms.Kenneth Graunke2012-01-091-2/+1
| | | | | | | | | | The BRW_NEW_CURBE_OFFSETS dirty bit is only flagged by the brw_curbe_offsets state atom which is only used on Gen4-5. Since it's never flagged, there's no reason to depend on it. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Remove BRW_NEW_URB_FENCE dirty bit from Gen6+ atoms.Kenneth Graunke2012-01-091-1/+0
| | | | | | | | | | The BRW_NEW_URB_FENCE dirty bit is only flagged by the brw_recalculate_urb_fence state atom which isn't used on Gen6+. Since it's never flagged, there's no reason to depend on it. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Put a proper sampler count in 3DSTATE_VS.Kenneth Graunke2011-11-101-1/+2
| | | | | | | See similar code for 3DSTATE_WM. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>