summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_vs_state.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Eliminate brw->vs.prog_data pointer.Kenneth Graunke2016-10-051-11/+11
| | | | | | | | | | | | Just say no to: - brw->vs.base.prog_data = &brw->vs.prog_data->base.base; We'll just use the brw_stage_prog_data pointer in brw_stage_state and downcast it to brw_vs_prog_data as needed. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
* i965: get rid of duplicated values from gen_device_infoLionel Landwerlin2016-09-231-1/+2
| | | | | | | | Now that we have gen_device_info mutable, we can update its values and drop all copies we had in brw_context. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Fix cross-primitive scratch corruption when changing the per-thread ↵Francisco Jerez2016-06-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | allocation. I haven't found any mention of this in the hardware docs, but experimentally what seems to be going on is that when the per-thread scratch slot size is changed between two pipelined draw calls, shader invocations using the old and new scratch size setting may end up being executed in parallel, causing their scratch offset calculations to be based in a different partitioning of the scratch space, which can cause their thread-local scratch space to overlap leading to cross-thread scratch corruption. I've been experimenting with alternative workarounds, like emitting a PIPE_CONTROL with DC flush and CS stall between draw (or dispatch compute) calls using different per-thread scratch allocation settings, or avoiding reuse of the scratch BO if the per-thread scratch allocation doesn't exactly match the original. Both seem to be as effective as this workaround, but they have potential performance implications, while this should be basically for free. Fixes over 40 failures in our CI system with spilling forced on (including CTS, dEQP and Piglit failures) on a number of different platforms from Gen4 to Gen9. The 'glsl-max-varyings' piglit test seems to be able to reproduce this bug consistently in the vertex shader on at least Gen4, Gen8 and Gen9 with spilling forced on. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Make all atoms to track BRW_NEW_BLORP by defaultKenneth Graunke2016-04-231-0/+1
| | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com
* i965/state: Don't use brw->state.dirty.brwJordan Justen2015-03-311-1/+1
| | | | | | | | | | | | | | | Now, we only use ctx->NewDriverState. I used this bash & sed command in the i965 directory: for file in *.[ch] *.[ch]pp; do sed -i -e 's/state\.dirty\.brw/ctx.NewDriverState/g' $file done Followed by manual changes to brw_state_upload.c. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Drop BRW_NEW_VERTEX_PROGRAM and _NEW_TRANSFORM from Gen4 VS state.Kenneth Graunke2014-12-041-3/+2
| | | | | | | | | | These stopped being necessary in commit ab973403e445cd8211dba4e87e0. v2: Update commit message with a better explanation (thanks to Eric Anholt for doing the git archaeology). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Store floating point mode choice in brw_stage_prog_data.Kenneth Graunke2014-12-041-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | We use IEEE mode for GLSL programs, but need to use ALT mode for ARB programs so that 0^0 == 1. The choice is based entirely on the shader source language. Previously, our code to determine which mode we wanted was duplicated in 8 different places (VS and FS for Gen4-5, Gen6, Gen7, and Gen8). The ctx->_Shader->CurrentProgram[stage] == NULL check was confusing as well - we use CurrentProgram (non-derived state), but _Shader (derived state). It also relies on knowing that ARB programs don't use gl_shader_program structures today. The compiler already makes this assumption in a few places, but I'd rather keep that assumption out of the state upload code. With this patch, we select the mode at compile time, and store that choice in prog_data. The state upload code simply uses that decision. This eliminates a BRW_NEW_*_PROGRAM dependency in the state upload code. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Make Gen4-5 and Gen8+ ALT checks use ctx->_Shader too.Kenneth Graunke2014-12-041-1/+1
| | | | | | | | | | | | | Commit c0347705 changed the Gen6-7 code to use ctx->_Shader rather than ctx->Shader, but neglected to change the Gen4-5 or Gen8+ code. This might fix SSO related bugs, but ALT mode is only used for ARB programs, so if there's an actual problem, it's likely no one would run into it. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Move BRW_NEW_*_PROG_DATA flags to .brw (not .cache).Kenneth Graunke2014-12-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | I put the BRW_NEW_*_PROG_DATA flags at the beginning so that brw_state_cache.c can still continue using 1 << brw_cache_id. I also added a comment explaining the difference between BRW_NEW_*_PROG_DATA and BRW_NEW_*_PROGRAM, as it took me a long time to remember it. Non-mechanical changes: - brw_state_cache.c and brw_ff_gs.c now signal .brw, not .cache. - brw_state_upload.c - INTEL_DEBUG=state changes. - brw_context.h - bit definition merging. v2: Correct the explanation of BRW_NEW_*_PROG_DATA to mention state-based recompiles, and nix the "proper subset" claim, as it's false. (Caught by Kristian Høgsberg). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Rename CACHE_NEW_*_PROG to BRW_NEW_*_PROG_DATA.Kenneth Graunke2014-12-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we've moved a bunch of CACHE_NEW_* bits to BRW_NEW_*, the only ones that are left are legitimately related to the program cache. Yet, it seems a bit wasteful to have an entire bitfield for only 7 bits. State upload is one of the hottest paths in the driver. For each atom in the list, we call check_state() to see if it needs to be emitted. Currently, this involves comparing three separate bitfields (mesa, brw, and cache). Consolidating the brw and cache bitfields would save a small amount of CPU overhead per atom. Broadwell, for example, has 57 state atoms, so this small savings can add up. CACHE_NEW_*_PROG covers the brw_*_prog_data structures, as well as the offset into the program cache BO (prog_offset). Since most uses refer to brw_*_prog_data, I decided to use BRW_NEW_*_PROG_DATA as the name. Removing "cache" completely is a bit painful, so I decided to do it in several patches for easier review, and to separate mechanical changes from manual ones. This one simply renames things, and was made via: $ for file in *.[ch]; do sed -i -e 's/CACHE_NEW_\([A-Z_\*]*\)_PROG/BRW_NEW_\1_PROG_DATA/g' \ -e 's/BRW_NEW_WM_PROG_DATA/BRW_NEW_FS_PROG_DATA/g' $file done Note that BRW_NEW_*_PROG_DATA is still in .cache, not .brw! The next patch will remedy this flaw. It will also fix the alphabetization issues. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Acked-by: Matt Turner <mattst88@gmail.com>
* i965: Move CACHE_NEW_SAMPLER to BRW_NEW_SAMPLER_STATE_TABLE.Kenneth Graunke2014-11-291-3/+3
| | | | | | | | | | | | | | | | | | | This flag signifies that we've emitted a new SAMPLER_STATE table. Given that we haven't cached those in years, CACHE_NEW_SAMPLER isn't a great name. Putting it in the BRW_NEW_* hierarchy would make more sense; BRW_NEW_SAMPLER_STATE_TABLE better reflects its actual purpose. When this flag is raised, the pointer to the SAMPLER_STATE table has changed, so we need to re-issue any packets which point to it (unit state on Gen4-5, 3DSTATE_SAMPLER_STATE_POINTERS on Gen6, and the per-stage variants on Gen7+). Saves 2 * sizeof(void *) bytes per context, as we remove useless aux_compare/aux_free function pointers. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Move some /* CACHE_NEW_SAMPLER */ comments.Kenneth Graunke2014-11-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | Marking brw_stage_state::sampler_count as CACHE_NEW_SAMPLER is wrong. The number of samplers used by each program is actually computed at draw time (brw_try_draw_prims), based purely on the currently bound shader programs (gl_program::SamplersUsed). CACHE_NEW_SAMPLER means that we've emitted a new SAMPLER_STATE table. Although this could indicate that the number of samplers has changed, it could also simply mean that the contents of the table has changed (i.e. we've bound different textures). The real reason these atoms depend on CACHE_NEW_SAMPLER is because they include a pointer to the SAMPLER_STATE table. This was not commented. So, move the comments to the appropriate place. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Combine CACHE_NEW_*_UNIT into BRW_NEW_GEN4_UNIT_STATE.Kenneth Graunke2014-11-291-1/+1
| | | | | | | | | | | | | | | | | | | | | On Gen4-5, unit state is specified as indirect state, rather than commands. If any unit state changes, we upload it via brw_state_batch and arrange for 3DSTATE_PIPELINED_POINTERS to be re-emitted, which updates pointers to all unit state at once. Since there's only one command and state atom (brw_psp_urb_cs) that needs to know about this, there's no benefit to having six separate flags. We can combine CACHE_NEW_*_UNIT into a single flag. We also haven't cached these in a long time, so it doesn't make sense to use the "CACHE_NEW_" prefix. Instead, use the "BRW_NEW_" prefix. This also saves 12 * sizeof(void *) bytes of memory per context, as we remove useless aux_compare/aux_free functions for each CACHE bit. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Alphabetize brw_tracked_state flags and use a consistent style.Kenneth Graunke2014-11-291-6/+7
| | | | | | | | | | | | | | | | Most of the dirty flags were listed in some arbitrary order. Some used bonus parenthesis. Some put multiple flags on one line, others put one per line. Some used tabs instead of spaces...but only on some lines. This patch settles on one flag per line, in alphabetical order, using spaces instead of tabs, and sheds the unnecessary parentheses. Sorting was mostly done with vim's visual block feature and !sort, although I alphabetized short lists by hand; it was pretty manual. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* Revert 5 i965 patches: 8e27a4d2, 373143ed, c5bdf9be, 6f56e142, 88e3d404Jordan Justen2014-09-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | Reverts * "i965: Modify state upload to allow 2 different sets of state atoms." 8e27a4d2b3e4e74e9a77446bce49607433d86be3 * "i965: Modify dirty bit handling to support 2 pipelines." 373143ed9187c4d4ce1e3c486b5dd0880d18ec8b * "i965: Create a macro for checking a dirty bit." c5bdf9be1eca190417998d548fd140c1eca37a54 Conflicts: src/mesa/drivers/dri/i965/brw_context.h * "i965: Create a macro for setting all dirty bits." 6f56e1424d923fd80c84090fbf4506c9eaaffea1 Conflicts: src/mesa/drivers/dri/i965/brw_blorp.cpp src/mesa/drivers/dri/i965/brw_state_cache.c src/mesa/drivers/dri/i965/brw_state_upload.c * "i965: Create a macro for setting a dirty bit." 88e3d404dad009d8cff5124cf8acee7daeaceb64 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Move curb_read_length/total_scratch to brw_stage_prog_data.Kenneth Graunke2014-09-031-4/+4
| | | | | | | | All shader stages have these fields, so it makes sense to store them in the common base structure, rather than duplicating them in each. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Create a macro for setting a dirty bit.Paul Berry2014-09-011-1/+1
| | | | | | | This will make it easier to extend dirty bit handling to support compute shaders. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Move dispatch_grf_start_reg and first_curbe_grf into stage_prog_data.Eric Anholt2014-07-021-1/+1
| | | | | | | I wanted to access this value from stage-generic code, so stop storing it under two different names. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use unreachable() instead of unconditional assert().Matt Turner2014-07-011-2/+2
| | | | Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* mesa: Replace ctx->Shader.Current{Vertex,Fragment,Geometry}Program with an ↵Paul Berry2014-01-211-1/+1
| | | | | | | | | | | | | | | | | | | | | array. These are replaced with ctx->Shader.CurrentProgram[MESA_SHADER_{VERTEX,FRAGMENT,GEOMETRY}]. In patches to follow, this will allow us to replace a lot of ad-hoc logic with a variable index into the array. With the exception of the changes to mtypes.h, this patch was generated entirely by the command: find src -type f '(' -iname '*.c' -o -iname '*.cpp' ')' \ -print0 | xargs -0 sed -i \ -e 's/\.CurrentVertexProgram/.CurrentProgram[MESA_SHADER_VERTEX]/g' \ -e 's/\.CurrentGeometryProgram/.CurrentProgram[MESA_SHADER_GEOMETRY]/g' \ -e 's/\.CurrentFragmentProgram/.CurrentProgram[MESA_SHADER_FRAGMENT]/g' Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Brian Paul <brianp@vmware.com>
* i965: Use the new drm_intel_bo offset64 field.Kenneth Graunke2014-01-201-2/+2
| | | | | | | | | | | | | | | | | | | libdrm 2.4.52 introduces a new 'uint64_t offset64' field, intended to replace the old 'unsigned long offset' field. To preserve ABI, libdrm continues to store the presumed offset in both locations. On Broadwell, a 64-bit kernel may place BOs at "high" (> 4G) addresses. However, with a 32-bit userspace, the 'unsigned long offset' field will only be 32-bit, which is not large enough to hold this value. We need to use a proper uint64_t (like the kernel does). Technically, a lot of this code doesn't affect Broadwell, so we could leave it using the old field. But it makes sense to just switch to the new, properly typed field. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* s/Tungsten Graphics/VMware/José Fonseca2014-01-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tungsten Graphics Inc. was acquired by VMware Inc. in 2008. Leaving the old copyright name is creating unnecessary confusion, hence this change. This was the sed script I used: $ cat tg2vmw.sed # Run as: # # git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 | xargs -0 sed -i -f tg2vmw.sed # # Rename copyrights s/Tungsten Gra\(ph\|hp\)ics,\? [iI]nc\.\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./g /Copyright/s/Tungsten Graphics\(,\? [iI]nc\.\)\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./ s/TUNGSTEN GRAPHICS/VMWARE/g # Rename emails s/alanh@tungstengraphics.com/alanh@vmware.com/ s/jens@tungstengraphics.com/jowen@vmware.com/g s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/ s/jrfonseca\?@tungstengraphics.com/jfonseca@vmware.com/g s/keithw\?@tungstengraphics.com/keithw@vmware.com/g s/michel@tungstengraphics.com/daenzer@vmware.com/g s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/ s/zack@tungstengraphics.com/zackr@vmware.com/ # Remove dead links s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g # C string src/gallium/state_trackers/vega/api_misc.c s/"Tungsten Graphics, Inc"/"VMware, Inc"/ Reviewed-by: Brian Paul <brianp@vmware.com>
* i965: Drop trailing whitespace from the rest of the driver.Kenneth Graunke2013-12-051-5/+5
| | | | | | | Performed via: $ for file in *; do sed -i 's/ *//g'; done Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Tell the unit states how many binding table entries we have.Eric Anholt2013-11-051-1/+2
| | | | | | | | | | Before the series with 3c9dc2d31b80fc73bffa1f40a91443a53229c8e2 to dynamically assign our binding table indices, we didn't really track our binding table count per shader, so we never filled in these fields. Affects cairo-gl trace runtime by -2.47953% +/- 1.07281% (n=20) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Move data from brw->vs into a base class if gs will also need it.Paul Berry2013-08-311-11/+14
| | | | | | | | | This paves the way for sharing the code that will set up the vertex and geometry shader pipeline state. v2: Rename the base class to brw_stage_state. Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
* i965/vec4: Allow for dispatch_grf_start_reg to vary.Paul Berry2013-08-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both 3DSTATE_VS and 3DSTATE_GS have a dispatch_grf_start_reg control, which determines the register where the hardware delivers data sourced from the URB (push constants followed by per-vertex input data). For vertex shaders, we always set dispatch_grf_start_reg to 1, since R1 is always the first register available for push constants in vertex shaders. For geometry shaders, we'll need the flexibility to set dispatch_grf_start_reg to different values depending on the behvaiour of the geometry shader; if it accesses gl_PrimitiveIDIn, we'll need to set it to 2 to allow the primitive ID to be delivered to the thread in R1. This patch eliminates the assumption that dispatch_grf_start_reg is always 1. In vec4_visitor, we record the regnum that was passed to vec4_visitor::setup_uniforms() in prog_data for later use. In vec4_generator, we consult this value when converting an abstract UNIFORM register to a concrete hardware register. And in the code that emits 3DSTATE_VS, we set dispatch_grf_start_reg based on the value recorded in prog_data. This will allow us to set dispatch_grf_start_reg to the appropriate value when compiling geometry shaders. Vertex shaders will continue to always use a dispatch_grf_start_reg of 1. v2: Make dispatch_grf_start_reg "unsigned" rather than "GLuint". Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Upload separate per-stage sampler state tables.Kenneth Graunke2013-08-191-2/+2
| | | | | | | | | | | | | | Also upload separate sampler default/texture border color entries. At the moment, this is completely idiotic: both tables contain exactly the same contents, so we're simply wasting batch space and CPU time. However, soon we'll only upload data for textures actually /used/ in a particular stage, which will usually make the VS table empty and very likely eliminate all redundancy. This is just a stepping stone. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
* i965: Split sampler count variable to be per-stage.Kenneth Graunke2013-08-191-2/+2
| | | | | | | | | | | Currently, we only have a single sampler state table shared among all stages, so we just copy wm.sampler_count into vs.sampler_count. In the future, each shader stage will have its own SAMPLER_STATE table, at which point we'll need these separate sampler counts. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
* i965/vs: set up sampler state pointer for Gen4/5.Chris Forbes2013-07-311-6/+21
| | | | | | | | | | Fixes broken filter and lod selection for vertex texturing. (txs/txf only worked properly because they ignore the sampler state completely) Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: mesa-stable@lists.freedesktop.org
* i965: Gen4/5: use IEEE floating point mode for GLSL shaders.Chris Forbes2013-07-141-1/+8
| | | | | | | Fixes isinf(), isnan() from GLSL 1.30 Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Move intel_context::gen and gt fields to brw_context.Kenneth Graunke2013-07-091-3/+2
| | | | | | | | | | Most functions no longer use intel_context, so this patch additionally removes the local "intel" variables to avoid compiler warnings. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Move intel_context::is_<platform> flags to brw_context.Kenneth Graunke2013-07-091-1/+1
| | | | | | | Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Move intel_context::batch to brw_context.Kenneth Graunke2013-07-091-1/+1
| | | | | | | Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965/vs: split brw_vs_prog_data into generic and VS-specific parts.Paul Berry2013-04-111-6/+8
| | | | | | | | | | | This will allow the generic parts to be re-used for geometry shaders. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> v2: Put urb_read_length and urb_entry_size in the generic struct. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/vs: Remove support for the old parameter layout.Kenneth Graunke2012-11-011-11/+1
| | | | | | Only the old backend used it. Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Fix incorrect comment about single program flow on Ironlake.Kenneth Graunke2011-12-051-1/+1
| | | | | | | | The code forces single program flow to be enabled on Ironlake, or equivalently, disables multiple program flow. The comment was reversed. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Use 0 for the number of binding table entries in 3DSTATE_(VS|WM).Kenneth Graunke2011-11-101-6/+1
| | | | | | | | | | | | | | | | | | | | | | These fields control how many entries the hardware prefetches into the state cache, so they only impact performance, not correctness. However, it's not clear how to use this in a way that's beneficial. According to the documentation, kernels "using a large number" of entries may wish to program this to zero to avoid thrashing the cache; it's unclear how many is too many. Also, Ironlake's WM was missing this feature entirely---the count had to be zero. The dirty bit tracking to handle this complicates the surface state and binding table setup; removing it should simplify things and make future refactoring easier. So just set 0 for the number of entries rather than trying to compute and track it. Appears to have no impact on Nexuiz and OpenArena on Sandybridge. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>
* i965/gen4: Move unit state setup to emit() time.Eric Anholt2011-10-291-2/+2
| | | | | | | It is only needed in time for brw_psp_urb_cbs(), which is also an emit(). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Paul Berry <stereotype441@gmail.com>
* i965: Rename (vs|wm)_max_threads to max_(vs|wm)_threads for consistency.Kenneth Graunke2011-10-251-1/+1
| | | | | | | | | The inconsistency between vs_max_threads and max_vs_entries was rather annoying. I could never seem to remember which one was reversed, which made it harder to find quickly. "Max __ Threads" seems more natural. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965 new VS: don't share clip plane constants in pre-GEN6Paul Berry2011-09-281-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | In pre-GEN6, when using clip planes, both the vertex shader and the clipper need access to the client-supplied clip planes, since the vertex shader needs them to set the clip flags, and the clipper needs them to determine where to insert new vertices. With the old VS backend, we used a clever optimization to avoid placing duplicate copies of these planes in the CURBE: we used the same block of memory for both the clipper and vertex shader constants, with the clip planes at the front of it, and then we instructed the clipper to read just the initial part of this block containing the clip planes. This optimization was tricky, of dubious value, and not completely working in the new VS backend, so I've removed it. Now, when using the new VS backend, separate parts of the CURBE are used for the clipper and the vertex shader. Note that this doesn't affect the number of push constants available to the vertex shader, it simply causes the CURBE to occupy a few more bytes of URB memory. The old VS backend is unaffected. GEN6+, which does clipping entirely in hardware, is also unaffected. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/vs: Fix setup of scratch space pointer on pre-gen6.Eric Anholt2011-09-061-0/+10
| | | | | | | We were failing to relocate, so on the first draw run our scratch would tend to get written to 0x0. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Add remaining scratch space setup emit to unit states.Eric Anholt2011-08-161-0/+10
|
* i965: Add a type argument to brw_state_batch().Eric Anholt2011-07-111-1/+2
| | | | | | | | | I want to make brw_state_dump.c handle more than just the last statechange, so I want to keep track of what's in the batch state. By using AUB file numbering for most of these packets, this may be reusable for aub dumping. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/gen4: Fix GPU hangs since the program streaming change.Eric Anholt2011-07-091-1/+1
| | | | | | | | | | This was tricky. We were doing a use-before-initialize of grf_reg_count, but the value usually got overwritten anyway -- when we didn't have to do a relocation (typical), or on gen5 when we didn't have relocations at all. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38771 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/gen4: Remove old VS unit state key structure.Eric Anholt2011-06-201-12/+0
| | | | We're streaming VS state out now, not caching it.
* i965: Use state streaming on programs, and state base address on gen5+.Eric Anholt2011-06-181-9/+9
| | | | | | | | | | There will be a little bit of thrashing of the program cache BO as the cache warms up, but once the application is in steady state, this reduces relocations on gen5 and later. On my T420 laptop, cairogl firefox-talos-gfx performance improves 2.6% +/- 1.3% (n=6). No statistically significant performance difference on nexuiz (n=5).
* i965/gen4: Move VS state to state streaming.Eric Anholt2011-04-291-82/+48
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Apply a workaround for the Ironlake "vertex flashing".Eric Anholt2011-03-041-1/+8
| | | | | | | | | | | | | | This is an awful hack and will hurt performance on Ironlake, but we're at a loss as to what's going wrong otherwise. This is the only common variable we've found that avoids the problem on 4 applications (CelShading, gnome-shell, Pill Popper, and my GLSL demo), while other variables we've tried appear to only be confounding. Neither the specifications nor the hardware team have been able to provide any enlightenment, despite much searching. https://bugs.freedesktop.org/show_bug.cgi?id=29172 Tested by: Chris Lord <chris@linux.intel.com> (Pill Popper) Tested by: Ryan Lortie <desrt@desrt.ca> (gnome-shell)
* intel: Annotate debug printout checks with unlikely().Eric Anholt2010-11-031-1/+1
| | | | | | | This provides the optimizer with hints about code hotness, which we're quite certain about for debug printouts (or, rather, while we developers often hit the checks for debug printouts, we don't care about performance while doing so).
* Drop GLcontext typedef and use struct gl_context insteadKristian Høgsberg2010-10-131-1/+1
|