summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_gs_state.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Only emit 1 viewport when possible.Kenneth Graunke2016-10-031-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | In core profile, we support up to 16 viewports. However, in the majority of cases, only 1 of them is actually used - we only need the others if the last shader stage prior to the rasterizer writes gl_ViewportIndex. Processing all 16 viewports adds additional CPU overhead, which hurts CPU-intensive workloads such as Glamor. This meant that switching to core profile actually penalized Glamor to an extent, which is unfortunate. This patch tracks the number of relevant viewports, switching between 1 and ctx->Const.MaxViewports if gl_ViewportIndex is written. A new BRW_NEW_VIEWPORT_COUNT flag tracks this. This could mean re-emitting viewport state when switching, but hopefully this is offset by doing 1/16th of the work in the common case. The new flag is also lighter weight than BRW_NEW_VUE_MAP_GEOM_OUT, which we were using in one case. According to Eric Anholt, x11perf -copypixwin10 performance improves by 11.5094% +/- 3.10841% (n=10) on his Skylake. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Make all atoms to track BRW_NEW_BLORP by defaultKenneth Graunke2016-04-231-0/+1
| | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com
* i965/state: Don't use brw->state.dirty.brwJordan Justen2015-03-311-1/+1
| | | | | | | | | | | | | | | Now, we only use ctx->NewDriverState. I used this bash & sed command in the i965 directory: for file in *.[ch] *.[ch]pp; do sed -i -e 's/state\.dirty\.brw/ctx.NewDriverState/g' $file done Followed by manual changes to brw_state_upload.c. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Move BRW_NEW_*_PROG_DATA flags to .brw (not .cache).Kenneth Graunke2014-12-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | I put the BRW_NEW_*_PROG_DATA flags at the beginning so that brw_state_cache.c can still continue using 1 << brw_cache_id. I also added a comment explaining the difference between BRW_NEW_*_PROG_DATA and BRW_NEW_*_PROGRAM, as it took me a long time to remember it. Non-mechanical changes: - brw_state_cache.c and brw_ff_gs.c now signal .brw, not .cache. - brw_state_upload.c - INTEL_DEBUG=state changes. - brw_context.h - bit definition merging. v2: Correct the explanation of BRW_NEW_*_PROG_DATA to mention state-based recompiles, and nix the "proper subset" claim, as it's false. (Caught by Kristian Høgsberg). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Rename CACHE_NEW_*_PROG to BRW_NEW_*_PROG_DATA.Kenneth Graunke2014-12-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we've moved a bunch of CACHE_NEW_* bits to BRW_NEW_*, the only ones that are left are legitimately related to the program cache. Yet, it seems a bit wasteful to have an entire bitfield for only 7 bits. State upload is one of the hottest paths in the driver. For each atom in the list, we call check_state() to see if it needs to be emitted. Currently, this involves comparing three separate bitfields (mesa, brw, and cache). Consolidating the brw and cache bitfields would save a small amount of CPU overhead per atom. Broadwell, for example, has 57 state atoms, so this small savings can add up. CACHE_NEW_*_PROG covers the brw_*_prog_data structures, as well as the offset into the program cache BO (prog_offset). Since most uses refer to brw_*_prog_data, I decided to use BRW_NEW_*_PROG_DATA as the name. Removing "cache" completely is a bit painful, so I decided to do it in several patches for easier review, and to separate mechanical changes from manual ones. This one simply renames things, and was made via: $ for file in *.[ch]; do sed -i -e 's/CACHE_NEW_\([A-Z_\*]*\)_PROG/BRW_NEW_\1_PROG_DATA/g' \ -e 's/BRW_NEW_WM_PROG_DATA/BRW_NEW_FS_PROG_DATA/g' $file done Note that BRW_NEW_*_PROG_DATA is still in .cache, not .brw! The next patch will remedy this flaw. It will also fix the alphabetization issues. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Acked-by: Matt Turner <mattst88@gmail.com>
* i965: Combine CACHE_NEW_*_UNIT into BRW_NEW_GEN4_UNIT_STATE.Kenneth Graunke2014-11-291-1/+1
| | | | | | | | | | | | | | | | | | | | | On Gen4-5, unit state is specified as indirect state, rather than commands. If any unit state changes, we upload it via brw_state_batch and arrange for 3DSTATE_PIPELINED_POINTERS to be re-emitted, which updates pointers to all unit state at once. Since there's only one command and state atom (brw_psp_urb_cs) that needs to know about this, there's no benefit to having six separate flags. We can combine CACHE_NEW_*_UNIT into a single flag. We also haven't cached these in a long time, so it doesn't make sense to use the "CACHE_NEW_" prefix. Instead, use the "BRW_NEW_" prefix. This also saves 12 * sizeof(void *) bytes of memory per context, as we remove useless aux_compare/aux_free functions for each CACHE bit. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Alphabetize brw_tracked_state flags and use a consistent style.Kenneth Graunke2014-11-291-4/+4
| | | | | | | | | | | | | | | | Most of the dirty flags were listed in some arbitrary order. Some used bonus parenthesis. Some put multiple flags on one line, others put one per line. Some used tabs instead of spaces...but only on some lines. This patch settles on one flag per line, in alphabetical order, using spaces instead of tabs, and sheds the unnecessary parentheses. Sorting was mostly done with vim's visual block feature and !sort, although I alphabetized short lists by hand; it was pretty manual. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* Revert 5 i965 patches: 8e27a4d2, 373143ed, c5bdf9be, 6f56e142, 88e3d404Jordan Justen2014-09-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | Reverts * "i965: Modify state upload to allow 2 different sets of state atoms." 8e27a4d2b3e4e74e9a77446bce49607433d86be3 * "i965: Modify dirty bit handling to support 2 pipelines." 373143ed9187c4d4ce1e3c486b5dd0880d18ec8b * "i965: Create a macro for checking a dirty bit." c5bdf9be1eca190417998d548fd140c1eca37a54 Conflicts: src/mesa/drivers/dri/i965/brw_context.h * "i965: Create a macro for setting all dirty bits." 6f56e1424d923fd80c84090fbf4506c9eaaffea1 Conflicts: src/mesa/drivers/dri/i965/brw_blorp.cpp src/mesa/drivers/dri/i965/brw_state_cache.c src/mesa/drivers/dri/i965/brw_state_upload.c * "i965: Create a macro for setting a dirty bit." 88e3d404dad009d8cff5124cf8acee7daeaceb64 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Create a macro for setting a dirty bit.Paul Berry2014-09-011-1/+1
| | | | | | | This will make it easier to extend dirty bit handling to support compute shaders. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Set the maximum VPIndexIan Romanick2014-01-201-0/+2
| | | | | | | | | | | | | At various stages the hardware clamps the gl_ViewportIndex to these values. Setting them to zero effectively makes gl_ViewportIndex be ignored. This is acutally useful in blorp (so that we don't have to modify all of the viewport / scissor state). v2: Use INTEL_MASK to create GEN6_CLIP_MAX_VP_INDEX_MASK. Suggested by Ken. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* s/Tungsten Graphics/VMware/José Fonseca2014-01-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tungsten Graphics Inc. was acquired by VMware Inc. in 2008. Leaving the old copyright name is creating unnecessary confusion, hence this change. This was the sed script I used: $ cat tg2vmw.sed # Run as: # # git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 | xargs -0 sed -i -f tg2vmw.sed # # Rename copyrights s/Tungsten Gra\(ph\|hp\)ics,\? [iI]nc\.\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./g /Copyright/s/Tungsten Graphics\(,\? [iI]nc\.\)\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./ s/TUNGSTEN GRAPHICS/VMWARE/g # Rename emails s/alanh@tungstengraphics.com/alanh@vmware.com/ s/jens@tungstengraphics.com/jowen@vmware.com/g s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/ s/jrfonseca\?@tungstengraphics.com/jfonseca@vmware.com/g s/keithw\?@tungstengraphics.com/keithw@vmware.com/g s/michel@tungstengraphics.com/daenzer@vmware.com/g s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/ s/zack@tungstengraphics.com/zackr@vmware.com/ # Remove dead links s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g # C string src/gallium/state_trackers/vega/api_misc.c s/"Tungsten Graphics, Inc"/"VMware, Inc"/ Reviewed-by: Brian Paul <brianp@vmware.com>
* i965: Drop trailing whitespace from the rest of the driver.Kenneth Graunke2013-12-051-5/+5
| | | | | | | Performed via: $ for file in *; do sed -i 's/ *//g'; done Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: rename legacy gs structs and functions to ff_gs.Paul Berry2013-08-311-8/+9
| | | | | | | | "ff" is for "fixed function". This frees up the name "gs" to refer to user-defined geometry shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
* i965: Move intel_context::gen and gt fields to brw_context.Kenneth Graunke2013-07-091-2/+1
| | | | | | | | | | Most functions no longer use intel_context, so this patch additionally removes the local "intel" variables to avoid compiler warnings. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965/gen4: Move unit state setup to emit() time.Eric Anholt2011-10-291-2/+2
| | | | | | | It is only needed in time for brw_psp_urb_cbs(), which is also an emit(). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Paul Berry <stereotype441@gmail.com>
* i965: Add a type argument to brw_state_batch().Eric Anholt2011-07-111-1/+2
| | | | | | | | | I want to make brw_state_dump.c handle more than just the last statechange, so I want to keep track of what's in the batch state. By using AUB file numbering for most of these packets, this may be reusable for aub dumping. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use state streaming on programs, and state base address on gen5+.Eric Anholt2011-06-181-10/+9
| | | | | | | | | | There will be a little bit of thrashing of the program cache BO as the cache warms up, but once the application is in steady state, this reduces relocations on gen5 and later. On my T420 laptop, cairogl firefox-talos-gfx performance improves 2.6% +/- 1.3% (n=6). No statistically significant performance difference on nexuiz (n=5).
* i965/gen4: Move the GS state to state streaming.Eric Anholt2011-04-291-87/+40
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* intel: Annotate debug printout checks with unlikely().Eric Anholt2010-11-031-1/+1
| | | | | | | This provides the optimizer with hints about code hotness, which we're quite certain about for debug printouts (or, rather, while we developers often hit the checks for debug printouts, we don't care about performance while doing so).
* intel: Convert remaining dri_bo_emit_reloc to drm_intel_bo_emit_reloc.Eric Anholt2010-06-081-5/+3
| | | | | The new API makes so much more sense, I'd like to forget how the old one worked.
* intel: Change dri_bo_* to drm_intel_bo* to consistently use new API.Eric Anholt2010-06-081-3/+3
| | | | | The slightly less mechanical change of converting the emit_reloc calls will follow.
* intel: Clean up chipset name and gen num for IronlakeZhenyu Wang2010-04-211-1/+1
| | | | | | | | | Rename old IGDNG to Ironlake, and set 'gen' number for Ironlake as 5, so tracking the features with generation num instead of special is_ironlake flag. Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
* Merge branch 'mesa_7_7_branch'Brian Paul2010-01-251-1/+0
|\ | | | | | | | | | | | | | | | | | | | | Conflicts: src/mesa/drivers/dri/intel/intel_screen.c src/mesa/drivers/dri/intel/intel_swapbuffers.c src/mesa/drivers/dri/r300/r300_emit.c src/mesa/drivers/dri/r300/r300_ioctl.c src/mesa/drivers/dri/r300/r300_tex.c src/mesa/drivers/dri/r300/r300_texstate.c
| * i965: Remove unnecessary headers.Vinson Lee2010-01-221-1/+0
| |
* | i965: Allow for variable-sized auxdata in the state cache.Eric Anholt2010-01-191-2/+1
| | | | | | | | | | | | Everything has been constant-sized until now, but constant buffer handling changes will make us want some additional variable sized array.
* | intel: Replace IS_IGDNG checks with intel->is_ironlake or needs_ff_sync.Eric Anholt2009-12-221-1/+2
|/ | | | Saves ~480 bytes of code.
* i965: Add support for 2 threads in the GS.Eric Anholt2009-09-041-1/+4
| | | | | This brings noop vertex shader throughput from 6.8M verts/sec to 10.4M verts/sec using GL_QUADs on my GM45.
* i965: add support for new chipsetsXiang, Haihao2009-07-131-0/+3
| | | | | | | | | | 1. new PCI ids 2. fix some 3D commands on new chipset 3. fix send instruction on new chipset 4. new VUE vertex header 5. ff_sync message (added by Zou Nan Hai <nanhai.zou@intel.com>) 6. the offset in JMPI is in unit of 64bits on new chipset 7. new cube map layout
* mesa: added "main/" prefix to includes, remove some -I paths from ↵Brian Paul2008-09-181-1/+1
| | | | Makefile.template
* intel: track bufmgr move to libdrm_intel and bufmgr_fake irq emit/wait change.Eric Anholt2008-09-101-5/+5
|
* Revert "Revert "Merge branch 'drm-gem'""Dave Airlie2008-08-241-7/+6
| | | | This reverts commit 7c81124d7c4a4d1da9f48cbf7e82ab1a3a970a7a.
* Revert "Merge branch 'drm-gem'"Dave Airlie2008-08-241-6/+7
| | | | | | | | This reverts commit 53675e5c05c0598b7ea206d5c27dbcae786a2c03. Conflicts: src/mesa/drivers/dri/i965/brw_wm_surface_state.c
* intel-gem: Update to new check_aperture API for classic mode.Eric Anholt2008-08-081-2/+1
| | | | | | To do this, I had to clean up some of 965 state upload stuff. We may end up over-emitting state in the aperture overflow case, but that should be rare, and I'd rather have the simplification of state management.
* [intel-gem] Chase domain flag renaming in the DRM.Eric Anholt2008-06-111-1/+1
| | | | This is an API breakage only.
* [intel] Convert drivers to using libdrm bufmgr code.Eric Anholt2008-06-031-5/+5
|
* GEM: Make dri_emit_reloc take GEM domain flags instead of TTM flags.Eric Anholt2008-05-071-1/+1
| | | | | | The GEM flags are much more descriptive for what we need. Since this makes bufmgr_fake rather device-specific, move it to the intel common directory. We've wanted to do device-specific stuff to it before.
* i965: initial attempt at fixing the aperture overflowDave Airlie2008-04-181-2/+3
| | | | | | | | | Makes state emission into a 2 phase, prepare sets things up and accounts the size of all referenced buffer objects. The emit stage then actually does the batchbuffer touching for emitting the objects. There is an assert in dri_emit_reloc if a reloc occurs for a buffer that hasn't been accounted yet.
* [intel] Convert relocations to not be cleared out on buffer submit.Eric Anholt2008-01-031-19/+17
| | | | | | | | We have two consumers of relocations. One is static state buffers, which want the same relocation every time. The other is the batchbuffer, which gets thrown out immediately after submit. This lets us reduce repeated computation for static state buffers, and clean up the code by moving relocations nearer to where the state buffer is computed.
* [965] Convert GS unit to use a cache key instead of brw_cache_data.Eric Anholt2008-01-021-23/+62
|
* [965] Replace the state cache suballocator with direct dri_bufmgr use.Eric Anholt2007-12-141-4/+20
| | | | | | | | | | | | | | | | | | | | | | | The user-space suballocator that was used avoided relocation computations by using the general and surface state base registers and allocating those types of buffers out of pools built on top of single buffer objects. It also avoided calls into the buffer manager for these small state allocations, since only one buffer object was being used. However, the buffer allocation cost appears to be low, and with relocation caching, computing relocations for buffers is essentially free. Additionally, implementing the suballocator required a don't-fence-subdata flag to disable waiting on buffer maps so that writing new data didn't block on rendering using old data, and careful handling when mapping to update old data (which we need to do for unavoidable relocations with FBOs). More importantly, when the suballocator filled, it had no replacement algorithm and just threw out all of the contents and forced them to be recomputed, which is a significant cost. This is the first step, which just changes the buffer type, but doesn't yet improve the hash table to not result in full recompute on overflow. Because the buffers are all allocated out of the general buffer allocator, we can no longer use the general/surface state bases to avoid relocations, and they are set to 0 instead.
* [965] Replace various alignment code with a shared ALIGN() macro.Eric Anholt2007-10-041-1/+2
| | | | | | | | In the process, fix some alignment issues: - Scratch space allocation was aligned into units of 1KB, while the allocation wanted units of bytes, so we never allocated enough space for scratch. - GRF register count was programmed as ALIGN(val - 1, 16) / 16 instead of ALIGN(val, 16) / 16 - 1, which overcounted for val != 16n+1.
* Add Intel i965G/Q DRI driver.Eric Anholt2006-08-091-0/+89
This driver comes from Tungsten Graphics, with a few further modifications by Intel.