summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/intel_batchbuffer.h
Commit message (Collapse)AuthorAgeFilesLines
* i965: Roll intel_reg.h into brw_defines.hJason Ekstrand2016-08-191-1/+0
| | | | | | | | More than half of the stuff in intel_reg.h had nothing whatsoever to do with registers and really belongs in brw_defines.h anyway. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Make room in the batch epilogue for three more pipe controls.Francisco Jerez2016-07-071-5/+5
| | | | | | | | | Review carefully, it sucks to have to keep track of the number of command packet dwords emitted in the batch epilogue manually. The MI_REPORT_PERF_COUNT_BATCH_DWORDS calculation was obviously wrong. Cc: "12.0 11.1 11.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965: Don't inline intel_batchbuffer_require_space().Matt Turner2016-03-301-26/+2
| | | | | | | | | | | | It's called by the inline intel_batchbuffer_begin() function which itself is used in BEGIN_BATCH. So in sequence of code emitting multiple packets, we have inlined this ~200 byte function multiple times. Making it an out-of-line function presumably improved icache usage. Improves performance of Gl32Batch7 by 3.39898% +/- 0.358674% (n=155) on Ivybridge. Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
* i965: Work around L3 state leaks during context switches.Francisco Jerez2015-12-091-1/+5
| | | | | | | | | | | | | | | | | | This is going to require some rather intrusive kernel changes to fix properly, in the meantime (and forever on at least pre-v4.1 kernels) we'll have to restore the hardware defaults at the end of every batch in which the L3 configuration was changed to avoid interfering with the DDX and GL clients that use an older non-L3-aware version of Mesa. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> v2: Optimize look-up of the default configuration by assuming it's the first entry of the L3 config array in order to avoid an FPS regression in GpuTest Triangle and SynMark OglBatch2-7 on most affected platforms. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i915, i965: Silence unused parameter warnings in intel_batchbuffer_advanceIan Romanick2015-09-101-0/+2
| | | | | | | | | | | | | These only occurred in release builds, but they occurred in every file that included intel_batchbuffer.h. Lots of spam. :( intel_batchbuffer.h: In function 'intel_batchbuffer_advance': intel_batchbuffer.h:153:47: warning: unused parameter 'brw' [-Wunused-parameter] intel_batchbuffer_advance(struct brw_context *brw) ^ Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Optimize batchbuffer macros.Matt Turner2015-07-151-17/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | Previously OUT_BATCH was just a macro around an inline function which does brw->batch.map[brw->batch.used++] = dword; When making consecutive calls to intel_batchbuffer_emit_dword() the compiler isn't able to recognize that we're writing consecutive memory locations or that it doesn't need to write batch.used back to memory each time. We can avoid both of these problems by making a local pointer to the next location in the batch in BEGIN_BATCH(). Cuts 18k from the .text size. text data bss dec hex filename 4946956 195152 26192 5168300 4edcac i965_dri.so before 4928956 195152 26192 5150300 4e965c i965_dri.so after This series (including commit c0433948) improves performance of Synmark OglBatch7 by 8.01389% +/- 0.63922% (n=83) on Ivybridge. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
* i965: Add and use USED_BATCH macro.Matt Turner2015-07-151-3/+6
| | | | | | | The next patch will replace the .used field with an on-demand calculation of batchbuffer usage. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
* i965: Split batch emission from relocation functions.Matt Turner2015-07-151-16/+18
| | | | | | | So that everything writing to the batch between BEGIN_BATCH() and ADVANCE_BATCH() goes through OUT_BATCH. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
* i965: Set brw->batch.emit only #ifdef DEBUG.Matt Turner2015-07-091-1/+1
| | | | | | | | | | | | | | | It's only used inside #ifdef DEBUG. Cuts ~1.7k of .text, and more importantly prevents a larger code size regression in the next commit when the .used field is replaced and calculated on demand. text data bss dec hex filename 4945468 195152 26192 5166812 4ed6dc i965_dri.so before 4943740 195152 26192 5165084 4ed01c i965_dri.so after And surround the emit and total fields with #ifdef DEBUG to prevent such mistakes from happening again. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/hsw: Implement end of batch workaroundBen Widawsky2015-07-091-0/+4
| | | | | | | | | | | | | | | | | | | | This patch can cause an infinite recursion if the previous patch titled, "i965: Track finished batch state" isn't present (backporters take notice). v2: Sent out the wrong patch originally. This patches switches the order of flushes, doing the generic flush before the CC_STATE, and the required workaround flush afterwards v3: Only perform workaround for render ring Add text to the BATCH_RESERVE comments v4 (By Ken): Rebase; update citation to mention PRM and Wa name; combine two blocks. http://otc-mesa-ci.jf.intel.com/job/bwidawsk/171/ Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Reserve more batch space to accomodate Gen6 perfmonitors.Kenneth Graunke2015-07-061-2/+2
| | | | | | | | | | | | | | | Ben noticed that I said each PIPE_CONTROL was 4 DWords, but it's actually 5 DWords on Gen6-7. We've been reserving insufficient space for performance monitoring on Sandybridge, which means it would likely break if you used that functionality. (Thankfully, no one does...) Also, the existing number of 146 was the result of me flubbing up the arithmetic: it should have actually been 140. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
* i965: Transplant PIPE_CONTROL routines to brw_pipe_controlChris Wilson2015-06-241-10/+0
| | | | | | | | | Start trimming the fat from intel_batchbuffer.c. First by moving the set of routines for emitting PIPE_CONTROLS (along with the lore concerning hardware workarounds) to a separate brw_pipe_control.c Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Drop some more dead code from the old CACHED_BATCH feature.Eric Anholt2014-03-181-1/+0
| | | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Create a helper function for emitting PIPE_CONTROL writes.Kenneth Graunke2014-01-201-0/+3
| | | | | | | | | | | | | | | | | | | There are a lot of places that use PIPE_CONTROL to write a value to a buffer (either an immediate write, TIMESTAMP, or PS_DEPTH_COUNT). Creating a single function to do this seems convenient. As part of this refactor, we now set the PPGTT/GTT selection bit correctly on Gen7+. Previously, we set bit 2 of DW2 on all platforms. This is correct for Sandybridge, but actually part of the address on Ivybridge and later! Broadwell will also increase the length of these packets by 1; with the refactoring, we should have to adjust that in substantially fewer places, giving us confidence that we've hit them all. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Create a helper function for emitting PIPE_CONTROL flushes.Kenneth Graunke2014-01-201-0/+1
| | | | | | | | | | | | | | | These days, we need to emit PIPE_CONTROL flushes all over the place. Being able to do that via a single function call seems convenient. Broadwell will also increase the length of these packets by 1; with the refactoring, we should have to do this in substantially fewer places. v2: Add back forgotten intel_emit_post_sync_nonzero_flush (caught by Eric Anholt). Drop unlikely() from BLT_RING check. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Introduce an OUT_RELOC64 macro.Kenneth Graunke2014-01-201-0/+10
| | | | | | | | | Broadwell uses 48-bit addresses. The first DWord is the low 32 bits, and the second DWord is the high 16 bits. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Delete intel_batchbuffer_emit_reloc_fenced.Kenneth Graunke2014-01-201-5/+0
| | | | | | | | Nothing in i965 uses it. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Remove CACHED_BATCH support altogether.Kenneth Graunke2014-01-171-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | Using an unoptimized variant of glamor spending 50% of its CPU time in brw_draw_prims() (and hitting the cache *very* frequently): N Min Max Median Avg Stddev x 200 29200 40500 34900 34750 958.43256 + 200 31000 40300 34700 34622 916.35941 No difference proven at 95.0% confidence Similarly, no difference on GLB2.7: N Min Max Median Avg Stddev x 63 64.1 71.36 70.69 70.113175 1.6782026 + 63 63.6 71.18 70.75 70.223651 1.6044186 No difference proven at 95.0% confidence v2: Rebase on master (by anholt) v3: Add a missing BEGIN_BATCH(3) to aa_line_parameters -- CACHED_BATCH didn't have the asserts about batchbuffer usage that ADVANCE_BATCH does, so we started assertion failing. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Replace OUT_RELOC_FENCED with OUT_RELOC.Kenneth Graunke2013-12-091-4/+0
| | | | | | | | | | | | | | On Gen4+, OUT_RELOC_FENCED is equivalent to OUT_RELOC; libdrm silently ignores the fenced flag: /* We never use HW fences for rendering on 965+ */ if (bufmgr_gem->gen >= 4) need_fence = false; Thanks to Eric for noticing this. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Replace non-standard INLINE macro with "inline".Kenneth Graunke2013-12-051-7/+7
| | | | | | These are identical: main/compiler.h defines INLINE to "inline". Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Reserve batchbuffer space for a closing MI_REPORT_PERF_COUNT.Kenneth Graunke2013-11-211-1/+7
| | | | | | | | | | | | | | | | | | | In order to use the Observability Architecture effectively, we'll need to take snapshots of the OA counters via MI_REPORT_PERF_COUNT at the start and end of each batch. Experimentation reveals that we need to flush before and after each MI_REPORT_PERF_COUNT to get working values. For simplicitly, I chose to use intel_batchbuffer_emit_mi_flush(), which unfortunately expands to triple pipe controls on Sandybridge. We may want to start computing per-generation reserved batch space to avoid the insanity of Sandybridge's PIPE_CONTROL cost. That said, much of this cost existed before I rewrote the query object support to use hardware contexts, so it's at least not entirely new. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Start and stop OA counters as necessary.Kenneth Graunke2013-11-211-1/+2
| | | | | | | | | | | | | We need to start OA at the beginning of each batch where monitors are active. OACONTROL isn't part of the hardware context, so to avoid leaving counters enabled for other applications, we turn them off at the end of the batch too. We also need to start them at BeginPerfMonitor time (unless they've already been started). We stop them when the monitor last ends as well. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Introduce a "render ring prelude" hook.Kenneth Graunke2013-11-211-0/+5
| | | | | | | | | | The new intel_batchbuffer_emit_render_ring_prelude() hook will be called when switching from BLT or UNKNOWN_RING to RENDER_RING. This provides a place to emit state that should go at the start of each render ring batch, with minimal overhead. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Introduce an UNKNOWN_RING state.Kenneth Graunke2013-11-211-3/+8
| | | | | | | | | | | | | | | | | | | When we first create a batch buffer, it's empty. We don't actually know what ring it will be targeted at until the first BEGIN_BATCH or BEGIN_BATCH_BLT macro. Previously, one could determine the state of the batch by checking brw->batch.ring (blit vs. render) and brw->batch.used != 0 (known vs. unknown). This should be functionally equivalent, but the tri-state enum is a bit clearer. v2: Catch three explicit require_space callers (thanks to Carl and Eric). v3: Split the boolean -> enum change from the UNKNOWN_RING change. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Convert brw->batch.is_blit to a BLT_RING/RENDER_RING enum.Kenneth Graunke2013-11-211-9/+11
| | | | | | | | Passing BLT_RING or RENDER_RING to batchbuffer functions is a lot more obvious than passing true or false. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965/gen7: Emit workaround flush when changing GS enable state.Paul Berry2013-11-181-0/+1
| | | | | | | | | | v2: Don't go to extra work to avoid extraneous flushes. (Previous experiments in the kernel have suggested that flushing the pipeline when it is already empty is extremely cheap). Cc: "10.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Add missing state reset at the end of blorp.Eric Anholt2013-08-301-0/+1
| | | | | | | | | | | | | These are things that happen to be occurring because of the batch flush at the start of the blorp op (which exists to prevent batch space or aperture space overflow), but the intention was for this sequence of state resets at the end of blorp to be everything necessary for the next draw call. Found when debugging the next commit, by comparing brw_new_batch() and intel_batchbuffer_reset() to brw_blorp_exec(). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
* i965: Delete the BATCH_LOCALS macro.Kenneth Graunke2013-08-011-4/+0
| | | | | | | | | This hasn't done anything in a long time, and it's only used in a couple places...which means we couldn't use it without doing a bunch of work anyway. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Move intel_context::gen and gt fields to brw_context.Kenneth Graunke2013-07-091-2/+1
| | | | | | | | | | Most functions no longer use intel_context, so this patch additionally removes the local "intel" variables to avoid compiler warnings. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Move intel_context::batch to brw_context.Kenneth Graunke2013-07-091-12/+8
| | | | | | | Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Pass brw_context to functions rather than intel_context.Kenneth Graunke2013-07-091-35/+38
| | | | | | | | | | | | | | This makes brw_context available in every function that used intel_context. This makes it possible to start migrating fields from intel_context to brw_context. Surprisingly, this actually removes some code, as functions that use OUT_BATCH don't need to declare "intel"; they just use "brw." Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Replace #include "intel_context.h" with brw_context.h.Kenneth Graunke2013-07-091-1/+1
| | | | | | | | | | | brw_context.h includes intel_context.h, but additionally makes the brw_context structure available. Switching this allows us to start using brw_context in more places. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Replace maxBatchSize variable with BATCH_SZ define.Kenneth Graunke2013-07-031-1/+1
| | | | | | | | maxBatchSize was only ever initialized to BATCH_SZ, and a few places used BATCH_SZ directly anyway. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
* i965: Move the remaining intel code to the i965 directory.Eric Anholt2013-06-261-0/+173
| | | | | | | | | Now that i915's forked off, they don't need to live in a shared directory. Acked-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chad Versace <chad.versace@linux.intel.com> Acked-by: Adam Jackson <ajax@redhat.com> (and I hear second hand that idr is OK with it, too)
* [965] Convert the driver to dri_bufmgr interface and enable TTM.Eric Anholt2007-12-071-133/+0
| | | | | | | | | | | | | This is currently believed to work but be a significant performance loss. Performance recovery should be soon to follow. The dri_bo_fake_disable_backing_store() call was added to allow backing store disable like bufmgr_fake.c did, which is a significant performance win (though it's missing the no-fence-subdata part). This commit is a squash merge of the 965-ttm branch, which had some history I wanted to avoid pulling due to noisiness and brokenness at many points for git-bisecting.
* Replace bmBufferOffset usage in batchbuffer setup with OUT_RELOC.Eric Anholt2007-10-041-0/+6
| | | | This is in preparation for 965 TTM.
* Revert "WIP 965 conversion to dri_bufmgr."Eric Anholt2007-09-271-2/+2
| | | | | | | This reverts commit b2f1aa2389473ed09170713301b042661d70a48e. Somehow I ended up with my branch's save-this-while-I-work-on-master commit actually on master.
* WIP 965 conversion to dri_bufmgr.Eric Anholt2007-09-271-2/+2
|
* Use unsigned long batchbuffer offset, fixes x64 warnings.Keith Whitwell2006-10-131-1/+1
|
* Cope with memory pool fragmentation by allowing a second attempt atKeith Whitwell2006-09-071-2/+2
| | | | | | | | rendering operations to take place after evicting all resident buffers. Cope better with memory allocation failures throughout the driver and improve tracking of failures.
* Add Intel i965G/Q DRI driver.Eric Anholt2006-08-091-0/+127
This driver comes from Tungsten Graphics, with a few further modifications by Intel.