summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_performance_monitor.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Make room in the batch epilogue for three more pipe controls.Francisco Jerez2016-07-071-5/+5
| | | | | | | | | Review carefully, it sucks to have to keep track of the number of command packet dwords emitted in the batch epilogue manually. The MI_REPORT_PERF_COUNT_BATCH_DWORDS calculation was obviously wrong. Cc: "12.0 11.1 11.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* Remove wrongly repeated words in commentsGiuseppe Bilotta2016-06-231-1/+1
| | | | | | | | | | | | | | | | | Clean up misrepetitions ('if if', 'the the' etc) found throughout the comments. This has been done manually, after grepping case-insensitively for duplicate if, is, the, then, do, for, an, plus a few other typos corrected in fly-by v2: * proper commit message and non-joke title; * replace two 'as is' followed by 'is' to 'as-is'. v3: * 'a integer' => 'an integer' and similar (originally spotted by Jason Ekstrand, I fixed a few other similar ones while at it) Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Reviewed-by: Chad Versace <chad.versace@intel.com>
* i965: Use offset instead of index in brw_store_register_mem64Jordan Justen2016-05-041-3/+2
| | | | | | | | | | This matches the byte based offset of brw_load_register_mem*. The function is also moved into intel_batchbuffer.c like brw_load_register_mem*. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Add and use USED_BATCH macro.Matt Turner2015-07-151-3/+3
| | | | | | | The next patch will replace the .used field with an on-demand calculation of batchbuffer usage. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
* i965: Rename intel_emit* to reflect their new location in brw_pipe_controlChris Wilson2015-06-241-4/+4
| | | | | Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Don't write past the end of the application supplied bufferIan Romanick2015-03-091-7/+12
| | | | | | | | | | | | | | | | | | | | | | Both the AMD and Intel APIs provide a dataSize parameter, and this function would merrily ignore it. Neither API specifies what to do when the buffer isn't big enough. I take the easy route of writing all the complete bits of data that will fit. With more complete specs, we could probably do something different. I noticed this while looking into an unused parameter warning. The warning was actually useful! brw_performance_monitor.c: In function 'brw_get_perf_monitor_result': brw_performance_monitor.c:1261:37: warning: unused parameter 'data_size' [-Wunused-parameter] GLsizei data_size, ^ v2: Fix checks to include offset in the calculation. Noticed by Jan. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
* i965: Silence unused parameter warningIan Romanick2015-03-091-0/+1
| | | | | | | | | | | | | | All dd functions take a gl_context as the first parameter. Instead of removing it, just silence the warning. brw_performance_monitor.c: In function 'brw_new_perf_monitor': brw_performance_monitor.c:1354:41: warning: unused parameter 'ctx' [-Wunused-parameter] brw_new_perf_monitor(struct gl_context *ctx) ^ Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Carl Worth <cworth@cworth.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Silence many 'static' is not at beginning of declaration warningsIan Romanick2015-03-091-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | What a useful warning. #ThanksGCC brw_performance_monitor.c:153:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration] const static struct gl_perf_monitor_counter gen5_raw_chaps_counters[] = { ^ brw_performance_monitor.c:185:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration] const static int gen5_oa_snapshot_layout[] = ^ brw_performance_monitor.c:221:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration] const static struct gl_perf_monitor_group gen5_groups[] = { ^ brw_performance_monitor.c:240:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration] const static struct gl_perf_monitor_counter gen6_raw_oa_counters[] = { ^ brw_performance_monitor.c:281:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration] const static int gen6_oa_snapshot_layout[] = ^ brw_performance_monitor.c:317:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration] const static struct gl_perf_monitor_counter gen6_statistics_counters[] = { ^ brw_performance_monitor.c:332:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration] const static int gen6_statistics_register_addresses[] = { ^ brw_performance_monitor.c:346:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration] const static struct gl_perf_monitor_group gen6_groups[] = { ^ brw_performance_monitor.c:356:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration] const static struct gl_perf_monitor_counter gen7_raw_oa_counters[] = { ^ brw_performance_monitor.c:402:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration] const static int gen7_oa_snapshot_layout[] = ^ brw_performance_monitor.c:470:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration] const static struct gl_perf_monitor_counter gen7_statistics_counters[] = { ^ brw_performance_monitor.c:493:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration] const static int gen7_statistics_register_addresses[] = { ^ brw_performance_monitor.c:515:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration] const static struct gl_perf_monitor_group gen7_groups[] = { ^ Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Carl Worth <cworth@cworth.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* util: Move Mesa's bitset.h to util/.Eric Anholt2015-02-201-1/+1
| | | | Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* i965: Use safer pointer arithmetic in gather_oa_results()Chad Versace2014-12-221-1/+1
| | | | | | | | | | | | | | | | | | This patch reduces the likelihood of pointer arithmetic overflow bugs in gather_oa_results(), like the one fixed by b69c7c5dac. I haven't yet encountered any overflow bugs in the wild along this patch's codepath. But I get nervous when I see code patterns like this: (void*) + (int) * (int) I smell 32-bit overflow all over this code. This patch retypes 'snapshot_size' to 'ptrdiff_t', which should fix any potential overflow. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
* util: Move ralloc to a new src/util directory.Kenneth Graunke2014-08-041-1/+1
| | | | | | | | | | | | | | | | | | For a long time, we've wanted a place to put utility code which isn't directly tied to Mesa or Gallium internals. This patch creates a new src/util directory for exactly that purpose, and builds the contents as libmesautil.la. ralloc seemed like a good first candidate. These days, it's directly used by mesa/main, i965, i915, and r300g, so keeping it in src/glsl didn't make much sense. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> v2 (Jason Ekstrand): More realloc uses and some scons fixes Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* i965: Use unreachable() instead of unconditional assert().Matt Turner2014-07-011-3/+2
| | | | Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Check calloc return value in gather_statistics_results()Juha-Pekka Heikkila2014-06-261-1/+14
| | | | | | | | Check calloc return value and report on error, also later skip results handling if there was no memory to store results to. Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Use binary literals counter select.Matt Turner2014-05-151-2/+2
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Fix OACONTROL assertion failures on Ironlake.Kenneth Graunke2013-12-031-4/+8
| | | | | | | | | | | | | | | I guarded half of the callers to start/stop_oa_counters with generation checks, but missed the other half (which were added later). OACONTROL doesn't exist on Ironlake, so we better not write it. Also, there's no need---Ironlake's performance counters are always running. This patch moves the generation checks into start/stop_oa_counters, rather than requiring the caller to do them. Fixes assertion failures in Piglit's AMD_performance_monitor/measure. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Take "bookend" OA snapshots at the start/end of each batch.Kenneth Graunke2013-11-211-7/+357
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unfortunately, our hardware only has one set of aggregating performance counters shared between all 3D programs, and their values are not saved or restored by hardware contexts. Also, at least on Sandybridge and Ivybridge, the counters lose their values if the GPU goes to sleep. To work around both of these problems, we have to snapshot the performance counters at the beginning and end of each batch, similar to how we handle query objects on platforms that don't support hardware contexts. I call these "bookend" snapshots. Since there can be multiple performance monitors active at a time, we store the bookend snapshots in a global BO, shared by all monitors. For monitors that span multiple batches, acquiring results involves adding up three segments: BeginPerfMonitor --> End of Batch 1 ("head") Start of Batch 2 --> End of Batch 2 ... ("middle") Start of Batch N-1 --> End of Batch N-1 Start of Batch N --> EndPerfMonitor ("tail") Monitors that refer to bookend BO snapshots are considered "unresolved". We delay resolving them (and adding up deltas to obtain the results) as long as possible to avoid blocking on mapping monitor->oa_bo. We can also run out of space in the bookend BO, at which point we have to resolve all unresolved monitors. Then we can throw away the snapshots and begin writing at the beginning of the buffer. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Add some plumbing for gathering OA results.Kenneth Graunke2013-11-211-0/+91
| | | | | | | | | Currently, this only considers the monitor start and end snapshots. This is woefully insufficient, but allows me to add a bunch of the infrastructure now and flesh it out later. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Start and stop OA counters as necessary.Kenneth Graunke2013-11-211-0/+50
| | | | | | | | | | | | | We need to start OA at the beginning of each batch where monitors are active. OACONTROL isn't part of the hardware context, so to avoid leaving counters enabled for other applications, we turn them off at the end of the batch too. We also need to start them at BeginPerfMonitor time (unless they've already been started). We stop them when the monitor last ends as well. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Add functions to start and stop the OA counters.Kenneth Graunke2013-11-211-0/+42
| | | | | Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Take OA counter snapshots at Begin/EndPerfMonitor time.Kenneth Graunke2013-11-211-1/+37
| | | | | Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Add a function to emit the MI_REPORT_PERF_COUNT packet.Kenneth Graunke2013-11-211-0/+76
| | | | | | | | | | | | | MI_REPORT_PERF_COUNT writes a snapshot of the Observability Architecture counters to a buffer. Exactly how it works varies between generations: Ironlake requires two packets, Sandybridge has to use GGTT, and Ivybridge and later use PPGTT. v2: Assert that we didn't use more space than we reserved (suggested by Eric Anholt). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Track the number of monitors that need OA counters.Kenneth Graunke2013-11-211-1/+16
| | | | | | | | | Using the OA counters requires some per-batch work. When starting and ending a batch, it's useful to know whether any monitors are actually interested in OA data. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Enumerate Observability Architecture counters on Gen5+.Kenneth Graunke2013-11-211-0/+297
| | | | | | | | | | | | | | | In addition to listing the counter names, we include several "remap" tables. Confusingly, counters are documented with names like "A23", are written to some buffer offset other than 23, and exposed by core Mesa under a counter ID that is different still. The first is inevitable; MI_REPORT_PERF_COUNT writes certain counters to fixed locations in the buffer. The latter could be avoided, but core Mesa uses the "Counters" array index as the ID for a counter. We could do remapping there, but it would just complicate the core Mesa code. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Expose pipeline statistics registers via performance monitors.Kenneth Graunke2013-11-211-5/+141
| | | | | | | | | | | | | This is fairly simple: - At BeginPerfMonitor time, take an opening snapshot. - At EndPerfMonitor time, take a closing snapshot. - The first time the application asks for results, subtract the two and store that value. Then free the BO containing the snapshots. - On subsequent requests for the results, just return the saved value. - On reset, throw away the results. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Enumerate the pipeline statistics register counters on Gen6+.Kenneth Graunke2013-11-211-0/+78
| | | | | | | | | | | | For now, we only support these on Gen6+, since that's what currently uses hardware contexts. When we add Ironlake hardware context support, we can add pipeline statistics register support for that as well. In theory, we could support pipeline statistics counters even without hardware contexts, but it would be annoyingly painful. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Initialize performance monitor Groups/NumGroups.Kenneth Graunke2013-11-211-1/+35
| | | | | | | Since we don't support any counters, there are zero groups. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Add macros for creating performance monitor counters and groups.Kenneth Graunke2013-11-211-0/+26
| | | | | | | | | | The Observability Architecture counters are 32-bit unsigned values, and the Pipeline Statistics Register counters are 64-bit unsigned values. These convenience macros make it easy to create those types of counters. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Add basic driver hooks and plumbing for AMD_performance_monitor.Kenneth Graunke2013-11-211-0/+219
| | | | | | | These stub functions will be filled out in later patches. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* Revert "i965: Add support for GL_AMD_performance_monitor on Ironlake."Kenneth Graunke2013-11-071-391/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts most of commit 0f2da773070c06b6d20ad264d3abb19c4dfd9761. (I chose to leave the additions to brw_defines.h.) My previous Ironlake implementation was somewhat broken: counter data was global, rather than per-context. This meant that performance monitors captured data from your compositor, 2D driver, and other 3D programs. Originally, I believed that Sandybridge and later had an easy way to avoid this problem (setting per-context flags in OACONTROL), while Ironlake did not. So I'd intended to leave it as a known limitation of performance monitoring support on Ironlake. However, this turned out not to be true. Unfortunately, our hardware only has one set of aggregating performance counters shared between all 3D programs, and their values are not saved or restored by hardware contexts. Also, at least on Sandybridge and Ivybridge, the counters lose their values if the GPU goes to sleep. To work around both of these problems, we have to snapshot the performance counters at the beginning and end of each batch, similar to how we handle query objects on platforms that don't support hardware contexts. For occlusion queries, this batch bookending approach is fairly simple: only one occlusion query can be active at a time, and the result is a single integer. Performance monitors are more complex: an arbitrary number of monitors can be active at a time, each monitoring some subset of our ~30 observability counters. Individual monitors can be started and stopped at any point during the batch. Tracking where each monitor started/ended relative to batch flushes ends up being a pain. And you can run out of space in the buffer. Properly supporting this required some serious rearchitecting of the code. Rather than writing patches to try and morph a broken system into a working one (which operates quite differently), I decided it would be simplest to revert the old code and start fresh. Parts will look familiar, but other parts are new. I also decided it would be best to include Sandybridge and Ivybridge support from the start, since the newer platforms have added complexity that I wanted to make sure worked. They're also what most people care about these days. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Add support for GL_AMD_performance_monitor on Ironlake.Kenneth Graunke2013-09-261-0/+391
Ironlake's counters are always enabled; userspace can simply send a MI_REPORT_PERF_COUNT packet to take a snapshot of them. This makes it easy to implement. The counters are documented in the source code for the intel-gpu-tools intel_perf_counters utility. v2: Adjust for core data structure changes. Add a table mapping buffer object offsets to exposed counters (which changes each generation). Finally, add report ID assertions to sanity check the BO layout (thanks to Carl Worth). v3: Update for core BeginPerfMonitor hook changes (requested by Brian). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>