| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a regression introduced in
1da704a94c57aa0b0cf8faaa3236fe47dfb8f88c because the offset has moved
from 0x600 to 0x620, and the kernels used for reading MP perf counters
have to be re-assembled.
This also fixes amd_performance_monitor_measure piglit.
Fixes: 1da704a ("nvc0: increase the tex handles area size in the driver")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
| |
This might avoid mistakes if the size is bumped in the future.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To read out MP perf counters we use a compute shader and need to upload
input data like a 64-bits addr used to store the values and a sequence
ID for synchronization. Currently, this input data is uploaded as user
uniforms which means that it's sticked to c0[], but if a compute shader
from a real application is used, monitoring those performance counters
will just overwrite some data and miserably crash.
Instead, sticking the 64-bits addr and the sequence into the driver
constant buffer seems like much better and will allow to monitor
counters with GL 4.3 apps.
Tested on GF119 and GK110, but should not hurt anything on GK104.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
| |
The GALLIUM_HUD does not yet expose a description for each events, but
this might be useful for developers who want to have a long description
of hw perf counters directly in the source code.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
| |
This might be useful to avoid breaking the current compute state when
monitoring MP perf counters because we use a compute kernel to read out
those counters. This has been initially suggested by Ilia Mirkin.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
|
|
|
|
|
|
|
|
|
| |
Because compute support is not enabled by default for these chipsets,
NVF0_COMPUTE=1 needs to be used, along with GALLIUM_HUD to enable
performance counters.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
| |
This is really verbose but most of the configuration will be reused
for SM35 (GK110).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
| |
This mainly improves how we define the different list of queries.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This introduces pipe_grid_info which contains all information to
describe a launch_grid call. This will be used to implement indirect
compute in the same fashion as indirect draw.
Changes from v2:
- correctly initialize pipe_grid_info for nv50/nvc0
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
| |
v2. update for libdrm nouveau_drm::lib_version removal
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
|
|
|
|
|
|
|
|
|
| |
The destroy_query() helper was actually never called. This fixes
a memory leak while monitoring performance metrics.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
|
|
|
|
|
|
| |
I forgot to remove it when I refactored all performance metrics.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
|
|
|
|
|
|
| |
Those bits were related to old performance metrics support.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
|
|
|
|
|
|
|
| |
These performance metrics will be re-introduced in an upcoming
patch that will follow the same design as Fermi.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
|
|
|
|
|
|
|
| |
inst_issued is performance metric not a hardware event on Kepler (SM30).
It will be re-introduced in an upcoming patch.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
|
|
|
|
|
|
|
| |
SM30 is the compute capability version for GK104/GK106/GK107.
This also introduces a new signal group selection called UNK0F.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
|
|
|
|
|
|
|
| |
No need to allocate more GPR than used in the compute kernel which
reads MP performance counters on Fermi.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
|
|
|
|
|
|
|
|
|
| |
MP counters on GF100/GF110 (compute capability 2.0) are buggy
because there is a context-switch problem that we need to fix.
Results might be wrong sometimes, be careful!
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
| |
GF100 and GF110 chipsets are compute capability 2.0, while the other
Fermi chipsets are compute capability 2.1. That's why, some MP counters
are different between these chipsets and we need to handle variants.
Signed-off-by: Samuel Pitoiet <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
| |
This will help for handling HW SM queries variants on Fermi.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a card has more than one GPC, the grid used by the compute
kernel which reads MP performance counters seems to be too small.
The consequence is that the kernel is not launched on all TPCs.
Increasing the grid size using the number of GPCs now launches
enough blocks and we can read MP performance counters of all TPCs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Memory access have to be aligned to 128-bits. Note that this
doesn't happen when the card only has TPC.
This patch fixes the following dmesg fail:
gr: GPC0/TPC1/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 000f
[UNALIGNED_MEM_ACCESS]
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
| |
For strange reasons, the signal id depends on the slot selected on Fermi
but not on Kepler. Fortunately, the signal ids are just offseted by the
slot id!
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Queries which use more than one MP counters was misconfigured and
computing the final result was also wrong because sources need to
be configured on different hardware counters instead.
According to the blob, computing the result is now as follows:
FOR i..n
val += ctr[i] * pow(2, i)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
| |
On Fermi, we have one domain of 8 MP counters while we have
two domains of 4 MP counters on Kepler.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
| |
Sequence fields are located at MP[i] + 0x20 in the buffer object.
This is used to check if result is available for MP[i].
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
| |
Writing 0x408000 to 0x419e00 (like on Kepler) has no effect on Fermi
because we only have one domain of 8 counters. Instead, we have to
write 0x80000000.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
| |
Writing 0x1fcb to 0x419eac is definitely not related to MP counters and
has no effect on Fermi (although this enables MP counters on Kepler).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
| |
The way we configure MP performance counters is going to pretty
different between Fermi and Kepler. Having two separate functions
is much better.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
Global performance counters (PCOUNTER) will be added to
nvc0_query_hw_pm.c/h files.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
|