summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/gen7_gs_state.c
diff options
context:
space:
mode:
authorFrancisco Jerez <currojerez@riseup.net>2015-08-13 15:02:05 +0300
committerFrancisco Jerez <currojerez@riseup.net>2015-10-09 17:48:26 +0300
commit5346c1167064d6429c6338974c6342f8346fd34b (patch)
treec1a4f16a886c489c4b634e580b6d2c5d27e59865 /src/mesa/drivers/dri/i965/gen7_gs_state.c
parentbb59ba8634e3e5e3949103c6013918b8a4953111 (diff)
downloadexternal_mesa3d-5346c1167064d6429c6338974c6342f8346fd34b.zip
external_mesa3d-5346c1167064d6429c6338974c6342f8346fd34b.tar.gz
external_mesa3d-5346c1167064d6429c6338974c6342f8346fd34b.tar.bz2
i965: Don't tell the hardware about our UAV access.
The hardware documentation relating to the UAV HW-assisted coherency mechanism and UAV access enable bits is scarce and sometimes contradictory, and there's quite some guesswork behind this commit, so let me summarize the background first: HSW and later hardware have infrastructure to support a stricter form of data coherency between shader invocations from separate primitives. The mechanism is controlled by the "Accesses UAV" bits on 3DSTATE_VS, _HS, _DS, _GS and _PS (or _PS_EXTRA on BDW+), and the "UAV Coherency Required" bit on the 3DPRIMITIVE command. Regardless of whether "UAV Coherency Required" is set, the hardware fixed-function units will increment a per-stage semaphore for each request received if "Accesses UAV" is set for the same or any lower stage. An implicit DC flush is emitted by the lowermost stage with "Accesses UAV" set once it's done processing the request, this also happens regardless of the value of "UAV Coherency Required". The completion of the DC flush will cause the same stage and all previous ones to decrement the semaphore, marking the UAV accesses for the primitive as coherent with L3. The "UAV Coherency Required" 3DPRIMITIVE bit will cause a pipeline stall before any threads are dispatched for the first FF stage with "Accesses UAV" set until the semaphore is cleared for the same stage. Effectively this guarantees that UAV memory accesses performed by previous primitives from any stage will be strictly ordered (and thanks to the implicit DC flush visible in memory) with UAV accesses from the following primitives. None of this is required by the usual image, atomic counter and SSBO GL APIs which have very relaxed cross-primitive coherency and ordering requirements, so we don't actually ever set the "UAV Coherency Required" bit -- Ordering with respect to shader invocations from previous stages on the same primitive where there is a data dependency is of course already guaranteed as the spec requires, regardless of this mechanism being enabled. We do set the "Accesses UAV" bits though since my commit ac7664e493655e290783c23a0412b9c70936da50 (which this patch partially reverts), mainly because of comments like the following from the BDW PRM: > 3DSTATE_GS >[...] > 12 Accesses UAV > Format: Enable > This field must be set when GS has a UAV access. There are similar comments in the documentation for the other 3DSTATE_*S commands. The "must" part is misleading and unjustified AFAIK. Most of the "Accesses UAV" bits don't seem to have any side effects other than the implicit DC flushes and the related book-keeping in anticipation for a subsequent primitive with "UAV Coherency Required" set, so in most cases they are unnecessary and may incur a performance penalty. There is an exception though. On Gen8+ the PS_EXTRA UAV access bit influences the calculation of the PS UAV-only and ThreadDispatchEnable signals which on previous generations were set explicitly by the driver, so we cannot always avoid enabling it on the PS stage. The primary motivation for this change is that in fact the hardware coherency mechanism is buggy and will cause a rather non-deterministic hang on Gen8 when VS is the only stage with "Accesses UAV" set and the processing of a request terminates immediately after the implicit DC flush is sent for a previous primitive with no additional vertices being emitted for the second primitive, what will cause the hardware to skip sending a second DC flush and cause the VS to stall indefinitely waiting for a response from the DC (BDWGFX HSD 1912017). This hardware bug can be reproduced on current master with the spec@arb_shader_image_load_store@host-mem-barrier@Indirect/RaW piglit subtest (if you have the patience to run it a few dozen times). The proposed workaround is to insert CS STALLs speculatively between 3DPRIMITIVE commands when "Accesses UAV" is enabled for the VS stage only. Because this would affect one of the hottest paths in the driver and likely decrease performance even further due to the unnecessary serialization, and because we don't actually need the implicit DC flushes, it seems better to just disable them. Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Diffstat (limited to 'src/mesa/drivers/dri/i965/gen7_gs_state.c')
-rw-r--r--src/mesa/drivers/dri/i965/gen7_gs_state.c4
1 files changed, 1 insertions, 3 deletions
diff --git a/src/mesa/drivers/dri/i965/gen7_gs_state.c b/src/mesa/drivers/dri/i965/gen7_gs_state.c
index 497ecec..8d6d3fe 100644
--- a/src/mesa/drivers/dri/i965/gen7_gs_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_gs_state.c
@@ -59,9 +59,7 @@ upload_gs_state(struct brw_context *brw)
OUT_BATCH(((ALIGN(stage_state->sampler_count, 4)/4) <<
GEN6_GS_SAMPLER_COUNT_SHIFT) |
((brw->gs.prog_data->base.base.binding_table.size_bytes / 4) <<
- GEN6_GS_BINDING_TABLE_ENTRY_COUNT_SHIFT) |
- (brw->is_haswell && prog_data->base.nr_image_params ?
- HSW_GS_UAV_ACCESS_ENABLE : 0));
+ GEN6_GS_BINDING_TABLE_ENTRY_COUNT_SHIFT));
if (brw->gs.prog_data->base.base.total_scratch) {
OUT_RELOC(stage_state->scratch_bo,