diff options
author | Kenneth Graunke <kenneth@whitecape.org> | 2016-05-04 23:44:25 -0700 |
---|---|---|
committer | Kenneth Graunke <kenneth@whitecape.org> | 2016-05-09 15:00:01 -0700 |
commit | 96d43f2d087e23ab692d43fc48fe1be30e923ae0 (patch) | |
tree | df3d1532a3993a98e0141003d63e4fc4a06cd2b7 /src/mesa/drivers/dri/i965/brw_context.c | |
parent | fdb6c1887f7b61ef49fb89e0b0928f65b2edf29b (diff) | |
download | external_mesa3d-96d43f2d087e23ab692d43fc48fe1be30e923ae0.zip external_mesa3d-96d43f2d087e23ab692d43fc48fe1be30e923ae0.tar.gz external_mesa3d-96d43f2d087e23ab692d43fc48fe1be30e923ae0.tar.bz2 |
i965: Reimplement ARB_transform_feedback2 on Haswell and later.
My old implementation accumulated <start, end> pairs in a buffer,
and eventually processed that data on the CPU. This meant flushing
the batchbuffer and waiting for it to completely execute before we
could map it, resulting in really long stalls. We could also run out
of space in the buffer, and have to do this early.
Instead, we can use Haswell's MI_MATH command to do the (end - start)
subtraction, as well as the multiplication by 2 or 3 to convert from
the number of primitives written to the number of vertices written.
We still need to CS stall to read the counters, but otherwise everything
is completely pipelined - there's no CPU<->GPU synchronization required.
It also uses only 80 bytes in the buffer, no matter what.
Improves performance in Manhattan on Skylake GT3e at 800x600 by
6.1086% +/- 0.954166% (n=9). At 1920x1080, improves performance
by 2.82103% +/- 0.148596% (n=84).
v2: Fix number of primitives -> number of vertices calculation for
GL_TRIANGLES (I was multiplying by 4 instead of 3.) Caught by
Jordan Justen.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Diffstat (limited to 'src/mesa/drivers/dri/i965/brw_context.c')
-rw-r--r-- | src/mesa/drivers/dri/i965/brw_context.c | 14 |
1 files changed, 10 insertions, 4 deletions
diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 1380d41..26514a0 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -372,13 +372,18 @@ brw_init_driver_functions(struct brw_context *brw, functions->NewTransformFeedback = brw_new_transform_feedback; functions->DeleteTransformFeedback = brw_delete_transform_feedback; - functions->GetTransformFeedbackVertexCount = - brw_get_transform_feedback_vertex_count; - if (brw->gen >= 7) { + if (brw->intelScreen->has_mi_math_and_lrr) { + functions->BeginTransformFeedback = hsw_begin_transform_feedback; + functions->EndTransformFeedback = hsw_end_transform_feedback; + functions->PauseTransformFeedback = hsw_pause_transform_feedback; + functions->ResumeTransformFeedback = hsw_resume_transform_feedback; + } else if (brw->gen >= 7) { functions->BeginTransformFeedback = gen7_begin_transform_feedback; functions->EndTransformFeedback = gen7_end_transform_feedback; functions->PauseTransformFeedback = gen7_pause_transform_feedback; functions->ResumeTransformFeedback = gen7_resume_transform_feedback; + functions->GetTransformFeedbackVertexCount = + brw_get_transform_feedback_vertex_count; } else { functions->BeginTransformFeedback = brw_begin_transform_feedback; functions->EndTransformFeedback = brw_end_transform_feedback; @@ -494,7 +499,8 @@ brw_initialize_context_constants(struct brw_context *brw) ctx->Const.MaxTransformFeedbackSeparateComponents = BRW_MAX_SOL_BINDINGS / BRW_MAX_SOL_BUFFERS; - ctx->Const.AlwaysUseGetTransformFeedbackVertexCount = true; + ctx->Const.AlwaysUseGetTransformFeedbackVertexCount = + !brw->intelScreen->has_mi_math_and_lrr; int max_samples; const int *msaa_modes = intel_supported_msaa_modes(brw->intelScreen); |