summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_fs.cpp
diff options
context:
space:
mode:
authorFrancisco Jerez <currojerez@riseup.net>2016-08-08 12:43:18 -0700
committerFrancisco Jerez <currojerez@riseup.net>2016-08-18 20:05:00 -0700
commitb2b621a0ec57f08586b9afcf666c0eadc0993ca0 (patch)
tree47a8e425f0ae10b1018e405124daf45ccbcc34da /src/mesa/drivers/dri/i965/brw_fs.cpp
parent01b321f2420d45e9353c94bcf5d96cae6c2deac2 (diff)
downloadexternal_mesa3d-b2b621a0ec57f08586b9afcf666c0eadc0993ca0.zip
external_mesa3d-b2b621a0ec57f08586b9afcf666c0eadc0993ca0.tar.gz
external_mesa3d-b2b621a0ec57f08586b9afcf666c0eadc0993ca0.tar.bz2
i965/fs: Switch to per-subspan discard jumps.
ANY4H is more efficient than ANY8H and ANY16H because it makes sure that whenever a whole subspan hits a discard statement it gets disabled by the EU until the end of the program, regardless of whether the discard condition is uniform across all channels of the SIMD8-16 thread. OTOH ANY8H/ANY16H would cause the rest of the program to be executed for *all* channels if only one of the channels hadn't taken the discard branch, potentially increasing the bandwidth and ALU usage of the program unnecessarily. This change increases the FPS by over 3x of a simple micro-benchmark that discards a bunch of fragments and then does a single costly texturing operation. I've just re-verified the FPS change on HSW and SKL, but I expect all platforms from Gen6 up to get a similar benefit. Note that we could potentially be more aggressive and use the NORMAL predicate to discard individual channels, but that would need to happen post-scheduling because the scheduler currently doesn't care to reorder HALT instructions with respect to other instructions, and the NORMAL predicate would cause the results of subsequent derivative computations to become undefined -- If the scheduler didn't reorder HALT instructions it would actually be safe to switch to NORMAL because the behavior of derivative computations after a non-uniform discard statement is undefined by the GLSL spec, but that would make the optimization implemented by one of the following commits somewhat more difficult. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Diffstat (limited to 'src/mesa/drivers/dri/i965/brw_fs.cpp')
-rw-r--r--src/mesa/drivers/dri/i965/brw_fs.cpp4
1 files changed, 1 insertions, 3 deletions
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
index b89c672..c23b558 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1397,9 +1397,7 @@ fs_visitor::emit_discard_jump()
fs_inst *discard_jump = bld.emit(FS_OPCODE_DISCARD_JUMP);
discard_jump->flag_subreg = 1;
- discard_jump->predicate = (dispatch_width == 8)
- ? BRW_PREDICATE_ALIGN1_ANY8H
- : BRW_PREDICATE_ALIGN1_ANY16H;
+ discard_jump->predicate = BRW_PREDICATE_ALIGN1_ANY4H;
discard_jump->predicate_inverse = true;
}