| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
::regs_written.
This is in preparation for dropping vec4_instruction::regs_read and
::regs_written in favor of more accurate alternatives expressed in
byte units. The main reason these wrappers are useful is that a
number of optimization passes implement dataflow analysis with
register granularity, so these helpers will come in handy once we've
switched register offsets and sizes to the byte representation. The
wrapper functions will also make sure that GRF misalignment (currently
neglected by most of the back-end) is taken into account correctly in
the calculation of regs_read and regs_written.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
| |
This can be used on Broadwell by setting INTEL_SCALAR_TES=0.
More importantly, it will be used for Ivybridge and Haswell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The TCS is the first tessellation shader stage, and the most
complicated. It has access to each of the control points in the input
patch, and computes a new output patch. There is one logical invocation
per output control point; all invocations run in parallel, and can
communicate by reading and writing output variables.
One of the main responsibilities of the TCS is to write the special
gl_TessLevelOuter[] and gl_TessLevelInner[] output variables which
control how much new geometry the hardware tessellation engine will
produce. Otherwise, it simply writes outputs that are passed along
to the TES.
We run in SIMD4x2 mode, handling two logical invocations per EU thread.
The hardware doesn't properly manage the dispatch mask for us; it always
initializes it to 0xFF. We wrap the whole program in an IF..ENDIF block
to handle an odd number of invocations, essentially falling back to
SIMD4x1 on the last thread.
v2: Update comments (requested by Jordan Justen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first pass marked dead instructions as opcode = NOP, and a second
pass deleted those instructions so that the live ranges used in the
first pass wouldn't change.
But since we're walking the instructions in reverse order, we can just
do everything in one pass. The only thing we have to do is walk the
blocks in reverse as well.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
|
|
|
|
|
|
|
| |
Removes dead code from glsl-mat-from-int-ctor-03.shader_test.
Reported-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
| |
The 2-bit hardware register file field is ARF, GRF, MRF, IMM.
Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to
mean an assigned general purpose register.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
| |
Detected by Matt Turner while reviewing commit
a59359ecd22154cc2b3f88bb8c599f21af8a3934
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vec4_live_variables tracks now each flag channel independently, so
vec4_dead_code_eliminate can update the writemask of null registers,
based on which component are alive at the moment. This would allow
vec4_cmod_propagation to optimize out several movs involving null
registers.
v2: added support to track each flag channel independently at vec4
live_variables, as v1 assumed that it was already doing it, as
pointed by Francisco Jerez
v3: general cleaningn after Matt Turner's review
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
| |
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 5a06ee738 added a step to the generator to set up the message
header when generating the VS_OPCODE_PULL_CONSTANT_LOAD_GEN7
instruction. That pseudo opcode is implemented in terms of multiple
actual opcodes, one of which writes to one of the source registers in
order to set up the message header. This causes problems because the
scheduler isn't aware that the source register is written to and it
can end up reorganising the instructions incorrectly such that the
write to the source register overwrites a needed value from a previous
instruction. This problem was presenting itself as a rendering error
in the weapon in Enemy Territory: Quake Wars.
Since commit 588859e1 there is an additional problem that the double
register allocated to include the message header would end up being
split into two. This wasn't happening previously because the code to
split registers was explicitly avoided for instructions that are
sending from the GRF.
This patch fixes both problems by splitting the code to set up the
message header into a new pseudo opcode so that it will be done
outside of the generator. This new opcode has the header register as a
destination so the scheduler can recognise that the register is
written to. This has the additional benefit that the scheduler can
optimise the message header slightly better by moving the mov
instructions further away from the send instructions.
On Skylake it appears to fix the following three Piglit tests without
causing any regressions:
gs-float-array-variable-index
gs-mat3x4-row-major
gs-mat4x3-row-major
I think we actually may need to do something similar for the fs
backend and possibly for message headers from regular texture sampling
but I'm not entirely sure.
v2: Make sure the exec-size is retained as 8 for the mov instruction
to initialise the header from g0. This was accidentally lost
during a rebase on top of 07c571a39fa1.
Split the patch into two so that the helper function is a separate
change.
Fix emitting the MOV instruction on Gen7.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89058
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
|
|
|
|
|
|
| |
dead_code_eliminate().
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
| |
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
|
|
Improves 359 shaders by >=10%
114 shaders by >=20%
91 shaders by >=30%
82 shaders by >=40%
22 shaders by >=50%
4 shaders by >=60%
2 shaders by >=80%
total instructions in shared programs: 5845346 -> 5822422 (-0.39%)
instructions in affected programs: 364979 -> 342055 (-6.28%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|