| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
This is copy-pasted almost line by line from the FS back-end. The
only reason it cannot be implemented in terms of backend_reg is that
the backend_reg::nr field doesn't have the same meaning for uniforms
on both back-ends. It could be easily deduplicated by using a
template function.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
| |
Unlike the FS counterpart of this commit this was likely not (yet) a
bug, but let's fix it already in preparation for implementing support
for sub-GRF offsets in the VEC4 back-end.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
units.
The previous regs_read value can be recovered by rewriting each
reference of regs_read() like 'x = i.regs_read(j)' to 'x =
DIV_ROUND_UP(i.size_read(j), reg_unit)'.
For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in bytes.
The previous regs_written field can be recovered by rewriting each
rvalue reference of regs_written like 'x = i.regs_written' to 'x =
DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference
like 'i.regs_written = x' to 'i.size_written = x * reg_unit'.
For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
::regs_written.
This is in preparation for dropping vec4_instruction::regs_read and
::regs_written in favor of more accurate alternatives expressed in
byte units. The main reason these wrappers are useful is that a
number of optimization passes implement dataflow analysis with
register granularity, so these helpers will come in handy once we've
switched register offsets and sizes to the byte representation. The
wrapper functions will also make sure that GRF misalignment (currently
neglected by most of the back-end) is taken into account correctly in
the calculation of regs_read and regs_written.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
expressed in bytes.
The dst/src_reg::offset field in byte units introduced in the previous
patch is a more straightforward alternative to an offset
representation split between ::reg_offset and ::subreg_offset fields.
The split representation makes it too easy to forget about one of the
offsets while dealing with the other, which has led to multiple FS
back-end bugs in the past. To make the matter worse the unit
reg_offset was expressed in was rather inconsistent, for uniforms it
would be expressed in either 4B or 16B units depending on the
back-end, and for most other things it would be expressed in 32B
units.
This encodes reg_offset as a new offset field expressed consistently
in byte units. Each rvalue reference of reg_offset in existing code
like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and
each lvalue reference like 'r.reg_offset = x' is rewritten to
'r.offset = r.offset % reg_unit + x * reg_unit'.
Because the change affects a lot of places and is rather non-trivial
to verify due to the inconsistent value of reg_unit, I've tried to
avoid making any additional changes other than applying the rewrite
rule above in order to keep the patch as simple as possible, sometimes
at the cost of introducing obvious stupidity (e.g. algebraic
expressions that could be simplified given some knowledge of the
context) -- I'll clean those up later on in a second pass.
v2: Fix division by the wrong reg_unit in the UNIFORM case of
convert_to_hw_regs(). (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generated by:
sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Scalar immediates used to be handled correctly by swizzle() (as the
identity) but since commit 58fa9d47b536403c4e3ca5d6a2495691338388fd it
will corrupt the contents of the immediate. Vector immediates were
never handled correctly, but we had ad-hoc code to swizzle VF
immediates in the vec4 copy propagation pass. This takes care of
swizzling V and UV in addition.
v2: Don't implement swizzling of V/UV immediates (Matt). If you need
to swizzle an integer vector immediate in the future apply the
following diff to go back to v1:
--- a/src/mesa/drivers/dri/i965/brw_eu.c
+++ b/src/mesa/drivers/dri/i965/brw_eu.c
@@ -119,11 +119,10 @@ brw_swap_cmod(uint32_t cmod)
static unsigned
imm_shift(enum brw_reg_type type, unsigned i)
{
- assert(type != BRW_REGISTER_TYPE_UV && type != BRW_REGISTER_TYPE_V &&
- "Not implemented.");
-
if (type == BRW_REGISTER_TYPE_VF)
return 8 * (i & 3);
+ else if (type == BRW_REGISTER_TYPE_UV || type == BRW_REGISTER_TYPE_V)
+ return 4 * (i & 7);
else
return 0;
}
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When working on tessellation shaders, I created some vec4 virtual
opcodes for creating message headers through a sequence like:
mov(8) g7<1>UD 0x00000000UD { align1 WE_all 1Q compacted };
mov(1) g7.5<1>UD 0x00000100UD { align1 WE_all };
mov(1) g7<1>UD g0<0,1,0>UD { align1 WE_all compacted };
mov(1) g7.3<1>UD g8<0,1,0>UD { align1 WE_all };
This is done in the generator since the vec4 backend can't handle align1
regioning. From the visitor's point of view, this is a single opcode:
hs_set_output_urb_offsets vgrf7.0:UD, 1U, vgrf8.xxxx:UD
Normally, there's no hazard between sources and destinations - an
instruction (naturally) reads its sources, then writes the result to the
destination. However, when the virtual instruction generates multiple
hardware instructions, we can get into trouble.
In the above example, if the register allocator assigned vgrf7 and vgrf8
to the same hardware register, then we'd clobber the source with 0 in
the first instruction, and read back the wrong value in the last one.
It occured to me that this is exactly the same problem we have with
SIMD16 instructions that use W/UW or B/UB types with 0 stride. The
hardware implicitly decodes them as two SIMD8 instructions, and with
the overlapping regions, the first would clobber the second.
Previously, we handled that by incrementing the live range end IP by 1,
which works, but is excessive: the next instruction doesn't actually
care about that. It might also be the end of control flow. This might
keep values alive too long. What we really want is to say "my source
and destinations interfere".
This patch creates new infrastructure for doing just that, and teaches
the register allocator to add interference when there's a hazard. For
my vec4 case, we can determine this by switching on opcodes. For the
SIMD16 case, we just move the existing code there.
I audited our existing virtual opcodes that generate multiple
instructions; I believe FS_OPCODE_PACK_HALF_2x16_SPLIT needs this
treatment as well, but no others.
v2: Rebased by mattst88.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the next patch, I make backend_reg's inheritance from brw_reg
private, which confuses clang when it sees the type "struct brw_reg" in
the derived class constructors, thinking it is referring to the
privately inherited brw_reg:
brw_fs.cpp:366:23: error: 'brw_reg' is a private member of 'brw_reg'
fs_reg::fs_reg(struct brw_reg reg) :
^
brw_shader.h:39:22: note: constrained by private inheritance here
struct backend_reg : private brw_reg
^~~~~~~~~~~~~~~
brw_reg.h:232:8: note: member is declared here
struct brw_reg {
^
Avoid this by marking brw_reg with the scope resolution operator.
|
|
|
|
|
|
|
| |
Cuts 1.5k of .text.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
| |
The first four values (2-bits) are hardware values, and VGRF, ATTR, and
UNIFORM remain values used in the IR.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to
look at different fields for type, abs, negate, writemask, swizzle, and
a second file. They also caused annoying problems like immediate sources
being considered scheduling barriers (commit 6148e94e2) and other such
nonsense.
Instead use ARF/FIXED_GRF/MRF for fixed registers in those files.
After a sufficient amount of time has passed since "GRF" was used, we
can rename FIXED_GRF -> GRF, but doing so now would make rebasing awful.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In addition to combining another field, we get replace silliness like
"reg.reg" with something that actually makes sense, "reg.nr"; and no one
will ever wonder again why dst.reg isn't a dst_reg.
Moving the now 16-bit nr field to a 16-bit boundary decreases code size
by about 3k.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
| |
Also allows us to handle HW_REGs in the swizzle() and writemask()
functions.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
| |
Since backend_reg now inherits brw_reg, we can use it in place of the
fixed_hw_reg field.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
| |
Instead use the ones provided by brw_reg. Also allows us to handle
HW_REGs in the negate() functions.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes the visitor convert registers to the HW_REG file at the
very end, after register allocation, post-RA scheduling, and dependency
control flagging. After that, everything is in fixed brw_regs.
This simplifies the code generator, as it can just use the hardware
registers rather than having to interpret our abstract files. In
particular, interpreting the UNIFORM file meant reading prog_data
to figure out where push constants are supposed to start.
Having the part of the code that performs register allocation also
translate everything to hardware registers seems sensible.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit f17b78 added an alternative reads_flag(channel) that returned
if the instruction was reading a specific channel flag. By mistake it
only took into account the predicate, but when the opcode is
VS_OPCODE_UNPACK_FLAGS_SIMD4X2 there isn't any predicate, but the flag
are used.
That mistake caused some regressions on old hw. More information on
this bug:
https://bugs.freedesktop.org/show_bug.cgi?id=92621
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vec4_live_variables tracks now each flag channel independently, so
vec4_dead_code_eliminate can update the writemask of null registers,
based on which component are alive at the moment. This would allow
vec4_cmod_propagation to optimize out several movs involving null
registers.
v2: added support to track each flag channel independently at vec4
live_variables, as v1 assumed that it was already doing it, as
pointed by Francisco Jerez
v3: general cleaningn after Matt Turner's review
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
| |
Make them members of fs_inst/vec4_instruction for use elsewhere.
Also fix the fs version to check that dst.type == src[1].type and for
!saturate.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
|
|
|
|
|
|
|
|
| |
Gen6 MATH instructions can not execute in align16 mode, so swizzles or
writemasking are not allowed.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92033
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are some bug reports about shaders failing to compile in gen6
because MRF 14 is used when we need to spill. For example:
https://bugs.freedesktop.org/show_bug.cgi?id=86469
https://bugs.freedesktop.org/show_bug.cgi?id=90631
Discussion in bugzilla pointed to the fact that gen6 might actually have
24 MRF registers available instead of 16, so we could use other MRF
registers and avoid these conflicts (we still need to investigate why
some shaders need up to MRF 14 anyway, since this is not expected).
Notice that the hardware docs are not clear about this fact:
SNB PRM Vol4 Part2's "Table 5-4. MRF Registers Available in Device
Hardware" says "Number per Thread" - "24 registers"
However, SNB PRM Vol4 Part1, 1.6.1 Message Register File (MRF) says:
"Normal threads should construct their messages in m1..m15. (...)
Regardless of actual hardware implementation, the thread should
not assume th at MRF addresses above m15 wrap to legal MRF registers."
Therefore experimentation was necessary to evaluate if we had these extra
MRF registers available or not. This was tested in gen6 using MRF
registers 21..23 for spilling and doing a full piglit run (all.py) forcing
spilling of everything on the FS backend. It was also tested by doing
spilling of everything on both the FS and the VS backends with a piglit run
of shader.py. In both cases no regressions were observed. In fact, many of
these tests where helped in the cases where we forced spilling, since that
triggered the same underlying problem described in the bug reports. Here are
some results using INTEL_DEBUG=spill_fs,spill_vec4 for a shader.py run on
gen6 hardware:
Using MRFs 13..15 for spilling:
crash: 2, fail: 113, pass: 6621, skip: 5461
Using MRFs 21..23 for spilling:
crash: 2, fail: 12, pass: 6722, skip: 5461
This patch sets the ground for later patches to implement spilling
using MRF registers 21..23 in gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is useful for the upcoming texture support in NIR->vec4 pass,
as we found several cases where the brw_type is available, but not
the glsl_type.
Without this new constructor, the alternative would be:
dst_reg reg(MRF, <reg>)
reg.type = <brw_type>
reg.writemask = <mask>
Adding a new constructor makes code easier to read.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
|
|
|
|
|
|
| |
v2: Use set_ prefix.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
| |
v2: Use set_ prefix.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
| |
v2: Use set_ prefix.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
| |
v2: Style fixes.
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
| |
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
| |
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
| |
constructors.
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
It could be objected that swizzle_for_size() is "faster" than
brw_swizzle_for_size(). It's not measurably better in any reasonable
CPU-bound benchmark on VLV according to the Finnish benchmarking
system (including the SynMark2 DrvShComp shader compilation
benchmark).
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
| |
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
| |
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
| |
multiple registers.
v2: Avoid nested ternary operators in vec4_instruction::regs_read(). (Matt)
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
| |
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
| |
The only reason why you need a vec4_visitor to construct a
vec4_instruction is to initialize vec4_instruction::ir and
::annotation. Instead set them from vec4_visitor::emit() just like
fs_visitor does.
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
One should be able to manipulate i965 IR without pulling the whole
FS/VEC4 visitor classes -- Optimization passes and other
transformations would ideally be visitor-agnostic. Among other issues
this avoids a circular dependency between the header file where such
visitor-agnostic code will be defined and the main FS/VEC4 header
where both IR (layer below) and visitor (layer above) happen to be
defined.
Reviewed-by: Matt Turner <mattst88@gmail.com>
|