summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_ir_vec4.h
Commit message (Collapse)AuthorAgeFilesLines
* i965/vec4: Port regions_overlap() to the vec4 IR.Francisco Jerez2016-09-141-4/+58
| | | | | | | | | | This is copy-pasted almost line by line from the FS back-end. The only reason it cannot be implemented in terms of backend_reg is that the backend_reg::nr field doesn't have the same meaning for uniforms on both back-ends. It could be easily deduplicated by using a template function. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Take into account misalignment in regs_written() and regs_read().Francisco Jerez2016-09-141-4/+6
| | | | | | | | Unlike the FS counterpart of this commit this was likely not (yet) a bug, but let's fix it already in preparation for implementing support for sub-GRF offsets in the VEC4 back-end. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Replace vec4_instruction::regs_read with ::size_read using byte ↵Francisco Jerez2016-09-141-2/+4
| | | | | | | | | | | | | | | units. The previous regs_read value can be recovered by rewriting each reference of regs_read() like 'x = i.regs_read(j)' to 'x = DIV_ROUND_UP(i.size_read(j), reg_unit)'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Replace vec4_instruction::regs_written with ::size_written field ↵Francisco Jerez2016-09-141-1/+2
| | | | | | | | | | | | | | | | in bytes. The previous regs_written field can be recovered by rewriting each rvalue reference of regs_written like 'x = i.regs_written' to 'x = DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference like 'i.regs_written = x' to 'i.size_written = x * reg_unit'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Add wrapper functions for vec4_instruction::regs_read and ↵Francisco Jerez2016-09-141-0/+26
| | | | | | | | | | | | | | | | ::regs_written. This is in preparation for dropping vec4_instruction::regs_read and ::regs_written in favor of more accurate alternatives expressed in byte units. The main reason these wrappers are useful is that a number of optimization passes implement dataflow analysis with register granularity, so these helpers will come in handy once we've switched register offsets and sizes to the byte representation. The wrapper functions will also make sure that GRF misalignment (currently neglected by most of the back-end) is taken into account correctly in the calculation of regs_read and regs_written. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Replace dst/src_reg::reg_offset with dst/src_reg::offset ↵Francisco Jerez2016-09-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | expressed in bytes. The dst/src_reg::offset field in byte units introduced in the previous patch is a more straightforward alternative to an offset representation split between ::reg_offset and ::subreg_offset fields. The split representation makes it too easy to forget about one of the offsets while dealing with the other, which has led to multiple FS back-end bugs in the past. To make the matter worse the unit reg_offset was expressed in was rather inconsistent, for uniforms it would be expressed in either 4B or 16B units depending on the back-end, and for most other things it would be expressed in 32B units. This encodes reg_offset as a new offset field expressed consistently in byte units. Each rvalue reference of reg_offset in existing code like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and each lvalue reference like 'r.reg_offset = x' is rewritten to 'r.offset = r.offset % reg_unit + x * reg_unit'. Because the change affects a lot of places and is rather non-trivial to verify due to the inconsistent value of reg_unit, I've tried to avoid making any additional changes other than applying the rewrite rule above in order to keep the patch as simple as possible, sometimes at the cost of introducing obvious stupidity (e.g. algebraic expressions that could be simplified given some knowledge of the context) -- I'll clean those up later on in a second pass. v2: Fix division by the wrong reg_unit in the UNIFORM case of convert_to_hw_regs(). (Iago) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* intel: s/brw_device_info/gen_device_info/Jason Ekstrand2016-09-031-3/+3
| | | | | | | | | | | | | Generated by: sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965/vec4: Move can_do_writemask to vec4_instructionJason Ekstrand2016-04-151-0/+1
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Add support for swizzling arbitrary immediates to (brw_)swizzle().Francisco Jerez2016-03-061-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Scalar immediates used to be handled correctly by swizzle() (as the identity) but since commit 58fa9d47b536403c4e3ca5d6a2495691338388fd it will corrupt the contents of the immediate. Vector immediates were never handled correctly, but we had ad-hoc code to swizzle VF immediates in the vec4 copy propagation pass. This takes care of swizzling V and UV in addition. v2: Don't implement swizzling of V/UV immediates (Matt). If you need to swizzle an integer vector immediate in the future apply the following diff to go back to v1: --- a/src/mesa/drivers/dri/i965/brw_eu.c +++ b/src/mesa/drivers/dri/i965/brw_eu.c @@ -119,11 +119,10 @@ brw_swap_cmod(uint32_t cmod) static unsigned imm_shift(enum brw_reg_type type, unsigned i) { - assert(type != BRW_REGISTER_TYPE_UV && type != BRW_REGISTER_TYPE_V && - "Not implemented."); - if (type == BRW_REGISTER_TYPE_VF) return 8 * (i & 3); + else if (type == BRW_REGISTER_TYPE_UV || type == BRW_REGISTER_TYPE_V) + return 4 * (i & 7); else return 0; } Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Add src/dst interference for certain instructions with hazards.Kenneth Graunke2015-11-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When working on tessellation shaders, I created some vec4 virtual opcodes for creating message headers through a sequence like: mov(8) g7<1>UD 0x00000000UD { align1 WE_all 1Q compacted }; mov(1) g7.5<1>UD 0x00000100UD { align1 WE_all }; mov(1) g7<1>UD g0<0,1,0>UD { align1 WE_all compacted }; mov(1) g7.3<1>UD g8<0,1,0>UD { align1 WE_all }; This is done in the generator since the vec4 backend can't handle align1 regioning. From the visitor's point of view, this is a single opcode: hs_set_output_urb_offsets vgrf7.0:UD, 1U, vgrf8.xxxx:UD Normally, there's no hazard between sources and destinations - an instruction (naturally) reads its sources, then writes the result to the destination. However, when the virtual instruction generates multiple hardware instructions, we can get into trouble. In the above example, if the register allocator assigned vgrf7 and vgrf8 to the same hardware register, then we'd clobber the source with 0 in the first instruction, and read back the wrong value in the last one. It occured to me that this is exactly the same problem we have with SIMD16 instructions that use W/UW or B/UB types with 0 stride. The hardware implicitly decodes them as two SIMD8 instructions, and with the overlapping regions, the first would clobber the second. Previously, we handled that by incrementing the live range end IP by 1, which works, but is excessive: the next instruction doesn't actually care about that. It might also be the end of control flow. This might keep values alive too long. What we really want is to say "my source and destinations interfere". This patch creates new infrastructure for doing just that, and teaches the register allocator to add interference when there's a hazard. For my vec4 case, we can determine this by switching on opcodes. For the SIMD16 case, we just move the existing code there. I audited our existing virtual opcodes that generate multiple instructions; I believe FS_OPCODE_PACK_HALF_2x16_SPLIT needs this treatment as well, but no others. v2: Rebased by mattst88. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Use scope operator to ensure brw_reg is interpreted as a type.Matt Turner2015-11-241-2/+2
| | | | | | | | | | | | | | | | | | | In the next patch, I make backend_reg's inheritance from brw_reg private, which confuses clang when it sees the type "struct brw_reg" in the derived class constructors, thinking it is referring to the privately inherited brw_reg: brw_fs.cpp:366:23: error: 'brw_reg' is a private member of 'brw_reg' fs_reg::fs_reg(struct brw_reg reg) : ^ brw_shader.h:39:22: note: constrained by private inheritance here struct backend_reg : private brw_reg ^~~~~~~~~~~~~~~ brw_reg.h:232:8: note: member is declared here struct brw_reg { ^ Avoid this by marking brw_reg with the scope resolution operator.
* i965/vec4: Replace src_reg(imm) constructors with brw_imm_*().Matt Turner2015-11-191-5/+0
| | | | | | | Cuts 1.5k of .text. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Combine register file field.Matt Turner2015-11-131-4/+4
| | | | | | | | The first four values (2-bits) are hardware values, and VGRF, ATTR, and UNIFORM remain values used in the IR. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Replace HW_REG with ARF/FIXED_GRF.Matt Turner2015-11-131-2/+4
| | | | | | | | | | | | | | | | HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to look at different fields for type, abs, negate, writemask, swizzle, and a second file. They also caused annoying problems like immediate sources being considered scheduling barriers (commit 6148e94e2) and other such nonsense. Instead use ARF/FIXED_GRF/MRF for fixed registers in those files. After a sufficient amount of time has passed since "GRF" was used, we can rename FIXED_GRF -> GRF, but doing so now would make rebasing awful. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use brw_reg's nr field to store register number.Matt Turner2015-11-131-4/+4
| | | | | | | | | | | | In addition to combining another field, we get replace silliness like "reg.reg" with something that actually makes sense, "reg.nr"; and no one will ever wonder again why dst.reg isn't a dst_reg. Moving the now 16-bit nr field to a 16-bit boundary decreases code size by about 3k. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/vec4: Remove swizzle/writemask fields from src/dst_reg.Matt Turner2015-11-131-6/+1
| | | | | | | | Also allows us to handle HW_REGs in the swizzle() and writemask() functions. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Remove fixed_hw_reg field from backend_reg.Matt Turner2015-11-131-2/+2
| | | | | | | | Since backend_reg now inherits brw_reg, we can use it in place of the fixed_hw_reg field. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Delete abs/negate fields from backend_reg.Matt Turner2015-11-131-1/+1
| | | | | | | | Instead use the ones provided by brw_reg. Also allows us to handle HW_REGs in the negate() functions. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/vec4: Convert src_reg/dst_reg to brw_reg at the end of the visitor.Kenneth Graunke2015-10-291-3/+0
| | | | | | | | | | | | | | | | | This patch makes the visitor convert registers to the HW_REG file at the very end, after register allocation, post-RA scheduling, and dependency control flagging. After that, everything is in fixed brw_regs. This simplifies the code generator, as it can just use the hardware registers rather than having to interpret our abstract files. In particular, interpreting the UNIFORM file meant reading prog_data to figure out where push constants are supposed to start. Having the part of the code that performs register allocation also translate everything to hardware registers seems sensible. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: check opcode on vec4_instruction::reads_flag(channel)Alejandro Piñeiro2015-10-231-2/+2
| | | | | | | | | | | | | | Commit f17b78 added an alternative reads_flag(channel) that returned if the instruction was reading a specific channel flag. By mistake it only took into account the predicate, but when the opcode is VS_OPCODE_UNPACK_FLAGS_SIMD4X2 there isn't any predicate, but the flag are used. That mistake caused some regressions on old hw. More information on this bug: https://bugs.freedesktop.org/show_bug.cgi?id=92621 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/vec4: track and use independently each flag channelAlejandro Piñeiro2015-10-221-0/+21
| | | | | | | | | | | | | | | vec4_live_variables tracks now each flag channel independently, so vec4_dead_code_eliminate can update the writemask of null registers, based on which component are alive at the moment. This would allow vec4_cmod_propagation to optimize out several movs involving null registers. v2: added support to track each flag channel independently at vec4 live_variables, as v1 assumed that it was already doing it, as pointed by Francisco Jerez v3: general cleaningn after Matt Turner's review Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Extract can_change_source_types() functions.Matt Turner2015-10-191-0/+1
| | | | | | | | | Make them members of fs_inst/vec4_instruction for use elsewhere. Also fix the fs version to check that dst.type == src[1].type and for !saturate. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/vec4: Don't coalesce regs in Gen6 MATH ops if reswizzle/writemask neededAntia Puentes2015-09-231-1/+2
| | | | | | | | Gen6 MATH instructions can not execute in align16 mode, so swizzles or writemasking are not allowed. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92033 Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Turn BRW_MAX_MRF into a macro that accepts a hardware generationIago Toral Quiroga2015-09-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are some bug reports about shaders failing to compile in gen6 because MRF 14 is used when we need to spill. For example: https://bugs.freedesktop.org/show_bug.cgi?id=86469 https://bugs.freedesktop.org/show_bug.cgi?id=90631 Discussion in bugzilla pointed to the fact that gen6 might actually have 24 MRF registers available instead of 16, so we could use other MRF registers and avoid these conflicts (we still need to investigate why some shaders need up to MRF 14 anyway, since this is not expected). Notice that the hardware docs are not clear about this fact: SNB PRM Vol4 Part2's "Table 5-4. MRF Registers Available in Device Hardware" says "Number per Thread" - "24 registers" However, SNB PRM Vol4 Part1, 1.6.1 Message Register File (MRF) says: "Normal threads should construct their messages in m1..m15. (...) Regardless of actual hardware implementation, the thread should not assume th at MRF addresses above m15 wrap to legal MRF registers." Therefore experimentation was necessary to evaluate if we had these extra MRF registers available or not. This was tested in gen6 using MRF registers 21..23 for spilling and doing a full piglit run (all.py) forcing spilling of everything on the FS backend. It was also tested by doing spilling of everything on both the FS and the VS backends with a piglit run of shader.py. In both cases no regressions were observed. In fact, many of these tests where helped in the cases where we forced spilling, since that triggered the same underlying problem described in the bug reports. Here are some results using INTEL_DEBUG=spill_fs,spill_vec4 for a shader.py run on gen6 hardware: Using MRFs 13..15 for spilling: crash: 2, fail: 113, pass: 6621, skip: 5461 Using MRFs 21..23 for spilling: crash: 2, fail: 12, pass: 6722, skip: 5461 This patch sets the ground for later patches to implement spilling using MRF registers 21..23 in gen6. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/vec4: Add a new dst_reg constructor accepting a brw_reg_typeAlejandro Piñeiro2015-08-031-0/+2
| | | | | | | | | | | | | | | This is useful for the upcoming texture support in NIR->vec4 pass, as we found several cases where the brw_type is available, but not the glsl_type. Without this new constructor, the alternative would be: dst_reg reg(MRF, <reg>) reg.type = <brw_type> reg.writemask = <mask> Adding a new constructor makes code easier to read. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965: Define consistent interface to enable instruction result saturation.Francisco Jerez2015-06-091-0/+11
| | | | | | v2: Use set_ prefix. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Define consistent interface to enable instruction conditional modifiers.Francisco Jerez2015-06-091-0/+11
| | | | | | v2: Use set_ prefix. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Define consistent interface to predicate an instruction.Francisco Jerez2015-06-091-0/+22
| | | | | | v2: Use set_ prefix. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Perform basic optimizations on the BROADCAST opcode.Francisco Jerez2015-05-041-0/+7
| | | | | | v2: Style fixes. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Add a devinfo field to backend_visitor and use it for gen checksJason Ekstrand2015-04-221-1/+1
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Some more trivial swizzle clean-up.Francisco Jerez2015-03-231-5/+1
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Pass argument by reference to src_reg/dst_reg conversion ↵Francisco Jerez2015-03-231-2/+2
| | | | | | constructors. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Remove swizzle_for_size() in favour of brw_swizzle_for_size().Francisco Jerez2015-03-231-3/+0
| | | | | | | | | | It could be objected that swizzle_for_size() is "faster" than brw_swizzle_for_size(). It's not measurably better in any reasonable CPU-bound benchmark on VLV according to the Finnish benchmarking system (including the SynMark2 DrvShComp shader compilation benchmark). Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Fix signedness of dst_reg::writemask.Francisco Jerez2015-03-231-2/+3
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Don't use GL types in the IR data structures.Francisco Jerez2015-03-231-1/+1
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Fix the scheduler to take into account reads and writes of ↵Francisco Jerez2015-02-101-0/+1
| | | | | | | | multiple registers. v2: Avoid nested ternary operators in vec4_instruction::regs_read(). (Matt) Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Implement equals() method for dst_reg too.Francisco Jerez2015-02-101-0/+2
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Remove dependency of vec4_instruction on the visitor class.Francisco Jerez2015-02-101-2/+1
| | | | | | | | | The only reason why you need a vec4_visitor to construct a vec4_instruction is to initialize vec4_instruction::ir and ::annotation. Instead set them from vec4_visitor::emit() just like fs_visitor does. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Move IR object definitions to separate header files.Francisco Jerez2015-02-101-0/+192
One should be able to manipulate i965 IR without pulling the whole FS/VEC4 visitor classes -- Optimization passes and other transformations would ideally be visitor-agnostic. Among other issues this avoids a circular dependency between the header file where such visitor-agnostic code will be defined and the main FS/VEC4 header where both IR (layer below) and visitor (layer above) happen to be defined. Reviewed-by: Matt Turner <mattst88@gmail.com>