summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_ir_fs.h
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: Get rid of fs_inst::set_smear().Francisco Jerez2016-09-141-3/+0
| | | | | | | | | | | | | | | | | | | component() was generally a better alternative because of several issues set_smear() had: - It wouldn't take the original stride and offset of the register into account, which means that set_smear() on the result of e.g. another set_smear() call or an offset() call would give a bogus region as result. - It was an inherently destructive operation. See the 'nir_intrinsic_shader_clock' hunk below for how this could lead to subtle bugs in cases where set_smear() was called multiple times on the same register like 'r.set_smear(0), r.set_smear(1)' with the expectation that each call would return a separate value instead of a reference to the same subsequently mutated object. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Move region_contained_in to the IR header and fix for non-VGRF files.Francisco Jerez2016-09-141-0/+13
| | | | | | | Also changed the argument names since 'src' and 'dst' don't make that much sense outside of the context of copy propagation. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Simplify byte_offset().Francisco Jerez2016-09-141-8/+4
| | | | | | | | In the most common case this can now be implemented as a simple addition because the offset is already encoded as a single scalar value in bytes. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Fix signedness of the return value of fs_inst::size_read().Francisco Jerez2016-09-141-1/+1
| | | | Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Drop fs_inst::overwrites_reg() in favor of regions_overlap().Francisco Jerez2016-09-141-1/+0
| | | | | | | | | | | fs_inst::overwrites_reg is rather easy to misuse because it cannot tell how large the register region starting at 'reg' is, so in cases where the destination region starts after 'reg' it may give a misleading result. regions_overlap() is somewhat more verbose to use but handles arbitrary overlap correctly so it should generally be used instead. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Take into account misalignment in regs_written() and regs_read().Francisco Jerez2016-09-141-4/+4
| | | | | | | | | | | | There was a workaround for this in fs_inst::size_read() for the SHADER_OPCODE_MOV_INDIRECT instruction and FIXED_GRF register file *only*. We should take this possibility into account for the sources and destinations of all instructions on all optimization passes that need to quantize dataflow in 32B increments by adding the amount of misalignment to the size read or written from the regs_read() and regs_written() helpers respectively. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Take into account trailing padding in regs_written() and regs_read().Francisco Jerez2016-09-141-2/+20
| | | | | | | | | | | | | | | | | | | | This fixes regs_written() and regs_read() to return a more accurate value when the padding left between components due to a stride value greater than one causes the region bounds given by size_written or size_read to overflow into the next register. This could become a problem in optimization passes that keep track of dataflow using fixed-size arrays with register granularity, because the overflow register (not actually accessed by the region) may not have been allocated at all which could lead to undefined memory access. An alternative to this would be to subtract the trailing padding already during the calculation of fs_inst::size_read and ::size_written, but that would break code that currently assumes that ::size_read and _written are whole multiples of the component size, and would be hard to maintain looking forward because size_written is assigned from a bunch of different places. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Handle fixed HW GRF subnr in reg_offset().Francisco Jerez2016-09-141-1/+2
| | | | | | | This will be useful later on when we start using reg_offset() on fixed hardware registers. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Replace fs_inst::regs_read with ::size_read using byte units.Francisco Jerez2016-09-141-2/+4
| | | | | | | | | | | | | The previous regs_read value can be recovered by rewriting each reference of regs_read() like 'x = i.regs_read(j)' to 'x = DIV_ROUND_UP(i.size_read(j), reg_unit)'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Replace fs_inst::regs_written with ::size_written field in bytes.Francisco Jerez2016-09-141-1/+2
| | | | | | | | | | | | | | The previous regs_written field can be recovered by rewriting each rvalue reference of regs_written like 'x = i.regs_written' to 'x = DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference like 'i.regs_written = x' to 'i.size_written = x * reg_unit'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Add wrapper functions for fs_inst::regs_read and ::regs_written.Francisco Jerez2016-09-141-0/+26
| | | | | | | | | | | | | | This is in preparation for dropping fs_inst::regs_read and ::regs_written in favor of more accurate alternatives expressed in byte units. The main reason these wrappers are useful is that a number of optimization passes implement dataflow analysis with register granularity, so these helpers will come in handy once we've switched register offsets and sizes to the byte representation. The wrapper functions will also make sure that GRF misalignment (currently neglected by most of the back-end) is taken into account correctly in the calculation of regs_read and regs_written. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Replace fs_reg::subreg_offset with fs_reg::offset expressed in bytes.Francisco Jerez2016-09-141-11/+5
| | | | | | | | | | | | | | | | | The fs_reg::subreg_offset and ::offset fields are now redundant, the sub-GRF offset can just be added to the single ::offset field expressed in byte units. The current subreg_offset value can be recovered by applying the following rule: Replace each rvalue reference of subreg_offset like 'x = r.subreg_offset' with 'x = r.offset % reg_unit', and each lvalue reference like 'r.subreg_offset = x' with 'r.offset = ROUND_DOWN_TO(r.offset, reg_unit) + x'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Replace fs_reg::reg_offset with fs_reg::offset expressed in bytes.Francisco Jerez2016-09-141-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fs_reg::offset field in byte units introduced in this patch is a more straightforward alternative to the current register offset representation split between fs_reg::reg_offset and ::subreg_offset. The split representation makes it too easy to forget about one of the offsets while dealing with the other, which has led to multiple back-end bugs in the past. To make the matter worse the unit reg_offset was expressed in was rather inconsistent, for uniforms it would be expressed in either 4B or 16B units depending on the back-end, and for most other things it would be expressed in 32B units. This encodes reg_offset as a new offset field expressed consistently in byte units. Each rvalue reference of reg_offset in existing code like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and each lvalue reference like 'r.reg_offset = x' is rewritten to 'r.offset = r.offset % reg_unit + x * reg_unit'. Because the change affects a lot of places and is rather non-trivial to verify due to the inconsistent value of reg_unit, I've tried to avoid making any additional changes other than applying the rewrite rule above in order to keep the patch as simple as possible, sometimes at the cost of introducing obvious stupidity (e.g. algebraic expressions that could be simplified given some knowledge of the context) -- I'll clean those up later on in a second pass. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* intel: s/brw_device_info/gen_device_info/Jason Ekstrand2016-09-031-2/+2
| | | | | | | | | | | | | Generated by: sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965/fs: Teach regions_overlap() about COMPR4 MRF regions.Francisco Jerez2016-05-311-3/+17
| | | | | Cc: "12.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Generalize regions_overlap() from copy propagation to handle ↵Francisco Jerez2016-05-291-0/+38
| | | | | | | | | | | | non-VGRF files. This will be useful in several places. The only externally visible difference (other than non-VGRF files being supported now) is that the region sizes are now passed in byte units instead of in GRF units because the loss of precision would have become a problem in the SIMD lowering pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Refactor offset() into a separate function taking the width as ↵Francisco Jerez2016-05-291-0/+19
| | | | | | | | | | | argument. This will be useful in the SIMD lowering pass to avoid having to construct a builder object of the known region width just to pass it as argument to offset(), which doesn't do anything with it other than taking the builder dispatch_width as region width. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Fix half() to handle more exotic register files.Francisco Jerez2016-05-271-21/+4
| | | | | | | | horiz_offset() is able to deal with a superset of the register files currently special-cased in half(). Just call horiz_offset() in all cases. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Fix horiz_offset() to handle ARF and HW GRF register files.Francisco Jerez2016-05-271-4/+10
| | | | | | | We'll hit these in some cases during SIMD lowering in 32-wide programs. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Clean up remaining uses of fs_inst::reads_flag and ::writes_flag.Francisco Jerez2016-05-271-12/+0
| | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Define methods to calculate the flag subset read or written by an ↵Francisco Jerez2016-05-271-2/+23
| | | | | | | | fs_inst. v2: Codestyle fixes (Jason). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Expose arbitrary channel execution groups to the IR.Francisco Jerez2016-05-271-11/+9
| | | | | | | | | | This generalizes the current fs_inst::force_sechalf flag to allow specifying channel enable groups other than 0 or 8. At some point it will likely make sense to fix the vec4 generator to support arbitrary execution groups and then move the definition of fs_inst::group into backend_instruction (e.g. so we can do FP64 in the VEC4 back-end). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Generalize is_uniform() to is_periodic().Francisco Jerez2016-05-271-1/+30
| | | | | | This will be useful in the SIMD lowering pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Fix byte_offset() for MRF/ARF/FIXED_GRF regs.Francisco Jerez2016-05-271-11/+17
| | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Fix and document component().Francisco Jerez2016-05-161-2/+5
| | | | | | | | | | This fixes a number of bugs of component() by reimplementing it in terms of horiz_offset(): Handling of base registers starting at a non-zero subreg_offset, handling of strided registers and overflow of subreg_offset into reg_offset. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: fix subreg_offset overflow in byte_offset()Iago Toral Quiroga2016-05-161-4/+6
| | | | | | | | | | | This can happen if the register already has a non-zero subreg_offset when byte_offset() is called. v2 (Sam): - Refactor byte_offset() (Jordan). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: handle uniforms in byte_offset()Connor Abbott2016-05-101-1/+5
| | | | | | | | v2: Do it only for uniforms (Iago) Signed-off-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Introduce helper to extract a field from each channel of a register.Francisco Jerez2016-05-101-0/+28
| | | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Get rid of reladdrJason Ekstrand2016-04-141-4/+1
| | | | | | | We aren't using it anymore. Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Add src/dst interference for certain instructions with hazards.Kenneth Graunke2015-11-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When working on tessellation shaders, I created some vec4 virtual opcodes for creating message headers through a sequence like: mov(8) g7<1>UD 0x00000000UD { align1 WE_all 1Q compacted }; mov(1) g7.5<1>UD 0x00000100UD { align1 WE_all }; mov(1) g7<1>UD g0<0,1,0>UD { align1 WE_all compacted }; mov(1) g7.3<1>UD g8<0,1,0>UD { align1 WE_all }; This is done in the generator since the vec4 backend can't handle align1 regioning. From the visitor's point of view, this is a single opcode: hs_set_output_urb_offsets vgrf7.0:UD, 1U, vgrf8.xxxx:UD Normally, there's no hazard between sources and destinations - an instruction (naturally) reads its sources, then writes the result to the destination. However, when the virtual instruction generates multiple hardware instructions, we can get into trouble. In the above example, if the register allocator assigned vgrf7 and vgrf8 to the same hardware register, then we'd clobber the source with 0 in the first instruction, and read back the wrong value in the last one. It occured to me that this is exactly the same problem we have with SIMD16 instructions that use W/UW or B/UB types with 0 stride. The hardware implicitly decodes them as two SIMD8 instructions, and with the overlapping regions, the first would clobber the second. Previously, we handled that by incrementing the live range end IP by 1, which works, but is excessive: the next instruction doesn't actually care about that. It might also be the end of control flow. This might keep values alive too long. What we really want is to say "my source and destinations interfere". This patch creates new infrastructure for doing just that, and teaches the register allocator to add interference when there's a hazard. For my vec4 case, we can determine this by switching on opcodes. For the SIMD16 case, we just move the existing code there. I audited our existing virtual opcodes that generate multiple instructions; I believe FS_OPCODE_PACK_HALF_2x16_SPLIT needs this treatment as well, but no others. v2: Rebased by mattst88. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Use scope operator to ensure brw_reg is interpreted as a type.Matt Turner2015-11-241-1/+1
| | | | | | | | | | | | | | | | | | | In the next patch, I make backend_reg's inheritance from brw_reg private, which confuses clang when it sees the type "struct brw_reg" in the derived class constructors, thinking it is referring to the privately inherited brw_reg: brw_fs.cpp:366:23: error: 'brw_reg' is a private member of 'brw_reg' fs_reg::fs_reg(struct brw_reg reg) : ^ brw_shader.h:39:22: note: constrained by private inheritance here struct backend_reg : private brw_reg ^~~~~~~~~~~~~~~ brw_reg.h:232:8: note: member is declared here struct brw_reg { ^ Avoid this by marking brw_reg with the scope resolution operator.
* i965/fs: Replace fs_reg(imm) constructors with brw_imm_*().Matt Turner2015-11-191-5/+0
| | | | | | | | | | | | Cuts 10k of .text, of which only 776 bytes are the fs_reg constructor implementations themselves. text data bss dec hex filename 5204535 214112 27784 5446431 531b1f i965_dri.so before 5193977 214112 27784 5435873 52f1e1 i965_dri.so after Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Combine register file field.Matt Turner2015-11-131-2/+2
| | | | | | | | The first four values (2-bits) are hardware values, and VGRF, ATTR, and UNIFORM remain values used in the IR. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Replace HW_REG with ARF/FIXED_GRF.Matt Turner2015-11-131-3/+6
| | | | | | | | | | | | | | | | HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to look at different fields for type, abs, negate, writemask, swizzle, and a second file. They also caused annoying problems like immediate sources being considered scheduling barriers (commit 6148e94e2) and other such nonsense. Instead use ARF/FIXED_GRF/MRF for fixed registers in those files. After a sufficient amount of time has passed since "GRF" was used, we can rename FIXED_GRF -> GRF, but doing so now would make rebasing awful. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Rename GRF to VGRF.Matt Turner2015-11-131-3/+3
| | | | | | | | | | The 2-bit hardware register file field is ARF, GRF, MRF, IMM. Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to mean an assigned general purpose register. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use brw_reg's nr field to store register number.Matt Turner2015-11-131-3/+3
| | | | | | | | | | | | In addition to combining another field, we get replace silliness like "reg.reg" with something that actually makes sense, "reg.nr"; and no one will ever wonder again why dst.reg isn't a dst_reg. Moving the now 16-bit nr field to a 16-bit boundary decreases code size by about 3k. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Remove fixed_hw_reg field from backend_reg.Matt Turner2015-11-131-2/+2
| | | | | | | | Since backend_reg now inherits brw_reg, we can use it in place of the fixed_hw_reg field. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Delete abs/negate fields from backend_reg.Matt Turner2015-11-131-1/+1
| | | | | | | | Instead use the ones provided by brw_reg. Also allows us to handle HW_REGs in the negate() functions. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Replace default case with list of enum values.Matt Turner2015-11-021-3/+4
| | | | | | | If we add a new file type, we'd like to get warnings if it's not handled. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Extract can_change_source_types() functions.Matt Turner2015-10-191-0/+1
| | | | | | | | | Make them members of fs_inst/vec4_instruction for use elsewhere. Also fix the fs version to check that dst.type == src[1].type and for !saturate. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs: Factor out source components calculation to a separate method.Francisco Jerez2015-07-291-0/+1
| | | | | | | | | | | | | | | | | | This cleans up fs_inst::regs_read() slightly by disentangling the calculation of "components" from the handling of message payload arguments. This will also simplify the SIMD lowering and logical send message lowering passes, because it will avoid expressions like 'regs_read * REG_SIZE / component_size' which are not only ugly, they may be inaccurate because regs_read rounds up the result to the closest register multiple so they could give incorrect results when the component size is lower than one register (e.g. uniforms). This didn't seem to be a problem right now because all such expressions happen to be dealing with per-channel GRFs only currently, but that's by no means obvious so better be safe than sorry. v2: Split PIXEL_X/Y and LINTERP into separate case blocks. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs: Factor out universally broken calculation of the register component ↵Francisco Jerez2015-07-161-0/+6
| | | | | | | | | | | | | | size. This in principle simple calculation was being open-coded in a number of places (in a series I haven't yet sent for review there will be a couple more), all of them were subtly broken in one way or another: None of them were handling the HW_REG case correctly as pointed out by Connor, and fs_inst::regs_read() was handling the stride=0 case rather naively. This patch solves both problems and factors out the calculation as a new fs_reg method. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs: Remove the width field from fs_regJason Ekstrand2015-06-301-14/+1
| | | | | | | | | | | | | As of now, the width field is no longer used for anything. The width field "seemed like a good idea at the time" but is actually entirely redundant with the instruction's execution size. Initially, it gave us the ability to easily set the instructions execution size based entirely on register widths. With the builder, we can easiliy set the sizes explicitly and the width field doesn't have as much purpose. At this point, it's just redundant information that can get out of sync so it really needs to go. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Remove fs_inst constructors that don't take an explicit exec_sizeJason Ekstrand2015-06-301-8/+1
| | | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Move offset(fs_reg, unsigned) to brw_fs.hJason Ekstrand2015-06-301-21/+0
| | | | | | | | Shortly, offset() will depend on the builder so we need it moved to some place where it has access to that. Reviewed-by: Iago Toral Quiroga <itoral@igali.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Drop fs_inst::force_uncompressed.Francisco Jerez2015-06-091-1/+0
| | | | | | This is now unused. Saves a whole bit of memory per instruction. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Define consistent interface to enable instruction result saturation.Francisco Jerez2015-06-091-0/+11
| | | | | | v2: Use set_ prefix. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Define consistent interface to enable instruction conditional modifiers.Francisco Jerez2015-06-091-0/+11
| | | | | | v2: Use set_ prefix. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Define consistent interface to predicate an instruction.Francisco Jerez2015-06-091-0/+22
| | | | | | v2: Use set_ prefix. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Add set_sechalf() method.Matt Turner2015-05-181-0/+10
| | | | | | Used in the next commit. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>