summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_fs_builder.h
Commit message (Collapse)AuthorAgeFilesLines
* i965: Introduce downcast helpers for prog_data structures.Kenneth Graunke2016-10-051-1/+1
| | | | | | | Similar to brw_context(...), intel_texture_object(...), and so on. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
* i965/fs: Replace fs_inst::regs_written with ::size_written field in bytes.Francisco Jerez2016-09-141-4/+4
| | | | | | | | | | | | | | The previous regs_written field can be recovered by rewriting each rvalue reference of regs_written like 'x = i.regs_written' to 'x = DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference like 'i.regs_written = x' to 'i.size_written = x * reg_unit'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* glsl: Separate overlapping sentinel nodes in exec_list.Matt Turner2016-07-261-1/+1
| | | | | | | | | | | I do appreciate the cleverness, but unfortunately it prevents a lot more cleverness in the form of additional compiler optimizations brought on by -fstrict-aliasing. No difference in OglBatch7 (n=20). Co-authored-by: Davin McCall <davmac@davmac.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: enable the emission of the DIM instructionSamuel Iglesias Gonsálvez2016-07-141-0/+1
| | | | | | | | | | v2 (Matt): - Take a DF source argument for the DIM instruction emission in the visitors. - Indentation. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* Revert "i965/fs: Allow scalar source regions on SNB math instructions."Francisco Jerez2016-06-031-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit c1107cec44ab030c7fcc97c67baa12df1cc9d7b5. Apparently the hardware spec text I quoted in the commit message was outright lying about scalar source math being supported on SNB, the hardware seems to load 32 contiguous bits of data for each channel regardless of the regioning mode. Fixes regressions in the following CTS tests (which we didn't catch early due to CTS being temporarily disabled in our CI system): es2-cts.gtf.gl.atan.atan_vec3_frag_xvary es2-cts.gtf.gl.cos.cos_vec2_frag_xvary es2-cts.gtf.gl.atan.atan_vec2_frag_xvary es2-cts.gtf.gl.pow.pow_vec2_frag_xvary_yconsthalf es2-cts.gtf.gl.cos.cos_float_frag_xvary es2-cts.gtf.gl.pow.pow_float_frag_xvary_yconsthalf es2-cts.gtf.gl.atan.atan_vec3_frag_xvaryyvary es2-cts.gtf.gl.pow.pow_vec3_frag_xvary_yconsthalf es2-cts.gtf.gl.cos.cos_vec3_frag_xvary es2-cts.gtf.gl.atan.atan_vec2_frag_xvaryyvary Cc: mesa-stable@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96346 Reported-by: Mark Janes <mark.a.janes@intel.com> Acked-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Allow scalar source regions on SNB math instructions.Francisco Jerez2016-05-311-8/+2
| | | | | | | | | | | | | | | | I haven't found any evidence that this isn't supported by the hardware, in fact according to the SNB hardware spec: "The supported regioning modes for math instructions are align16, align1 with the following restrictions: - Scalar source is supported. [...] - Source and destination offset must be the same, except the case of scalar source." Cc: "12.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Return 32 bit mask from fs_builder::sample_mask().Francisco Jerez2016-05-271-1/+3
| | | | | | | | This doesn't actually handle the FS case, just add an assertion for the moment so I don't forget to update it later on for SIMD32 fragment shader dispatch. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Emit fixed-width null register regardless of the dispatch width.Francisco Jerez2016-05-271-8/+4
| | | | | | | | brw_null_vec() cannot handle widths over 16 but it doesn't really matter what width we specify for null registers because destination regions have no width field at the hardware level. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Expose arbitrary channel execution groups to the IR.Francisco Jerez2016-05-271-3/+11
| | | | | | | | | | This generalizes the current fs_inst::force_sechalf flag to allow specifying channel enable groups other than 0 or 8. At some point it will likely make sense to fix the vec4 generator to support arbitrary execution groups and then move the definition of fs_inst::group into backend_instruction (e.g. so we can do FP64 in the VEC4 back-end). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Lower math into Gen4-5 send-like instructions in lower_logical_sends.Francisco Jerez2016-05-271-42/+5
| | | | | | | | The benefit is we will be able to use the SIMD lowering pass to unroll math instructions of unsupported width and then remove some cruft from the generator. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: add null_reg_dfIago Toral Quiroga2016-05-101-0/+7
| | | | | | | | Probably not needed since we fix the dst type of comparisons automatically, but for consistency with the rest of null_reg_* functions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: fix regs_written in LOAD_PAYLOAD for doublesConnor Abbott2016-05-101-2/+6
| | | | | | | | v2: Account for the stride of the dst (Iago) Signed-off-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Make emit_minmax return an instruction*.Matt Turner2016-02-171-3/+3
| | | | And use it in brw_fs_nir.cpp.
* i965: Lower min/max after optimization on Gen4/5.Matt Turner2016-02-171-8/+2
| | | | | | | | | | | | | | | | | | | Gen4/5's SEL instruction cannot use conditional modifiers, so min/max are implemented as CMP + SEL. Handling that after optimization lets us CSE more. On Ironlake: total instructions in shared programs: 6426035 -> 6422753 (-0.05%) instructions in affected programs: 326604 -> 323322 (-1.00%) helped: 1411 total cycles in shared programs: 129184700 -> 129101586 (-0.06%) cycles in affected programs: 18950290 -> 18867176 (-0.44%) helped: 2419 HURT: 328 Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Replace fs_reg(imm) constructors with brw_imm_*().Matt Turner2015-11-191-2/+2
| | | | | | | | | | | | Cuts 10k of .text, of which only 776 bytes are the fs_reg constructor implementations themselves. text data bss dec hex filename 5204535 214112 27784 5446431 531b1f i965_dri.so before 5193977 214112 27784 5435873 52f1e1 i965_dri.so after Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Rename GRF to VGRF.Matt Turner2015-11-131-2/+2
| | | | | | | | | | The 2-bit hardware register file field is ARF, GRF, MRF, IMM. Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to mean an assigned general purpose register. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Replace nested ternary with if ladder.Matt Turner2015-11-131-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | Since the types of the expression were bool ? src_reg : (bool ? brw_reg : brw_reg) the result of the second (nested) ternary would be implicitly converted to a src_reg by the src_reg(struct brw_reg) constructor. I.e., bool ? src_reg : src_reg(bool ? brw_reg : brw_reg) In the next patch, I make backend_reg (the parent of src_reg) inherit from brw_reg, which changes this expression to return brw_reg, which throws away any fields that exist in the classes derived from brw_reg. I.e., src_reg(bool ? brw_reg(src_reg) : bool ? brw_reg : brw_reg) Generally this code was gross, and wasn't actually shorter or easier to read than an if ladder. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965/fs: Avoid scalar destinations in emit_uniformize()Kristian Høgsberg Kristensen2015-10-231-4/+11
| | | | | | | | | The scalar destination registers break copy propagation. Instead compute the results to a regular register and then reference a component when we later use the result as a source. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
* i965/fs: Use greater-equal cmod to implement maximum.Matt Turner2015-08-311-0/+2
| | | | | | | | | | The docs specifically call out SEL with .l and .ge as the implementations of MIN and MAX respectively. Among other things, SEL with these conditional mods are commutative. See commit 3b7f683f. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965/fs: Make the default builder 64-wide before entering the optimization loop.Francisco Jerez2015-07-291-0/+3
| | | | | | | | | | | Not a typo. Replace the default builder with one of bogus width to catch cases in which optimization passes assume that the default dispatch width is good enough. The execution controls of instructions emitted during optimization should in general match the original code that is being manipulated. Many of the problems fixed in this series were caught by the assertions introduced in this patch. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs: Define a new fs_builder constructor taking an instruction as argument.Francisco Jerez2015-07-291-0/+16
| | | | | | | | | | | | | | | | | | | | | | | We have a number of optimization passes that repeat the same pattern before inserting new instructions into the program based on some previous instruction: They point the default builder at the original instruction, then call exec_all() and group() to select the same execution controls the original instruction had, and then maybe call annotate() to clone the debug annotation from the original instruction. In fact an optimization pass missing any of these steps is likely to be broken if the intention was to emit new code based on a preexisting instruction, so let's make it easy for passes to do the right thing by having an fs_builder constructor that automates the task of setting up a builder to emit a given instruction provided as argument. The following patches fix all cases I've found in which we weren't explicitly initializing the execution controls of the emitted instructions, and clean-up optimization passes which were already doing the right thing to use the new constructor. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs: Handle zero-size allocations in fs_builder::vgrf().Francisco Jerez2015-07-291-4/+7
| | | | | | | | | | | | | | This will be handy to avoid some ugly ternary operators in the next patch, like: fs_reg reg = (size == 0 ? null_reg_ud() : vgrf(..., size)); Because a zero-size register allocation is guaranteed not to ever be read or written we can just return the null register. Another possibility would be to actually allocate a zero-size VGRF what would involve defining a zero-size register class in the register allocator and a considerable amount of churn. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs: Add builder emit method taking a variable number of source registers.Francisco Jerez2015-07-291-3/+12
| | | | | | | | And start using it in fs_builder::LOAD_PAYLOAD(). This will be used to emit logical send message opcodes which have an unusually large number of arguments. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965: Fix stride field for the result of emit_uniformize().Francisco Jerez2015-07-211-7/+9
| | | | | | | | | | | | | | | | | | | This is essentially the same problem fixed in an earlier patch for immediates. Setting the stride to zero will be particularly useful for my future SIMD lowering pass, because we will be able to just check whether the stride of a source register is zero and skip emitting the copies required to unzip it in that case. Instead of setting stride to zero in every caller of emit_uniformize() I've changed the function to return the result as its return value (previously it was being written into a caller-provided destination register), because this way we can enforce that the result is used with the correct regioning from the function itself. The changes to the prototype of its VEC4 counterpart are mainly for the sake of symmetry, VEC4 registers don't have stride. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
* i965/fs: Relax fs_builder channel group assertion when force_writemask_all ↵Francisco Jerez2015-07-011-2/+2
| | | | | | | | | | | | | | | | | | | is on. This assertion was meant to catch code inadvertently escaping the control flow jail determined by the group of channel enable signals selected by some caller, however it seems useful to be able to increase the default execution size as long as force_writemask_all is enabled, because force_writemask_all is an explicit indication that there is no longer a one-to-one correspondence between channels and SIMD components so the restriction doesn't apply. In addition reorder the calls to fs_builder::group and ::exec_all in a couple of places to make sure that we don't temporarily break this invariant in the future for instructions with exec_size higher than the dispatch width. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs: Remove the width field from fs_regJason Ekstrand2015-06-301-14/+5
| | | | | | | | | | | | | As of now, the width field is no longer used for anything. The width field "seemed like a good idea at the time" but is actually entirely redundant with the instruction's execution size. Initially, it gave us the ability to easily set the instructions execution size based entirely on register widths. With the builder, we can easiliy set the sizes explicitly and the width field doesn't have as much purpose. At this point, it's just redundant information that can get out of sync so it really needs to go. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs_builder: Use the dispatch width for setting exec sizesJason Ekstrand2015-06-301-9/+11
| | | | | | | Previously we used dst.width but the two *should* be the same. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Remove fs_inst constructors that don't take an explicit exec_sizeJason Ekstrand2015-06-301-1/+1
| | | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Introduce FS IR builder.Francisco Jerez2015-06-091-0/+652
The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>