summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_reg.h
Commit message (Collapse)AuthorAgeFilesLines
* i965: fix subnr overflow in suboffset()Iago Toral Quiroga2016-10-191-8/+5
| | | | Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: Take Dispatch/Vector mask into account in FIND_LIVE_CHANNELJason Ekstrand2016-09-211-0/+12
| | | | | | | | | | | | | | | | | | | On at least Sky Lake, ce0 does not contain the full story as far as enabled channels goes. It is possible to have completely disabled channels where the corresponding bits in ce0 are 1. In order to get the correct execution mask, you have to mask off those channels which were disabled from the beginning by taking the AND of ce0 with either sr0.2 or sr0.3 depending on the shader stage. Failure to do so can result in FIND_LIVE_CHANNEL returning a completely dead channel. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Cc: Francisco Jerez <currojerez@riseup.net> [ Francisco Jerez: Fix a couple of typos, add mask register type assertion, clarify reason why ce0 can have bits set for disabled channels, clarify that this may only be a problem when thread dispatch doesn't pack channels tightly in the SIMD thread. Apply same treatment to Align16 path. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965/reg: Make brw_sr0_reg take a subnr and return a vec1 regJason Ekstrand2016-09-211-12/+8
| | | | | | | | | | | The state register sr0 is really a collection of dwords not a SIMD8 anything. It's much more convenient for brw_sr0_reg to return the particular dword you're looking for rather than a giant blob you have to massage into what you want. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> [ Francisco Jerez: Trivial simplification of brw_ud1_reg(). ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* intel: s/brw_device_info/gen_device_info/Jason Ekstrand2016-09-031-2/+2
| | | | | | | | | | | | | Generated by: sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: add helper for creating packing writemaskTimothy Arceri2016-07-211-0/+7
| | | | | | | | | | | For example where n=3 first_component=1 this will give us 0xE (WRITEMASK_YZW). V2: Add assert to check first component is <= 4 (Suggested by Ken) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* i965: add helpers for creating component layout swizzleTimothy Arceri2016-07-211-0/+3
| | | | | | | | | This will be used to swizzle components to the beginning or end of the vector based on the component layout qualifier and whether we are doing a load or store. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* i965: Add missing types to type_sz().Matt Turner2016-06-021-1/+5
| | | | | | | | Coverity warns in multiple places about the potential for division by zero, caused by this function's default case. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Define brw_int_type() helper.Francisco Jerez2016-05-271-0/+20
| | | | | | | Intended as a (partial) inverse of type_sz(). Will be useful in the next commit and some other SIMD32 generator changes I have queued up. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965: Fix brw_regs_equal() for NaN and positive/negative zero.Kenneth Graunke2016-05-201-1/+2
| | | | | | | | We'd like the comparisons to mean "the exact same bits". Comparing doubles won't do that for NaN values or positive vs. negative zero. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Fix undefined df bits in brw_reg comparisons.Kenneth Graunke2016-05-141-8/+19
| | | | | | | | | | | | | | | | | | Commit 5310bca024f77da40ea6f4c275455f9cb0528f9e added a new "double df" field to the brw_reg struct, adding an extra 4 bytes of data that isn't usually initialized (or may contain irrelevant garbage if the struct is mutated). This means that it's no longer safe to memcmp(). Instead, add a brw_regs_equal() function which ignores the extra df bits unless they matter. To keep the implementation cheap, we wrap the first set of fields in a union/struct so that we can use a single DWord comparison. v2: Drop unnecessary casts (caught by Francisco Jerez). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
* i965: add brw_imm_dfConnor Abbott2016-05-101-0/+9
| | | | | | | | | | v2 (Iago) - Fixup accessibility in backend_reg Signed-off-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965: Determine size of double precision float registerTopi Pohjolainen2016-05-101-0/+1
| | | | | | | | | | | | | | This is used to determine how many registers an instruction reads and writes as well as for offseting register region into a desired component. v2 (Connor): rebase on master Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Signed-off-by: Tapani P\344lli <tapani.palli@intel.com> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965/hsw: Initialize SLM index in state registerJordan Justen2016-03-081-0/+16
| | | | | | | | | | | | | | | For Haswell, we need to initialize the SLM index in the state register. This can be copied out of the CS header dword 0. v2: * Use UW move to avoid changing upper 16-bits of sr0.1 (mattst88) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94081 Fixes: piglit arb_compute_shader/execution/shared-atomics.shader_test Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Cc: "11.2" <mesa-stable@lists.freedesktop.org> Tested-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1) Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Add support for swizzling arbitrary immediates to (brw_)swizzle().Francisco Jerez2016-03-061-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Scalar immediates used to be handled correctly by swizzle() (as the identity) but since commit 58fa9d47b536403c4e3ca5d6a2495691338388fd it will corrupt the contents of the immediate. Vector immediates were never handled correctly, but we had ad-hoc code to swizzle VF immediates in the vec4 copy propagation pass. This takes care of swizzling V and UV in addition. v2: Don't implement swizzling of V/UV immediates (Matt). If you need to swizzle an integer vector immediate in the future apply the following diff to go back to v1: --- a/src/mesa/drivers/dri/i965/brw_eu.c +++ b/src/mesa/drivers/dri/i965/brw_eu.c @@ -119,11 +119,10 @@ brw_swap_cmod(uint32_t cmod) static unsigned imm_shift(enum brw_reg_type type, unsigned i) { - assert(type != BRW_REGISTER_TYPE_UV && type != BRW_REGISTER_TYPE_V && - "Not implemented."); - if (type == BRW_REGISTER_TYPE_VF) return 8 * (i & 3); + else if (type == BRW_REGISTER_TYPE_UV || type == BRW_REGISTER_TYPE_V) + return 4 * (i & 7); else return 0; } Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Pass symbolic swizzle to brw_swizzle() as a single argument.Francisco Jerez2016-03-061-11/+4
| | | | | | | | And replace brw_swizzle1() with brw_swizzle(). Seems slightly cleaner and will allow reusing brw_swizzle() in the vec4 back-end more easily. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/gen8: Always use BRW_REGISTER_TYPE_UW for MUL on GEN8+Marta Lofstedt2015-12-301-27/+0
| | | | | | | | | | | | | The imulExtended tests of the shader bitfield tests of the OpenGL ES 3.1 CTS, fail on gen8+, when BRW_REGISTER_TYPE_W is used for SHADER_OPECODE_MULH. Also, remove unused helper function: static inline bool type_is_signed(unsigned type) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595 Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Add tessellation control shaders.Kenneth Graunke2015-12-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | The TCS is the first tessellation shader stage, and the most complicated. It has access to each of the control points in the input patch, and computes a new output patch. There is one logical invocation per output control point; all invocations run in parallel, and can communicate by reading and writing output variables. One of the main responsibilities of the TCS is to write the special gl_TessLevelOuter[] and gl_TessLevelInner[] output variables which control how much new geometry the hardware tessellation engine will produce. Otherwise, it simply writes outputs that are passed along to the TES. We run in SIMD4x2 mode, handling two logical invocations per EU thread. The hardware doesn't properly manage the dispatch mask for us; it always initializes it to 0xFF. We wrap the whole program in an IF..ENDIF block to handle an odd number of invocations, essentially falling back to SIMD4x1 on the last thread. v2: Update comments (requested by Jordan Justen). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Add brw_imm_uv().Matt Turner2015-11-201-0/+9
|
* i965: Don't bother setting regioning on immediates.Matt Turner2015-11-201-6/+0
| | | | The region fields are unioned with the immediate storage.
* i965: Make brw_imm_vf4() take 8-bit restricted floats.Matt Turner2015-11-191-31/+7
| | | | | | | | | | | | | | | | | | | | This partially reverts commit bbf8239f92ecd79431dfa41402e1c85318e7267f. I didn't like that commit to begin with -- computing things at compile time is fine -- but for purposes of verifying that the resulting values are correct, looking up 0x00 and 0x30 in a table is a lot better than evaluating a recursive function. Anyway, by making brw_imm_vf4() take the actual 8-bit restricted floats directly (instead of only integral values that would be converted to restricted float), we can use this function as a replacement for the vector float src_reg/fs_reg constructors. brw_float_to_vf() is not currently an inline function, so it will not be evaluated at compile time. I'll address that in a follow-up patch. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Combine register file field.Matt Turner2015-11-131-2/+2
| | | | | | | | The first four values (2-bits) are hardware values, and VGRF, ATTR, and UNIFORM remain values used in the IR. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use brw_reg's nr field to store register number.Matt Turner2015-11-131-5/+5
| | | | | | | | | | | | In addition to combining another field, we get replace silliness like "reg.reg" with something that actually makes sense, "reg.nr"; and no one will ever wonder again why dst.reg isn't a dst_reg. Moving the now 16-bit nr field to a 16-bit boundary decreases code size by about 3k. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Add and use enum brw_reg_file.Matt Turner2015-11-131-12/+13
| | | | | Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Reorganize brw_reg fields.Matt Turner2015-11-131-8/+8
| | | | | | | | | Put fields that are meaningless with an immediate in the same storage with the immediate. This leaves fields type, file, nr, subnr in the first dword where there's now extra room for expansion. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Make 'dw1' and 'bits' unnamed structures in brw_reg.Matt Turner2015-11-131-20/+20
| | | | | | | | | | | | | | | | | | | | | | Generated by sed -i -e 's/\.bits\././g' *.c *.h *.cpp sed -i -e 's/dw1\.//g' *.c *.h *.cpp and then reverting changes to comments in gen7_blorp.cpp and brw_fs_generator.cpp. There wasn't any utility offered by forcing the programmer to list these to access their fields. Removing them will reduce churn in future commits. This is C11 (and gcc has apparently supported it for sometime "compatibility with other compilers") See https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/brw_reg: Add a brw_VxH_indirect helperJason Ekstrand2015-11-111-0/+11
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Note that the UV immediate type is Gen6+.Matt Turner2015-10-221-1/+1
|
* i965: Turn BRW_MAX_MRF into a macro that accepts a hardware generationIago Toral Quiroga2015-09-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are some bug reports about shaders failing to compile in gen6 because MRF 14 is used when we need to spill. For example: https://bugs.freedesktop.org/show_bug.cgi?id=86469 https://bugs.freedesktop.org/show_bug.cgi?id=90631 Discussion in bugzilla pointed to the fact that gen6 might actually have 24 MRF registers available instead of 16, so we could use other MRF registers and avoid these conflicts (we still need to investigate why some shaders need up to MRF 14 anyway, since this is not expected). Notice that the hardware docs are not clear about this fact: SNB PRM Vol4 Part2's "Table 5-4. MRF Registers Available in Device Hardware" says "Number per Thread" - "24 registers" However, SNB PRM Vol4 Part1, 1.6.1 Message Register File (MRF) says: "Normal threads should construct their messages in m1..m15. (...) Regardless of actual hardware implementation, the thread should not assume th at MRF addresses above m15 wrap to legal MRF registers." Therefore experimentation was necessary to evaluate if we had these extra MRF registers available or not. This was tested in gen6 using MRF registers 21..23 for spilling and doing a full piglit run (all.py) forcing spilling of everything on the FS backend. It was also tested by doing spilling of everything on both the FS and the VS backends with a piglit run of shader.py. In both cases no regressions were observed. In fact, many of these tests where helped in the cases where we forced spilling, since that triggered the same underlying problem described in the bug reports. Here are some results using INTEL_DEBUG=spill_fs,spill_vec4 for a shader.py run on gen6 hardware: Using MRFs 13..15 for spilling: crash: 2, fail: 113, pass: 6621, skip: 5461 Using MRFs 21..23 for spilling: crash: 2, fail: 12, pass: 6722, skip: 5461 This patch sets the ground for later patches to implement spilling using MRF registers 21..23 in gen6. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Move MRF register asserts out of brw_reg.hIago Toral Quiroga2015-09-211-3/+4
| | | | | | | | | | | | | | | In a later patch we will make BRW_MAX_MRF return a different value depending on the hardware generation, but it is inconvenient to add a gen parameter to the brw_reg functions only for the assertions, so move these to places where we have the hardware generation available. Ken suggested to add the asserts to brw_set_src0 and brw_set_dest since that would make sure that we catch all uses of MRF registers, even those coming from modules that generate native code directly, like blorp. Unfortunately, this is very late in the process which can make things harder to debug, so add asserts to the generator as well. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/vec4: Add auxiliary func to build a writemask from a component sizeEduardo Lima Mitev2015-08-031-0/+6
| | | | | | | New method brw_writemask_for_size() will return a writemask with the first 'size' components activated. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* Delete duplicate function is_power_of_two() and use _mesa_is_pow_two()Anuj Phogat2015-07-291-1/+1
| | | | | | Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
* i965: Add notification registerJordan Justen2015-06-121-0/+16
| | | | | | | | | | | | This will be used by the wait instruction when implementing the barrier() function. v2: * Changes suggested by mattst88 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Document brw_mask_reg().Francisco Jerez2015-05-121-1/+5
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Make the brw_inst helpers take a device_info instead of a contextJason Ekstrand2015-04-221-2/+2
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Calculate delta_x and delta_y together.Matt Turner2015-04-211-0/+7
| | | | | | | | | | | | | This lets SIMD16 programs on G45 and Gen5 use the PLN instruction. On Ironlake: total instructions in shared programs: 5634757 -> 5518055 (-2.07%) instructions in affected programs: 1745837 -> 1629135 (-6.68%) helped: 11439 HURT: 4 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965: Make type_sz() return unsigned.Matt Turner2015-04-211-1/+1
| | | | | | Avoids annoying warnings when comparing with sizeof(...). Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/vec4: Some more trivial swizzle clean-up.Francisco Jerez2015-03-231-4/+2
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Fix signedness of brw_is_single_value_swizzle() argument.Francisco Jerez2015-03-231-1/+1
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Define some useful swizzle helper functions.Francisco Jerez2015-03-231-0/+97
| | | | | | | | | | This defines helper functions implementing some common swizzle transformations that are usually open-coded in the compiler back-end, causing a lot of clutter. Some optimization passes will become almost trivial implemented in terms of these functions (e.g. vec4_visitor::opt_reduce_swizzle()). Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Replace ud_reg_to_w() with a more general helper function.Francisco Jerez2015-02-191-0/+22
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Extract scalar region checking logicBen Widawsky2015-01-201-0/+13
| | | | | | | | | | | There are currently 2 users of this functionality. I have 2 more users coming up, and having a simple function makes the results much cleaner. The existing interface semantics was proposed by Matt. v2 (Ken): Rename to region_matches()/has_scalar_region(). Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Add QWORD sizes to type_sz macroBen Widawsky2015-01-201-0/+3
| | | | | | | | | | | | | | | | | | | GEN8 added the QWORD as a valid type for certain operations on the EU. In order to calculate the number of registers used one must have the type size as part of the equation. Quoting the formula in the code: regs_written = (dst.width * dst.stride * type_sz(dst.type) + 31) / 32; Adding this separately for bisection since there is no simple way to add an assert in the type_sz function. NOTE: As a side note, I was confused for a while because it's impossible to calculate the region, ie. registers needed, without vstride. However, at this point these are all part of the IR, and so no vstride must exist. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/brw_reg: struct constructor now needs explicit negate and abs values.Andres Gomez2014-12-151-2/+20
| | | | | | | | | | | | | | | | | | | We were assuming, when constructing a new brw_reg struct, that the negate and abs register modifiers would not be present by default in the new register. Now, we force explicitly setting these values when constructing a new register. This will avoid problems like forgetting to properly set them when we are using a previous register to generate this new register, as it was happening in the dFdx and dFdy generation functions. Fixes piglit test shaders/glsl-deriv-varyings Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82991 Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Add functions to convert float <-> VF.Matt Turner2014-11-251-0/+4
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/brw_reg: Make the accumulator register take an explicit width.Jason Ekstrand2014-09-301-2/+3
| | | | | | | The big pile of patches I just pushed regresses about 25 piglit tests on SNB. This fixes the regressions. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/brw_reg: Add a firsthalf function and use it in the generatorJason Ekstrand2014-09-301-0/+6
| | | | | | | | Right now, this function is a no-op but it indicates that we intend to only use the first half of the 16-wide register. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Make null_reg_* const members of fs_visitor instead of globalsJason Ekstrand2014-09-301-0/+6
| | | | | | | | | We also set the register width equal to the dispatch width. Right now, this is effectively a no-op since we don't do anything with it. However, it will be important once we add an actual width field to fs_reg. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: forward-declare struct brw_context in brw_reg.hIlia Mirkin2014-07-091-0/+2
| | | | | | | | | | | | | | | | | | | | Commit 54e91e7420 introduced a function declaration that uses brw_context. While brw_context tends to get included in most files, it is not when compiling intel_asm_annotation.c resulting in the following warning: In file included from brw_shader.h:25:0, from brw_cfg.h:32, from intel_asm_annotation.c:24: brw_reg.h:122:39: warning: 'struct brw_context' declared inside parameter list [enabled by default] brw_reg.h:122:39: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default] Add a forward-declaration for struct brw_context to avoid the issue. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use enum brw_reg_type for register types.Matt Turner2014-07-051-4/+4
| | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Use unreachable() instead of unconditional assert().Matt Turner2014-07-011-2/+1
| | | | Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>