summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_eu_emit.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: fix unused variable warning in gen7_block_read_scratch()Timothy Arceri2016-10-051-2/+1
| | | | Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Take Dispatch/Vector mask into account in FIND_LIVE_CHANNELJason Ekstrand2016-09-211-9/+30
| | | | | | | | | | | | | | | | | | | On at least Sky Lake, ce0 does not contain the full story as far as enabled channels goes. It is possible to have completely disabled channels where the corresponding bits in ce0 are 1. In order to get the correct execution mask, you have to mask off those channels which were disabled from the beginning by taking the AND of ce0 with either sr0.2 or sr0.3 depending on the shader stage. Failure to do so can result in FIND_LIVE_CHANNEL returning a completely dead channel. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Cc: Francisco Jerez <currojerez@riseup.net> [ Francisco Jerez: Fix a couple of typos, add mask register type assertion, clarify reason why ce0 can have bits set for disabled channels, clarify that this may only be a problem when thread dispatch doesn't pack channels tightly in the SIMD thread. Apply same treatment to Align16 path. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* intel: Rename brw_get_device_name/info to gen_get_device_name/infoJason Ekstrand2016-09-031-1/+1
| | | | | Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* intel: s/brw_device_info/gen_device_info/Jason Ekstrand2016-09-031-71/+71
| | | | | | | | | | | | | Generated by: sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Pass start_offset to brw_set_uip_jip().Matt Turner2016-08-311-11/+3
| | | | | | | | | | | | Without this, we would pass over the instructions in the SIMD8 program (which is located earlier in the buffer) when brw_set_uip_jip() is called to handle the SIMD16 program. The assertion about compacted control flow was bogus: halt, cont, break cannot be compacted because they have both JIP and UIP. Instead, we should never see a compacted instruction in this code at all. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/eu: Add codegen support for the Gen9+ render target read message.Francisco Jerez2016-08-251-0/+28
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/eu: Take into account the target cache argument in brw_set_dp_read_message.Francisco Jerez2016-08-251-2/+13
| | | | | | | | | | | brw_set_dp_read_message() was setting the data cache as send message SFID on Gen7+ hardware, ignoring the target cache specified by the caller. Some of the callers were passing a bogus target cache value as argument relying on brw_set_dp_read_message not to take it into account. Fix them too. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/eu: set DF imm value to the source of DIMSamuel Iglesias Gonsálvez2016-07-141-1/+2
| | | | | | | | | | | | | | | According to HSW's PRM, vol02b, the DIM instruction has the following restriction: "Restriction : src0 must be immediate. src0 must specify the :f (F, Float) type encoding but is an immediate 64-bit DF (Double Float) value. dst must have type DF." This commit allows to upload the immediate 64-bit DF value to the source of a DIM instruction even when it is of float type encoding. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: enable the emission of the DIM instructionSamuel Iglesias Gonsálvez2016-07-141-0/+1
| | | | | | | | | | v2 (Matt): - Take a DF source argument for the DIM instruction emission in the visitors. - Indentation. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* Revert "i965/fs: Allow scalar source regions on SNB math instructions."Francisco Jerez2016-06-031-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit c1107cec44ab030c7fcc97c67baa12df1cc9d7b5. Apparently the hardware spec text I quoted in the commit message was outright lying about scalar source math being supported on SNB, the hardware seems to load 32 contiguous bits of data for each channel regardless of the regioning mode. Fixes regressions in the following CTS tests (which we didn't catch early due to CTS being temporarily disabled in our CI system): es2-cts.gtf.gl.atan.atan_vec3_frag_xvary es2-cts.gtf.gl.cos.cos_vec2_frag_xvary es2-cts.gtf.gl.atan.atan_vec2_frag_xvary es2-cts.gtf.gl.pow.pow_vec2_frag_xvary_yconsthalf es2-cts.gtf.gl.cos.cos_float_frag_xvary es2-cts.gtf.gl.pow.pow_float_frag_xvary_yconsthalf es2-cts.gtf.gl.atan.atan_vec3_frag_xvaryyvary es2-cts.gtf.gl.pow.pow_vec3_frag_xvary_yconsthalf es2-cts.gtf.gl.cos.cos_vec3_frag_xvary es2-cts.gtf.gl.atan.atan_vec2_frag_xvaryyvary Cc: mesa-stable@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96346 Reported-by: Mark Janes <mark.a.janes@intel.com> Acked-by: Matt Turner <mattst88@gmail.com>
* i965/eu: use simd8 when exec_size != EXECUTE_16Alejandro Piñeiro2016-06-021-2/+2
| | | | | | | | | | Among other thigs, fix a gpu hang when using INTEL_DEBUG=shader_time for any shader. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Allow scalar source regions on SNB math instructions.Francisco Jerez2016-05-311-2/+4
| | | | | | | | | | | | | | | | I haven't found any evidence that this isn't supported by the hardware, in fact according to the SNB hardware spec: "The supported regioning modes for math instructions are align16, align1 with the following restrictions: - Scalar source is supported. [...] - Source and destination offset must be the same, except the case of scalar source." Cc: "12.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/ir: Make BROADCAST emit an unmasked single-channel move.Francisco Jerez2016-05-271-3/+9
| | | | | | | | | | | | | | Alternatively we could have extended the current semantics to 32-wide mode by changing brw_broadcast() to emit multiple indexed MOV instructions in the generator copying the selected value to all destination registers, but it seemed rather silly to waste EU cycles unnecessarily copying the exact same value 32 times in the GRF. The vstride change in the Align16 path is required to avoid assertions in validate_reg() since the change causes the execution size of the MOV and SEL instructions to be equal to the source region width. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Allow specifying arbitrary quarter control to FIND_LIVE_CHANNEL.Francisco Jerez2016-05-271-7/+12
| | | | | | | | | | | This makes FIND_LIVE_CHANNEL behave like a normal instruction for non-zero quarter control. On Gen8+ we just leave the quarter control field of the emitted FBL instruction set to the default value so the hardware applies the expected shift to the execution mask signals. On Gen7 we apply the offset manually by specifying a non-zero subregister offset in the source region of the FBL instruction. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Allow specifying arbitrary execution sizes up to 32 to ↵Francisco Jerez2016-05-271-8/+17
| | | | | | | | | | | | FIND_LIVE_CHANNEL. Due to a Gen7-specific hardware bug native 32-wide instructions get the lower 16 bits of the execution mask applied incorrectly to both halves of the instruction, so the MOV trick we currently use wouldn't work. Instead emit multiple 16-wide MOV instructions in 32-wide mode in order to cover the whole execution mask. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Implement scratch reads and writes of 4 GRFs at a time.Francisco Jerez2016-05-271-18/+13
| | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/eu: Fix Gen7+ DP scratch message size calculation on Gen7.Francisco Jerez2016-05-271-1/+4
| | | | | | | | Gen7 hardware expects the block size field in the message descriptor to be the number of registers minus one instead of the log2 of the number of registers. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/eu: Set execution size explicitly for memory fence send message.Francisco Jerez2016-05-271-4/+7
| | | | | | | | We don't want to emit a 32-wide send message in 32-wide programs. The memory fence message should have the same effect regardless of the execution size (as long as it's valid) so just set it to one. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/eu: Consider QtrCtrl 3Q-4Q in typed surface message descriptor setup.Francisco Jerez2016-05-271-6/+6
| | | | | | | | | | | In SIMD32 programs the compiler is responsible for providing the appropriate half of the sample mask in the message header, so the first and third quarters both map to the first slot group of the provided 16-bit half, while the second and fourth quarters map to the second slot group -- IOW they should be equivalent to 1Q and 2Q modulo two. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Clean up remaining uses of dispatch_width in the generator.Francisco Jerez2016-05-271-2/+1
| | | | | | | | Most of these are bugs because the intended execution size of an instruction and the dispatch width of the shader aren't necessarily the same (especially in SIMD32 programs). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/eu: Use current exec size instead of p->compressed in surface message ↵Francisco Jerez2016-05-271-6/+8
| | | | | | | | | | generation. This was kind of an abuse of p->compressed, dataport send message instructions are always uncompressed. Use the current execution size instead since p->compressed is on its way out. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/eu: Stop using p->compressed to specify the exec size of control flow ↵Francisco Jerez2016-05-271-13/+11
| | | | | | | | | instructions. p->compressed won't work for SIMD32, we should just be using the execution size value specified via p->current instead. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/eu: Fix a bunch of compression control bugs in the generator.Francisco Jerez2016-05-271-9/+8
| | | | | | | | | | Most of these were resetting quarter control to zero incorrectly even though everything they needed to do was disable instruction compression -- The brw_SAMPLE() case was doing the right thing but it can be simplified slightly by using the new compression control interface. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965: Mark fallthrough in switch statement.Matt Turner2016-05-251-0/+1
| | | | | Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
* i965: Fix JIP to skip over sibling do...while loops.Kenneth Graunke2016-05-161-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've apparently always been botching JIP for sequences such as: do cmp.f0.0 ... (+f0.0) break ... do ... while ... while Because the "do" instruction doesn't actually exist, the inner "while" is at the same depth as the "break". brw_find_next_block_end() thus mistook the inner "while" as the end of the loop containing the "break", and set the "break" to point to the wrong place. Only "while" instructions that jump before our instruction are relevant. We need to ignore the rest, as they're sibling control flow nodes (or children, but this was already handled by the depth == 0 check). See also commit 1ac1581f3889d5f7e6e231c05651f44fbd80f0b6. This prevents channel masks from being screwed up, and fixes GPU hangs(*) in dEQP-GLES31.functional.shaders.multisample_interpolation. interpolate_at_sample.centroid_qualified.multisample_texture_16. The test ended up executing code with no channels enabled, and that code contained FIND_LIVE_CHANNEL, which returned 8 (out of range for a SIMD8 program), which then was used in indirect GRF addressing, which randomly got a boolean value (0xFFFFFFFF), interpreted it as a sample ID, OR'd it into an indirect send message descriptor, which corrupted the message length, sending a pixel interpolator message with mlen 15, which is illegal. Whew :) (*) Technically, the test doesn't GPU hang currently, but only because another bug prevents it from issuing pixel interpolator messages entirely...with that fixed, it hangs. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965: Make a "does this while jump before our instruction?" helper.Kenneth Graunke2016-05-161-4/+12
| | | | | | | | I need to use this in an additional place. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965: two-argument instructions can only use 32-bit immediatesIago Toral Quiroga2016-05-101-0/+2
| | | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965/eu: add support for DF immediatesConnor Abbott2016-05-101-7/+21
| | | | | | | | | v2 (Sam): - Remove 'however' from the comment (Topi) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965/eu: Allow 3-src float ops with doublesTopi Pohjolainen2016-05-101-6/+18
| | | | | | | | | | v2: - set 3src_src_type for BRW_REGISTER_TYPE_DF (Connor) Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965: fix invalid memory writeMarc-André Lureau2016-03-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I noticed some heap corruption running virgl tests, and valgrind helped me to track it down to the following error: ==29272== Invalid write of size 4 ==29272== at 0x90283D4: push_loop_stack (brw_eu_emit.c:1307) ==29272== by 0x9029A7D: brw_DO (brw_eu_emit.c:1750) ==29272== by 0x90554B0: fs_generator::generate_code(cfg_t const*, int) (brw_fs_generator.cpp:1999) ==29272== by 0x904491F: brw_compile_fs (brw_fs.cpp:5685) ==29272== by 0x8FC5DC5: brw_codegen_wm_prog (brw_wm.c:137) ==29272== by 0x8FC7663: brw_fs_precompile (brw_wm.c:638) ==29272== by 0x8FA4040: brw_shader_precompile(gl_context*, gl_shader_program*) (brw_link.cpp:51) ==29272== by 0x8FA4A9A: brw_link_shader (brw_link.cpp:260) ==29272== by 0x8DEF751: _mesa_glsl_link_shader (ir_to_mesa.cpp:3006) ==29272== by 0x8C84325: _mesa_link_program (shaderapi.c:1042) ==29272== by 0x8C851D7: _mesa_LinkProgram (shaderapi.c:1515) ==29272== by 0x4E4B8E8: add_shader_program (vrend_renderer.c:880) ==29272== Address 0xf2f3cb0 is 0 bytes after a block of size 112 alloc'd ==29272== at 0x4C2AA98: calloc (vg_replace_malloc.c:711) ==29272== by 0x8ED11F7: ralloc_size (ralloc.c:113) ==29272== by 0x8ED1282: rzalloc_size (ralloc.c:134) ==29272== by 0x8ED14C0: rzalloc_array_size (ralloc.c:196) ==29272== by 0x9019C7B: brw_init_codegen (brw_eu.c:291) ==29272== by 0x904F565: fs_generator::fs_generator(brw_compiler const*, void*, void*, void const*, brw_stage_prog_data*, unsigned int, bool, gl_shader_stage) (brw_fs_generator.cpp:124) ==29272== by 0x9044883: brw_compile_fs (brw_fs.cpp:5675) ==29272== by 0x8FC5DC5: brw_codegen_wm_prog (brw_wm.c:137) ==29272== by 0x8FC7663: brw_fs_precompile (brw_wm.c:638) ==29272== by 0x8FA4040: brw_shader_precompile(gl_context*, gl_shader_program*) (brw_link.cpp:51) ==29272== by 0x8FA4A9A: brw_link_shader (brw_link.cpp:260) ==29272== by 0x8DEF751: _mesa_glsl_link_shader (ir_to_mesa.cpp:3006) if_depth_in_loop is an array of size p->loop_stack_array_size, and push_loop_stack() will access if_depth_in_loop[p->loop_stack_depth+1], thus the condition to grow the array should be p->loop_stack_array_size <= (p->loop_stack_depth + 1) (it's currently off by 2...) This can be reproduced by running the following test with virgl test server: LIBGL_ALWAYS_SOFTWARE=y GALLIUM_DRIVER=virpipe bin/shader_runner ./tests/shaders/glsl-fs-unroll-explosion.shader_test -auto Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Skip execution size adjustment for instructions of width 4Iago Toral Quiroga2016-03-171-1/+13
| | | | | | | | | | | | | | | | | | | | This code in brw_set_dest adjusts the execution size of any instruction with a dst.width < 8. However, we don't want to do this with instructions operating on doubles, since these will have a width of 4, but still need an execution size of 8 (for SIMD8). Unfortunately, we can't just check the size of the operands involved to detect if we are doing an operation on doubles, because we can have instructions that do operations on double operands interpreted as UD, operating on any of its 2 32-bit components. Previous commits have made it so we never emit instructions with a horizontal width of 4 that don't have the correct execution size set for gen6+, so we can skip it in this case, avoiding the conflicts with fp64 requirements. Expanding the same fix to other hardware generations requires many more changes but since we are not targetting fp64 support on them wer don't really care for now. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: set correct execsize for MOVS with a width of 4 in brw_find_live_channelIago Toral Quiroga2016-03-171-0/+3
| | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965/eu: set execution size for SEND message in brw_send_indirect_messageIago Toral Quiroga2016-03-171-0/+3
| | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965/eu: set correct execution size in brw_NOPIago Toral Quiroga2016-03-171-2/+3
| | | | | | | v2: NOP should have an execsize of 1 (Matt) Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Pass symbolic swizzle to brw_swizzle() as a single argument.Francisco Jerez2016-03-061-1/+1
| | | | | | | | And replace brw_swizzle1() with brw_swizzle(). Seems slightly cleaner and will allow reusing brw_swizzle() in the vec4 back-end more easily. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Set dest type to UW for several send messagesJordan Justen2016-02-261-1/+4
| | | | | | | | | | | | | | Without this, on SIMD 16 the send instruction destination will appear to write more than one destination register, causing the simulator to report an error. Of course, the send instruction can actually write more than one destination register regardless of the type set for the destination, so this is a bit strange. Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965: Move 3-src subnr swizzle handling into the vec4 backend.Kenneth Graunke2016-01-021-6/+5
| | | | | | | | | | | | | | | | | | | | | | | While most align16 instructions only support a SubRegNum of 0 or 4 (using swizzling to control the other channels), 3-src instructions actually support arbitrary SubRegNums. When the RepCtrl bit is set, we believe it ignores the swizzle and uses the equivalent of a <0,1,0> region from the subnr. In the past, we adopted a vec4-centric approach of specifying subnr of 0 or 4 and a swizzle, then having brw_eu_emit.c convert that to a proper SubRegNum. This isn't a great fit for the scalar backend, where we don't set swizzles at all, and happily set subnrs in the range [0, 7]. This patch changes brw_eu_emit.c to use subnr and swizzle directly, relying on the higher levels to set them sensibly. This should fix problems where scalar sources get copy propagated into 3-src instructions in the FS backend. I've only observed this with TES push model inputs, but I suppose it could happen in other cases. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Make brw_set_message_descriptor() non-static.Kenneth Graunke2015-12-111-1/+1
| | | | | | | | I want to use this directly from brw_vec4_generator.cpp. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
* i965: Fix JIP to properly skip over unrelated control flow.Kenneth Graunke2015-11-301-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've apparently always been botching JIP for sequences such as: do cmp.f0.0 ... (+f0.0) break ... if ... else ... endif ... while Normally, UIP is supposed to point to the final destination of the jump, while in nested control flow, JIP is supposed to point to the end of the current nesting level. It essentially bounces out of the current nested control flow, to an instruction that has a JIP which bounces out another level, and so on. In the above example, when setting JIP for the BREAK, we call brw_find_next_block_end(), which begins a search after the BREAK for the next ENDIF, ELSE, WHILE, or HALT. It ignores the IF and finds the ELSE, setting JIP there. This makes no sense at all. The break is supposed to skip over the whole if/else/endif block entirely. They have a sibling relationship, not a nesting relationship. This patch fixes brw_find_next_block_end() to track depth as it does its search, and ignore anything not at depth 0. So when it sees the IF, it ignores everything until after the ENDIF. That way, it finds the end of the right block. I noticed this while reading some assembly code. We believe jumping earlier is harmless, but makes the EU walk through a bunch of disabled instructions for no reason. I noticed that GLBenchmark Manhattan had a shader that contained a BREAK with a bogus JIP, but didn't measure any performance improvement (it's likely miniscule, if there is any). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965/gen9+: Switch thread scratch space to non-coherent stateless access.Francisco Jerez2015-11-261-2/+15
| | | | | | | | | | | | | | | | | | | | The thread scratch space is thread-local so using the full IA-coherent stateless surface index (255 since Gen8) is unnecessary and potentially expensive. On Gen8 and early steppings of Gen9 this is not a functional change because the kernel already sets bit 4 of HDC_CHICKEN0 which overrides all HDC memory access to be non-coherent in order to workaround a hardware bug. This happens to fix a full system hang when running any spilling code on a pre-production SKL GT4e machine I have on my desk (forcing all HDC access to non-coherent from the kernel up to stepping F0 might be a good idea though regardless of this patch), and improves performance of the OglPSBump2 SynMark benchmark run with INTEL_DEBUG=spill_fs by 33% (11 runs, 5% significance) on a production SKL GT2 (on which HDC IA-coherency is apparently functional so it wouldn't make sense to disable globally). Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
* i965: Use BRW_MRF_COMPR4 macro in more places.Matt Turner2015-11-131-2/+2
| | | | | Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Add and use enum brw_reg_file.Matt Turner2015-11-131-1/+1
| | | | | Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Make 'dw1' and 'bits' unnamed structures in brw_reg.Matt Turner2015-11-131-26/+26
| | | | | | | | | | | | | | | | | | | | | | Generated by sed -i -e 's/\.bits\././g' *.c *.h *.cpp sed -i -e 's/dw1\.//g' *.c *.h *.cpp and then reverting changes to comments in gen7_blorp.cpp and brw_fs_generator.cpp. There wasn't any utility offered by forcing the programmer to list these to access their fields. Removing them will reduce churn in future commits. This is C11 (and gcc has apparently supported it for sometime "compatibility with other compilers") See https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Fix invalid memory accesses after resizing brw_codegen's store tableKristian Høgsberg2015-10-301-4/+13
| | | | Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Set correct field for indirect align16 addrimm.Matt Turner2015-10-291-1/+1
| | | | | | This has been wrong since the initial import of the i965 driver. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Don't use message headers for untyped readsKristian Høgsberg Kristensen2015-10-231-2/+1
| | | | | | | | | | | | We always set the mask to 0xffff, which is what it defaults to when no header is present. Let's drop the header instead. v2: Only remove header for untyped reads. Typed reads always need the header. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
* i965/fs: Handle non-const sample number in interpolateAtSampleNeil Roberts2015-10-091-12/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a non-const sample number is given to interpolateAtSample it will now generate an indirect send message with the sample ID similar to how non-const sampler array indexing works. Previously non-const values were ignored and instead it ended up using a constant 0 value. The generator will try to determine if the sample ID is dynamically uniform via nir_src_is_dynamically_uniform. If not it will query the pixel interpolator in a loop, once for each different live sample number. The next live sample number is found using emit_uniformize. If multiple live channels have the same sample number then they will be handled in a single iteration of the loop. The loop is necessary because the indirect send message doesn't seem to have a way to specify a different value for each fragment. This fixes the following two Piglit tests: arb_gpu_shader5-interpolateAtSample-nonconst arb_gpu_shader5-interpolateAtSample-dynamically-nonuniform v2: Handle dynamically non-uniform sample ids. v3: Remove the BREAK instruction and predicate the WHILE directly. Make the tokens arrays const. (Matt Turner) v4: Iterate over the live channels instead of each possible sample number. v5: Don't special case immediate values in brw_pixel_interpolator_query. Make a better wrapper for the function to set up the PI send instruction. Ensure that the SHL instructions are scalar. (Francisco Jerez). Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965: Fix MRF register number assertions for compr4.Kenneth Graunke2015-09-211-2/+2
| | | | | | | | | | compr4 is represented by setting the high bit on the MRF number. We need to mask it out before sanity checking the register number. Fixes ~8000 assert fails on Ironlake and G45. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92066 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Turn BRW_MAX_MRF into a macro that accepts a hardware generationIago Toral Quiroga2015-09-211-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are some bug reports about shaders failing to compile in gen6 because MRF 14 is used when we need to spill. For example: https://bugs.freedesktop.org/show_bug.cgi?id=86469 https://bugs.freedesktop.org/show_bug.cgi?id=90631 Discussion in bugzilla pointed to the fact that gen6 might actually have 24 MRF registers available instead of 16, so we could use other MRF registers and avoid these conflicts (we still need to investigate why some shaders need up to MRF 14 anyway, since this is not expected). Notice that the hardware docs are not clear about this fact: SNB PRM Vol4 Part2's "Table 5-4. MRF Registers Available in Device Hardware" says "Number per Thread" - "24 registers" However, SNB PRM Vol4 Part1, 1.6.1 Message Register File (MRF) says: "Normal threads should construct their messages in m1..m15. (...) Regardless of actual hardware implementation, the thread should not assume th at MRF addresses above m15 wrap to legal MRF registers." Therefore experimentation was necessary to evaluate if we had these extra MRF registers available or not. This was tested in gen6 using MRF registers 21..23 for spilling and doing a full piglit run (all.py) forcing spilling of everything on the FS backend. It was also tested by doing spilling of everything on both the FS and the VS backends with a piglit run of shader.py. In both cases no regressions were observed. In fact, many of these tests where helped in the cases where we forced spilling, since that triggered the same underlying problem described in the bug reports. Here are some results using INTEL_DEBUG=spill_fs,spill_vec4 for a shader.py run on gen6 hardware: Using MRFs 13..15 for spilling: crash: 2, fail: 113, pass: 6621, skip: 5461 Using MRFs 21..23 for spilling: crash: 2, fail: 12, pass: 6722, skip: 5461 This patch sets the ground for later patches to implement spilling using MRF registers 21..23 in gen6. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Move MRF register asserts out of brw_reg.hIago Toral Quiroga2015-09-211-3/+6
| | | | | | | | | | | | | | | In a later patch we will make BRW_MAX_MRF return a different value depending on the hardware generation, but it is inconvenient to add a gen parameter to the brw_reg functions only for the assertions, so move these to places where we have the hardware generation available. Ken suggested to add the asserts to brw_set_src0 and brw_set_dest since that would make sure that we catch all uses of MRF registers, even those coming from modules that generate native code directly, like blorp. Unfortunately, this is very late in the process which can make things harder to debug, so add asserts to the generator as well. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>