summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_shader.cpp
Commit message (Collapse)AuthorAgeFilesLines
* i965: use EmitNoIndirectSampler for gen < 7Tapani Pälli2015-06-301-0/+4
| | | | | | Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Cc: "10.5" and "10.6" <mesa-stable@lists.freedesktop.org>
* Revert "i965: Delete linked GLSL IR when using NIR."Kenneth Graunke2015-06-281-4/+1
| | | | This reverts commit 104c8fc2c2aa5621261f80aa6b4f76c3163078f1.
* i965: Delete linked GLSL IR when using NIR.Tapani Pälli2015-06-241-1/+4
| | | | | | | | | | | This is based on Kenneth's patch to delete 'most of the IR'. Due to linker changes to clone variables, we can now free all of IR. Saves 58MB of memory when replaying a Dota 2 trace on Broadwell. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: mesa-stable@lists.freedesktop.org
* i965: Remove the brw_context from the visitorsJason Ekstrand2015-06-231-4/+5
| | | | | | | As of this commit, nothing actually needs the brw_context. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965: Add compiler options to brw_compilerJason Ekstrand2015-06-231-1/+49
| | | | | | | | | | | | | This creates the options at screen cration time and then we just copy them into the context at context creation time. We also move is_scalar to the brw_compiler structure. We also end up manually setting some values that the core would have set by default for us. Fortunately, there are only two non-zero shader compiler option defaults that we aren't overriding anyway so this isn't a big deal. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965/fs: Plumb compiler debug logging through brw_compilerJason Ekstrand2015-06-231-0/+26
| | | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965: Plumb compiler debug logging through a function pointer in brw_compilerJason Ekstrand2015-06-231-0/+16
| | | | | | | | v2 (Ken): Make shader_debug_log a printf-like function. v3 (Jason): Add a void * to pass the brw_context through Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Initialize backend_shader::mem_ctx in its constructor.Matt Turner2015-06-231-0/+2
| | | | | | | We were initializing it in each subclasses' constructors for some reason. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965/fs: Implement support for ir_barrierJordan Justen2015-06-121-0/+3
| | | | | Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965: Disallow saturation for MACH operations.Ben Widawsky2015-06-081-1/+0
| | | | | Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
* i965: Rename backend_visitor to backend_shaderJason Ekstrand2015-05-281-10/+10
| | | | | | | | The backend_shader class really is a representation of a shader. The fact that it inherits from ir_visitor is somewhat immaterial. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Introduce the FIND_LIVE_CHANNEL pseudo-opcode.Francisco Jerez2015-05-041-0/+2
| | | | | | | | | | | | | This instruction calculates the index of an arbitrary channel enabled in the current execution mask. It's expected to be used as input for the BROADCAST opcode, but it's implemented as a separate instruction rather than being baked into BROADCAST because FIND_LIVE_CHANNEL has no dependencies so it can always be CSE'ed with other instances of the same instruction within a basic block. v2: Whitespace fixes. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Introduce the BROADCAST pseudo-opcode.Francisco Jerez2015-05-041-0/+3
| | | | | | | | | | | | | | | | | | | The BROADCAST instruction picks the channel from its first source given by an index passed in as second source. This will be used in situations where all channels from the same SIMD thread have to agree on the value of something, e.g. a surface binding table index. This is in particular the case for UBO, sampler and image arrays, which can be indexed dynamically with the restriction that all active SIMD channels access the same index, provided to the shared unit as part of a single scalar field of the message descriptor. Simply taking the index value from the first channel as we were doing until now is incorrect, because it might contain an uninitialized value if the channel had previously been disabled by non-uniform control flow. v2: Minor style fixes. Improve commit message. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Add memory fence opcode.Francisco Jerez2015-05-041-0/+3
| | | | | Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Add typed surface access opcodes.Francisco Jerez2015-05-041-0/+8
| | | | | Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Add untyped surface write opcode.Francisco Jerez2015-05-041-0/+3
| | | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/cs: Support CS program precompileJordan Justen2015-05-021-0/+4
| | | | | Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/cs: Add CS_OPCODE_CS_TERMINATEJordan Justen2015-05-021-0/+2
| | | | | Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Add missing pixel_x/y to brw_instruction_name().Matt Turner2015-04-241-0/+5
| | | | Forgotten in commit 529064f6.
* i965: Add a brw_compiler structure and store the register sets in itJason Ekstrand2015-04-221-0/+13
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Use device_info instead of the context in instruction schedulingJason Ekstrand2015-04-221-2/+2
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Add a devinfo field to backend_visitor and use it for gen checksJason Ekstrand2015-04-221-1/+2
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Remove the context parameter from brw_texture_offsetJason Ekstrand2015-04-221-7/+1
| | | | | | | | It wasn't really being used anyway. We used it to assert that gpu_shader5 is supported in the back-end but that should be caught by the front-end. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Emit ADDs for gl_FragCoord, not virtual opcodes.Matt Turner2015-04-211-5/+0
| | | | | | | | | | These were used only on Gen4 and 5. emit_interpolation_setup_gen6() emits ADDs directly. The virtual opcodes weren't providing anything useful. I'm going to repurpose these opcodes, so deleting and readding them makes it simpler to see what's going on. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/skl: Add the header for constant loads outside of the generatorNeil Roberts2015-04-161-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 5a06ee738 added a step to the generator to set up the message header when generating the VS_OPCODE_PULL_CONSTANT_LOAD_GEN7 instruction. That pseudo opcode is implemented in terms of multiple actual opcodes, one of which writes to one of the source registers in order to set up the message header. This causes problems because the scheduler isn't aware that the source register is written to and it can end up reorganising the instructions incorrectly such that the write to the source register overwrites a needed value from a previous instruction. This problem was presenting itself as a rendering error in the weapon in Enemy Territory: Quake Wars. Since commit 588859e1 there is an additional problem that the double register allocated to include the message header would end up being split into two. This wasn't happening previously because the code to split registers was explicitly avoided for instructions that are sending from the GRF. This patch fixes both problems by splitting the code to set up the message header into a new pseudo opcode so that it will be done outside of the generator. This new opcode has the header register as a destination so the scheduler can recognise that the register is written to. This has the additional benefit that the scheduler can optimise the message header slightly better by moving the mov instructions further away from the send instructions. On Skylake it appears to fix the following three Piglit tests without causing any regressions: gs-float-array-variable-index gs-mat3x4-row-major gs-mat4x3-row-major I think we actually may need to do something similar for the fs backend and possibly for message headers from regular texture sampling but I'm not entirely sure. v2: Make sure the exec-size is retained as 8 for the mov instruction to initialise the header from g0. This was accidentally lost during a rebase on top of 07c571a39fa1. Split the patch into two so that the helper function is a separate change. Fix emitting the MOV instruction on Gen7. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89058 Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
* i965: Create NIR during LinkShader() and ProgramStringNotify().Kenneth Graunke2015-04-111-0/+6
| | | | | | | | | | | | | | | | | | | | | | | Previously, we translated into NIR and did all the optimizations and lowering as part of running fs_visitor. This meant that we did all of that work twice for fragment shaders - once for SIMD8, and again for SIMD16. We also had to redo it every time we hit a state based recompile. We now generate NIR once at link time. ARB programs don't have linking, so we instead generate it at ProgramStringNotify time. Mesa's fixed function vertex program handling doesn't bother to inform the driver about new programs at all (which is rather mean), so we generate NIR at the last minute, if it hasn't happened already. shader-db runs ~9.4% faster on my i7-5600U, with a release build. v2: Check NirOptions != NULL in ProgramStringNotify(). Don't bother using _mesa_program_enum_to_shader_stage as we already know it. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965: Move lower_output_reads to brw_link_shader().Kenneth Graunke2015-04-111-0/+3
| | | | | | | | This makes it so emit_nir_code() doesn't modify the GLSL IR. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Move brw_link_shader's GLSL IR transformations into a helper.Kenneth Graunke2015-04-101-93/+99
| | | | | | | | This function was getting a bit large and unwieldy. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Change brw_shader to gl_shader in brw_link_shader().Kenneth Graunke2015-04-101-32/+31
| | | | | | | | | Nothing actually wanted brw_shader fields - we just had to type shader->base all over the place for no reason. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Free dead GLSL IR one last time.Kenneth Graunke2015-04-061-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | While working on NIR's memory allocation model, I realized the GLSL IR memory model was broken. During glCompileShader, we allocate everything out of the _mesa_glsl_parse_state context, and reparent it to gl_shader at the end. During glLinkProgram, we allocate everything out of a temporary context, then reparent it to the exec_list containing the linked IR. But during brw_link_shader - the driver's final opportunity to do lowering and optimization - we just allocated everything out of the permanent context given to us by the linker. That memory stayed forever. Notably, passes like brw_fs_channel_expressions cause us to churn the majority of the code, so we really want to free dead IR here. Saves 125MB of memory when replaying a Dota 2 trace on Broadwell. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965: Use brw_nir_cubemap_normalize for NIR shadersJason Ekstrand2015-04-031-1/+2
| | | | Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Define method to check whether a backend_reg is inside a given range.Francisco Jerez2015-03-231-0/+9
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: De-duplicate is_expression_commutative() functions.Kenneth Graunke2015-03-151-0/+22
| | | | | | | | Create a backend_inst::is_commutative() method to replace two static functions that did the exact same thing. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Silence GCC maybe-uninitialized warning.Vinson Lee2015-03-091-1/+1
| | | | | | | | | | brw_shader.cpp: In function ‘bool brw_saturate_immediate(brw_reg_type, brw_reg*)’: brw_shader.cpp:618:31: warning: ‘sat_imm.brw_saturate_immediate(brw_reg_type, brw_reg*)::<anonymous union>::ud’ may be used uninitialized in this function [-Wmaybe-uninitialized] reg->dw1.ud = sat_imm.ud; ^ Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
* Fix invalid extern "C" around header inclusion.Mark Janes2015-03-051-2/+0
| | | | | | | | | | | System headers may contain C++ declarations, which cannot be given C linkage. For this reason, include statements should never occur inside extern "C". This patch moves the C linkage statements to enclose only the declarations within a single header. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* i965: Consider scratch writes to have side effects.Matt Turner2015-03-021-0/+1
| | | | | | | | We could do better by tracking scratch reads and writes. Cc: 10.5 <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88793 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/vec4: Add and use byte-MOV instruction for unpack 4x8.Matt Turner2015-02-191-0/+2
| | | | | | | | | Previously we were using a B/UB source in an Align16 instruction, which is illegal. It for some reason works on all platforms, except Broadwell. Cc: "10.5" <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86811 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: just avoid warnings with fp64Dave Airlie2015-02-201-0/+1
| | | | | | | This just fills in some blanks to avoid warnings in the i965 driver. Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Signed-off-by: Dave Airlie <airlied@redhat.com>
* i965: Create backend_visitor fields for debugging messages.Kenneth Graunke2015-02-191-0/+3
| | | | | | | | | | | | | We introduce three new fields in backend_visitor: - debug_enabled: whether or not INTEL_DEBUG & DEBUG_<stage flag> - stage_name: "vertex", "fragment", etc. for use in messages - stage_abbrev: "VS", "FS", etc. for use in messages Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Allow exec_list sentinels as arguments to insert functions.Matt Turner2015-02-171-2/+4
| | | | | | | | | | | | | | | | | | To insert an instruction at the end of a basic block, we typically do something like inst = block->last_non_control_flow_inst(); inst->insert_after(block, new_inst); But blocks can consist of a single control flow instruction, so inst will actually be the exec_list's head sentinel. We shouldn't use it as if it were a regular instruction, but it is safe to insert something after it. This patch avoids assert-failing because an exec_list sentinel wasn't in the basic block's instruction list. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Let dump_instructions() work before calculate_cfg().Matt Turner2015-02-151-5/+12
| | | | Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
* i965: Add an is_negative_one() method.Matt Turner2015-02-151-0/+16
| | | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Move some asserts to unreachable.Eric Anholt2015-02-121-2/+2
| | | | | | | | | | If execution was supposed to be supported in this case, we'd run into trouble from completely uninitialized sat_imm values. v2: Drop the '!' before the string. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Add LINTERP/CINTERP to can_do_cmod().Matt Turner2015-02-111-0/+2
| | | | | | | | | | | | LINTERP is implemented as a PLN instruction or a LINE+MAC. PLN and MAC can do conditional mod. CINTERP is just a MOV. total instructions in shared programs: 5952103 -> 5950284 (-0.03%) instructions in affected programs: 324573 -> 322754 (-0.56%) helped: 1819 We lose the SIMD16 in one Unigine Heaven shader which appears six times in shader-db.
* i965: Allocate binding table space for shader images.Francisco Jerez2015-02-101-0/+7
| | | | | | | v2: Bump the number of supported image uniforms to 32 (Ken). Reviewed-by: Paul Berry <stereotype441@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Handle negated unsigned immediate values in constant propagation.Francisco Jerez2015-02-101-6/+2
| | | | | | | | | Negation of UD/UW sources behaves the same as for D/W sources, taking the two's complement of the source, except for bitwise logical operations on Gen8 and up which take the one's complement. Fixes crash in a GLSL shader with subtraction of two unsigned values. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Add function to take the abs of immediates.Matt Turner2015-02-031-0/+39
| | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965: Add function to negate immediates.Matt Turner2015-02-031-0/+39
| | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965: Mark UB/B immediates as unreachable.Matt Turner2015-02-031-4/+1
| | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* glsl: Improve precision of mod(x,y)Iago Toral Quiroga2015-02-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, Mesa uses the lowering pass MOD_TO_FRACT to implement mod(x,y) as y * fract(x/y). This implementation has a down side though: it introduces precision errors due to the fract() operation. Even worse, since the result of fract() is multiplied by y, the larger y gets the larger the precision error we produce, so for large enough numbers the precision loss is significant. Some examples on i965: Operation Precision error ----------------------------------------------------- mod(-1.951171875, 1.9980468750) 0.0000000447 mod(121.57, 13.29) 0.0000023842 mod(3769.12, 321.99) 0.0000762939 mod(3769.12, 1321.99) 0.0001220703 mod(-987654.125, 123456.984375) 0.0160663128 mod( 987654.125, 123456.984375) 0.0312500000 This patch replaces the current lowering pass with a different one (MOD_TO_FLOOR) that follows the recommended implementation in the GLSL man pages: mod(x,y) = x - y * floor(x/y) This implementation eliminates the precision errors at the expense of an additional add instruction on some systems. On systems that can do negate with multiply-add in a single operation this new implementation would come at no additional cost. v2 (Ian Romanick) - Do not clone operands because when they are expressions we would be duplicating them and that can lead to suboptimal code. Fixes the following 16 dEQP tests: dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.mediump_* dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.highp_* Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>