summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_fs.h
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: Use the builder dispatch_width for computing register offsetsJason Ekstrand2015-06-301-1/+1
| | | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Add a builder argument to offset()Jason Ekstrand2015-06-301-1/+1
| | | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Move offset(fs_reg, unsigned) to brw_fs.hJason Ekstrand2015-06-301-0/+21
| | | | | | | | Shortly, offset() will depend on the builder so we need it moved to some place where it has access to that. Reviewed-by: Iago Toral Quiroga <itoral@igali.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: emit constants only onceConnor Abbott2015-06-301-0/+2
| | | | | | | | | | | | | | | | | | | Before, we would lazily emit a MOV whenever we encountered a use of a constant. Now that we have a dedicated file for SSA values, we can instead only emit the MOV's once, which is more consistent and prevents us from relying on CSE to re-combine the constants when they aren't absorbed into the instruction. total instructions in shared programs: 6078991 -> 6073118 (-0.10%) instructions in affected programs: 402221 -> 396348 (-1.46%) helped: 1527 HURT: 0 GAINED: 8 LOST: 2 v2: split this out from the previous commit (Jason) Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs: use SSA values directlyConnor Abbott2015-06-301-0/+3
| | | | | | | | | | | Before, we would use registers, but set a magical "parent_instr" field to indicate that it was actually purely an SSA value (i.e., it wasn't involved in any phi nodes). Instead, just use SSA values directly, which lets us get rid of the hack and reduces memory usage since we're not allocating a nir_register for every value. It also makes our handling of load_const more consistent compared to the other instructions. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/vs: Move compute_clip_distance() out of emit_urb_writes().Kenneth Graunke2015-06-281-1/+1
| | | | | | | | | | | | | | | | | Legacy user clipping (using gl_Position or gl_ClipVertex) is handled by turning those into the modern gl_ClipDistance equivalents. This is unnecessary in Core Profile: if user clipping is enabled, but the shader doesn't write the corresponding gl_ClipDistance entry, results are undefined. Hence, it is also unnecessary for geometry shaders. This patch moves the call up to run_vs(). This is equivalent for VS, but removes the need to pass clip distances into emit_urb_writes(). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965: Remove the brw_context from the visitorsJason Ekstrand2015-06-231-1/+1
| | | | | | | As of this commit, nothing actually needs the brw_context. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965/vs: Pass the current set of clip planes through run() and run_vs()Jason Ekstrand2015-06-231-4/+4
| | | | | | | | | Previously, these were pulled out of the GL context conditionally based on whether we were running ff/ARB or a GLSL program. Now, we just pass them in so that the visitor doesn't have to grab them itself. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965/fs: Add a do_rep_send flag to run_fsJason Ekstrand2015-06-231-1/+1
| | | | | | | Previously, we were pulling it from brw->do_rep_send Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965: Pull calls to get_shader_time_index out of the visitorJason Ekstrand2015-06-231-2/+5
| | | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965: Use a single index per shader for shader_time.Jason Ekstrand2015-06-231-1/+2
| | | | | | | | | | Previously, each shader took 3 shader time indices which were potentially at arbirary points in the shader time buffer. Now, each shader gets a single index which refers to 3 consecutive locations in the buffer. This simplifies some of the logic at the cost of having a magic 3 a few places. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965/fs: Make no16 non-variadicJason Ekstrand2015-06-231-1/+1
| | | | | | | We never used the fact that it was variadic anyway. Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Remove the dependance on brw_context from the generatorsJason Ekstrand2015-06-231-1/+3
| | | | | Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Plumb compiler debug logging through a function pointer in brw_compilerJason Ekstrand2015-06-231-2/+2
| | | | | | | | v2 (Ken): Make shader_debug_log a printf-like function. v3 (Jason): Add a void * to pass the brw_context through Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Implement support for ir_barrierJordan Justen2015-06-121-0/+3
| | | | | Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* fs/reg_allocate: Remove the MRF hack helpers from fs_visitorJason Ekstrand2015-06-091-4/+0
| | | | | | | These are helpers that only exist in this one file. No reason to put them in the visitor. Reviewed-by: Neil Roberts <neil@linux.intel.com>
* i965/fs: Don't let the EOT send message interfere with the MRF hackJason Ekstrand2015-06-091-1/+2
| | | | | | | | | | | | | Previously, we just put the message for the EOT send as high in the file as it would go. This is because the register pre-filling hardware will stop all over the early registers in the file in preparation for the next thread while you're still sending the last message. However, if something happens to spill, then the MRF hack interferes with the EOT send message and, if things aren't scheduled nicely, will stomp on it. Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90520 Reviewed-by: Neil Roberts <neil@linux.intel.com>
* i965/fs: Remove dead IR construction code from the visitor.Francisco Jerez2015-06-091-73/+0
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Migrate translation of NIR texturing instructions to the IR builder.Francisco Jerez2015-06-091-1/+2
| | | | | | v2: Don't remove assignments of base_ir just yet. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Migrate translation of NIR intrinsics to the IR builder.Francisco Jerez2015-06-091-1/+2
| | | | | | v2: Use fs_builder::SEL instead of ::emit. Use set_condmod(). Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Migrate translation of NIR ALU instructions to the IR builder.Francisco Jerez2015-06-091-1/+1
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Migrate translation of NIR control flow to the IR builder.Francisco Jerez2015-06-091-1/+2
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Migrate NIR emit_percomp() to the IR builder.Francisco Jerez2015-06-091-1/+2
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Migrate FS framebuffer writes to the IR builder.Francisco Jerez2015-06-091-1/+2
| | | | | | | | | | | | | | The explicit call to fs_builder::group() in emit_single_fb_write() is required by the builder (otherwise the assertion in fs_builder::emit() would fail) because the subsequent LOAD_PAYLOAD and FB_WRITE instructions are in some cases emitted with a non-native execution width. The previous code would always use the channel enables for the first quarter, which is dubious but probably worked in practice because FB writes are never emitted inside non-uniform control flow and we don't pass the kill-pixel mask via predication in the cases where we have to fall-back to SIMD8 writes. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Migrate shader time to the IR builder.Francisco Jerez2015-06-091-2/+3
| | | | | | v2: Change null register destination type to UD so it can be compacted. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Migrate pull constant loads to the IR builder.Francisco Jerez2015-06-091-4/+5
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Migrate Gen4 send dependency workarounds to the IR builder.Francisco Jerez2015-06-091-1/+1
| | | | | | v2: Change brw_null_reg() to bld.null_reg_f(). Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Allocate a common IR builder object in fs_visitor.Francisco Jerez2015-06-091-0/+2
| | | | | | | v2: Call fs_builder::at_end() to point the builder at the end of the program explicitly. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Remove the ir_visitor codeJason Ekstrand2015-05-281-48/+2
| | | | | | | Now that everything is running through NIR, this is all dead. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Remove the old fragment program codeJason Ekstrand2015-05-281-26/+0
| | | | | | | Now that everything is running through NIR, this is all dead. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Make fs/vec4_visitor inherit from ir_visitor directlyJason Ekstrand2015-05-281-1/+1
| | | | | | | | This is using multiple inheritance in C++. However, ir_visitor is really just an interface with no data so it shouldn't be so bad. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Rename backend_visitor to backend_shaderJason Ekstrand2015-05-281-1/+1
| | | | | | | | The backend_shader class really is a representation of a shader. The fact that it inherits from ir_visitor is somewhat immaterial. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Lower integer multiplication after optimizations.Matt Turner2015-05-181-0/+1
| | | | | | | | | | | | 32-bit x 32-bit integer multiplication requires multiple instructions until Broadwell. This patch just lets us treat the MUL instruction in the FS backend like it operates on Broadwell, and after optimizations we lower it into a sequence of instructions on older platforms. Doing this will allow us to some extra optimization on integer multiplies. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs: Combine the fs_visitor constructors.Kenneth Graunke2015-05-141-20/+4
| | | | | | | | | | | | | | For scalar GS support, we either need to add a fourth constructor which takes the GS structures, or combine the existing two and pass the shader stage. Given that they're not significantly different, I opted for the latter. v2: Remove more stuff from the .h file (Jason and Jordan). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965/fs: Rework the fs_visitor LOAD_PAYLOAD instructionJason Ekstrand2015-05-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The newly reworked instruction is far more straightforward than the original. Before, the LOAD_PAYLOAD instruction was lowered by a the complicated and broken-by-design pile of heuristics to try and guess force_writemask_all, exec_size, and a number of other factors on the sources. Instead, we use the header_size on the instruction to denote which sources are "header sources". Header sources are required to be a single physical hardware register that is copied verbatim. The registers that follow are considered the actual payload registers and have a width that correspond's to the LOAD_PAYLOAD's exec_size and are treated as being per-channel. This gives us a fairly straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and force_writemask_all modifiers as the LOAD_PAYLOAD operation itself. 3) In order to accommodate older gens that need interleaved colors, lower_load_payload detects when the destination is a COMPR4 register and automatically interleaves the non-header sources. The lower_load_payload pass does the right thing here regardless of whether or not the hardware actually supports COMPR4. This patch commit itself is made up of a bunch of smaller changes squashed together. Individual change descriptions follow: i965/fs: Rework fs_visitor::LOAD_PAYLOAD We rework LOAD_PAYLOAD to verify that all of the sources that count as headers are, indeed, exactly one register and that all of the non-header sources match the destination width. We then take the exec_size for LOAD_PAYLOAD directly from the destination width. i965/fs: Make destinations of load_payload have the appropreate width i965/fs: Rework fs_visitor::lower_load_payload v2: Don't allow the saturate flag on LOAD_PAYLOAD instructions i965/fs_cse: Support the new-style LOAD_PAYLOAD i965/fs_inst::is_copy_payload: Support the new-style LOAD_PAYLOAD i965/fs: Simplify setup_color_payload Previously, setup_color_payload was a a big helper function that did a lot of gen-specific special casing for setting up the color sources of the LOAD_PAYLOAD instruction. Now that lower_load_payload is much more sane, most of that complexity isn't needed anymore. Instead, we can do a simple fixup pass for color clamps and then just stash sources directly in the LOAD_PAYLOAD. We can trust lower_load_payload to do the right thing with respect to COMPR4. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Make LOAD_PAYLOAD take a header sizeJason Ekstrand2015-05-061-1/+2
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Make emit_single_fb_write take an explicit exec_sizeJason Ekstrand2015-05-061-1/+1
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Define helper function to copy an arbitrary live component from some ↵Francisco Jerez2015-05-041-0/+2
| | | | | | register. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Perform basic optimizations on the FIND_LIVE_CHANNEL opcode.Francisco Jerez2015-05-041-0/+1
| | | | | | | | v2: Save some CPU cycles by doing 'return progress' rather than 'depth++' in the discard jump special case. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Simplify generator code for untyped surface messages.Francisco Jerez2015-05-041-11/+0
| | | | | | | | | The generate_untyped_*() methods do nothing useful other than calling the corresponding function from brw_eu_emit.c. The calls to brw_mark_surface_used() will go away too in a future commit. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Add brw_setup_tex_for_precompile. Use in VS, GS & FS.Jordan Justen2015-05-021-0/+3
| | | | | | Suggested-by: Kristian Høgsberg <krh@bitplanet.net> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Support compute programs in fs_visitorJordan Justen2015-05-021-0/+10
| | | | | | | | | | v2: * Clean out some unneeded code copied from run_fs (krh) * Always use NIR * Split shader time out into a separate commit Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/cs: Add generator support for CS_OPCODE_CS_TERMINATEJordan Justen2015-05-021-0/+1
| | | | | | | | | | v2: * Don't rely on brw_eu* to generate the send instruction. We now generate the send here, and drop the "i965/cs: Add support for the SEND message that terminates a CS thread" brw_eu* patch. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Add emit_cs_terminate to emit CS_OPCODE_CS_TERMINATEJordan Justen2015-05-021-0/+1
| | | | | | | | | | | | v2: * Do more work at the visitor level. g0 is loaded and sent to the generator now. v3: * Use Ken's comment explaining g0 usage Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Strip trailing constant zeroes in sample messagesNeil Roberts2015-05-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a send message is emitted with a message length that is less than required for the message then the remaining parameters default to zero. We can take advantage of this to save a register when a shader passes constant zeroes as the final coordinates to the sample function. I think this might be useful for GLES applications that are using 2D textures to simulate 1D textures. On Skylake it will be useful for shaders that do texelFetch(tex,something,0) which I think is fairly common. This helps more on Skylake because in that case the order of the instruction operands are u,v,lod,r which is good for 2D textures whereas before they were u,lod,v,r which is only good for 1D textures. On Haswell: total instructions in shared programs: 8535730 -> 8533261 (-0.03%) instructions in affected programs: 236968 -> 234499 (-1.04%) helped: 1174 On Skylake: total instructions in shared programs: 10345646 -> 10341237 (-0.04%) instructions in affected programs: 293011 -> 288602 (-1.50%) helped: 1218 Reviewed-by: Matt Turner <mattst88@gmail.com> v2: Applied suggestions by Kenneth Graunke: - Only apply on Gen5+ - Apply to all texture opcodes, not just TEX and TXF. Moved the optimisation into the loop as suggested by Matt Turner. Fix the array index when there is a header.
* i965: Rename brw_compile to brw_codegenJason Ekstrand2015-04-221-1/+1
| | | | | | | | | | | | This name better matches what it's actually used for. The patch was generated with the following command: for file in *; do sed -i -e s/brw_compile/brw_codegen/g $file done Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Add a devinfo field to the generator and use it for gen checksJason Ekstrand2015-04-221-0/+1
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Remove the GL context from the generatorJason Ekstrand2015-04-221-1/+0
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Calculate delta_x and delta_y together.Matt Turner2015-04-211-2/+1
| | | | | | | | | | | | | This lets SIMD16 programs on G45 and Gen5 use the PLN instruction. On Ironlake: total instructions in shared programs: 5634757 -> 5518055 (-2.07%) instructions in affected programs: 1745837 -> 1629135 (-6.68%) helped: 11439 HURT: 4 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs: Emit ADDs for gl_FragCoord, not virtual opcodes.Matt Turner2015-04-211-1/+0
| | | | | | | | | | These were used only on Gen4 and 5. emit_interpolation_setup_gen6() emits ADDs directly. The virtual opcodes weren't providing anything useful. I'm going to repurpose these opcodes, so deleting and readding them makes it simpler to see what's going on. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>