external_mesa3d.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	i965/fs: Combine tex/fb_write operations (opt)	Ben Widawsky	2015-04-14	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Certain platforms support the ability to sample from a texture, and write it out to the file RT - thus saving a costly send instructions (note that this is a potnential win if one wanted to backport to a tag that didn't have the patch from Topi which removed excess MOVs from LOAD_PAYLOAD - 97caf5fa04dbd2), v2: Modify the algorithm. Instead of iterating in reverse through blocks and insts, since the last block/inst is the only thing which can benefit. Rebased on top of Ken's patching modifying is_last_send v3: Rebased over almost 2 months, and Incorporated feedback from Matt: Some comment typo fixes and rewordings. Whitespace Move the optimization pass outside of the optimize loop v4: Some cosmetic changes requested from Ken. These changes ensured that the optimization function always returned true when an optimization occurred, and false when one did not. This behavior did not exist with the original patch. As a result, having the separate helper function which Matt did not like no longer made sense, and so now I believe everyone should be happy. Benchmark (n=20) %diff OglBatch5 -1.4 OglBatch7 -1.79 OglFillTexMulti 5.57 OglFillTexSingle 1.16 OglShMapPcf 0.05 OglTexFilterAniso 3.01 OglTexFilterTri 1.94 No piglit regressions: (http://otc-gfxtest-01.jf.intel.com:8080/view/dev/job/bwidawsk/112/) [*] I believe my measurements are incorrect for Batch5-7. If I add this new optimization, but never emit the new instruction I see similar results. v5: Remove declaration of combine_tex_header since v4 dropped that function (Ben) Remove check for impossible case of an empty block (Matt) Set dest earlier to avoid extra special-casing in generate_tex (Matt) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Implement SIMD16 texturing on Gen4.	Kenneth Graunke	2015-04-06	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows SIMD16 mode to work for a lot more programs. Texturing is also more efficient in SIMD16 mode than SIMD8. Several messages don't actually exist in SIMD8 mode, so we did SIMD16 messages and threw away half of the data. Now we compute real data in both halves. Also, the SIMD16 "sample" message doesn't require all three coordinate components to exist (like the SIMD8 one), so we can shorten the message lengths, cutting register usage a bit. I chose to implement the visitor functionality in a separate function, since mixing true SIMD16 with SIMD8 code that uses SIMD16 fallbacks seemed like a mess. The new code bails on a few cases where we'd have to do two SIMD8 messages - we just fall back to SIMD8 for now. Improves performance in "Shadowrun: Dragonfall - Director's Cut" by about 20% on GM45 (measured with LIBGL_SHOW_FPS=1 while standing around in the first mission). v2: Add ir_txf to the has_lod case (caught by Jordan Justen). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965/fs: Make emit_lrp return an fs_inst	Jason Ekstrand	2015-03-23	1	-2/+2
\| \| \| \| \|	Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
*	i965/fs: Make an emit_discard_jump() function to reduce duplication.	Kenneth Graunke	2015-03-19	1	-0/+1
\| \| \| \| \| \| \| \| \|	This is already copied in two places, and I want to copy it to a third place. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Carl Worth <cworth@cworth.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965/nir: Sort uniforms direct-first and use two different uniform registers	Jason Ekstrand	2015-03-19	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, we put all the uniforms into one big array. The problem with this approach is that, as soon as there was one indirect array acces, the backend would decide that the entire large array should be pull constants. This commit splits the array in half: first direct-only uniforms and then potentially-indirect uniforms. This may not be optimal, but it does let the backend promote things to push constants. Shader-db results on HSW: total instructions in shared programs: 4114840 -> 4112172 (-0.06%) instructions in affected programs: 43316 -> 40648 (-6.16%) helped: 116 HURT: 0 v2: Set param_size[num_direct_uniforms] only if we have indirect uniforms. This caused a bug that, strangely enough, only showed up on Broadwell vertex shaders. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
*	i965/fs: Print spills:fills and number of promoted constants.	Matt Turner	2015-03-19	1	-0/+4
\| \| \| \| \|	Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
*	i965/fs: Emit better b2f of an expression on GEN4 and GEN5	Ian Romanick	2015-03-19	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On platforms that do not natively generate 0u and ~0u for Boolean results, b2f expressions that look like f = b2f(expr cmp 0) will generate better code by pretending the expression is f = ir_triop_sel(0.0, 1.0, expr cmp 0) This is because the last instruction of "expr" can generate the condition code for the "cmp 0". This avoids having to do the "-(b & 1)" trick to generate 0u or ~0u for the Boolean result. This means code like mov(16) g16<1>F 1F mul.ge.f0(16) null g6<8,8,1>F g14<8,8,1>F (+f0) sel(16) m6<1>F g16<8,8,1>F 0F will be generated instead of mul(16) g2<1>F g12<8,8,1>F g4<8,8,1>F cmp.ge.f0(16) g2<1>D g4<8,8,1>F 0F and(16) g4<1>D g2<8,8,1>D 1D and(16) m6<1>D -g4<8,8,1>D 0x3f800000UD v2: When the comparison is either == 0.0 or != 0.0 use the knowledge that the true (or false) case already results in zero would allow better code generation by possibly avoiding a load-immediate instruction. v3: Apply the optimization even when neither comparitor is zero. Shader-db results: GM45 (0x2A42): total instructions in shared programs: 3551002 -> 3550829 (-0.00%) instructions in affected programs: 33269 -> 33096 (-0.52%) helped: 121 Iron Lake (0x0046): total instructions in shared programs: 4993327 -> 4993146 (-0.00%) instructions in affected programs: 34199 -> 34018 (-0.53%) helped: 129 No change on other platforms. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Tapani Palli <tapani.palli@intel.com>
*	i965/fs: Store a pointer to brw_sampler_prog_key_data in the visitor.	Kenneth Graunke	2015-03-12	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The NIR backend hardcodes brw_wm_prog_key at the moment, which won't work when we support scalar VS. We could use get_tex(), but it's a static method. I was going to promote it to fs_visitor, but then realized that both parameters (stage and key) are already members. It then occured to me that we could just set up a pointer in the constructor, and skip having a function altogether. This patch also converts all existing users to use key_tex. v2: Make key_tex a "const brw_sampler_prog_key_data *" instead of non-const; word-wrap some lines. (Review comments from Topi.) Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
*	i965/fs: Make get_timestamp() pass back the MOV rather than emitting it.	Kenneth Graunke	2015-03-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	This makes another part of the INTEL_DEBUG=shader_time code emittable at arbitrary locations, rather than just at the end of the instruction stream. v2: Don't lose smear! Caught by Topi Pohjolainen. v3: Don't set smear on the destination of the MOV. Thanks Topi! Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org
*	i965/fs: Make emit_shader_time_write return rather than emit.	Kenneth Graunke	2015-03-09	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Instead of emit_shader_time_write, we now do emit(SHADER_TIME_ADD(...)). The advantage is that we can also insert a shader time write at an arbitrary location in the instruction stream, rather than being restricted to emitting at the end. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org
*	i965/fs: Silence unused parameter warning	Ian Romanick	2015-03-09	1	-1/+1
\| \| \| \| \| \| \| \| \|	brw_fs_visitor.cpp:2162:56: warning: unused parameter 'offset_components' [-Wunused-parameter] fs_reg offset_value, unsigned offset_components, ^ Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965/fs: Implement SIMD16 dual source blending.	Iago Toral Quiroga	2015-03-09	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \|	From the SNB PRM, volume 4, part 1, page 193: "The dual source render target messages only have SIMD8 forms due to maximum message length limitations. SIMD16 pixel shaders must send two of these messages to cover all of the pixels. Each message contains two colors (4 channels each) for each pixel in the message payload." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82831 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
*	i965/nir: Resolve source modifiers on Gen8+ logic operations.	Kenneth Graunke	2015-03-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	On Gen8+, AND/OR/XOR/NOT don't support the abs() source modifier, and negate changes meaning to bitwise-not (~, not -). This isn't what NIR expects, so we should resolve the source modifers via a MOV. +30 Piglits (fs-op-bit{and,or,xor}-not-abs-*). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
*	i965: Remove redundant discard jumps.	Kenneth Graunke	2015-02-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the previous optimization in place, some shaders wind up with multiple discard jumps in a row, or jumps directly to the next instruction. We can remove those. Without NIR on Haswell: total instructions in shared programs: 5777258 -> 5775872 (-0.02%) instructions in affected programs: 20312 -> 18926 (-6.82%) helped: 716 With NIR on Haswell: total instructions in shared programs: 5773163 -> 5771785 (-0.02%) instructions in affected programs: 21040 -> 19662 (-6.55%) helped: 717 v2: Use the CFG rather than the old instructions list. Presumably the placeholder halt will be in the last basic block. v3: Make sure placeholder_halt->prev isn't the head sentinel (caught twice by Eric Anholt). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>
*	i965/fs: Optimize (gl_FrontFacing ? x : y) where x and y are ±1.0.	Matt Turner	2015-02-24	1	-0/+1
\| \| \| \| \| \| \|	total instructions in shared programs: 5695356 -> 5689775 (-0.10%) instructions in affected programs: 486231 -> 480650 (-1.15%) helped: 2604 LOST: 1
*	i965/fs/nir: Optimize (gl_FrontFacing ? x : y) where x and y are ±1.0.	Matt Turner	2015-02-24	1	-0/+3
\| \| \| \| \| \| \| \|	total instructions in shared programs: 7756214 -> 7753873 (-0.03%) instructions in affected programs: 455452 -> 453111 (-0.51%) helped: 2333 Reviewed-by: Eric Anholt <eric@anholt.net>
*	i965/fs: Remove type parameter from emit_vs_system_value().	Kenneth Graunke	2015-02-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Every VS system value has type D. We can always add this back if that changes, but for now, it's extra typing. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
*	i965/fs: Add pass to combine immediates.	Matt Turner	2015-02-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	total instructions in shared programs: 5885407 -> 5940958 (0.94%) instructions in affected programs: 3617311 -> 3672862 (1.54%) helped: 3 HURT: 23556 GAINED: 31 LOST: 165 ... but will allow us to always emit MAD instructions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/vec4: Add register classes up to MAX_VGRF_SIZE.	Francisco Jerez	2015-02-10	1	-3/+0
\| \| \| \| \| \| \|	In preparation for some send from GRF instructions that will require larger payloads. Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965/fs: Remove duplicate include of brw_shader.h	Francisco Jerez	2015-02-10	1	-1/+0
\| \| \| \| \| \| \|	The second one was inside an extern "C" block, luckily it was being discarded by the preprocessor. Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Move IR object definitions to separate header files.	Francisco Jerez	2015-02-10	1	-225/+1
\| \| \| \| \| \| \| \| \| \| \| \|	One should be able to manipulate i965 IR without pulling the whole FS/VEC4 visitor classes -- Optimization passes and other transformations would ideally be visitor-agnostic. Among other issues this avoids a circular dependency between the header file where such visitor-agnostic code will be defined and the main FS/VEC4 header where both IR (layer below) and visitor (layer above) happen to be defined. Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Factor out virtual GRF allocation to a separate object.	Francisco Jerez	2015-02-10	1	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Right now virtual GRF book-keeping and allocation is performed in each visitor class separately (among other hundred different things), leading to duplicated logic in each visitor and preventing layering as it forces any code that manipulates i965 IR and needs to allocate virtual registers to depend on the specific visitor that happens to be used to translate from GLSL IR. v2: Use realloc()/free() to allocate VGRF book-keeping arrays (Connor). Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965/fs_nir: Get rid of get_alu_src	Jason Ekstrand	2015-02-03	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Originally, get_alu_src was supposed to handle resolving swizzles and things like that. However, now that basically every instruction we have only takes scalar sources, we don't really need it anymore. The only case where it's still marginally useful is for the mov and vecN operations that are left over from SSA form. We can handle those cases as a special case easily enough. As a side-effect, we don't need the vec_to_movs pass anymore. v2 Jason Ekstrand <jason.ekstrand@intel.com>: - Rework the way we detect if we need an extra copy for swizzling. The old code involved a pile of confusing switch fall-throughs; we now use a loop. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/fs: Use NIR's scalarizing abilities and stop handling vectors	Jason Ekstrand	2015-02-03	1	-15/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we can scalarize with NIR, there's no need for all this code anymore. Let's get rid of it and just do scalar operations. v2: run copy prop before lowering phi nodes v3: Get rid of the "emit(...)->saturate = foo" pattern v4: Run alu_to_scalar as an optimization pass total instructions in shared programs: 5998321 -> 5974070 (-0.40%) instructions in affected programs: 732075 -> 707824 (-3.31%) helped: 3137 HURT: 191 GAINED: 18 LOST: 0 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/fs: Add pass to propagate conditional modifiers.	Matt Turner	2015-01-23	1	-0/+1
\| \| \| \| \| \| \| \| \|	total instructions in shared programs: 5974160 -> 5959463 (-0.25%) instructions in affected programs: 1743737 -> 1729040 (-0.84%) GAINED: 0 LOST: 12 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/fs: Add a pass to fixup 3-src instructions that have a null dest.	Matt Turner	2015-01-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	3-src instructions can only have GRF/MRF destinations. It's really difficult to deal with that restriction in dead code elimination (that wants to give instructions null destinations to show that their result isn't used) while allowing 3-src instructions to have conditional mod, so don't, and just give then a destination before register allocation. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/fs: Allow SIMD16 on pre-SNB when try_replace_with_sel is successful	Ian Romanick	2015-01-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If try_replace_with_sel is able to replace the flow control with a SEL instruction, then there is no flow control... failing SIMD16 because of nonexistent flow control is wrong. No piglit regressions on any i965 platform in Jenkins. total instructions in shared programs: 4382707 -> 4382707 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 GAINED: 2089 LOST: 0 No other platforms affected in shader-db. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/nir: Replace fs_reg(GRF, virtual_grf_alloc(...)) with vgrf(...).	Kenneth Graunke	2015-01-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	brw_fs_nir.cpp creates almost all of its registers via: fs_reg reg = fs_reg(GRF, virtual_grf_alloc(num_components)); When we add SIMD16 support, we'll need to set reg->width = 16 and double the VGRF size...on pretty much every VGRF it allocates. This patch replaces that pattern with a new "vgrf" helper method: fs_reg reg = vgrf(num_components); The new function correctly takes reg_width into account. For now, reg_width is always 1, so this should have no functional change. v2: Just make vgrf() account for reg_width right away, rather than changing the behavior in the next patch. v3: Replace one last virtual_grf_alloc I missed. It's used in code that only runs for dispatch_width == 8, so it doesn't matter, but consistency is nice. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
*	i965: Replace fs_reg(fs_visitor, type) with fs_visitor::vgrf(type).	Kenneth Graunke	2015-01-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I dislike how fs_reg has a constructor that knows about fs_visitor. Apart from that, it stands alone, with no need to interact with the rest of the compiler. Which is sensible - a class that represents a register should do just that. Allocating virtual register numbers should be left up to the compiler (fs_visitor). This patch replaces the constructor with a new fs_visitor::vgrf method, eliminating fs_reg's dependency on fs_visitor. It ends up being no more code. v2: Rebase from May 2014 -> January 2015. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965/fs_nir: Handle sample ID, position, and mask better	Jason Ekstrand	2015-01-15	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	Before, we were emitting the full pile of setup instructions for sample_id and sample_pos every time they were used. With this commit, we emit them in their own pass once at the beginning of the shader and simply emit uses later on. When it comes time for setting up VS, we can put setup for its special values in the same pass. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
*	nir: Make load_const SSA-only	Jason Ekstrand	2015-01-15	1	-1/+0
\| \| \| \| \| \| \| \|	As it was, we weren't ever using load_const in a non-SSA way. This allows us to substantially simplify the load_const instruction. If we ever need a non-SSA constant load, we can do a load_const and an imov. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
*	i965/fs_nir: Use an array rather than a hash table for register lookup	Jason Ekstrand	2015-01-15	1	-2/+2
\| \| \| \|	Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
*	i965/fs_nir: Don't duplicate emit_general_interpolation	Jason Ekstrand	2015-01-15	1	-1/+0
\| \| \| \|	Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
*	i965/fs: Don't take an ir_variable for emit_general_interpolation	Jason Ekstrand	2015-01-15	1	-1/+5
\| \| \| \| \| \| \| \| \| \|	Previously, emit_general_interpolation took an ir_variable and pulled the information it needed from that. This meant that in fs_fp, we were constructing a dummy ir_variable just to pass into it. This commit makes emit_general_interpolation take only the information it needs and gets rid of the fs_fp cruft. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
*	i965/fs: add a NIR frontend	Connor Abbott	2015-01-15	1	-0/+45
\| \| \| \| \| \| \| \| \| \|	This is similar to the GLSL IR frontend, except consuming NIR. This lets us test NIR as part of an actual compiler. v2: Jason Ekstrand <jason.ekstrand@intel.com>: Make brw_fs_nir build again Only use NIR of INTEL_USE_NIR is set whitespace fixes
*	i965/fs: Don't pass through the coordinate type	Connor Abbott	2015-01-15	1	-2/+2
\| \| \| \|	All we really need is the number of components.
*	i965/fs: make emit_fragcoord_interpolation() not take an ir_variable	Connor Abbott	2015-01-15	1	-1/+2
\|
*	i965: Pass a shader stage abbreviation to fs_generator().	Kenneth Graunke	2015-01-14	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \|	A lot of messages hardcoded the string "FS", which is confusing on Broadwell, where we use this code for VS support as well. shader-db particularly got confused, as it reported two "FS SIMD8" shaders, and no vertex shaders at all. Craziness ensued. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: remove includes of sampler.h from extern "C" blocks	Mark Janes	2014-12-16	1	-1/+1
\| \| \| \| \| \| \| \| \|	C linkage was removed from functions in program/sampler.cpp. However, some cpp files include program/sampler.h within extern "C" blocks, causing link errors for test_vec4_copy_propagation. Reviewed-by: Brian Paul <brianp@vmware.com> Tested-by: Ian Romanick <ian.d.romanick@intel.com>
*	i965: Clean up fs_visitor::run and rename to run_fs	Kristian Høgsberg	2014-12-10	1	-1/+1
\| \| \| \| \| \| \| \| \|	Now that fs_visitor::run is back to being only fragment shader compilation, we can clean up a few stage == MESA_SHADER_FRAGMENT conditions and rename it to run_fs. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Add fs_visitor::run_vs() to generate scalar vertex shader code	Kristian Høgsberg	2014-12-10	1	-2/+19
\| \| \| \| \| \| \| \| \|	This patch uses the previous refactoring to add a new run_vs() method that generates vertex shader code using the scalar visitor and optimizer. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Prepare for using the ATTR register file in the fs backend	Kristian Høgsberg	2014-12-10	1	-0/+3
\| \| \| \| \| \| \| \|	The scalar vertex shader will use the ATTR register file for vertex attributes. This patch adds support for the ATTR file to fs_visitor. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Add SIMD8 URB write low-level IR instruction	Kristian Høgsberg	2014-12-10	1	-0/+1
\| \| \| \| \| \| \| \| \|	This is all we need from the generator for SIMD8 vertex shaders. This opcode is just the send instruction, all the hard work will happen in the visitor using LOAD_PAYLOAD. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Remove shader program argument and member from fs_generator	Kristian Høgsberg	2014-12-10	1	-2/+0
\| \| \| \| \| \| \| \|	Now that the caller passes in the shader debug name, we don't need this anymore. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Set shader name for generator from call site	Kristian Høgsberg	2014-12-10	1	-3/+4
\| \| \| \| \| \| \| \|	fs_generator no longer knows what stage it's generating code for, so we have to set the debug name of the shader from the call site. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Generalize fs_generator further	Kristian Høgsberg	2014-12-10	1	-4/+3
\| \| \| \| \| \| \| \|	This removes all stage specific data from the generator, and lets us create a generator for any stage. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/fs: Try to emit LINE instructions on Gen <= 5.	Matt Turner	2014-12-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The LINE instruction performs a multiply-add instruction (a * b + c) where b and c are scalar arguments. It reads b and c from offsets in src0 such that you can load them (it they're representable) as a vector-float immediate with a single instruction. Hurts some programs, but that'll all get better once we CSE the vector-float MOVs in the next patch. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77544 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
*	i965/fs: Make brw_reg_from_fs_reg static and remove prototype.	Matt Turner	2014-12-05	1	-2/+0
\| \| \| \| \| \|	And move it above its first use in brw_fs_generator.cpp. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
*	i965/fs: Add a negate() function.	Matt Turner	2014-12-05	1	-0/+8
\| \| \| \|	Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
*	i965/fs: Don't offset uniform registers in half().	Matt Turner	2014-12-03	1	-0/+4
\| \| \| \| \| \| \|	Half gives you the second half of a SIMD16 register, but if the register is a uniform it would incorrectly give you the next register. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>