external_mesa3d.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	i965/gen4-5: Program the execution size correctly for DO/WHILE instructions.	Francisco Jerez	2015-07-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From the hardware docs for the DO instruction: "Execution size is ignored for this instruction." My observation on ILK hardware contradicts the spec though, channels over the execution size of a DO instruction won't enter the loop, and channels over the execution size of a WHILE instruction will exit the loop after the first iteration -- The latter is consistent with the spec though, there's no claim about the execution size being ignored for the WHILE instruction so it's not completely unexpected that it has an influence on the evaluation of EMask. The execute_size argument of brw_DO() shouldn't have any effect on Gen6 and newer hardware. On Gen4-5 WHILE instructions inherit the execution size from the matching DO, so this patch should fix them too. The execution size of BREAK and CONT instructions was already being set correctly. Fixes some 50 piglit tests on Gen4-5 when forced to run shaders with conditional and loop instructions 16-wide, e.g. shaders/glsl-fs-continue-inside-do-while. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/fs_generator: Use inst->exec_size for determining hardware reg widths	Jason Ekstrand	2015-06-30	1	-7/+7
\| \| \| \| \|	Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
*	i965/fs: Actually set/use the mlen for gen7 uniform pull constant loads	Jason Ekstrand	2015-06-30	1	-6/+3
\| \| \| \| \| \| \| \| \|	Previously, we were allocating the payload with different sizes per gen and then figuring out the mlen in the generator based on gen. This meant, among other things, that the higher level passes knew nothing about it. Acked-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Remove the dependance on brw_context from the generators	Jason Ekstrand	2015-06-23	1	-2/+3
\| \| \| \| \|	Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Plumb compiler debug logging through a function pointer in brw_compiler	Jason Ekstrand	2015-06-23	1	-11/+9
\| \| \| \| \| \| \| \|	v2 (Ken): Make shader_debug_log a printf-like function. v3 (Jason): Add a void * to pass the brw_context through Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/fs: Implement support for ir_barrier	Jordan Justen	2015-06-12	1	-0/+11
\| \| \| \| \|	Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
*	i965: Use UW-typed immediate in multiply inst.	Matt Turner	2015-06-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Some hardware reads only the low 16-bits even if the type is UD, but other hardware like Cherryview can't handle this. Fixes spec@arb_gpu_shader5@execution@sampler_array_indexing@fs-simple on Cherryview. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90830 Reviewed-by: Neil Roberts <neil@linux.intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
*	i965: Don't add base_binding_table_index if it's zero	Neil Roberts	2015-05-31	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When calculating the binding table index for non-constant sampler array indexing it needs to add the base binding table index which is a constant within the generated code. Often this base is zero so we can avoid a redundant instruction in that case. It looks like nothing in shader-db is doing non-constant sampler array indexing so this patch doesn't make any difference but it might be worth having anyway. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Ben Widawsky <ben@bwidawsk.net>
*	i965: Don't use a temporary when generating an indirect sample	Neil Roberts	2015-05-31	1	-13/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously when generating the send instruction for a sample instruction with an indirect sampler it would use the destination register as a temporary store. This breaks when used in combination with the opt_sampler_eot optimisation because that forces the destination to be null. This patch fixes that by avoiding the temp register altogether. The reason the temporary register was needed was because it was trying to ensure the binding table index doesn't overflow a byte by and'ing it with 0xff. The result is then or'd with samper_index<<8. This patch instead just and's the whole thing by 0xfff. This will ensure that a bogus sampler index won't overflow into the rest of the message descriptor but unlike the previous code it won't ensure that the binding table index doesn't overflow into the sampler index. It doesn't seem like that should matter very much though because if the shader is generating a bogus sampler index then it's going to just get garbage out either way. Instead of doing sampler_index<<8\|(sampler_index+base_table_index) the new code avoids one operation by doing sampler_index*0x101+base_table_index which should be equivalent. However if we wanted to avoid the multiply for some reason we could do this by adding an extra or instruction still without needing the temporary register. This fixes a number of Piglit tests on Skylake that were using indirect samplers such as: spec@arb_gpu_shader5@execution@sampler_array_indexing@fs-simple Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Ben Widawsky <ben@bwidawsk.net> Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
*	i965/fs: Rework compression control selection.	Matt Turner	2015-05-18	1	-3/+6
\| \| \| \| \| \| \| \|	The next commit uses an add(16) with a UW destination with a stride of 2, which needs compression control since it's writing two registers. The old code would have failed to set compression control correctly. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
*	i965/fs: set execution size to 8 with simd8 ddy instruction	Tapani Pälli	2015-05-13	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Commit dd5c825 changed the way how execution size for instructions get set. Previously it was based on destination register width, now it is set explicitly when emitting instructions. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90258
*	i965: Change header_present to header_size in backend_instruction	Jason Ekstrand	2015-05-06	1	-10/+10
\| \| \| \|	Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Introduce the FIND_LIVE_CHANNEL pseudo-opcode.	Francisco Jerez	2015-05-04	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	This instruction calculates the index of an arbitrary channel enabled in the current execution mask. It's expected to be used as input for the BROADCAST opcode, but it's implemented as a separate instruction rather than being baked into BROADCAST because FIND_LIVE_CHANNEL has no dependencies so it can always be CSE'ed with other instances of the same instruction within a basic block. v2: Whitespace fixes. Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Introduce the BROADCAST pseudo-opcode.	Francisco Jerez	2015-05-04	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The BROADCAST instruction picks the channel from its first source given by an index passed in as second source. This will be used in situations where all channels from the same SIMD thread have to agree on the value of something, e.g. a surface binding table index. This is in particular the case for UBO, sampler and image arrays, which can be indexed dynamically with the restriction that all active SIMD channels access the same index, provided to the shared unit as part of a single scalar field of the message descriptor. Simply taking the index value from the first channel as we were doing until now is incorrect, because it might contain an uninitialized value if the channel had previously been disabled by non-uniform control flow. v2: Minor style fixes. Improve commit message. Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Add memory fence opcode.	Francisco Jerez	2015-05-04	1	-0/+4
\| \| \| \| \|	Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
*	i965: Add typed surface access opcodes.	Francisco Jerez	2015-05-04	1	-0/+17
\| \| \| \| \|	Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
*	i965: Add untyped surface write opcode.	Francisco Jerez	2015-05-04	1	-0/+6
\| \| \| \| \|	Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Reorder sources of the untyped atomic opcode.	Francisco Jerez	2015-05-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	This is consistent with the untyped surface read opcode. From now on all typed and untyped surface access opcodes will follow the same pattern: src[0] will be the message payload, src[1] will be the surface index and src[2] will be a control immediate (atomic operation for atomic opcodes and number of vector components for surface read and write opcodes). Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Pass the number of components as a source of the untyped surface read ↵	Francisco Jerez	2015-05-04	1	-2/+3
\| \| \| \| \| \| \|	opcode. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Don't request untyped atomic writeback message if the destination is null.	Francisco Jerez	2015-05-04	1	-1/+1
\| \| \| \| \|	Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Simplify generator code for untyped surface messages.	Francisco Jerez	2015-05-04	1	-33/+9
\| \| \| \| \| \| \| \| \|	The generate_untyped_*() methods do nothing useful other than calling the corresponding function from brw_eu_emit.c. The calls to brw_mark_surface_used() will go away too in a future commit. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Fix the untyped surface opcodes to deal with indirect surface access.	Francisco Jerez	2015-05-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Change brw_untyped_atomic() and brw_untyped_surface_read() to take the surface index as a register instead of a constant and to use brw_send_indirect_message() to emit the indirect variant of send with a dynamically calculated message descriptor. This will be required to support variable indexing of image arrays for ARB_shader_image_load_store. Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
*	i965/cs: Add generator support for CS_OPCODE_CS_TERMINATE	Jordan Justen	2015-05-02	1	-0/+35
\| \| \| \| \| \| \| \| \| \|	v2: * Don't rely on brw_eu* to generate the send instruction. We now generate the send here, and drop the "i965/cs: Add support for the SEND message that terminates a CS thread" brw_eu* patch. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/skl: Force the exec size to 8 when initing header for SIMD4x2	Neil Roberts	2015-05-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Gen9+ there needs to be a header when sampling using SIMD4x2. The header is set up by copying from the g0 register. Commit 07c571a39f tried to fix this mov instruction to always use an exec size of 8 because previously it was incorrectly using 4. It did this by casting the type of the destination register to vec8. This was done because there is code in brw_set_dest to guess the exec size based on the width of the dest register. However I misunderstood how this works because it is actually only used when the width is less than 8. That means the patch actually changed it to use the default exec size which on SIMD16 would be 16 and the MOV would clobber over the first register in the send message. This patch makes it additionally set the default exec size to 8. This is similar to how the message is set up in fs_generator::generate_tex. I think this wasn't picked up by any Piglit tests because we don't have any fragment shaders that hit this code path so nothing was using SIMD16. However the patch caused failures in deqp tests. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90153 Reviewed-by: Matt Turner <mattst88@gmail.com> Tested-by: Tapani Pälli <tapani.palli@intel.com>
*	i965: Rename brw_compile to brw_codegen	Jason Ekstrand	2015-04-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	This name better matches what it's actually used for. The patch was generated with the following command: for file in *; do sed -i -e s/brw_compile/brw_codegen/g $file done Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965/fs: Add a devinfo field to the generator and use it for gen checks	Jason Ekstrand	2015-04-22	1	-59/+57
\| \| \| \|	Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965/device_info: Add a supports_simd16_3src flag	Jason Ekstrand	2015-04-22	1	-19/+6
\| \| \| \| \| \| \| \| \| \|	This also involves moving revision checking to screen creation time and passing that into brw_get_device_info so that we can get the right device_info for early versions of SKL. Since the only place we used revision was to check for SIMD16 3-src instruction support, it's safe to remove the revision field from brw_context. Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Make the annotation code take a device_info instead of a context	Jason Ekstrand	2015-04-22	1	-2/+3
\| \| \| \|	Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965/fs: Remove the GL context from the generator	Jason Ekstrand	2015-04-22	1	-10/+1
\| \| \| \|	Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Remove the context field from brw_compiler	Jason Ekstrand	2015-04-22	1	-1/+1
\| \| \| \|	Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Make the brw_inst helpers take a device_info instead of a context	Jason Ekstrand	2015-04-22	1	-27/+27
\| \| \| \|	Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965/fs: Combine pixel center calculation into one inst.	Matt Turner	2015-04-21	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The X and Y values come interleaved in g1 (.4-.11 inclusive), so we can calculate them together with a single add(32) instruction on some platforms like Broadwell and newer or in SIMD8 elsewhere. Note that I also moved the PIXEL_X/PIXEL_Y virtual opcodes from before LINTERP to after it. That's because the writes_accumulator_implicitly() function in backend_instruction tests for <= LINTERP for determining whether the instruction indeed writes the accumulator implicitly. The old FS_OPCODE_PIXEL_X/Y emitted ADD instructions, which did, but the new opcodes just emit MOVs, which don't. It doesn't matter, since we don't use these opcodes on Gen4/5 anymore, but in the case that we do... On Broadwell: total instructions in shared programs: 7192355 -> 7186224 (-0.09%) instructions in affected programs: 1190700 -> 1184569 (-0.51%) helped: 6131 On Haswell: total instructions in shared programs: 6155979 -> 6152800 (-0.05%) instructions in affected programs: 652362 -> 649183 (-0.49%) helped: 3179 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
*	i965/fs: Calculate delta_x and delta_y together.	Matt Turner	2015-04-21	1	-3/+22
\| \| \| \| \| \| \| \| \| \| \| \| \|	This lets SIMD16 programs on G45 and Gen5 use the PLN instruction. On Ironlake: total instructions in shared programs: 5634757 -> 5518055 (-2.07%) instructions in affected programs: 1745837 -> 1629135 (-6.68%) helped: 11439 HURT: 4 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
*	i965/fs: Emit ADDs for gl_FragCoord, not virtual opcodes.	Matt Turner	2015-04-21	1	-40/+0
\| \| \| \| \| \| \| \| \| \|	These were used only on Gen4 and 5. emit_interpolation_setup_gen6() emits ADDs directly. The virtual opcodes weren't providing anything useful. I'm going to repurpose these opcodes, so deleting and readding them makes it simpler to see what's going on. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
*	i965/fs: Set compression only if writing two registers.	Matt Turner	2015-04-21	1	-1/+4
\| \| \| \| \| \| \|	We don't want to set compression control on a SIMD16 instruction operating on words or smaller. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
*	i965/fs: Allow an execution size of 32.	Matt Turner	2015-04-21	1	-0/+1
\| \| \| \| \| \| \|	In a few commits, we'll start emitting an add(32) instruction on some platforms. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
*	i965: Replace guess_execution_size with something simpler.	Matt Turner	2015-04-21	1	-4/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	guess_execution_size() does two things: 1. Cope with small destination registers. 2. Cope with SIMD8 vs SIMD16 mode. This patch replaces the first with a simple if block in brw_set_dest: if the destination register width is less than 8, you probably want the execution size to match. (I didn't put this in the 3src block because it doesn't seem to matter.) Since only the FS compiler cares about SIMD16 mode, it's easy to just set the default execution size there. This pattern was already been proven in the Gen8+ generator, but we didn't port it back to the existing generator when we combined the two. This is based on a patch from Ken from about a year ago. I've rebased it and and fixed a few bugs. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
*	i965/fs: Ensure delta_x/y are even-aligned registers on Gen6.	Matt Turner	2015-04-21	1	-1/+1
\| \| \| \| \| \|	The BSpec says this applies to Gen6 as well. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
*	i965/fs: Combine tex/fb_write operations (opt)	Ben Widawsky	2015-04-14	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Certain platforms support the ability to sample from a texture, and write it out to the file RT - thus saving a costly send instructions (note that this is a potnential win if one wanted to backport to a tag that didn't have the patch from Topi which removed excess MOVs from LOAD_PAYLOAD - 97caf5fa04dbd2), v2: Modify the algorithm. Instead of iterating in reverse through blocks and insts, since the last block/inst is the only thing which can benefit. Rebased on top of Ken's patching modifying is_last_send v3: Rebased over almost 2 months, and Incorporated feedback from Matt: Some comment typo fixes and rewordings. Whitespace Move the optimization pass outside of the optimize loop v4: Some cosmetic changes requested from Ken. These changes ensured that the optimization function always returned true when an optimization occurred, and false when one did not. This behavior did not exist with the original patch. As a result, having the separate helper function which Matt did not like no longer made sense, and so now I believe everyone should be happy. Benchmark (n=20) %diff OglBatch5 -1.4 OglBatch7 -1.79 OglFillTexMulti 5.57 OglFillTexSingle 1.16 OglShMapPcf 0.05 OglTexFilterAniso 3.01 OglTexFilterTri 1.94 No piglit regressions: (http://otc-gfxtest-01.jf.intel.com:8080/view/dev/job/bwidawsk/112/) [*] I believe my measurements are incorrect for Batch5-7. If I add this new optimization, but never emit the new instruction I see similar results. v5: Remove declaration of combine_tex_header since v4 dropped that function (Ben) Remove check for impossible case of an empty block (Matt) Set dest earlier to avoid extra special-casing in generate_tex (Matt) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/skl: Use an exec size of 8 to initialise the message header	Neil Roberts	2015-04-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit e93566a15c61c33faa changed the message header code needed to make Skylake use SIMD4x2 so that it uses a register with width 4 instead of 8 as the source register in the send message. However it also changed the width for the dest in the MOV instruction which is used to initialise the header register with the values from g0. The width of the destination is used to determine the exec size in brw_set_dest so this would end up making the MOV have an exec size of 4. I think this would end up leaving the top half of the register uninitialised. The top half of the header has meaningful values so this probably isn't a good idea. This patch just casts the dest register for the MOV instruction back to a vec8 to fix it. It doesn't cause any changes to a Piglit run. Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
*	i965: Implement SIMD16 texturing on Gen4.	Kenneth Graunke	2015-04-06	1	-8/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows SIMD16 mode to work for a lot more programs. Texturing is also more efficient in SIMD16 mode than SIMD8. Several messages don't actually exist in SIMD8 mode, so we did SIMD16 messages and threw away half of the data. Now we compute real data in both halves. Also, the SIMD16 "sample" message doesn't require all three coordinate components to exist (like the SIMD8 one), so we can shorten the message lengths, cutting register usage a bit. I chose to implement the visitor functionality in a separate function, since mixing true SIMD16 with SIMD8 code that uses SIMD16 fallbacks seemed like a mess. The new code bails on a few cases where we'd have to do two SIMD8 messages - we just fall back to SIMD8 for now. Improves performance in "Shadowrun: Dragonfall - Director's Cut" by about 20% on GM45 (measured with LIBGL_SHOW_FPS=1 while standing around in the first mission). v2: Add ir_txf to the has_lod case (caught by Jordan Justen). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965/generator: Get rid of the ! in the unreachable statement	Jason Ekstrand	2015-04-02	1	-1/+1
\| \| \| \|	Reviewed-by: Mark Janes <mark.a.janes@intel.com>
*	i965: Pass number of components explicitly to brw_untyped_atomic and ↵	Francisco Jerez	2015-03-20	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \|	_surface_read. And calculate the message response size based on the number of components rather than the other way around. This simplifies their interface somewhat and allows the caller to request a writeback message with more than one vector component in SIMD4x2 mode. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Don't disable exec masking for sampler message sends.	Francisco Jerez	2015-03-20	1	-4/+4
\| \| \| \| \| \| \| \| \| \|	This was telling the sampler to do texture fetches for all channels in the non-constant surface index case, what could have reduced throughput unnecessarily when some of the channels were disabled by control flow. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Factor out logic to build a send message instruction with indirect ↵	Francisco Jerez	2015-03-20	1	-47/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	descriptor. This is going to be useful because the Gen7+ uniform and varying pull constant, texturing, typed and untyped surface read, write, and atomic generation code on the vec4 and fs back-end all require the same logic to handle conditionally indirect surface indices. In pseudocode: \| if (surface.file == BRW_IMMEDIATE_VALUE) { \| inst = brw_SEND(p, dst, payload); \| set_descriptor_control_bits(inst, surface, ...); \| } else { \| inst = brw_OR(p, addr, surface, 0); \| set_descriptor_control_bits(inst, ...); \| inst = brw_SEND(p, dst, payload); \| set_indirect_send_descriptor(inst, addr); \| } This patch abstracts out this frequently recurring pattern so we can now write: \| inst = brw_send_indirect_message(p, sfid, dst, payload, surface) \| set_descriptor_control_bits(inst, ...); without worrying about handling the immediate and indirect surface index cases explicitly. v2: Rebase. Improve documentatation and commit message. (Topi) Preserve UW destination type cargo-cult. (Topi, Ken, Matt) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/skl: Break down SIMD16 3-source instructions when required.	Kenneth Graunke	2015-03-20	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Several steppings of Skylake fail when using SIMD16 with 3-source instructions (such as MAD). This implements WaDisableSIMD16On3SrcInstr and fixes ~190 Piglit tests. Based on a patch by Neil Roberts. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Neil Roberts <neil@linux.intel.com>
*	i965: Refactor SIMD16-to-2xSIMD8 checks.	Neil Roberts	2015-03-20	1	-4/+14
\| \| \| \| \| \| \| \| \| \| \|	The places that were checking whether 3-source instructions are supported have now been combined into a small helper function. This will be used in the next patch to add an additonal restriction. Based on a patch by Kenneth Graunke. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965/fs: Print spills:fills and number of promoted constants.	Matt Turner	2015-03-19	1	-8/+14
\| \| \| \| \|	Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
*	i965/fs: Implement SIMD16 dual source blending.	Iago Toral Quiroga	2015-03-09	1	-3/+12
\| \| \| \| \| \| \| \| \| \| \| \|	From the SNB PRM, volume 4, part 1, page 193: "The dual source render target messages only have SIMD8 forms due to maximum message length limitations. SIMD16 pixel shaders must send two of these messages to cover all of the pixels. Each message contains two colors (4 channels each) for each pixel in the message payload." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82831 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
*	Fix invalid extern "C" around header inclusion.	Mark Janes	2015-03-05	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \|	System headers may contain C++ declarations, which cannot be given C linkage. For this reason, include statements should never occur inside extern "C". This patch moves the C linkage statements to enclose only the declarations within a single header. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>