external_mesa3d.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	i965: Silence unused parameter warnings	Ian Romanick	2016-10-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	brw_link.cpp:76:44: warning: unused parameter ‘shader_type’ [-Wunused-parameter] gl_shader_stage shader_type, ^ brw_nir.c: In function ‘brw_nir_lower_vs_inputs’: brw_nir.c:194:55: warning: unused parameter ‘devinfo’ [-Wunused-parameter] const struct gen_device_info *devinfo, ^ brw_vec4_visitor.cpp:914:37: warning: unused parameter ‘sampler’ [-Wunused-parameter] uint32_t sampler, ^ brw_vec4_visitor.cpp:1146:34: warning: unused parameter ‘stream_id’ [-Wunused-parameter] vec4_visitor::gs_emit_vertex(int stream_id) ^ Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Engestrom <eric@engestrom.ch>
*	i965: Enable ARB_shader_atomic_counter_ops	Ian Romanick	2016-10-04	1	-3/+13
\| \| \| \| \|	Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965: Refactor emission of atomic counter operations	Ian Romanick	2016-10-04	1	-15/+4
\| \| \| \| \| \| \|	This will make it easier to add more operations. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/vec4: Replace vec4_instruction::regs_written with ::size_written field ↵	Francisco Jerez	2016-09-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in bytes. The previous regs_written field can be recovered by rewriting each rvalue reference of regs_written like 'x = i.regs_written' to 'x = DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference like 'i.regs_written = x' to 'i.size_written = x * reg_unit'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/vec4: Replace dst/src_reg::reg_offset with dst/src_reg::offset ↵	Francisco Jerez	2016-09-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	expressed in bytes. The dst/src_reg::offset field in byte units introduced in the previous patch is a more straightforward alternative to an offset representation split between ::reg_offset and ::subreg_offset fields. The split representation makes it too easy to forget about one of the offsets while dealing with the other, which has led to multiple FS back-end bugs in the past. To make the matter worse the unit reg_offset was expressed in was rather inconsistent, for uniforms it would be expressed in either 4B or 16B units depending on the back-end, and for most other things it would be expressed in 32B units. This encodes reg_offset as a new offset field expressed consistently in byte units. Each rvalue reference of reg_offset in existing code like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and each lvalue reference like 'r.reg_offset = x' is rewritten to 'r.offset = r.offset % reg_unit + x * reg_unit'. Because the change affects a lot of places and is rather non-trivial to verify due to the inconsistent value of reg_unit, I've tried to avoid making any additional changes other than applying the rewrite rule above in order to keep the patch as simple as possible, sometimes at the cost of introducing obvious stupidity (e.g. algebraic expressions that could be simplified given some knowledge of the context) -- I'll clean those up later on in a second pass. v2: Fix division by the wrong reg_unit in the UNIFORM case of convert_to_hw_regs(). (Iago) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/vec4: add support for packing vs/gs/tes outputs	Timothy Arceri	2016-07-21	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \|	Here we create a new output_generic_reg array with the ability to store the dst_reg for each component of user defined varyings. This is needed as the previous code only stored the dst_reg based on the varying location which meant packed varyings would overwrite each other. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
*	i965/vec4: add support for packing inputs	Timothy Arceri	2016-07-21	1	-0/+2
\| \| \| \| \|	Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
*	i965: Stop muging cube array lengths by 6	Jason Ekstrand	2016-07-20	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From the Sky Lake PRM: "For SURFTYPE_CUBE: For Sampling Engine Surfaces and Typed Data Port Surfaces, the range of this field is [0,340], indicating the number of cube array elements (equal to the number of underlying 2D array elements divided by 6). For other surfaces, this field must be zero." In other words, the depth field for cube maps is in number of cubes not number of 2-D slices so we need to divide by 6. ISL will do this correctly for us assuming that we provide it with the correct array bounds which it expects to be in 2-D slices. It appears as if we've been doing this wrong ever since we first added cube map arrays for Sandy Bridge and the change to ISL made things slightly worse. While we're at it, we now need to remoe the shader hacks we've always done since they were only needed because we were setting the depth field six times too large. v2: Fix the vec4 backend as well (not sure how I missed this). Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Chris Forbes <chrisforbes@google.com>
*	i965: Use LZD to implement nir_op_find_lsb on Gen < 7	Ian Romanick	2016-07-19	1	-2/+24
\| \| \| \| \| \| \|	v2: Rebase on changes to previous two patches. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Use LZD to implement nir_op_ifind_msb on Gen < 7	Ian Romanick	2016-07-19	1	-11/+46
\| \| \| \| \| \| \| \| \|	v2: Retype LZD source as UD to avoid potential problems with 0x80000000. Suggested by Matt. Also update comment about problem values with LZD(abs(x)). Suggested by Curro. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Use LZD to implement nir_op_ufind_msb	Ian Romanick	2016-07-19	1	-0/+23
\| \| \| \| \| \| \| \| \| \|	This uses one less instruction. v2: Move emit_find_msb_using_lzd out of the visitor classes. Suggested by Curro. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Pass nir_src/nir_dest by reference.	Matt Turner	2016-05-20	1	-6/+6
\| \| \| \| \| \| \| \| \| \|	Cuts 6K of .text. text data bss dec hex filename 5772372 264648 29320 6066340 5c90a4 lib/i965_dri.so before 5766074 264648 29320 6060042 5c780a lib/i965_dri.so after Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
*	i965: Silence unused parameter warnings	Ian Romanick	2016-05-18	1	-10/+5
\| \| \| \| \| \| \| \| \| \| \| \|	The only place that actually used the type parameter was the GS visitor, and it was always passed glsl_type::int. Just remove the parameter. brw_vec4_vs_visitor.cpp:38:61: warning: unused parameter ‘type’ [-Wunused-parameter] const glsl_type *type) ^ Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
*	nir: Switch the arguments to nir_foreach_function	Jason Ekstrand	2016-04-28	1	-2/+2
\| \| \| \| \| \| \| \| \|	This matches the "foreach x in container" pattern found in many other programming languages. Generated by the following regular expression: s/nir_foreach_function(\([^,]\),\s\([^,]*\))/nir_foreach_function(\2, \1)/ Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
*	nir: Switch the arguments to nir_foreach_instr	Jason Ekstrand	2016-04-28	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	This matches the "foreach x in container" pattern found in many other programming languages. Generated by the following regular expression: s/nir_foreach_instr(\([^,]\),\s\([^,]*\))/nir_foreach_instr(\2, \1)/ and similar expressions for nir_foreach_instr_safe etc. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
*	i965/nir: fixup for new foreach_block()	Connor Abbott	2016-04-28	1	-4/+4
\| \| \| \|	Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
*	nir: rename nir_foreach_block() to nir_foreach_block_call()	Connor Abbott	2016-04-20	1	-1/+1
\| \| \| \|	Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
*	i965/vec4: Use the correct offset for the swizzle shift in push constants	Jason Ekstrand	2016-04-20	1	-1/+1
\| \| \| \| \| \| \| \| \|	This was actually caught by Ken in review the first time around but somehow didn't get fixed before the patches were pushed. :-( Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95001
*	i965/vec4: Use nir_intrinsic_base in the load_uniform implementation	Jason Ekstrand	2016-04-20	1	-1/+1
\| \| \| \| \| \| \| \|	We shouldn't be reading the const_index directly Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95001
*	i965/vec4: Support full std140 layout for push constants	Jason Ekstrand	2016-04-15	1	-5/+25
\| \| \| \| \| \| \| \| \|	Up until now, we have been able to assume that all push constants are vec4-aligned because this is what the GL driver gives us. In Vulkan, we need to be able to support full std140 because we get the layout from the client. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/vec4: Get rid of the uniform_size array	Jason Ekstrand	2016-04-14	1	-9/+0
\| \| \| \| \|	Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/vec4: Use MOV_INDIRECT instead of reladdr for indirect push constants	Jason Ekstrand	2016-04-14	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit moves us to an instruction based model rather than a register-based model for indirects. This is more accurate anyway as we have to emit instructions to resolve the reladdr. It's also a lot simpler because it gets rid of the recursive reladdr problem by design. One side-effect of this is that we need a whole new algorithm in move_uniform_array_access_to_pull_constants. This new algorithm is much more straightforward than the old one and is fairly similar to what we're already doing in the FS backend. Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/vec4: Use UD rather than D for uniform indirects	Jason Ekstrand	2016-04-14	1	-1/+1
\| \| \| \|	Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Implement the new imod and irem opcodes	Jason Ekstrand	2016-04-13	1	-0/+36
\| \| \| \|	Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Port INTEL_PRECISE_TRIG=1 to NIR.	Kenneth Graunke	2016-04-11	1	-14/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This makes the extra multiply visible to NIR's algebraic optimizations (for constant reassociation) as well as constant folding. This means that when the result of sin/cos are multiplied by an constant, we can eliminate the extra multiply altogether, reducing the cost of the workaround. It also means we only have to implement it one place, rather than in both backends. This makes INTEL_PRECISE_TRIG=1 cost nothing on GPUTest/Volplosion, which has a ton of sin() calls, but always multiplies them by an immediate constant. The extra multiply gets folded away. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965/nir: Provide a default LOD for buffer textures	Jason Ekstrand	2016-04-04	1	-0/+4
\| \| \| \| \| \| \| \| \|	Our hardware requires an LOD for all texelFetch commands even if they are on buffer textures. GLSL IR gives us an LOD of 0 in that case, but the LOD is really rather meaningless. This commit allows other NIR producers to be more lazy and not provide one at all. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965: Add an INTEL_PRECISE_TRIG=1 option to fix SIN/COS output range.	Kenneth Graunke	2016-04-04	1	-2/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The SIN and COS instructions on Intel hardware can produce values slightly outside of the [-1.0, 1.0] range for a small set of values. Obviously, this can break everyone's expectations about trig functions. According to an internal presentation, the COS instruction can produce a value up to 1.000027 for inputs in the range (0.08296, 0.09888). One suggested workaround is to multiply by 0.99997, scaling down the amplitude slightly. Apparently this also minimizes the error function, reducing the maximum error from 0.00006 to about 0.00003. When enabled, fixes 16 dEQP precision tests dEQP-GLES31.functional.shaders.builtin_functions.precision. {cos,sin}.{highp,mediump}_compute.{scalar,vec2,vec4,vec4}. at the cost of making every sin and cos call more expensive (about twice the number of cycles on recent hardware). Enabling this option has been shown to reduce GPUTest Volplosion performance by about 10%. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
*	i965: Add an implemnetation of nir_op_fquantize2f16	Jason Ekstrand	2016-04-01	1	-0/+25
\| \| \| \|	Reviewed-by: Matt Turner <mattst88@gmail.com>
*	nir: rename nir_const_value fields to include bitsize information	Iago Toral Quiroga	2016-03-17	1	-22/+22
\| \| \| \| \|	Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
*	i965/vec4/nir: no need to use surface_access:: to call emit_untyped_atomic	Alejandro Piñeiro	2016-03-08	1	-6/+5
\| \| \| \| \| \| \|	Now that brw_vec4_visitor::emit_untyped_atomic was removed, there is no need to explicitly set it. Reviewed-by: Francisco Jerez <currojerez@riseup.net>
*	i965/vec4/nir: remove emit_untyped_surface_read and emit_untyped_atomic at ↵	Alejandro Piñeiro	2016-03-08	1	-13/+23
\| \| \| \| \| \| \| \| \| \| \| \| \|	brw_vec4_visitor surface_access emit_untyped_read and emit_untyped_atomic provides the same functionality. v2: surface parameter of emit_untyped_atomic is a const, no need to specify default predicate on emit_untyped_atomic, use retype (Francisco Jerez). Reviewed-by: Francisco Jerez <currojerez@riseup.net>
*	i965: Push most TES inputs in vec4 mode.	Kenneth Graunke	2016-02-29	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(This is commit 4a1c8a3037cd29938b2a6e2c680c341e9903cfbe for vec4 mode.) Using the push model for inputs is much more efficient than pulling inputs - the hardware can simply copy a large chunk into URB registers at thread creation time, rather than having the thread send messages to request data from the L3 cache. Unfortunately, it's possible to have more TES inputs than fit in registers, so we have to fall back to the pull model in some cases. However, it turns out that most tessellation evaluation shaders are fairly simple, and don't use many inputs. An arbitrary cut-off of 24 vec4 slots (12 registers) should suffice. (I chose this instead of the 32 vec4 slots used in the scalar backend to avoid regressing a few Piglit tests due to the vec4 register allocator being too stupid to figure out what to do. We probably ought to fix that, but it's a separate issue.) Improves performance in GPUTest's tessmark_x64 microbenchmark by 41.5394% +/- 0.288519% (n = 115) at 1024x768 on my Clevo W740SU (with Iris Pro 5200). Improves performance in Synmark's Gl40TerrainFlyTess microbenchmark by 38.3576% +/- 0.759748% (n = 42). v2: Simplify abs/negate handling, as requested by Matt. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	nir: Remove the const_offset from nir_tex_instr	Jason Ekstrand	2016-02-10	1	-10/+11
\| \| \| \| \| \| \| \| \| \| \|	When NIR was originally drafted, there was no easy way to determine if something was constant or not. The result was that we had lots of special-casing for constant values such as this. Now that load_const instructions are SSA-only, it's really easy to find constants and this isn't really needed anymore. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Rob Clark <robclark@gmail.com>
*	i965/vec4: Plumb separate surfaces and samplers through from NIR	Jason Ekstrand	2016-02-09	1	-3/+12
\| \| \| \|	Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	nir: Separate texture from sampler in nir_tex_instr	Jason Ekstrand	2016-02-09	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	This commit adds the capability to NIR to support separate textures and samplers. As it currently stands, glsl_to_nir only sets the texture deref and leaves the sampler deref alone as it did before and nir_lower_samplers assumes this. Backends can still assume that they are combined and only look at only at the texture index. Or, if they wish, they can assume that they are separate because nir_lower_samplers, tgsi_to_nir, and prog_to_nir all set both texture and sampler index whenever a sampler is required (the two indices are the same in this case). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	nir/tex_instr: Rename sampler to texture	Jason Ekstrand	2016-02-09	1	-12/+12
\| \| \| \| \| \| \| \| \|	We're about to separate the two concepts. When we do, the sampler will become optional. Doing a rename first makes the separation a bit more safe because drivers that depend on GLSL or TGSI behaviour will be fine to just use the texture index all the time. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/vec4: Implement nir_op_pack_uvec2_to_uint.	Matt Turner	2016-02-01	1	-0/+18
\| \| \| \| \| \| \|	And mark nir_op_pack_uvec4_to_uint unreachable, since it's only produced by lowering pack[SU]norm4x8 which the vec4 backend does not need. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/vec4: Spaces around operators.	Matt Turner	2016-01-19	1	-1/+1
\|
*	i965/vec4: Use UW type for multiply into accumulator on GEN8+	Jason Ekstrand	2016-01-15	1	-1/+5
\| \| \| \| \| \| \| \|	BDW adds the following restriction: "When multiplying DW x DW, the dst cannot be accumulator." Cc: "11.1,11.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	nir: Lower bitfield_extract.	Matt Turner	2016-01-14	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The OpenGL specifications for bitfieldExtract() says: The result will be undefined if <offset> or <bits> is negative, or if the sum of <offset> and <bits> is greater than the number of bits used to store the operand. Therefore passing bits=32, offset=0 is legal and defined in GLSL. But the earlier SM5 ubfe/ibfe opcodes are specified to accept a bitfield width ranging from 0-31. As such, Intel and AMD instructions read only the low 5 bits of the width operand, making them not able to implement the GLSL-specified behavior directly. This commit adds ubfe/ibfe operations from SM5 and a lowering pass for bitfield_extract to to handle the trivial case of <bits> = 32 as bitfieldExtract: bits > 31 ? value : bfe(value, offset, bits) Fixes: ES31-CTS.shader_bitfield_operation.bitfieldExtract.uvec3_0 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595 Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>
*	glsl: Delete the ir_binop_bfm and ir_triop_bfi opcodes.	Kenneth Graunke	2016-01-13	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	TGSI doesn't use these - it just translates ir_quadop_bitfield_insert directly. NIR can handle ir_quadop_bitfield_insert as well. These opcodes were only used for i965, and with Jason's recent patches, we can do this lowering in NIR (which also gains us SPIR-V handling). So there's not much point to retaining this GLSL IR lowering code. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965: Add support for gl_DrawIDARB and enable extension	Kristian Høgsberg Kristensen	2015-12-29	1	-0/+8
\| \| \| \| \| \| \| \| \| \|	We have to break open a new vec4 for gl_DrawIDARB. We've used up all space in the vec4 we use for SGVS and gl_DrawIDARB has to come from its own separate vertex buffer anyway. This is because we point the vb for base vertex and base instance into the draw parameter BO for indirect draw calls, but the draw id is generated by mesa in a different buffer. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
*	i965: Add support for gl_BaseVertexARB and gl_BaseInstanceARB	Kristian Høgsberg Kristensen	2015-12-29	1	-0/+8
\| \| \| \| \| \| \|	We already have gl_BaseVertexARB in the .x component of the SGVS vec4 and plug gl_BaseInstanceARB into the last free component (.y). Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
*	nir: Get rid of function overloads	Jason Ekstrand	2015-12-28	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When Connor originally drafted NIR, he copied the same function+overload system that GLSL IR had with a few names changed. However, this double-indirection is not really needed and has only served to confuse people. Instead, let's just have functions which may not have unique names and may or may not have an implementation. If someone wants to do overload resolving, they can hav a hash table based function+overload system in the overload resolving pass. There's no good reason to keep it in core NIR. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org> ir3 bits are Reviewed-by: Rob Clark <robclark@gmail.com>
*	i965: Add tessellation control shaders.	Kenneth Graunke	2015-12-22	1	-1/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The TCS is the first tessellation shader stage, and the most complicated. It has access to each of the control points in the input patch, and computes a new output patch. There is one logical invocation per output control point; all invocations run in parallel, and can communicate by reading and writing output variables. One of the main responsibilities of the TCS is to write the special gl_TessLevelOuter[] and gl_TessLevelInner[] output variables which control how much new geometry the hardware tessellation engine will produce. Otherwise, it simply writes outputs that are passed along to the TES. We run in SIMD4x2 mode, handling two logical invocations per EU thread. The hardware doesn't properly manage the dispatch mask for us; it always initializes it to 0xFF. We wrap the whole program in an IF..ENDIF block to handle an odd number of invocations, essentially falling back to SIMD4x1 on the last thread. v2: Update comments (requested by Jordan Justen). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965/vec4: Optimize predicate handling for any/all.	Matt Turner	2015-12-18	1	-18/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For a select whose condition is any(v), instead of emitting cmp.nz.f0(8) null<1>D g1<0,4,1>D 0D mov(8) g7<1>.xUD 0x00000000UD (+f0.any4h) mov(8) g7<1>.xUD 0xffffffffUD cmp.nz.f0(8) null<1>D g7<4,4,1>.xD 0D (+f0) sel(8) g8<1>UD g4<4,4,1>UD g3<4,4,1>UD we now emit cmp.nz.f0(8) null<1>D g1<0,4,1>D 0D (+f0.any4h) sel(8) g9<1>UD g4<4,4,1>UD g3<4,4,1>UD Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	nir: Delete bany, ball, fany, fall.	Matt Turner	2015-12-18	1	-14/+0
\| \| \| \| \| \| \| \| \| \| \|	As in the previous patches, these can be implemented as any(v) -> any_nequal(v, false) all(v) -> all_equal(v, true) and their removal simplifies the code in the next patch. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
*	nir: Get rid of *_indirect variants of input/output load/store intrinsics	Jason Ekstrand	2015-12-10	1	-55/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is some special-casing needed in a competent back-end. However, they can do their special-casing easily enough based on whether or not the offset is a constant. In the mean time, having the _indirect variants adds special cases a number of places where they don't need to be and, in general, only complicates things. To complicate matters, NIR had no way to convdert an indirect load/store to a direct one in the case that the indirect was a constant so we would still not really get what the back-ends wanted. The best solution seems to be to get rid of the _indirect variants entirely. This commit is a bunch of different changes squashed together: - nir: Get rid of *_indirect variants of input/output load/store intrinsics - nir/glsl: Stop handling UBO/SSBO load/stores differently depending on indirect - nir/lower_io: Get rid of load/store_foo_indirect - i965/fs: Get rid of load/store_foo_indirect - i965/vec4: Get rid of load/store_foo_indirect - tgsi_to_nir: Get rid of load/store_foo_indirect - ir3/nir: Use the new unified io intrinsics - vc4: Do all uniform loads with byte offsets - vc4/nir: Use the new unified io intrinsics - vc4: Fix load_user_clip_plane crash - vc4: add missing src for store outputs - vc4: Fix state uniforms - nir/lower_clip: Update to the new load/store intrinsics - nir/lower_two_sided_color: Update to the new load intrinsic NIR and i965 changes are Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> NIR indirect declarations and vc4 changes are Reviewed-by: Eric Anholt <eric@anholt.net> ir3 changes are Reviewed-by: Rob Clark <robdclark@gmail.com> NIR changes are Acked-by: Rob Clark <robdclark@gmail.com>
*	i965: Make uniform offsets be in terms of bytes	Jason Ekstrand	2015-12-07	1	-4/+8
\| \| \| \| \| \| \| \| \| \|	This commit pushes makes uniform offsets be terms of bytes starting with nir_lower_io. They get converted to be in terms of vec4s or floats when we cram them in the UNIFORM register file but reladdr remains in terms of bytes all the way down to the point where we lower it to a pull constant load. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/vec4: Use a stride of 1 and byte offsets for UBOs	Jason Ekstrand	2015-12-07	1	-13/+3
\| \| \| \| \| \|	Cc: "11.0" <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92909 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>