external_mesa3d.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	radeonsi: fix an off-by-one error in the bounds check for max_vertices	Nicolai Hähnle	2016-12-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	The spec actually says that calling EmitStreamVertex is undefined when you exceed max_vertices. But we do need to avoid trampling over memory outside the GSVS ring. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> (cherry picked from commit 88509518b01d7c1d7436a790bf9be5cf3c41a528)
*	radeonsi: do not kill GS with memory writes	Nicolai Hähnle	2016-12-15	1	-8/+22
\| \| \| \| \| \| \| \| \| \| \| \|	Vertex emits beyond the specified maximum number of vertices are supposed to have no effect, which is why we used to always kill GS that reached the limit. However, if the GS also writes to memory (SSBO, atomics, shader images), then we must keep going and only skip the vertex emit itself. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com> (cherry picked from commit 7655bccce80c9690ecb850304d15238ef1e0d622)
*	radeonsi: wait for outstanding LDS instructions in memory barriers if needed	Marek Olšák	2016-12-14	1	-1/+17
\| \| \| \| \| \|	Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 13c34cf8ca43d0f9c1e1a663e6a3783b0938dfd9)
*	radeonsi: wait for outstanding memory instructions in TCS barriers	Marek Olšák	2016-12-14	1	-1/+5
\| \| \| \| \| \|	Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 16f49c16c79a67f174b92672d546f909425f7fc3)
*	radeonsi: allow specifying simm16 of emit_waitcnt at call sites	Marek Olšák	2016-12-14	1	-5/+7
\| \| \| \| \| \| \| \|	The next commit will use this. Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 15e96c70b0b668a2626326d3572a247e41885c18)
*	radeonsi: fix isolines tess factor writes to control ring	Nicolai Hähnle	2016-12-14	1	-4/+12
\| \| \| \| \| \| \| \| \|	Fixes piglit arb_tessellation_shader/execution/isoline{_no_tcs}.shader_test. Cc: mesa-stable@lists.freedesktop.org (cherry picked from commit d3931a355fd5d309d5bcfe2655249f029e84d355) [Emil Velikov: there is no si_shader_key::part in branch] Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
*	radeonsi: apply a TC L1 write corruption workaround for SI	Marek Olšák	2016-12-14	1	-11/+23
\| \| \| \| \| \|	Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 72e46c98896d0cb13fc7d70b7a4193a84d72a5fc)
*	radeonsi: apply a multi-wave workgroup SPI bug workaround to affected CIK chips	Marek Olšák	2016-12-14	1	-2/+22
\| \| \| \| \| \| \| \|	All codepaths are handled except for clover. Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 72d48fcd8eb5862c72d27e5462c289c5de65396e)
*	radeonsi: consolidate max-work-group-size computation	Marek Olšák	2016-12-14	1	-24/+19
\| \| \| \| \| \| \| \|	The next commit will need this. Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit ec36c63b4f417973a6d50d79281f4834682c4555)
*	radeonsi: fix 64-bit loads from LDS	Nicolai Hähnle	2016-10-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Fixes spec/arb_tessellation_shader/execution/dvec[23]-vs-tcs-tes, among others. Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> (cherry picked from commit 4a2dbfff05f7be271c2aa72e783e24b31906db51)
*	radeonsi: rename prefixes from radeon to si	Marek Olšák	2016-10-18	1	-46/+46
\| \| \| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
*	radeonsi: merge radeon_llvm_context and si_shader_context	Marek Olšák	2016-10-18	1	-271/+193
\| \| \| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
*	radeonsi: import all TGSI->LLVM code from gallium/radeon	Marek Olšák	2016-10-18	1	-2/+0
\| \| \| \| \| \|	Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
*	radeonsi: move LLVM ALU codegen into radeonsi	Marek Olšák	2016-10-18	1	-6/+3
\| \| \| \| \| \|	Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
*	radeonsi: unify the constant load paths	Nicolai Hähnle	2016-10-17	1	-28/+11
\| \| \| \| \| \|	Remove the split between direct and indirect. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
*	radeonsi: fix indirect loads of 64 bit constants	Nicolai Hähnle	2016-10-17	1	-2/+2
\| \| \| \| \| \| \|	This fixes GL45-CTS.compute_shader.fp64-case3. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>
*	radeonsi: shorten "shader->selector" to "sel" in si_shader_create	Marek Olšák	2016-10-17	1	-7/+8
\| \| \| \| \|	Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: implement TC-compatible HTILE	Marek Olšák	2016-10-13	1	-2/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	so that decompress blits aren't needed and depth texturing needs less memory bandwidth. Z16 and Z24 are promoted to Z32_FLOAT by the driver, because TC-compatible HTILE only supports Z32_FLOAT. This doubles memory footprint for Z16. The format promotion is not visible to state trackers. This is part of TC-compatible renderbuffer compression, which has 3 parts: DCC, HTILE, FMASK. Only TC-compatible FMASK compression is missing now. I don't see a measurable increase in performance though. (I tested Talos Principle and DiRT: Showdown, the latter is improved by 0.5%, which is almost noise, and it originally used layered Z16, so at least we know that Z16 promoted to Z32F isn't slower now) Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: fix regression in image atomics	Nicolai Hähnle	2016-10-13	1	-1/+1
\| \| \| \|	Caused by a bad rebase when pushing commit 76a940893.
*	radeonsi: fix the coordinate overloading of llvm.amdgcn.image.atomic.cmpswap.*	Nicolai Hähnle	2016-10-13	1	-2/+7
\| \| \| \| \| \| \|	Fixes GL45-CTS.shader_image_load_store.basic-allTargets-atomic* Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
*	radeonsi: Use the new image load/store intrinsic signatures	Tom Stellard	2016-10-12	1	-14/+45
\| \| \| \| \| \|	This patch requires LLVM r284024 or newer. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: Add function for converting LLVM type to intrinsic string	Tom Stellard	2016-10-12	1	-10/+32
\| \| \| \| \| \|	The existing function only worked for integer types. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: Refactor image store/load intrinsic name creation	Tom Stellard	2016-10-12	1	-11/+18
\| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: support ARB_compute_variable_group_size	Nicolai Hähnle	2016-10-10	1	-14/+30
\| \| \| \| \| \| \| \|	Not sure if it's possible to avoid programming the block size twice (once for the userdata and once for the dispatch). Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
*	radeonsi: fix interpolateAt opcodes for .zw components	Marek Olšák	2016-10-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Not returning garbage in .zw seems pretty important. This fixes: GL45-CTS.shader_multisample_interpolation.render.interpolate_at__check. Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: interpolate colors after interpolation weight shuffling	Marek Olšák	2016-10-05	1	-48/+48
\| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: optionally run the LLVM IR verifier pass	Nicolai Hähnle	2016-10-04	1	-7/+21
\| \| \| \| \| \| \| \|	This is enabled automatically if shader printing is enabled, or separately by R600_DEBUG=checkir. Catch mal-formed IR before it crashes in a later pass. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
*	radeonsi: don't declare LDS in PS when ds_bpermute is used	Marek Olšák	2016-10-04	1	-4/+3
\| \| \| \| \| \| \| \|	I guess this is not needed because dead code elimination removes the declaration. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
*	radeonsi: use DDX/DDY directly in si_llvm_emit_ddxy_interp	Marek Olšák	2016-10-04	1	-49/+7
\| \| \| \| \| \| \|	We can finally do this, because the opcodes are scalar now. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
*	radeonsi: simplify si_llvm_emit_ddxy	Marek Olšák	2016-10-04	1	-51/+29
\| \| \| \| \| \| \| \|	si_llvm_emit_ddxy is called once per element, so we don't have to generate code for 4 elements at once. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
*	radeonsi: don't call build_gep0 in si_llvm_emit_ddxy on VI	Marek Olšák	2016-10-04	1	-5/+9
\| \| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
*	radeonsi: use a helper function for BuildGEP(0, x)	Marek Olšák	2016-10-04	1	-47/+35
\| \| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
*	radeonsi: remove obsolete shader definitions	Marek Olšák	2016-10-04	1	-12/+4
\| \| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
*	radeonsi: remove unnecessary #includes	Marek Olšák	2016-10-04	1	-5/+0
\| \| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
*	radeonsi: reload PS inputs with direct indexing at each use (v2)	Marek Olšák	2016-09-14	1	-16/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The LLVM compiler can CSE interp intrinsics thanks to LLVMReadNoneAttribute. 26011 shaders in 14651 tests Totals: SGPRS: 1146340 -> 1132676 (-1.19 %) VGPRS: 727371 -> 711730 (-2.15 %) Spilled SGPRs: 2218 -> 2078 (-6.31 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 35841268 -> 36009732 (0.47 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 222559 -> 224779 (1.00 %) Wait states: 0 -> 0 (0.00 %) v2: don't call load_input for fragment shaders in emit_declaration Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: get rid of constant buffer preloading	Marek Olšák	2016-09-14	1	-24/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	26011 shaders in 14651 tests Totals: SGPRS: 1152636 -> 1146340 (-0.55 %) VGPRS: 728198 -> 727371 (-0.11 %) Spilled SGPRs: 3776 -> 2218 (-41.26 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 35835152 -> 35841268 (0.02 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 222372 -> 222559 (0.08 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
*	radeonsi: get rid of img/buf/sampler descriptor preloading (v2)	Marek Olšák	2016-09-14	1	-132/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	26011 shaders in 14651 tests Totals: SGPRS: 1251920 -> 1152636 (-7.93 %) VGPRS: 728421 -> 728198 (-0.03 %) Spilled SGPRs: 16644 -> 3776 (-77.31 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 36001064 -> 35835152 (-0.46 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 222221 -> 222372 (0.07 %) Wait states: 0 -> 0 (0.00 %) v2: merge codepaths where possible Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: rename get_sampler_desc -> load_sampler_desc	Marek Olšák	2016-09-14	1	-11/+11
\| \| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
*	radeonsi: cosmetic changes in si_shader.c	Marek Olšák	2016-09-14	1	-3/+5
\| \| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
*	radeonsi: load streamout buffer descriptors before use (v2)	Marek Olšák	2016-09-14	1	-33/+14
\| \| \| \| \| \|	v2: inline the code and remove the conditional that's a no-op now Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: fix FP64 UBO loads with indirect uniform block indexing	Marek Olšák	2016-09-13	1	-2/+1
\| \| \| \| \| \| \|	No known tests. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: export SampleMask from pixel shaders at full rate	Marek Olšák	2016-09-13	1	-12/+51
\| \| \| \| \| \| \|	Heaven and Valley write gl_SampleMask and not Z. Use 16_ABGR instead of 32_ABGR if Z isn't written. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: don't preload constants at the beginning of shaders	Marek Olšák	2016-09-12	1	-20/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	LLVM can CSE the loads, thus we can always re-load constants before each use. The decrease in SGPR spilling is huge. The best improvements are the dumbest ones. 26011 shaders in 14651 tests Totals: SGPRS: 1453346 -> 1251920 (-13.86 %) VGPRS: 742576 -> 728421 (-1.91 %) Spilled SGPRs: 52298 -> 16644 (-68.17 %) Spilled VGPRs: 397 -> 369 (-7.05 %) Scratch VGPRs: 1372 -> 1344 (-2.04 %) dwords per thread Code Size: 36136488 -> 36001064 (-0.37 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 219315 -> 222221 (1.33 %) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: fix Gather4 with integer formats	Marek Olšák	2016-09-05	1	-3/+96
\| \| \| \| \| \| \| \| \| \|	The closed compiler does the same thing. This fixes: GL45-CTS.texture_gather.-int- (18 tests) Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: fix a crash in imageSize for cubemap arrays	Marek Olšák	2016-09-05	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \|	Sometimes it was f32, other times it was i32. Now it's always i32. This fixes: GL45-CTS.texture_cube_map_array.image_texture_size.texture_size_compute_sh Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: fix gl_PatchVerticesIn for tessellation evaluation shader	Marek Olšák	2016-09-05	1	-1/+6
\| \| \| \| \| \| \| \| \|	This fixes: GL45-CTS.tessellation_shader.tessellation_control_to_tessellation_evaluation .gl_PatchVerticesIn Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: always use the same function signature for llvm.SI.export	Marek Olšák	2016-09-05	1	-4/+4
\| \| \| \| \|	Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: Don't use global variables for tess lds	Tom Stellard	2016-08-29	1	-9/+6
\| \| \| \| \| \| \| \| \|	We were allocating global variables for the maximum LDS size which made the compiler think we were using all of LDS, which isn't the case. Reviewed-By: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
*	gallium/radeon: add radeon_llvm_bound_index for bounds checking	Nicolai Hähnle	2016-08-17	1	-18/+1
\| \| \| \|	Reviewed-by: Marek Olšák <marek.olsak@amd.com>
*	gallium/radeon: use tgsi_scan_arrays for temp arrays	Nicolai Hähnle	2016-08-17	1	-1/+2
\| \| \| \|	Reviewed-by: Marek Olšák <marek.olsak@amd.com>