external_mesa3d.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	radeonsi: apply a multi-wave workgroup SPI bug workaround to affected CIK chips	Marek Olšák	2016-12-14	1	-0/+2
\| \| \| \| \| \| \| \|	All codepaths are handled except for clover. Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 72d48fcd8eb5862c72d27e5462c289c5de65396e)
*	radeonsi: adjust and clean up Z_ORDER and EXEC_ON_x settings	Marek Olšák	2016-10-13	1	-1/+0
\| \| \| \| \| \| \|	The table was copied from the Vulkan driver. The comment lines are as long as the table for cosmetic reasons. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: support ARB_compute_variable_group_size	Nicolai Hähnle	2016-10-10	1	-1/+3
\| \| \| \| \| \| \| \|	Not sure if it's possible to avoid programming the block size twice (once for the userdata and once for the dispatch). Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
*	radeonsi: clean up lucky #include dependencies	Marek Olšák	2016-10-04	1	-32/+2
\| \| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
*	radeonsi: export SampleMask from pixel shaders at full rate	Marek Olšák	2016-09-13	1	-0/+2
\| \| \| \| \| \| \|	Heaven and Valley write gl_SampleMask and not Z. Use 16_ABGR instead of 32_ABGR if Z isn't written. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: add DRAWID parameter to vertex shaders	Nicolai Hähnle	2016-08-09	1	-1/+3
\| \| \| \| \|	Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
*	radeonsi: pre-generate shader logs for ddebug	Marek Olšák	2016-07-26	1	-0/+7
\| \| \| \| \| \| \| \|	This cuts down the overhead of si_dump_shader when ddebug is capturing shader logs, which is done for every draw call unconditionally (that's quite a lot of work for a draw call). Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: move the shader key dumping to si_shader_dump	Marek Olšák	2016-07-26	1	-1/+0
\| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: report accurate SGPR and VGPR spills	Marek Olšák	2016-07-13	1	-0/+2
\| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: do compilation from si_create_shader_selector asynchronously	Marek Olšák	2016-07-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Main shader parts and geometry shaders are compiled asynchronously by util_queue. si_create_shader_selector doesn't wait and returns. si_draw_vbo(si_shader_select) waits for completion. This has the best effect when shaders are compiled at app-loading time. It doesn't help much for shaders compiled on demand, even though VS+PS compilation should take as much as time as the bigger one of the two. If an app creates more shaders, at most 4 threads will be used to compile them. Debug output disables this for shader stats to be printed in the correct order. (We could go even further and build variants asynchronously too, then emit draw calls without waiting and emit incomplete shader states, then force IB chaining to give the compiler more time, then sync the compilation at the IB flush and patch the IB with correct shader states. This is great for compilation before draw calls, but there are some difficulties such as scratch and tess states requiring the compiler output, and an on-disk shader cache will likely be a much better and simpler solution.) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: separate the compilation chunk of si_create_shader_selector	Marek Olšák	2016-07-05	1	-0/+7
\| \| \| \| \| \| \|	The function interface is ready to be used by util_queue. Also, si_shader_select_with_key can no longer accept si_context. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: don't interpolate colors if flatshading is enabled	Marek Olšák	2016-07-05	1	-1/+1
\| \| \| \| \| \|	use v_interp_mov for those Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: enable the barycentric optimization in all cases	Marek Olšák	2016-07-05	1	-5/+2
\| \| \| \| \| \| \| \|	Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled. This should increase the PS launch rate for big primitives with MSAA. Based on discussion with SPI guys. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: compute only one set of interpolation (i,j) when MSAA is disabled	Marek Olšák	2016-07-05	1	-2/+2
\| \| \| \| \| \| \|	This should increase the PS launch rate for shaders using at least 2 pairs of perspective (i,j) and same for linear. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: split ps.prolog.force_persample_interp into persp and linear bits	Marek Olšák	2016-07-05	1	-1/+2
\| \| \| \| \| \|	This reduces the number of v_mov's in the prolog. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: enable WQM in PS prolog when needed	Nicolai Hähnle	2016-06-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	WQM is needed when the PS prolog computes a VGPR that is consumed by a shader with (implicit or explicit) derivatives. Depends on http://reviews.llvm.org/D20839 / LLVM r272063 for this to be effective (otherwise it's just a no-op). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130 Cc: 12.0 <mesa-dev@lists.freedesktop.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
*	radeonsi: Remove LDS layout user SGPR's from TES.	Bas Nieuwenhuizen	2016-05-26	1	-7/+8
\| \| \| \| \| \| \| \|	They are unused. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
*	radeonsi: Store inputs to memory when not using a TCS.	Bas Nieuwenhuizen	2016-05-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to copy the VS outputs to memory. I decided to do this using a shader key, as the value depends on other shaders. I also switch the fixed function TCS over to monolithic, as otherwisze many of the user SGPR's need to be passed to the epilog, which increases register pressure, or complexity to avoid that. The main body of the fixed function TCS is not that interesting to precompile anyway, since we do it on demand and it is very small. v2: Use u_bit_scan64. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
*	radeonsi: Add user SGPR for the layout of the offchip buffer.	Bas Nieuwenhuizen	2016-05-26	1	-2/+10
\| \| \| \| \| \|	Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
*	radeonsi: Use correct parameter index for LS_OUT_LAYOUT.	Bas Nieuwenhuizen	2016-05-26	1	-3/+4
\| \| \| \| \| \| \| \| \|	This happens to be in the right position, but that changes when TCS/TES get new parameters. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
*	radeonsi: Add offchip tessellation parameters.	Bas Nieuwenhuizen	2016-05-26	1	-1/+2
\| \| \| \| \| \|	Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
*	radeonsi: decrease GS copy shader user SGPRs to 2	Marek Olšák	2016-04-22	1	-1/+1
\| \| \| \| \| \| \| \|	const buffers are no longer used since the clip plane const buffer was moved to RW buffers Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: move default tess level constant buffer to RW buffers	Marek Olšák	2016-04-22	1	-0/+6
\| \| \| \| \|	Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: Add config parameter to si_shader_apply_scratch_relocs.	Bas Nieuwenhuizen	2016-04-21	1	-0/+1
\| \| \| \| \| \| \|	shader->config is not updated for compute kernels. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
*	radeonsi: add shared memory	Bas Nieuwenhuizen	2016-04-19	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Declares the shared memory as a global variable so that LLVM is aware of it and it does not conflict with passes like AMDGPUPromoteAlloca. v2: - Use ctx->i8. - Dropped null-check for declare_memory_region. - Changed memory region array to single region. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
*	radeonsi: lower compute shader arguments	Bas Nieuwenhuizen	2016-04-19	1	-0/+9
\| \| \| \| \| \|	Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: use enums in si_shader.h	Marek Olšák	2016-04-18	1	-93/+119
\| \| \| \| \|	Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: implement set_shader_buffers	Nicolai Hähnle	2016-04-12	1	-56/+58
\| \| \| \| \|	Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
*	radeonsi: implement set_shader_images (v2)	Nicolai Hähnle	2016-03-21	1	-2/+2
\| \| \| \| \| \| \| \| \|	Whether DCC is disabled depends on the access flags with which the image is bound: image_load supports DCC, but store and atomic don't. v2: remove an unnecessary masking of images->desc.enabled_mask Reviewed-by: Marek Olšák <marek.olsak@amd.com>
*	radeonsi: allow dumping shader disassemblies to a file	Marek Olšák	2016-03-01	1	-1/+2
\| \| \| \|	Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
*	radeonsi: use re-Z	Marek Olšák	2016-03-01	1	-0/+1
\| \| \| \| \| \| \| \| \|	This can increase perf for shaders that kill pixels (kill, alpha-test, alpha-to-coverage). v2: add comments Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
*	radeonsi: implement binary shaders & shader cache in memory (v2)	Marek Olšák	2016-02-21	1	-1/+3
\| \| \| \| \| \| \|	v2: handle _mesa_hash_table_insert failure other cosmetic changes Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: move some struct si_shader members to new struct si_shader_info	Marek Olšák	2016-02-21	1	-9/+12
\| \| \| \| \| \|	This will be part of shader binaries. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: use smaller types for some si_shader members	Marek Olšák	2016-02-21	1	-3/+5
\| \| \| \| \| \| \| \|	in order to decrease the shader size for a shader cache. v2: add & use SI_MAX_VS_OUTPUTS Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: compile non-GS middle parts of shaders immediately if enabled	Marek Olšák	2016-02-21	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Still disabled. Only prologs & epilogs are compiled in draw calls, but each variant of those is compiled only once per process. VS is always compiled as hw VS. TES is always compiled as hw VS. LS and ES stages are always compiled on demand. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: add PS prolog	Marek Olšák	2016-02-21	1	-1/+13
\| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: add PS epilog	Marek Olšák	2016-02-21	1	-0/+7
\| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: add TCS epilog	Marek Olšák	2016-02-21	1	-0/+3
\| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: add VS epilog	Marek Olšák	2016-02-21	1	-0/+4
\| \| \| \| \| \| \| \| \|	It only exports the primitive ID. Also used by TES when it's compiled as VS. The VS input location of the primitive ID input is v2. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: add VS prolog	Marek Olšák	2016-02-21	1	-0/+9
\| \| \| \| \| \|	This is disabled with use_monolithic_shaders = true. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: first bits for non-monolithic shaders	Marek Olšák	2016-02-21	1	-1/+1
\| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: add code for combining and uploading shaders from 3 shader parts	Marek Olšák	2016-02-21	1	-0/+9
\| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: separate out shader key bits for prologs & epilogs	Marek Olšák	2016-02-21	1	-22/+55
\| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: compute how many input VGPRs fragment shaders have	Marek Olšák	2016-02-21	1	-0/+2
\| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: compute how many input SGPRs and VGPRs shaders have	Marek Olšák	2016-02-21	1	-0/+2
\| \| \| \| \| \| \|	Prologs (shader binaries inserted before the API shader binary) need to know this, so that they won't change the input registers unintentionally. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: put image, fmask, and sampler descriptors into one array	Marek Olšák	2016-02-10	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The texture slot is expanded to 16 dwords containing 2 descriptors. Those can be: - Image and fmask, or - Image and sampler state By carefully choosing the locations, we can put all three into one slot, with the fmask and sampler state being mutually exclusive. This improves shaders in 2 ways: - 2 user SGPRs are unused, shaders can use them as temporary registers now - each pair of descriptors is always on the same cache line v2: cosmetic changes: add back v8i32, don't load a sampler state & fmask at the same time Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: make LLVM IR dumping less messy	Marek Olšák	2016-02-09	1	-1/+2
\| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: remove useless code that handles dx10_clamp_mode	Marek Olšák	2016-02-09	1	-1/+0
\| \| \| \| \| \| \|	"enable-no-nans-fp-math" is a wrong string and there was a disagreement about fixing it. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: read SPI_PS_INPUT_ADDR from LLVM if it returns it	Marek Olšák	2016-02-09	1	-0/+1
\| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	radeonsi: implement forcing per-sample_interpolation using the shader key only	Marek Olšák	2016-02-09	1	-31/+19
\| \| \| \| \| \| \| \| \| \| \|	It was partly a state and partly emulated by shader code, but since we want to do this in a fragment shader prolog, we need to put it into the shader key, which will be used to generate the prolog. This also removes the spi_ps_input states and moves the registers to the PS state. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>