summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeonsi/si_state_shaders.c
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: update all GSVS ring descriptors for new buffer allocationsNicolai Hähnle2016-12-151-1/+6
| | | | | | | | | Fixes GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_geometry_instanced. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com> (cherry picked from commit 7b5b3d63c5f33bbd49f4b11c282603baa9371c10)
* radeonsi: remove cb0_is_integer handlingMarek Olšák2016-10-191-4/+2
| | | | | | st/mesa does this for us. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: adjust and clean up Z_ORDER and EXEC_ON_x settingsMarek Olšák2016-10-131-21/+32
| | | | | | | The table was copied from the Vulkan driver. The comment lines are as long as the table for cosmetic reasons. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: disable ReZMarek Olšák2016-10-131-7/+4
| | | | | | | | | This is a serious performance fix. Discovered by luck. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94354 Cc: 12.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: fix R600_DEBUG=precompile for shader-dbMarek Olšák2016-10-121-0/+6
| | | | | | | radeonsi no longer supports pixel shaders without interpolation optimizations, which led to assertion failures in si_shader_ps when running shader-db. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: add assertions to validate interpolation flagsMarek Olšák2016-10-051-0/+34
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: remove unnecessary #includesMarek Olšák2016-10-041-2/+0
| | | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: don't re-create shader PM4 states after scratch buffer updateMarek Olšák2016-10-041-14/+16
| | | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: export SampleMask from pixel shaders at full rateMarek Olšák2016-09-131-4/+3
| | | | | | | Heaven and Valley write gl_SampleMask and not Z. Use 16_ABGR instead of 32_ABGR if Z isn't written. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: also do VS_PARTIAL_FLUSH before updating VGT ring pointersMarek Olšák2016-09-051-0/+6
| | | | | | | ported from Vulkan Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallium/radeon: merge USER_SHADER and INTERNAL_SHADER priority flagsMarek Olšák2016-08-261-6/+6
| | | | | | there's no reason to separate these Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* radeonsi: eliminate PS OUT[1] if dual src blending is off and CB1 is not boundMarek Olšák2016-08-171-0/+7
| | | | | | All VP DX9 ports benefit from this. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: skip unnecessary si_update_shaders callsMarek Olšák2016-08-031-0/+7
| | | | | | Small decrease in draw call overhead. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: pre-generate shader logs for ddebugMarek Olšák2016-07-261-3/+17
| | | | | | | | This cuts down the overhead of si_dump_shader when ddebug is capturing shader logs, which is done for every draw call unconditionally (that's quite a lot of work for a draw call). Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: ensure sample locations are set for line and polygon smoothingNicolai Hähnle2016-07-231-2/+1
| | | | | | | Since commit d938b8c, the sample locations are no longer set unconditionally, so we need to set the atom to dirty on all chips, not just Polaris. Cc: 12.0 <mesa-stable@lists.freedesktop.org>
* gallium/u_queue: add optional cleanup callbackRob Clark2016-07-161-1/+2
| | | | | | | | | | | | Adds a second optional cleanup callback, called after the fence is signaled. This is needed if, for example, the queue has the last reference to the object that embeds the util_queue_fence. In this case we cannot drop the ref in the main callback, since that would result in the fence being destroyed before it is signaled. Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: disable multi-threading when shader dumps are enabledNicolai Hähnle2016-07-081-0/+1
| | | | | | | Otherwise, shader dumps can become interleaved and unusable. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: use multi-threaded compilation in debug contextsNicolai Hähnle2016-07-081-4/+4
| | | | | | | | We only have to stay single-threaded when debug output must be synchronous. This yields better parallelism in shader-db runs for me. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: explicitly choose center locations for 1xAA on PolarisNicolai Hähnle2016-07-081-0/+4
| | | | | | | | | | | | | Unlike SC, the small primitive filter does not automatically use center locations in 1xAA mode, so this is needed to avoid artifacts caused by the small primitive filter discarding triangles that it shouldn't. As a side effect of how the effective number of samples is now calculated, this patch also avoids submitting the sample locations for line/poly smoothing when they're not really needed. Cc: 12.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: do compilation from si_create_shader_selector asynchronouslyMarek Olšák2016-07-051-6/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | Main shader parts and geometry shaders are compiled asynchronously by util_queue. si_create_shader_selector doesn't wait and returns. si_draw_vbo(si_shader_select) waits for completion. This has the best effect when shaders are compiled at app-loading time. It doesn't help much for shaders compiled on demand, even though VS+PS compilation should take as much as time as the bigger one of the two. If an app creates more shaders, at most 4 threads will be used to compile them. Debug output disables this for shader stats to be printed in the correct order. (We could go even further and build variants asynchronously too, then emit draw calls without waiting and emit incomplete shader states, then force IB chaining to give the compiler more time, then sync the compilation at the IB flush and patch the IB with correct shader states. This is great for compilation before draw calls, but there are some difficulties such as scratch and tess states requiring the compiler output, and an on-disk shader cache will likely be a much better and simpler solution.) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: don't lock shader cache mutex during compilationMarek Olšák2016-07-051-6/+16
| | | | | | | | | | to allow multiple shaders to be compiled simultaneously. ALso, shader-db can again use all 4 cores. v2: Remove the pipe_mutex_unlock call in the error path. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
* radeonsi: separate the compilation chunk of si_create_shader_selectorMarek Olšák2016-07-051-80/+102
| | | | | | | The function interface is ready to be used by util_queue. Also, si_shader_select_with_key can no longer accept si_context. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: print LLVM IRs to ddebug logsMarek Olšák2016-07-051-1/+8
| | | | | | | Getting LLVM IRs of hanging shaders have never been easier. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: don't interpolate colors if flatshading is enabledMarek Olšák2016-07-051-0/+1
| | | | | | use v_interp_mov for those Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: enable the barycentric optimization in all casesMarek Olšák2016-07-051-8/+10
| | | | | | | | Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled. This should increase the PS launch rate for big primitives with MSAA. Based on discussion with SPI guys. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: compute only one set of interpolation (i,j) when MSAA is disabledMarek Olšák2016-07-051-0/+13
| | | | | | | This should increase the PS launch rate for shaders using at least 2 pairs of perspective (i,j) and same for linear. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: split ps.prolog.force_persample_interp into persp and linear bitsMarek Olšák2016-07-051-9/+12
| | | | | | This reduces the number of v_mov's in the prolog. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: enable distributed tess on multi-SE parts onlyMarek Olšák2016-06-291-1/+1
| | | | | | | ported from Vulkan Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: set optimal VGT_HS_OFFCHIP_PARAMMarek Olšák2016-06-291-10/+39
| | | | | | | ported from Vulkan Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallium/radeon: use r600_resource_referenceMarek Olšák2016-06-251-3/+1
| | | | | | Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Vedran Miletić <vedran@miletic.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: use trapezoid distribution for tess on Fiji and PolarisNicolai Hähnle2016-06-201-3/+7
| | | | | | | This yields a small performance improvement in Unigine Heaven. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Don't offset OFFCHIP_BUFFERING on pre-VI cards.Bas Nieuwenhuizen2016-05-301-2/+6
| | | | | | Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96239 Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: always reserve output space for tess factorsMarek Olšák2016-05-271-1/+6
| | | | | Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Dave Airlie <airlied@redhat.com>
* radeonsi: Allow TES distribution between shader engines.Bas Nieuwenhuizen2016-05-261-15/+24
| | | | | | | | | | | | | The R_028B50_VGT_TESS_DISTRIBUTION value is copied from amdgpu-pro. Smaller values in the ACCUM fields seem to decrease the performance advantage from this patch, higher values don't seem to matter. v2: Add distribution mode field enums. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Enable dynamic HS.Bas Nieuwenhuizen2016-05-261-1/+1
| | | | | | | | | | This allows running the TES on different CU's than the TCS which results in performance improvements. v2: Only write the control word from one invocation. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Store inputs to memory when not using a TCS.Bas Nieuwenhuizen2016-05-261-0/+3
| | | | | | | | | | | | | | | | | We need to copy the VS outputs to memory. I decided to do this using a shader key, as the value depends on other shaders. I also switch the fixed function TCS over to monolithic, as otherwisze many of the user SGPR's need to be passed to the epilog, which increases register pressure, or complexity to avoid that. The main body of the fixed function TCS is not that interesting to precompile anyway, since we do it on demand and it is very small. v2: Use u_bit_scan64. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Add offchip tessellation parameters.Bas Nieuwenhuizen2016-05-261-0/+9
| | | | | | Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Add buffer for offchip storage between TCS and TES.Bas Nieuwenhuizen2016-05-261-0/+18
| | | | | | | | | | | The buffer is quite large, but should only be allocated if the application uses tessellation. Most non-games don't. v2: - Use the correct register for SI. - Add define for block size. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Change default behaviour for undefined COLOR0Axel Davy2016-05-181-0/+3
| | | | | | | | | d3d 9 needs COLOR0 to be 1.0 on all channels when undefined. 0.0 for the others is fine. GL behaviour is undefined. Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: always allocate export memory for pixel shadersNicolai Hähnle2016-05-091-5/+10
| | | | | | | | Experiments with framebuffer-no-attachments type draw calls have shown that NULL exports stall terribly unless we ensure that export memory is allocated by the SPI. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: fix undefined behavior (memcpy arguments must be non-NULL)Nicolai Hähnle2016-05-071-1/+3
| | | | | Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* gallium: remove helpers converting to/from TGSI_PROCESSOR_*Marek Olšák2016-04-221-1/+1
| | | | Acked-by: Jose Fonseca <jfonseca@vmware.com>
* gallium: use PIPE_SHADER_* everywhere, remove TGSI_PROCESSOR_*Marek Olšák2016-04-221-7/+7
| | | | Acked-by: Jose Fonseca <jfonseca@vmware.com>
* radeonsi: remove the shader parameter from si_set_ring_bufferMarek Olšák2016-04-221-10/+9
| | | | | | | | not used anymore this is a follow-up to the RW buffer cleanup. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* radeonsi: move default tess level constant buffer to RW buffersMarek Olšák2016-04-221-8/+7
| | | | | Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: rename and rearrange RW buffer slotsMarek Olšák2016-04-221-8/+8
| | | | | | | | | - use an enum - use a unique slot number regardless of the shader stage (the per-stage slots will go away for RW buffers) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: Add config parameter to si_shader_apply_scratch_relocs.Bas Nieuwenhuizen2016-04-211-1/+1
| | | | | | | shader->config is not updated for compute kernels. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
* radeonsi: fold num_user_sgprs where it is possibleMarek Olšák2016-04-141-16/+4
| | | | Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* radeonsi: fix SGPRS calculation once moreMarek Olšák2016-04-141-55/+12
| | | | | | | | | | | | | This fixes GS piglit failures after adding SI_PARAM_SHADER_BUFFERS, which bumped NUM_USER_SGPRS and uncovered this bug on SI. If this was fixed in LLVM, these workarounds wouldn't be needed. LLVM would have to look at the calling convention to know how many SGPR inputs are declared, and add VCC and the scratch wave offset (which is enabled even if we spill SGPRs but not VGPRs, oh well). Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* radeonsi: move scissor and viewport states into gallium/radeonMarek Olšák2016-04-121-22/+3
| | | | | | Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Grigori Goronzy <greg@chown.ath.cx> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>