summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeonsi/si_state_draw.c
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: add a tess+GS hang workaround for VI dGPUsMarek Olšák2016-12-141-2/+10
| | | | | | | | ported from Vulkan Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit a816c7fe07bf16325c11bc692486ffb6d1e8b670)
* radeonsi: apply a tessellation bug workaround for SIMarek Olšák2016-12-141-0/+6
| | | | | | | | | | | Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 78c4528ae7709fbe94d917d034cfd60535b5dcf3) [Emil Velikov: resolve trivial conflict] Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Conflicts: src/gallium/drivers/radeonsi/si_state_draw.c
* radeonsi: apply a multi-wave workgroup SPI bug workaround to affected CIK chipsMarek Olšák2016-12-141-2/+4
| | | | | | | | All codepaths are handled except for clover. Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 72d48fcd8eb5862c72d27e5462c289c5de65396e)
* radeonsi: implement TC-compatible HTILEMarek Olšák2016-10-131-1/+2
| | | | | | | | | | | | | | | | | | | | | so that decompress blits aren't needed and depth texturing needs less memory bandwidth. Z16 and Z24 are promoted to Z32_FLOAT by the driver, because TC-compatible HTILE only supports Z32_FLOAT. This doubles memory footprint for Z16. The format promotion is not visible to state trackers. This is part of TC-compatible renderbuffer compression, which has 3 parts: DCC, HTILE, FMASK. Only TC-compatible FMASK compression is missing now. I don't see a measurable increase in performance though. (I tested Talos Principle and DiRT: Showdown, the latter is improved by 0.5%, which is almost noise, and it originally used layered Z16, so at least we know that Z16 promoted to Z32F isn't slower now) Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: use TC write-back instead of full cache invalidationMarek Olšák2016-10-121-3/+3
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: implement TC L2 write-back (flush) without cache invalidationMarek Olšák2016-10-121-19/+62
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: remove unnecessary #includesMarek Olšák2016-10-041-2/+0
| | | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: separate IA_MULTI_VGT_PARAM and VGT_PRIMITIVE_TYPE emissionMarek Olšák2016-10-041-7/+10
| | | | | | | We want to emit IA_MULTI_VGT_PARAM less often because it's a context reg. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: move VGT_LS_HS_CONFIG to derived tess_stateMarek Olšák2016-10-041-26/+14
| | | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: Fix primitive restart when index changesJames Legg2016-10-041-7/+7
| | | | | | | | | | | If primitive restart is enabled for two consecutive draws which use different primitive restart indices, then the first draw's primitive restart index was incorrectly used for the second draw. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98025 Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: fix the VGT performance tweak for small instancesMarek Olšák2016-09-091-5/+6
| | | | | | | | Based on the VGT spec. The Vulkan driver doesn't do it optimally and they plan to fix it. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: remove the cache_flush atomMarek Olšák2016-09-091-5/+5
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: skip redundant INDEX_TYPE writesMarek Olšák2016-09-071-20/+30
| | | | | | Ported from Vulkan. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: add more unlikely() uses into si_draw_vboMarek Olšák2016-09-071-5/+5
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: skip draws with instance_count == 0Marek Olšák2016-09-071-3/+13
| | | | | | loosely ported from Vulkan Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: fix variable naming in si_emit_cache_flushMarek Olšák2016-09-051-31/+31
| | | | | | Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: don't emit CS_PARTIAL_FLUSH if compute is not usedMarek Olšák2016-09-051-1/+3
| | | | | | | for less noise in the HUD Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: add HUD queries for counting VS/PS/CS partial flushesMarek Olšák2016-09-051-0/+8
| | | | | | Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: fix a badly implemented GS bug workaroundMarek Olšák2016-09-051-8/+13
| | | | | | | Limit it to geometry shaders and Hawaii. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: program additional multi draw parametersNicolai Hähnle2016-08-091-5/+25
| | | | | Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: program the DRAWID SGPRNicolai Hähnle2016-08-091-2/+6
| | | | | | | | | | Note that for indirect draws, the new MULTI firmware packets are required. There's also no need to reset last_{start_instance,sh_base_reg}, since resetting last_base_vertex is sufficient. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: remove an incorrect assertionNicolai Hähnle2016-08-091-2/+0
| | | | | | | | | Byte indices don't need any alignment, so remove this assertion (it got moved into a path where a piglit test hit it during the refactoring of commit 64ff23a58c). Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: flush TC L2 cache for indirect draw dataNicolai Hähnle2016-08-091-0/+5
| | | | | | | | | This fixes a bug when indirect draw data is generated by transform feedback. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: add has_draw_indirect_multi flagNicolai Hähnle2016-08-081-1/+1
| | | | | | | | | | Prefer to use DRAW_(INDEX)_INDIRECT_MULTI when available in the firmware. Versions for SI and CI already added as provided by the firmware team, but keep in mind that they won't currently be used since the radeon kernel module has no interface to query the firmware version. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: transpose indirect/index draw dispatchNicolai Hähnle2016-08-081-45/+31
| | | | | | This allows better code sharing for indirect draw calls. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: move index buffer calculations in si_emit_draw_packets upNicolai Hähnle2016-08-081-9/+12
| | | | Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: unify emitting PKT3_SET_BASE for indirect drawsNicolai Hähnle2016-08-081-16/+9
| | | | Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: deal with high vertex buffer memory usage correctlyMarek Olšák2016-08-061-0/+7
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: take scratch buffer and draw indirect memory usage into accountMarek Olšák2016-08-061-0/+6
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: skip unnecessary si_update_shaders callsMarek Olšák2016-08-031-7/+13
| | | | | | Small decrease in draw call overhead. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: remove the DRAW_PREAMBLE packetNicolai Hähnle2016-07-161-6/+1
| | | | | | | According to firmware guys, the new sequence that we added for Polaris should work on all CIK parts, and should actually be faster on some parts. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* gallium/radeon: add a heuristic enabling DCC for scanout surfaces (v2)Marek Olšák2016-06-291-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DCC for displayable surfaces is allocated in a separate buffer and is enabled or disabled based on PS invocations from 2 frames ago (to let queries go idle) and the number of slow clears from the current frame. At least an equivalent of 5 fullscreen draws or slow clears must be done to enable DCC. (PS invocations / (width * height) + num_slow_clears >= 5) Pipeline statistic queries are always active if a color buffer that can have separate DCC is bound, even if separate DCC is disabled. That means the window color buffer is always monitored and DCC is enabled only when the situation is right. The tracking of per-texture queries in r600_common_context is quite ugly, but I don't see a better way. The first fast clear always enables DCC. DCC decompression can disable it. A later fast clear can enable it again. Enable/disable typically happens only once per frame. The impact is expected to be negligible because games usually don't have a high level of overdraw. DCC usually activates when too much blending is happening (smoke rendering) or when testing glClear performance and CMASK isn't supported (Stoney). v2: rename stuff, add assertions Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: enable distributed tess on multi-SE parts onlyMarek Olšák2016-06-291-1/+1
| | | | | | | ported from Vulkan Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: set optimal VGT_HS_OFFCHIP_PARAMMarek Olšák2016-06-291-2/+3
| | | | | | | ported from Vulkan Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: use conformant line rasterizationMarek Olšák2016-06-291-2/+4
| | | | | | | | | | AA lines are not completely correct (see TODO), but everything else should be. + 3 linestipple piglits Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: use optimal WD settings for primitive restart on PolarisMarek Olšák2016-06-271-2/+10
| | | | | | ported from Vulkan Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: fix fractional odd tessellation spacing for PolarisMarek Olšák2016-06-241-0/+19
| | | | | | | ported from Vulkan (and no source explains why this is needed) Cc: 12.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: drop the DRAW_PREAMBLE packet on PolarisNicolai Hähnle2016-06-241-1/+6
| | | | | | | | It will be removed from the firmware for the Polaris. Cc: 12.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: use DRAW_(INDEX_)INDIRECT_MULTI on PolarisNicolai Hähnle2016-06-241-10/+36
| | | | | | | | The non-MULTI variants will be removed in Polaris firmware. Cc: 12.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* gallium/radeon: add driver queries for compute/dma call stats and spillsMarek Olšák2016-06-141-0/+2
| | | | | | also print the average count per frame Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: add a performance tweak for 4 SE partsMarek Olšák2016-06-061-0/+11
| | | | | | Ported from Vulkan. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: simplify PRIMGROUP_SIZE computation for tessellationMarek Olšák2016-06-061-9/+1
| | | | | | | | Ported from Vulkan. v2: keep the comment Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: implement global resetting of texture descriptorsMarek Olšák2016-06-011-1/+8
| | | | | | | it will be used by texture reallocation Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* radeonsi: Allow TES distribution between shader engines.Bas Nieuwenhuizen2016-05-261-0/+8
| | | | | | | | | | | | | The R_028B50_VGT_TESS_DISTRIBUTION value is copied from amdgpu-pro. Smaller values in the ACCUM fields seem to decrease the performance advantage from this patch, higher values don't seem to matter. v2: Add distribution mode field enums. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Process multiple patches per threadgroup.Bas Nieuwenhuizen2016-05-261-15/+35
| | | | | | | | | | | | | | | | | | | | | | | Using more than 1 wave per threadgroup does increase performance generally. Not using too many patches per threadgroup also increases performance. Both catalyst and amdgpu-pro seem to use 40 patches as their maximum, but I haven't really seen any performance increase from limiting the number of patches to 40 instead of 64. Note that the trick where we overlap the input and output LDS does not work anymore as the insertion of the tess factors changes the patch stride. v2: - Add comment about LDS assumptions. - Add constant for buffer size. - Fix code style. v3: - Correct limits for not splitting patches between waves. - Set max num_patches to 40 as in the proprietary driver. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Remove LDS layout user SGPR's from TES.Bas Nieuwenhuizen2016-05-261-3/+1
| | | | | | | | They are unused. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Add user SGPR for the layout of the offchip buffer.Bas Nieuwenhuizen2016-05-261-2/+7
| | | | | | Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* Treewide: Remove Elements() macroJan Vesely2016-05-171-2/+2
| | | | | Signed-off-by: Jan Vesely <jano.vesely@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>
* radeonsi: fix missing include for Elements.Dave Airlie2016-04-261-0/+1
| | | | | | Since u_blitter.h no longer defines this. Signed-off-by: Dave Airlie <airlied@redhat.com>
* radeonsi: Add CE synchronization for compute dispatches.Bas Nieuwenhuizen2016-04-191-2/+2
| | | | | Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>