summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeonsi/si_state.c
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: disable RB+ blend optimizations for dual source blendingMarek Olšák2016-12-141-0/+11
| | | | | | | | | This fixes dual source blending on Stoney. The fix was copied from Vulkan. The problem was discovered during internal testing. Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 5e5573b1bf8565f38e9b770b5357d069e80ff00d)
* radeonsi: set CB_BLEND1_CONTROL.ENABLE for dual source blendingMarek Olšák2016-12-141-0/+4
| | | | | | | | copied from Vulkan Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit ff50c44a5fb4411715da828af5b8706c8a456d26)
* radeonsi: always set all blend registersMarek Olšák2016-12-141-5/+5
| | | | | | | | better safe than sorry Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 87b208a54e67b6b01845efa2ec20a96963399920)
* radeonsi: set VGT_GS_ONCHIP_CNTL on CIK and laterMarek Olšák2016-11-011-0/+8
| | | | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Cc: 11.2 12.0 13.0 <mesa-stable@lists.freedesktop.org> (cherry picked from commit e24dc4316487eeaa6ee8aa5c709546d814e96f03)
* radeonsi: remove cb0_is_integer handlingMarek Olšák2016-10-191-8/+1
| | | | | | st/mesa does this for us. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: clear DB_RENDER_OVERRIDEMarek Olšák2016-10-171-3/+1
| | | | | | | Vulkan doesn't set these fields even though it doesn't use HiS. HiS is disabled by programming DB_SRESULTS_COMPARE_STATEn to 0. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: implement TC-compatible HTILEMarek Olšák2016-10-131-4/+35
| | | | | | | | | | | | | | | | | | | | | so that decompress blits aren't needed and depth texturing needs less memory bandwidth. Z16 and Z24 are promoted to Z32_FLOAT by the driver, because TC-compatible HTILE only supports Z32_FLOAT. This doubles memory footprint for Z16. The format promotion is not visible to state trackers. This is part of TC-compatible renderbuffer compression, which has 3 parts: DCC, HTILE, FMASK. Only TC-compatible FMASK compression is missing now. I don't see a measurable increase in performance though. (I tested Talos Principle and DiRT: Showdown, the latter is improved by 0.5%, which is almost noise, and it originally used layered Z16, so at least we know that Z16 promoted to Z32F isn't slower now) Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: use TC write-back instead of full cache invalidationMarek Olšák2016-10-121-9/+3
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: don't invalidate VMEM L1 for memory barriers for index buffersMarek Olšák2016-10-121-3/+4
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: remove unnecessary #includesMarek Olšák2016-10-041-2/+0
| | | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: don't set sampler buffer offsets in create_sampler_viewMarek Olšák2016-10-041-10/+2
| | | | | | | | | do it at bind time, so that pipe_sampler_view is immutable with regard to buffer reallocations and we don't have to remember all existing buffer views. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: track buffer bind historyMarek Olšák2016-10-041-4/+11
| | | | | | | similar to gl_buffer_object::UsageHistory Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: drop support for NULL sampler viewsMarek Olšák2016-10-041-10/+1
| | | | | | | not used anymore. It was used when the polygon stipple texture was constant. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: don't check PIPE_BARRIER_MAPPED_BUFFERMarek Olšák2016-10-041-4/+3
| | | | | | | Caches are always flushed at IB boundary. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: add save_qbo_stateNicolai Hähnle2016-09-291-0/+12
| | | | | | | | Save compute shader state that will be used for the ARB_query_buffer_object implementation. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: remove the cache_flush atomMarek Olšák2016-09-091-1/+0
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallium: remove PIPE_BIND_TRANSFER_READ/WRITEMarek Olšák2016-09-081-5/+0
| | | | | | | | not used in any useful way Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* gallium/radeon: remove VPORT_ZMIN/ZMAX from init config statesMarek Olšák2016-09-051-6/+0
| | | | | | | It's part of the viewport state now. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallium/radeon: set VPORT_ZMIN/MAX registers correctlyMarek Olšák2016-09-051-1/+2
| | | | | | | | | | | | Calculate depth ranges from viewport states and pipe_rasterizer_state::clip_halfz. The evergreend.h change is required to silence a warning. This fixes this recently updated piglit: arb_depth_clamp/depth-clamp-range Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: fix texture format reinterpretation with DCCMarek Olšák2016-09-051-0/+4
| | | | | | | | | | | | DCC is limited in how texture formats can be reinterpreted using texture views. If we get a view format that is incompatible with the initial texture format with respect to DCC, disable DCC. There is a new piglit which tests all format combinations. What works and what doesn't was deduced by looking at the piglit failures. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: fix cubemaps viewed as 2DMarek Olšák2016-09-051-0/+4
| | | | | | | | | | | This fixes: GL43-CTS.texture_view.view_sampling v2: fix a typo, merge both if statements Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Dave Airlie <airlied@redhat.com> (v1) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: set more sampler settingsMarek Olšák2016-09-051-2/+6
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: add support for cull distances. (v1.1)Dave Airlie2016-08-301-3/+4
| | | | | | | | | | This should be all that is required for cull distances to work on radeonsi. v1.1: whitespace cleanup, add docs fix clipdist_mask usage. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
* radeonsi: fix up buffer descriptor upper-bound checkingMarek Olšák2016-08-171-1/+1
| | | | | | st/mesa does this too, so we're safe. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallium: change pipe_sampler_view::first_element/last_element -> offset/sizeMarek Olšák2016-08-171-5/+5
| | | | | | | | | | | This is required by OpenGL. Our hardware supports this. Example: Bind RGBA32F with offset = 4 bytes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97305 Acked-by: Ilia Mirkin <imirkin@alum.mit.edu> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: simplify CB_TARGET_MASK logicMarek Olšák2016-08-171-14/+7
| | | | | | we can now rely on CB_COLORn_INFO to disable empty slots. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: don't set CB_COLOR1_INFO for dual src blendingMarek Olšák2016-08-171-7/+0
| | | | | | | | | Vulkan doesn't do this. The reason may be that CB_COLOR1_INFO.SOURCE_FORMAT from NI was moved to SPI_SHADER_COL_FORMAT for SI. I asked CB guys about this 2 days ago and they still haven't replied. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: eliminate PS OUT[1] if dual src blending is off and CB1 is not boundMarek Olšák2016-08-171-11/+0
| | | | | | All VP DX9 ports benefit from this. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: set CB_COLORn_INFO.ROUND_MODEMarek Olšák2016-08-101-0/+5
| | | | | | | just do what the register spec says Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: set CB_COLORn_INFO.SIMPLE_FLOATMarek Olšák2016-08-101-0/+1
| | | | | | | | This can help enable some blend optimizations (see the register spec). Vulkan always sets this. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: disallow MIN/MAX blend equations for dual source blendingMarek Olšák2016-08-101-0/+10
| | | | | Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: only set dual source blending for MRT0Marek Olšák2016-08-101-0/+4
| | | | | | | | | | | | | | This is the proper fix for Overlord and Witcher 2 hangs. The hang condition is that 1 app must write to MRT0 and MRT1 from a pixel shader while MRT1 is disabled in CB_TARGET_MASK (does this generate unflushable pixel quads? I don't know), and another app (e.g. Glamor) must enable dual source blending in both MRT0 and MRT1. The hw gets confused, which leads to corruption and hangs. Cc: 12.0 11.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: skip unnecessary si_update_shaders callsMarek Olšák2016-08-031-0/+6
| | | | | | Small decrease in draw call overhead. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: fix Polaris MSAA regressionNicolai Hähnle2016-07-231-15/+19
| | | | | | | | | | | | | The regression was introduced by commit d938b8c. The problem here is that in order to use the small primitive filter, we need to explicitly set the sample locations to 0. But the DB doesn't properly process the change of sample locations without a flush, and so we can end up with incorrect Z values. Instead of doing a flush, just disable the small primitive filter when MSAA is force-disabled. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96908 Cc: 12.0 <mesa-stable@lists.freedesktop.org>
* radeonsi: silence Coverity warningNicolai Hähnle2016-07-131-0/+2
| | | | | | | | Coverity's analysis is too weak to understand that r600_init_flushed_depth(_, _, NULL) only returns true when flushed_depth_texture was assigned a non-NULL value. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: fix bad assertion in si_emit_sample_maskNicolai Hähnle2016-07-091-1/+2
| | | | | | | The blitter sets mask == 1, which is fine since it doesn't use smoothing. Fixes a regression introduced in commit 5bcfbf91. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: catch a potential state tracker error with non-MSAA FBsNicolai Hähnle2016-07-081-0/+6
| | | | | | At least st/mesa ensures this, so I'd rather not handle deviations in radeonsi. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: explicitly choose center locations for 1xAA on PolarisNicolai Hähnle2016-07-081-16/+29
| | | | | | | | | | | | | Unlike SC, the small primitive filter does not automatically use center locations in 1xAA mode, so this is needed to avoid artifacts caused by the small primitive filter discarding triangles that it shouldn't. As a side effect of how the effective number of samples is now calculated, this patch also avoids submitting the sample locations for line/poly smoothing when they're not really needed. Cc: 12.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: sample from flushed depth texture when requiredNicolai Hähnle2016-07-061-0/+19
| | | | | | | Note that this has no effect yet. A case where can_sample_z/s can be false in radeonsi will be added in a later patch. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* gallium/radeon: replace is_flushing_texture with db_compatibleNicolai Hähnle2016-07-061-2/+4
| | | | | | | | | | | This is a left-over of when I considered generalizing the separate stencil support. I do prefer the new name since it emphasizes what flushing vs. non-flushing means from a functional point-of-view, namely special handling of the texture format. v2: adjust r600_init_color_surface as well Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* gallium/radeon: add a heuristic enabling DCC for scanout surfaces (v2)Marek Olšák2016-06-291-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DCC for displayable surfaces is allocated in a separate buffer and is enabled or disabled based on PS invocations from 2 frames ago (to let queries go idle) and the number of slow clears from the current frame. At least an equivalent of 5 fullscreen draws or slow clears must be done to enable DCC. (PS invocations / (width * height) + num_slow_clears >= 5) Pipeline statistic queries are always active if a color buffer that can have separate DCC is bound, even if separate DCC is disabled. That means the window color buffer is always monitored and DCC is enabled only when the situation is right. The tracking of per-texture queries in r600_common_context is quite ugly, but I don't see a better way. The first fast clear always enables DCC. DCC decompression can disable it. A later fast clear can enable it again. Enable/disable typically happens only once per frame. The impact is expected to be negligible because games usually don't have a high level of overdraw. DCC usually activates when too much blending is happening (smoke rendering) or when testing glClear performance and CMASK isn't supported (Stoney). v2: rename stuff, add assertions Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallium/radeon: add state setup for a separate DCC bufferMarek Olšák2016-06-291-1/+8
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: unreference framebuffer state with set_framebuffer_stateMarek Olšák2016-06-291-1/+1
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: don't advertise multisample shader imagesMarek Olšák2016-06-291-0/+3
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: enable CU0 in each SE for LS-HS executionMarek Olšák2016-06-291-2/+1
| | | | | | | Offchip-only tessellation allows this. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: use conformant line rasterizationMarek Olšák2016-06-291-1/+9
| | | | | | | | | | AA lines are not completely correct (see TODO), but everything else should be. + 3 linestipple piglits Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: set PA_SU_SMALL_PRIM_FILTER_CNTL register on PolarisMarek Olšák2016-06-281-0/+5
| | | | | | | This was missing. Cc: 12.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
* radeonsi: make si_is_format_supported staticMarek Olšák2016-06-251-5/+6
| | | | | | Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Vedran Miletić <vedran@miletic.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: boolean -> bool, TRUE -> true, FALSE -> falseMarek Olšák2016-06-251-10/+10
| | | | | | Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Vedran Miletić <vedran@miletic.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: Implement POLYGON_OFFSET_UNITS_UNSCALEDAxel Davy2016-06-251-14/+18
| | | | | | | | | | | | | | | | Empirical tests show that the polygon offset behaviour is entirely determined by the content of the PA_SU_POLY_OFFSET states, and not by the depth buffer format bound. PA_SU_POLY_OFFSET seems to directly set the parameters of the polygon offset formula, and setting 0 for PA_SU_POLY_OFFSET_DB_FMT_CNTL (ie setting the unorm depth bias behaviour with a scale of 2^0 = 1.0f) gives the unscaled behaviour. Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Marek Olšák <marek.olsak@amd.com>