summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeonsi/si_shader.c
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: fix an off-by-one error in the bounds check for max_verticesNicolai Hähnle2016-12-151-1/+1
| | | | | | | | | | | | The spec actually says that calling EmitStreamVertex is undefined when you exceed max_vertices. But we do need to avoid trampling over memory outside the GSVS ring. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> (cherry picked from commit 88509518b01d7c1d7436a790bf9be5cf3c41a528)
* radeonsi: do not kill GS with memory writesNicolai Hähnle2016-12-151-8/+22
| | | | | | | | | | | | Vertex emits beyond the specified maximum number of vertices are supposed to have no effect, which is why we used to always kill GS that reached the limit. However, if the GS also writes to memory (SSBO, atomics, shader images), then we must keep going and only skip the vertex emit itself. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com> (cherry picked from commit 7655bccce80c9690ecb850304d15238ef1e0d622)
* radeonsi: wait for outstanding LDS instructions in memory barriers if neededMarek Olšák2016-12-141-1/+17
| | | | | | Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 13c34cf8ca43d0f9c1e1a663e6a3783b0938dfd9)
* radeonsi: wait for outstanding memory instructions in TCS barriersMarek Olšák2016-12-141-1/+5
| | | | | | Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 16f49c16c79a67f174b92672d546f909425f7fc3)
* radeonsi: allow specifying simm16 of emit_waitcnt at call sitesMarek Olšák2016-12-141-5/+7
| | | | | | | | The next commit will use this. Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 15e96c70b0b668a2626326d3572a247e41885c18)
* radeonsi: fix isolines tess factor writes to control ringNicolai Hähnle2016-12-141-4/+12
| | | | | | | | | Fixes piglit arb_tessellation_shader/execution/isoline{_no_tcs}.shader_test. Cc: mesa-stable@lists.freedesktop.org (cherry picked from commit d3931a355fd5d309d5bcfe2655249f029e84d355) [Emil Velikov: there is no si_shader_key::part in branch] Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
* radeonsi: apply a TC L1 write corruption workaround for SIMarek Olšák2016-12-141-11/+23
| | | | | | Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 72e46c98896d0cb13fc7d70b7a4193a84d72a5fc)
* radeonsi: apply a multi-wave workgroup SPI bug workaround to affected CIK chipsMarek Olšák2016-12-141-2/+22
| | | | | | | | All codepaths are handled except for clover. Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 72d48fcd8eb5862c72d27e5462c289c5de65396e)
* radeonsi: consolidate max-work-group-size computationMarek Olšák2016-12-141-24/+19
| | | | | | | | The next commit will need this. Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit ec36c63b4f417973a6d50d79281f4834682c4555)
* radeonsi: fix 64-bit loads from LDSNicolai Hähnle2016-10-241-1/+1
| | | | | | | | | | Fixes spec/arb_tessellation_shader/execution/dvec[23]-vs-tcs-tes, among others. Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> (cherry picked from commit 4a2dbfff05f7be271c2aa72e783e24b31906db51)
* radeonsi: rename prefixes from radeon to siMarek Olšák2016-10-181-46/+46
| | | | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: merge radeon_llvm_context and si_shader_contextMarek Olšák2016-10-181-271/+193
| | | | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: import all TGSI->LLVM code from gallium/radeonMarek Olšák2016-10-181-2/+0
| | | | | | Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: move LLVM ALU codegen into radeonsiMarek Olšák2016-10-181-6/+3
| | | | | | Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: unify the constant load pathsNicolai Hähnle2016-10-171-28/+11
| | | | | | Remove the split between direct and indirect. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: fix indirect loads of 64 bit constantsNicolai Hähnle2016-10-171-2/+2
| | | | | | | This fixes GL45-CTS.compute_shader.fp64-case3. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: shorten "shader->selector" to "sel" in si_shader_createMarek Olšák2016-10-171-7/+8
| | | | | Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: implement TC-compatible HTILEMarek Olšák2016-10-131-2/+16
| | | | | | | | | | | | | | | | | | | | | so that decompress blits aren't needed and depth texturing needs less memory bandwidth. Z16 and Z24 are promoted to Z32_FLOAT by the driver, because TC-compatible HTILE only supports Z32_FLOAT. This doubles memory footprint for Z16. The format promotion is not visible to state trackers. This is part of TC-compatible renderbuffer compression, which has 3 parts: DCC, HTILE, FMASK. Only TC-compatible FMASK compression is missing now. I don't see a measurable increase in performance though. (I tested Talos Principle and DiRT: Showdown, the latter is improved by 0.5%, which is almost noise, and it originally used layered Z16, so at least we know that Z16 promoted to Z32F isn't slower now) Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: fix regression in image atomicsNicolai Hähnle2016-10-131-1/+1
| | | | Caused by a bad rebase when pushing commit 76a940893.
* radeonsi: fix the coordinate overloading of llvm.amdgcn.image.atomic.cmpswap.*Nicolai Hähnle2016-10-131-2/+7
| | | | | | | Fixes GL45-CTS.shader_image_load_store.basic-allTargets-atomic* Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Use the new image load/store intrinsic signaturesTom Stellard2016-10-121-14/+45
| | | | | | This patch requires LLVM r284024 or newer. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: Add function for converting LLVM type to intrinsic stringTom Stellard2016-10-121-10/+32
| | | | | | The existing function only worked for integer types. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: Refactor image store/load intrinsic name creationTom Stellard2016-10-121-11/+18
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: support ARB_compute_variable_group_sizeNicolai Hähnle2016-10-101-14/+30
| | | | | | | | Not sure if it's possible to avoid programming the block size twice (once for the userdata and once for the dispatch). Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: fix interpolateAt opcodes for .zw componentsMarek Olšák2016-10-051-1/+1
| | | | | | | | | | Not returning garbage in .zw seems pretty important. This fixes: GL45-CTS.shader_multisample_interpolation.render.interpolate_at_*_check.* Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: interpolate colors after interpolation weight shufflingMarek Olšák2016-10-051-48/+48
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: optionally run the LLVM IR verifier passNicolai Hähnle2016-10-041-7/+21
| | | | | | | | This is enabled automatically if shader printing is enabled, or separately by R600_DEBUG=checkir. Catch mal-formed IR before it crashes in a later pass. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: don't declare LDS in PS when ds_bpermute is usedMarek Olšák2016-10-041-4/+3
| | | | | | | | I guess this is not needed because dead code elimination removes the declaration. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: use DDX/DDY directly in si_llvm_emit_ddxy_interpMarek Olšák2016-10-041-49/+7
| | | | | | | We can finally do this, because the opcodes are scalar now. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: simplify si_llvm_emit_ddxyMarek Olšák2016-10-041-51/+29
| | | | | | | | si_llvm_emit_ddxy is called once per element, so we don't have to generate code for 4 elements at once. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: don't call build_gep0 in si_llvm_emit_ddxy on VIMarek Olšák2016-10-041-5/+9
| | | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: use a helper function for BuildGEP(0, x)Marek Olšák2016-10-041-47/+35
| | | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: remove obsolete shader definitionsMarek Olšák2016-10-041-12/+4
| | | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: remove unnecessary #includesMarek Olšák2016-10-041-5/+0
| | | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: reload PS inputs with direct indexing at each use (v2)Marek Olšák2016-09-141-16/+11
| | | | | | | | | | | | | | | | | | | | | The LLVM compiler can CSE interp intrinsics thanks to LLVMReadNoneAttribute. 26011 shaders in 14651 tests Totals: SGPRS: 1146340 -> 1132676 (-1.19 %) VGPRS: 727371 -> 711730 (-2.15 %) Spilled SGPRs: 2218 -> 2078 (-6.31 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 35841268 -> 36009732 (0.47 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 222559 -> 224779 (1.00 %) Wait states: 0 -> 0 (0.00 %) v2: don't call load_input for fragment shaders in emit_declaration Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: get rid of constant buffer preloadingMarek Olšák2016-09-141-24/+14
| | | | | | | | | | | | | | | | | 26011 shaders in 14651 tests Totals: SGPRS: 1152636 -> 1146340 (-0.55 %) VGPRS: 728198 -> 727371 (-0.11 %) Spilled SGPRs: 3776 -> 2218 (-41.26 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 35835152 -> 35841268 (0.02 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 222372 -> 222559 (0.08 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: get rid of img/buf/sampler descriptor preloading (v2)Marek Olšák2016-09-141-132/+47
| | | | | | | | | | | | | | | | | | 26011 shaders in 14651 tests Totals: SGPRS: 1251920 -> 1152636 (-7.93 %) VGPRS: 728421 -> 728198 (-0.03 %) Spilled SGPRs: 16644 -> 3776 (-77.31 %) Spilled VGPRs: 369 -> 369 (0.00 %) Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread Code Size: 36001064 -> 35835152 (-0.46 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 222221 -> 222372 (0.07 %) Wait states: 0 -> 0 (0.00 %) v2: merge codepaths where possible Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: rename get_sampler_desc -> load_sampler_descMarek Olšák2016-09-141-11/+11
| | | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: cosmetic changes in si_shader.cMarek Olšák2016-09-141-3/+5
| | | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: load streamout buffer descriptors before use (v2)Marek Olšák2016-09-141-33/+14
| | | | | | v2: inline the code and remove the conditional that's a no-op now Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: fix FP64 UBO loads with indirect uniform block indexingMarek Olšák2016-09-131-2/+1
| | | | | | | No known tests. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: export SampleMask from pixel shaders at full rateMarek Olšák2016-09-131-12/+51
| | | | | | | Heaven and Valley write gl_SampleMask and not Z. Use 16_ABGR instead of 32_ABGR if Z isn't written. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: don't preload constants at the beginning of shadersMarek Olšák2016-09-121-20/+11
| | | | | | | | | | | | | | | | | | | | | | LLVM can CSE the loads, thus we can always re-load constants before each use. The decrease in SGPR spilling is huge. The best improvements are the dumbest ones. 26011 shaders in 14651 tests Totals: SGPRS: 1453346 -> 1251920 (-13.86 %) VGPRS: 742576 -> 728421 (-1.91 %) Spilled SGPRs: 52298 -> 16644 (-68.17 %) Spilled VGPRs: 397 -> 369 (-7.05 %) Scratch VGPRs: 1372 -> 1344 (-2.04 %) dwords per thread Code Size: 36136488 -> 36001064 (-0.37 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 219315 -> 222221 (1.33 %) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: fix Gather4 with integer formatsMarek Olšák2016-09-051-3/+96
| | | | | | | | | | The closed compiler does the same thing. This fixes: GL45-CTS.texture_gather.*-int-* (18 tests) Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: fix a crash in imageSize for cubemap arraysMarek Olšák2016-09-051-3/+1
| | | | | | | | | | | Sometimes it was f32, other times it was i32. Now it's always i32. This fixes: GL45-CTS.texture_cube_map_array.image_texture_size.texture_size_compute_sh Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: fix gl_PatchVerticesIn for tessellation evaluation shaderMarek Olšák2016-09-051-1/+6
| | | | | | | | | This fixes: GL45-CTS.tessellation_shader.tessellation_control_to_tessellation_evaluation .gl_PatchVerticesIn Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: always use the same function signature for llvm.SI.exportMarek Olšák2016-09-051-4/+4
| | | | | Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: Don't use global variables for tess ldsTom Stellard2016-08-291-9/+6
| | | | | | | | | We were allocating global variables for the maximum LDS size which made the compiler think we were using all of LDS, which isn't the case. Reviewed-By: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* gallium/radeon: add radeon_llvm_bound_index for bounds checkingNicolai Hähnle2016-08-171-18/+1
| | | | Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* gallium/radeon: use tgsi_scan_arrays for temp arraysNicolai Hähnle2016-08-171-1/+2
| | | | Reviewed-by: Marek Olšák <marek.olsak@amd.com>