summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeonsi/si_compute.c
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: apply a multi-wave workgroup SPI bug workaround to affected CIK chipsMarek Olšák2016-12-141-0/+1
| | | | | | | | All codepaths are handled except for clover. Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 72d48fcd8eb5862c72d27e5462c289c5de65396e)
* radeonsi: store group_size_variable in struct si_computeNicolai Hähnle2016-11-241-5/+8
| | | | | | | | | | | | | | | For compute shaders, we free the selector after the shader has been compiled, so we need to save this bit somewhere else. Also, make sure that this type of bug cannot re-appear, by NULL-ing the selector pointer after we're done with it. This bug has been there since the feature was added, but was only exposed in piglit arb_compute_variable_group_size-local-size by commit 9bfee7047b70cb0aa026ca9536465762f96cb2b1 (which is totally unrelated). Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com> (cherry picked from commit 42d5e91a2ae235c007c5d17935be9bb1c4ff388e)
* radeonsi: use TC write-back instead of full cache invalidationMarek Olšák2016-10-121-1/+1
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: emit TA_CS_BC_BASE_ADDR on SI only if the kernel allows itMarek Olšák2016-10-111-1/+6
| | | | | | | Reviewed-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: support ARB_compute_variable_group_sizeNicolai Hähnle2016-10-101-1/+9
| | | | | | | | Not sure if it's possible to avoid programming the block size twice (once for the userdata and once for the dispatch). Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: fix texture border colors for compute shadersMarek Olšák2016-10-051-0/+12
| | | | | | | | There are VM faults without this. Cc: 12.0 <mesa-stable@lists.freedesktop.org> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: remove unnecessary #includesMarek Olšák2016-10-041-2/+0
| | | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi/compute: Use the HSA abi for non-TGSI compute shaders v3Tom Stellard2016-09-161-17/+222
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch switches non-TGSI compute shaders over to using the HSA ABI described here: https://github.com/RadeonOpenCompute/ROCm-Docs/blob/master/AMDGPU-ABI.md The HSA ABI provides a much cleaner interface for compute shaders and allows us to share more code in the compiler with the HSA stack. The main changes in this patch are: - We now pass the scratch buffer resource into the shader via user sgprs rather than using relocations. - Grid/Block sizes are now passed to the shader via the dispatch packet rather than at the beginning of the kernel arguments. Typically for HSA, the CP firmware will create the dispatch packet and set up the user sgprs automatically. However, in Mesa we let the driver do this work. The main reason for this is that I haven't researched how to get the CP to do all these things, and I'm not sure if it is supported for all GPUs. v2: - Add comments explaining why we are setting certain bits of the scratch resource descriptor. v3: - Use amdgcn-mesa-mesa3d triple instead of amdgcn--mesa3d. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi/compute: Add some more debug printfsTom Stellard2016-09-161-0/+3
|
* radeonsi: flush TC L2 before using a compute indirect bufferMarek Olšák2016-09-091-2/+10
| | | | | | | There is no known test for this. Cc: 12.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: remove the cache_flush atomMarek Olšák2016-09-091-1/+1
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: don't emit CS_PARTIAL_FLUSH if compute is not usedMarek Olšák2016-09-051-0/+1
| | | | | | | for less noise in the HUD Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallium/radeon: merge USER_SHADER and INTERNAL_SHADER priority flagsMarek Olšák2016-08-261-1/+1
| | | | | | there's no reason to separate these Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* radeonsi: take compute shader and dispatch indirect memory usage into accountMarek Olšák2016-08-061-0/+6
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: set optimal settings in COMPUTE_RESOURCE_LIMITSMarek Olšák2016-07-191-2/+6
| | | | | | ported from Vulkan Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallium/radeon: remove unused code - radeon_llvm_util.*Marek Olšák2016-07-051-1/+0
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallium/radeon: use r600_resource_referenceMarek Olšák2016-06-251-4/+2
| | | | | | Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Vedran Miletić <vedran@miletic.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: fix a compute shader hang with big threadgroups on SI & CIMarek Olšák2016-06-241-0/+18
| | | | | | | ported from Vulkan Cc: 12.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallium/radeon: add driver queries for compute/dma call stats and spillsMarek Olšák2016-06-141-0/+6
| | | | | | also print the average count per frame Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: Fix memory leak in error path.Bas Nieuwenhuizen2016-04-261-0/+2
| | | | | | Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Add config parameter to si_shader_apply_scratch_relocs.Bas Nieuwenhuizen2016-04-211-1/+1
| | | | | | | shader->config is not updated for compute kernels. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
* radeonsi: Consider input SGPR count for compute shader SGPR count.Bas Nieuwenhuizen2016-04-191-5/+11
| | | | | | | | si_shader_create corrects the SGPR count with si_fix_num_sgprs. We then recompute the rsrc1 register to use the new SGPR count. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Add CE synchronization for compute dispatches.Bas Nieuwenhuizen2016-04-191-0/+4
| | | | | Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: do not do two full flushes on every compute dispatchBas Nieuwenhuizen2016-04-191-15/+2
| | | | | | | | | | | | | | | | | | | | v2: Add more CS_PARTIAL_FLUSH events. Essentially every place with waits on finishing for pixel shaders also has a write after read hazard with compute shaders. Invalidating L2 waits implicitly on pixel and compute shaders, so, we don't need a CS_PARTIAL_FLUSH for switching FBO. v3: Add CS_PARTIAL_FLUSH events even if we already have INV_GLOBAL_L2. According to Marek the INV_GLOBAL_L2 events don't wait for compute shaders to finish, so wait for them explicitly. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
* radeonsi: split setting graphics and compute descriptorsBas Nieuwenhuizen2016-04-191-0/+3
| | | | | | Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: split texture decompression for compute shadersBas Nieuwenhuizen2016-04-191-0/+2
| | | | | | Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: update predicate condition for compute dispatchesBas Nieuwenhuizen2016-04-191-0/+6
| | | | | | | Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
* radeonsi: implement TGSI compute dispatchBas Nieuwenhuizen2016-04-191-27/+77
| | | | | | | | | | v2: - Use radeon_set_sh_reg_seq. - Set predicate bit for conditional rendering. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
* radeonsi: only emit compute shader state when switching shadersBas Nieuwenhuizen2016-04-191-59/+86
| | | | | | | | | | | | v2: - Do check if anything changed earlier - Use emitted_program instead of emitted_bo to prevent shaders with shader->bo = NULL confusing the check - Use radeon_set_sh_reg* Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
* radeonsi: rework compute scratch bufferBas Nieuwenhuizen2016-04-191-93/+44
| | | | | | | | | | | | | | | Instead of having a scratch buffer per program, have one per context. Also removed the per kernel wave count calculations, but that only helped if the total number of waves in the dispatch was smaller than sctx->scratch_waves. v2: Fix style issue. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: do per cs setup for compute shaders once per csBas Nieuwenhuizen2016-04-191-32/+45
| | | | | | | | | | | | Also removes PKT3_CONTEXT_CONTROL as that is already being done by si_begin_new_cs, when emitting init_config. v2: - Use radeon_set_sh_reg_seq. - Also set COMPUTE_STATIC_THREAD_MGMT_SE2 / SE3 for CIK+ Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: don't pass scratch buffer to user SGPRsBas Nieuwenhuizen2016-04-191-8/+0
| | | | | | | | As far as I can see we use relocations for clover too. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: split input upload off from si_launch_gridBas Nieuwenhuizen2016-04-191-41/+52
| | | | | | | | | | | | | | | | Also uses a dynamically allocated buffer using u_upload_alloc. The old buffer per program approach required serializing all dispatches of the same program. v2: - Clarified commit message. - Use radeon_set_sh_reg_seq. - Also upload input buffer for clover kernels, even when input_size is 0, as it contains grid parameters. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
* radeonsi: implement TGSI compute shader creationBas Nieuwenhuizen2016-04-191-18/+58
| | | | | | | v2: Moved scratch_enabled initialization after compile. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: allow dumping shader disassemblies to a fileMarek Olšák2016-03-011-1/+1
| | | | Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
* gallium: add a new interface for pipe_context::launch_grid()Samuel Pitoiset2016-02-131-16/+15
| | | | | | | | | | | | | This introduces pipe_grid_info which contains all information to describe a launch_grid call. This will be used to implement indirect compute in the same fashion as indirect draw. Changes from v2: - correctly initialize pipe_grid_info for nv50/nvc0 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* gallium/radeon: drop support for LLVM 3.5Marek Olšák2016-02-111-57/+1
| | | | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> v2: adjust the comment in the amdgpu winsys
* radeonsi: make LLVM IR dumping less messyMarek Olšák2016-02-091-1/+2
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* r600,compute: Plug few memory leaksJan Vesely2016-01-261-3/+0
| | | | | | | | | | | v2: drop inline keyword drop radeon_llvm_dispose_kernel_module wrapper v3: move definitions to .c file use in radeonsi Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
* gallium/radeon: rename max_compute_units -> num_good_compute_unitsMarek Olšák2016-01-221-2/+2
| | | | | | radeon sets this correctly, but not amdgpu Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: adjust the parameters of si_shader_dumpMarek Olšák2016-01-071-4/+2
| | | | | | | The function will be extended to dump all binaries shaders will consist of, so si_shader* makes sense here. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: move si_shader_dump call out of si_compile_llvmMarek Olšák2016-01-071-0/+3
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: inline si_shader_binary_readMarek Olšák2016-01-071-2/+2
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: move si_shader_dump call out of si_shader_binary_readMarek Olšák2016-01-071-3/+5
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: don't pass si_shader to si_compile_llvmMarek Olšák2016-01-071-1/+2
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: move si_shader_binary_upload out of si_compile_llvmMarek Olšák2016-01-071-0/+1
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: don't pass si_shader to si_shader_binary_readMarek Olšák2016-01-071-1/+2
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: don't pass si_shader to si_shader_binary_read_configMarek Olšák2016-01-071-2/+3
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: add struct si_shader_configMarek Olšák2016-01-071-12/+12
| | | | | | | There will be 1 config per variant, which will be a union of configs from {prolog, main, epilog}. For now, just add the structure. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: remove unused parameter from si_shader_binary_read_configMarek Olšák2016-01-031-3/+2
| | | | | Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>