summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeonsi/si_shader.h
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: apply a multi-wave workgroup SPI bug workaround to affected CIK chipsMarek Olšák2016-12-141-0/+2
| | | | | | | | All codepaths are handled except for clover. Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 72d48fcd8eb5862c72d27e5462c289c5de65396e)
* radeonsi: adjust and clean up Z_ORDER and EXEC_ON_x settingsMarek Olšák2016-10-131-1/+0
| | | | | | | The table was copied from the Vulkan driver. The comment lines are as long as the table for cosmetic reasons. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: support ARB_compute_variable_group_sizeNicolai Hähnle2016-10-101-1/+3
| | | | | | | | Not sure if it's possible to avoid programming the block size twice (once for the userdata and once for the dispatch). Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: clean up lucky #include dependenciesMarek Olšák2016-10-041-32/+2
| | | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: export SampleMask from pixel shaders at full rateMarek Olšák2016-09-131-0/+2
| | | | | | | Heaven and Valley write gl_SampleMask and not Z. Use 16_ABGR instead of 32_ABGR if Z isn't written. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: add DRAWID parameter to vertex shadersNicolai Hähnle2016-08-091-1/+3
| | | | | Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: pre-generate shader logs for ddebugMarek Olšák2016-07-261-0/+7
| | | | | | | | This cuts down the overhead of si_dump_shader when ddebug is capturing shader logs, which is done for every draw call unconditionally (that's quite a lot of work for a draw call). Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: move the shader key dumping to si_shader_dumpMarek Olšák2016-07-261-1/+0
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: report accurate SGPR and VGPR spillsMarek Olšák2016-07-131-0/+2
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: do compilation from si_create_shader_selector asynchronouslyMarek Olšák2016-07-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Main shader parts and geometry shaders are compiled asynchronously by util_queue. si_create_shader_selector doesn't wait and returns. si_draw_vbo(si_shader_select) waits for completion. This has the best effect when shaders are compiled at app-loading time. It doesn't help much for shaders compiled on demand, even though VS+PS compilation should take as much as time as the bigger one of the two. If an app creates more shaders, at most 4 threads will be used to compile them. Debug output disables this for shader stats to be printed in the correct order. (We could go even further and build variants asynchronously too, then emit draw calls without waiting and emit incomplete shader states, then force IB chaining to give the compiler more time, then sync the compilation at the IB flush and patch the IB with correct shader states. This is great for compilation before draw calls, but there are some difficulties such as scratch and tess states requiring the compiler output, and an on-disk shader cache will likely be a much better and simpler solution.) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: separate the compilation chunk of si_create_shader_selectorMarek Olšák2016-07-051-0/+7
| | | | | | | The function interface is ready to be used by util_queue. Also, si_shader_select_with_key can no longer accept si_context. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: don't interpolate colors if flatshading is enabledMarek Olšák2016-07-051-1/+1
| | | | | | use v_interp_mov for those Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: enable the barycentric optimization in all casesMarek Olšák2016-07-051-5/+2
| | | | | | | | Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled. This should increase the PS launch rate for big primitives with MSAA. Based on discussion with SPI guys. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: compute only one set of interpolation (i,j) when MSAA is disabledMarek Olšák2016-07-051-2/+2
| | | | | | | This should increase the PS launch rate for shaders using at least 2 pairs of perspective (i,j) and same for linear. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: split ps.prolog.force_persample_interp into persp and linear bitsMarek Olšák2016-07-051-1/+2
| | | | | | This reduces the number of v_mov's in the prolog. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: enable WQM in PS prolog when neededNicolai Hähnle2016-06-071-0/+1
| | | | | | | | | | | | WQM is needed when the PS prolog computes a VGPR that is consumed by a shader with (implicit or explicit) derivatives. Depends on http://reviews.llvm.org/D20839 / LLVM r272063 for this to be effective (otherwise it's just a no-op). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130 Cc: 12.0 <mesa-dev@lists.freedesktop.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Remove LDS layout user SGPR's from TES.Bas Nieuwenhuizen2016-05-261-7/+8
| | | | | | | | They are unused. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Store inputs to memory when not using a TCS.Bas Nieuwenhuizen2016-05-261-0/+1
| | | | | | | | | | | | | | | | | We need to copy the VS outputs to memory. I decided to do this using a shader key, as the value depends on other shaders. I also switch the fixed function TCS over to monolithic, as otherwisze many of the user SGPR's need to be passed to the epilog, which increases register pressure, or complexity to avoid that. The main body of the fixed function TCS is not that interesting to precompile anyway, since we do it on demand and it is very small. v2: Use u_bit_scan64. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Add user SGPR for the layout of the offchip buffer.Bas Nieuwenhuizen2016-05-261-2/+10
| | | | | | Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Use correct parameter index for LS_OUT_LAYOUT.Bas Nieuwenhuizen2016-05-261-3/+4
| | | | | | | | | This happens to be in the right position, but that changes when TCS/TES get new parameters. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Add offchip tessellation parameters.Bas Nieuwenhuizen2016-05-261-1/+2
| | | | | | Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: decrease GS copy shader user SGPRs to 2Marek Olšák2016-04-221-1/+1
| | | | | | | | const buffers are no longer used since the clip plane const buffer was moved to RW buffers Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: move default tess level constant buffer to RW buffersMarek Olšák2016-04-221-0/+6
| | | | | Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: Add config parameter to si_shader_apply_scratch_relocs.Bas Nieuwenhuizen2016-04-211-0/+1
| | | | | | | shader->config is not updated for compute kernels. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
* radeonsi: add shared memoryBas Nieuwenhuizen2016-04-191-0/+3
| | | | | | | | | | | | | | | Declares the shared memory as a global variable so that LLVM is aware of it and it does not conflict with passes like AMDGPUPromoteAlloca. v2: - Use ctx->i8. - Dropped null-check for declare_memory_region. - Changed memory region array to single region. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
* radeonsi: lower compute shader argumentsBas Nieuwenhuizen2016-04-191-0/+9
| | | | | | Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: use enums in si_shader.hMarek Olšák2016-04-181-93/+119
| | | | | Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: implement set_shader_buffersNicolai Hähnle2016-04-121-56/+58
| | | | | Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
* radeonsi: implement set_shader_images (v2)Nicolai Hähnle2016-03-211-2/+2
| | | | | | | | | Whether DCC is disabled depends on the access flags with which the image is bound: image_load supports DCC, but store and atomic don't. v2: remove an unnecessary masking of images->desc.enabled_mask Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: allow dumping shader disassemblies to a fileMarek Olšák2016-03-011-1/+2
| | | | Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
* radeonsi: use re-ZMarek Olšák2016-03-011-0/+1
| | | | | | | | | This can increase perf for shaders that kill pixels (kill, alpha-test, alpha-to-coverage). v2: add comments Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
* radeonsi: implement binary shaders & shader cache in memory (v2)Marek Olšák2016-02-211-1/+3
| | | | | | | v2: handle _mesa_hash_table_insert failure other cosmetic changes Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: move some struct si_shader members to new struct si_shader_infoMarek Olšák2016-02-211-9/+12
| | | | | | This will be part of shader binaries. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: use smaller types for some si_shader membersMarek Olšák2016-02-211-3/+5
| | | | | | | | in order to decrease the shader size for a shader cache. v2: add & use SI_MAX_VS_OUTPUTS Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: compile non-GS middle parts of shaders immediately if enabledMarek Olšák2016-02-211-0/+11
| | | | | | | | | | | | | | Still disabled. Only prologs & epilogs are compiled in draw calls, but each variant of those is compiled only once per process. VS is always compiled as hw VS. TES is always compiled as hw VS. LS and ES stages are always compiled on demand. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: add PS prologMarek Olšák2016-02-211-1/+13
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: add PS epilogMarek Olšák2016-02-211-0/+7
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: add TCS epilogMarek Olšák2016-02-211-0/+3
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: add VS epilogMarek Olšák2016-02-211-0/+4
| | | | | | | | | It only exports the primitive ID. Also used by TES when it's compiled as VS. The VS input location of the primitive ID input is v2. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: add VS prologMarek Olšák2016-02-211-0/+9
| | | | | | This is disabled with use_monolithic_shaders = true. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: first bits for non-monolithic shadersMarek Olšák2016-02-211-1/+1
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: add code for combining and uploading shaders from 3 shader partsMarek Olšák2016-02-211-0/+9
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: separate out shader key bits for prologs & epilogsMarek Olšák2016-02-211-22/+55
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: compute how many input VGPRs fragment shaders haveMarek Olšák2016-02-211-0/+2
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: compute how many input SGPRs and VGPRs shaders haveMarek Olšák2016-02-211-0/+2
| | | | | | | Prologs (shader binaries inserted before the API shader binary) need to know this, so that they won't change the input registers unintentionally. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: put image, fmask, and sampler descriptors into one arrayMarek Olšák2016-02-101-4/+4
| | | | | | | | | | | | | | | | | | | The texture slot is expanded to 16 dwords containing 2 descriptors. Those can be: - Image and fmask, or - Image and sampler state By carefully choosing the locations, we can put all three into one slot, with the fmask and sampler state being mutually exclusive. This improves shaders in 2 ways: - 2 user SGPRs are unused, shaders can use them as temporary registers now - each pair of descriptors is always on the same cache line v2: cosmetic changes: add back v8i32, don't load a sampler state & fmask at the same time Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: make LLVM IR dumping less messyMarek Olšák2016-02-091-1/+2
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: remove useless code that handles dx10_clamp_modeMarek Olšák2016-02-091-1/+0
| | | | | | | "enable-no-nans-fp-math" is a wrong string and there was a disagreement about fixing it. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: read SPI_PS_INPUT_ADDR from LLVM if it returns itMarek Olšák2016-02-091-0/+1
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: implement forcing per-sample_interpolation using the shader key onlyMarek Olšák2016-02-091-31/+19
| | | | | | | | | | | It was partly a state and partly emulated by shader code, but since we want to do this in a fragment shader prolog, we need to put it into the shader key, which will be used to generate the prolog. This also removes the spi_ps_input states and moves the registers to the PS state. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>