summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_cs.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Fix compute shader crash.Kenneth Graunke2016-11-241-1/+1
| | | | | | | | | Fixes crashes when starting Deus Ex: Mankind Divided. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (cherry picked from commit ca76e6b5213c92432b9f3a641cb26f5861d53e09)
* i965: Eliminate brw->cs.prog_data pointer.Kenneth Graunke2016-10-051-5/+5
| | | | | | | | | | | | Just say no to: - brw->cs.base.prog_data = &brw->cs.prog_data->base.base; We'll just use the brw_stage_prog_data pointer in brw_stage_state and downcast it to brw_cs_prog_data as needed. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
* i965: get rid of duplicated values from gen_device_infoLionel Landwerlin2016-09-231-2/+3
| | | | | | | | Now that we have gen_device_info mutable, we can update its values and drop all copies we had in brw_context. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* intel/i965: make gen_device_info mutableLionel Landwerlin2016-09-231-1/+1
| | | | | | | | | | | | Make gen_device_info a mutable structure so we can update the fields that can be refined by querying the kernel (like subslices and EU numbers). This patch does not make any functional change, it just makes gen_get_device_info() fill a structure rather than returning a const pointer. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Rename intelScreen to screen.Kenneth Graunke2016-09-201-3/+3
| | | | | | | | "intelScreen" is wordy and also doesn't fit our style guidelines. "screen" is shorter, which is nice, because we use it fairly often. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
* intel: s/brw_device_info/gen_device_info/Jason Ekstrand2016-09-031-1/+1
| | | | | | | | | | | | | Generated by: sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* util: Move _mesa_fsl/util_last_bit into util/bitscan.hMathias Fröhlich2016-08-091-1/+1
| | | | | | | | | | | As requested with the initial creation of util/bitscan.h now move other bitscan related functions into util. v2: Split into two patches. Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Tested-by: Brian Paul <brianp@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>
* i965: make more effective use of SamplersUsedTimothy Arceri2016-07-051-2/+1
| | | | Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
* i965: Keep track of the per-thread scratch allocation in brw_stage_state.Francisco Jerez2016-06-131-25/+23
| | | | | | | | | | | | | | | | This will be used to find out what per-thread slot size a previously allocated scratch BO was used with in order to fix a hardware race condition without introducing additional stalls or memory allocations. Instead of calling brw_get_scratch_bo() manually from the various codegen functions, call a new helper function that keeps track of the per-thread scratch size and conditionally allocates a larger scratch BO. v2: Handle BO allocation manually instead of relying on brw_get_scratch_bo (Ken). Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Account for poor address calculations in Haswell CS scratch size.Kenneth Graunke2016-06-121-1/+20
| | | | | | | | | | | | | | | | Curro figured this out by investigating the simulator. Apparently there's also a workaround in the Windows driver. I'm not sure it's actually documented anywhere. We were underallocating the scratch buffer by a factor of 128/70. v2: Rename threads_per_subslice to scratch_ids_per_subslice (suggested by Jordan Justen). Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Allocate scratch space for the maximum number of compute threads.Kenneth Graunke2016-06-121-1/+3
| | | | | | | | | | | | | | | | | | | | We were allocating enough space for the number of threads per subslice, when we should have been allocating space for the number of threads in the entire GPU. Even though we currently run with a reduced thread count (due to a bug), we might still overflow the scratch buffer because the address calculation is based on the FFTID, which can depend on exactly which threads, EUs, and threads are executing. We need to allocate enough for every possible thread that could run. Fixes rendering corruption in Synmark's Gl43CSDof on Gen8+. Earlier platforms need additional bug fixes. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Add uniform for a CS thread local base IDJordan Justen2016-06-011-0/+3
| | | | | | | | | | | | v4: * Force thread_local_id_index to -1 for now, and have fs_visitor::setup_cs_payload look at thread_local_id_index. This enables us to more easily cut over from the old local ID layout to the new layout, as suggested by Jason. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965: Shrink stage_prog_data param array lengthJordan Justen2016-05-291-1/+1
| | | | | | | | | | | | | | It appears we were over-allocating these arrays. Previously we would use nir->num_uniforms directly for scalar programs, and multiply it by 4 for vec4 programs. Instead we should have been dividing by 4 in both cases to convert from bytes to a gl_constant_value count. The size of gl_constant_value is 4 bytes. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* glsl: move to compiler/Emil Velikov2016-01-261-1/+1
| | | | | | Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Matt Turner <mattst88@gmail.com> Acked-by: Jose Fonseca <jfonseca@vmware.com>
* i965: Move brw_cs_fill_local_id_payload() to libi965_compilerKristian Høgsberg Kristensen2015-12-111-36/+0
| | | | | | This is a helper function for setting up the local invocation ID payload according to the cs_prog_data generated by the compiler. It's intended to be available to users of libi965_compiler so move it there.
* i965: Enable shared local memory for CS shared variablesJordan Justen2015-12-091-0/+13
| | | | | | | | | v3: * Check shared variable size at link time Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
* i965: Clean up #includes in the compiler.Matt Turner2015-11-241-0/+1
| | | | Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Compile brw_cs_fill_local_id_payload() as C.Matt Turner2015-11-241-0/+36
| | | | | | | | | | It's only called from C, it compiles as C, so just compile it as C. Notice the missing extern "C" on the definition of the function, which would screw things up if the prototype wasn't parsed before the definition. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Push down inclusion of brw_program.h.Matt Turner2015-11-241-0/+1
| | | | | | | We were including it in headers, which then caused it to be included in tons of places it wasn't needed. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Rename brw_foo_emit to brw_compile_fooJason Ekstrand2015-10-191-3/+3
| | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965/cs: Rework cs_emit to take a nir_shader and a brw_compilerJason Ekstrand2015-10-191-2/+8
| | | | | | | This commit removes all dependence on GL state by getting rid of the brw_context parameter and the GL data structures. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Move brw_get_shader_time_index() call out of emit functionsKristian Høgsberg Kristensen2015-10-081-1/+5
| | | | | | | | | | brw_get_shader_time_index() is all tangled up in brw_context state and we can't call it from the compiler. Thanks the Jasons recent refactoring, we can just get the index and pass to the emit functions instead. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
* i965: Move brw_dump_ir() out of brw_*_emit() functionsKristian Høgsberg Kristensen2015-10-081-0/+3
| | | | | | | We move these calls one level up into the codegen functions. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
* i965: Move prog_data uniform setup to the codegen levelJason Ekstrand2015-10-021-1/+4
| | | | | | | | | | As of now, uniform setup is more-or-less unified between vec4 and fs and no longer requires the fs_visitor. This makes uniform setup more of a language/API thing than a backend compiler thing. This commit moves setting up the stage_prog_data.params arrays to the same place as we set up the rest of stage_prog_data. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Move binding table setup to codegen time.Jason Ekstrand2015-10-021-0/+20
| | | | | | | | | Setting up binding tables really has little to do with the actual process of turning shaders into instructions; it's more part of setting up prog_data. This commit moves it out of the visitors and with the rest of the prog_data setup stuff. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Pull stage_prog_data.nr_params out of the NIR shaderJason Ekstrand2015-10-021-2/+2
| | | | | | | | | | | | | Previously, we had a bunch of code in each stage to figure out how many slots we needed in stage_prog_data.param. This code was mostly identical across the stages and had been copied and pasted around. Unfortunately, this meant that any time you did something special, you had to add code for it to each of these places. In particular, none of the stages took subroutines into account; they were working entirely by accident. By taking this data from the NIR shader, we know the exact number of entries we need and everything goes a bit smoother. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Get rid of prog_data compare functionsJason Ekstrand2015-09-301-21/+0
| | | | | | They are no longer used. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Move perf_debug code to brw_codegen_*_prog()Kristian Høgsberg Kristensen2015-09-141-5/+26
| | | | | | | | | | We're trying to avoid a libdrm dependency in the core compiler, so let's move the perf_debug code one level up from the brw_*_emit() helpers to the brw_codegen_*_prog() helpers. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
* i965: Move compute shader code aroundKristian Høgsberg Kristensen2015-09-141-0/+194
This moves the compute shader code around in order to make the way the code is split up more consistent. There should be no functional changes. Typically we have a few files per stage: brw_vs.c, brw_wm.c brw_gs.c: code to drive code generation and implement precompiling and cache search. genX_<stage>_state.c gen specific implementation of the state emission for the shader stage. The brw_*_emit() functions are all in the same files as the visitor classes they use (with the exception of VS, which may use either vec4 or fs). To make compute follow this convention, we move the brw_cs_emit() function into brw_fs.cpp. We can then rename brw_cs.cpp to brw_cs.c and do this in C like the other similar files. Finally, move state setup and atoms to gen7_cs_state.c. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>