summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_shader.h
Commit message (Collapse)AuthorAgeFilesLines
* i965: Refactor emission of atomic counter operationsIan Romanick2016-10-041-0/+3
| | | | | | | This will make it easier to add more operations. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/vec4: Drop backend_reg::in_range() in favor of regions_overlap().Francisco Jerez2016-09-141-1/+0
| | | | | | | | | | This makes sure that overlap checks are done correctly throughout the back-end when the '*this' register starts before the register/size pair provided as argument, and is actually less annoying to use than in_range() at this point since regions_overlap() takes its size arguments in bytes. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/ir: Drop backend_instruction::regs_written field.Francisco Jerez2016-09-141-1/+0
| | | | Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Replace fs_inst::regs_written with ::size_written field in bytes.Francisco Jerez2016-09-141-0/+1
| | | | | | | | | | | | | | The previous regs_written field can be recovered by rewriting each rvalue reference of regs_written like 'x = i.regs_written' to 'x = DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference like 'i.regs_written = x' to 'i.size_written = x * reg_unit'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/ir: Remove backend_reg::reg_offset.Francisco Jerez2016-09-141-13/+2
| | | | Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Replace fs_reg::reg_offset with fs_reg::offset expressed in bytes.Francisco Jerez2016-09-141-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fs_reg::offset field in byte units introduced in this patch is a more straightforward alternative to the current register offset representation split between fs_reg::reg_offset and ::subreg_offset. The split representation makes it too easy to forget about one of the offsets while dealing with the other, which has led to multiple back-end bugs in the past. To make the matter worse the unit reg_offset was expressed in was rather inconsistent, for uniforms it would be expressed in either 4B or 16B units depending on the back-end, and for most other things it would be expressed in 32B units. This encodes reg_offset as a new offset field expressed consistently in byte units. Each rvalue reference of reg_offset in existing code like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and each lvalue reference like 'r.reg_offset = x' is rewritten to 'r.offset = r.offset % reg_unit + x * reg_unit'. Because the change affects a lot of places and is rather non-trivial to verify due to the inconsistent value of reg_unit, I've tried to avoid making any additional changes other than applying the rewrite rule above in order to keep the patch as simple as possible, sometimes at the cost of introducing obvious stupidity (e.g. algebraic expressions that could be simplified given some knowledge of the context) -- I'll clean those up later on in a second pass. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* intel: s/brw_device_info/gen_device_info/Jason Ekstrand2016-09-031-5/+5
| | | | | | | | | | | | | Generated by: sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Move the type_size function declartaions to brw_nir.hJason Ekstrand2016-08-291-6/+0
| | | | | | Signed-of-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Allocate space in the binding table for non-coherent FB fetch.Francisco Jerez2016-08-251-1/+1
| | | | | | | | | | | | | | Unfortunately due to the inconsistent meaning of some surface state structure fields, we cannot re-use the same binding table entries for sampling from and rendering into the same set of render buffers, so we need to allocate a separate binding table block specifically for render target reads if the non-coherent path is in use. The slight noise is due to the change of brw_assign_common_binding_table_offsets to return the next available binding table index rather than void. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: bring back type_size_vec4_times_4()Timothy Arceri2016-07-211-0/+1
| | | | | | | | We will use this for output varyings. To make component packing simpler we will just treat all varyings as vec4s. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* glsl/mesa: split gl_shader in twoTimothy Arceri2016-06-301-2/+1
| | | | | | | | | | | | | | | | | There are two distinctly different uses of this struct. The first is to store GL shader objects. The second is to store information about a shader stage thats been linked. The two uses actually share few fields and there is clearly confusion about their use. For example the linked shaders map one to one with a program so can simply be destroyed along with the program. However previously we were calling reference counting on the linked shaders. We were also creating linked shaders with a name even though it is always 0 and called the driver version of the _mesa_new_shader() function unnecessarily for GL shader objects. Acked-by: Iago Toral Quiroga <itoral@igalia.com>
* mesa/glsl: stop using GL shader type internallyTimothy Arceri2016-06-161-1/+2
| | | | | | | | | | | | Instead use the internal gl_shader_stage enum everywhere. This makes things more consistent and gets rid of unnecessary conversions. Ideally it would be nice to remove the Type field from gl_shader altogether but currently it is used to differentiate between gl_shader and gl_shader_program in the ShaderObjects hash table. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: remove type_size_vec4_times_4()Timothy Arceri2016-06-151-1/+0
| | | | | | | | type_size_vec4_times_4() was introduced as a fix in 8dcf807cb43383 however since 3810c1561 we can just use type_size_scalar() and get the actual number of outputs we need. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: fix double-precision vertex inputs measurementJuan A. Suarez Romero2016-05-241-0/+1
| | | | | | | | | | | | | | | | | | | For double-precision vertex inputs we need to measure them in dvec4 terms, and for single-precision vertex inputs we need to measure them in vec4 terms. For the later case, we use type_size_vec4() function. For the former case, we had a wrong implementation based on type_size_vec4(). This commit introduces a proper type_size_dvec4() function, that we use to measure vertex inputs. Measuring double-precision vertex inputs as dvec4 is required because ARB_vertex_attrib_64bit states that these uses the same number of locations than the single-precision version. That is, two consecutives dvec4 would be located in location "x" and location "x+1", not "x+2". Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: take care of doubles when lowering VS inputsJuan A. Suarez Romero2016-05-171-0/+1
| | | | | | | Input attributes can require 2 vec4 or 1 vec4 depending on whether they are double-precision or not. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: add brw_imm_dfConnor Abbott2016-05-101-0/+1
| | | | | | | | | | v2 (Iago) - Fixup accessibility in backend_reg Signed-off-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965: Rework passthrough TCS checks.Kenneth Graunke2016-05-051-0/+1
| | | | | | | | | | According to Timothy, using program_string_id == 0 to identify the passthrough TCS is going to be problematic for his shader cache work. So, change it to strcmp() the name at visitor creation time. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Pass devinfo pointer to is_3src() helpers.Francisco Jerez2016-05-031-1/+1
| | | | | | | | | | | | | | This is not strictly required for the following changes because none of the three-source opcodes we support at the moment in the compiler back-end has been removed or redefined, but that's likely to change in the future. In any case having hardware instructions specified as a pair of hardware device and opcode number explicitly in all cases will simplify the opcode look-up interface introduced in a subsequent commit, since the opcode number alone is in general ambiguous. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Pass devinfo pointer to brw_instruction_name().Francisco Jerez2016-05-031-1/+2
| | | | | | | | | | | | A future series will implement support for an instruction that happens to have the same opcode number as another instruction we support already on a disjoint set of hardware generations. In order to disambiguate which instruction it is brw_instruction_name() will need some way to find out which device we are generating code for. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Make a few tessellation related functions non-static.Kenneth Graunke2016-04-251-0/+4
| | | | | | | | Also, move them to brw_shader.cpp so they're in a location for code used by both the vec4 and fs worlds. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
* i965: Remove useless IR self-destruct backend_shader method.Francisco Jerez2016-03-131-1/+0
| | | | | | | | | | | From the point it's constructed the CFG contains the only existing copy of the program IR, and it never becomes invalid. Calling backend_shader::invalidate_cfg would have destroyed the program structure irrecoverably -- We weren't calling it at all for a good reason. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Move brw_compiler_create() to new brw_compiler.c.Matt Turner2016-02-011-3/+0
| | | | | | | A future patch will want to use designated initalizers, which aren't available in C++, but this is C. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Add tessellation control shaders.Kenneth Graunke2015-12-221-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | The TCS is the first tessellation shader stage, and the most complicated. It has access to each of the control points in the input patch, and computes a new output patch. There is one logical invocation per output control point; all invocations run in parallel, and can communicate by reading and writing output variables. One of the main responsibilities of the TCS is to write the special gl_TessLevelOuter[] and gl_TessLevelInner[] output variables which control how much new geometry the hardware tessellation engine will produce. Otherwise, it simply writes outputs that are passed along to the TES. We run in SIMD4x2 mode, handling two logical invocations per EU thread. The hardware doesn't properly manage the dispatch mask for us; it always initializes it to 0xFF. We wrap the whole program in an IF..ENDIF block to handle an odd number of invocations, essentially falling back to SIMD4x1 on the last thread. v2: Update comments (requested by Jordan Justen). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Add tessellation evaluation shadersKenneth Graunke2015-12-221-0/+3
| | | | | | | | | | | | | | | | | | | The TES is essentially a post-tessellator VS, which has access to the entire TCS output patch, and a special gl_TessCoord input. Otherwise, they're very straightforward. This patch implements SIMD8 tessellation evaluation shaders for Gen8+. The tessellator can generate a lot of geometry, so operating in SIMD8 mode (8 vertices per thread) is more efficient than SIMD4x2 mode (only 2 vertices per thread). I have another patch which implements SIMD4x2 mode for older hardware (or via an environment variable override). We currently handle all inputs via the pull model. v2: Improve comments (suggested by Jordan Justen). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Clean up #includes in the compiler.Matt Turner2015-11-241-5/+2
| | | | Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Move brw_new_shader and brw_link_shader prototypes from brw_wm.h.Matt Turner2015-11-241-0/+3
| | | | Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Prevent implicit upcasts to brw_reg.Matt Turner2015-11-241-1/+34
| | | | | | | Now that backend_reg inherits from brw_reg, we have to be careful to avoid the object slicing problem. Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965: Use implicit backend_reg copy-constructor.Matt Turner2015-11-241-1/+1
| | | | | | | | | | | | | | In order to do this, we have to change the signature of the backend_reg(brw_reg) constructor to take a reference to a brw_reg in order to avoid unresolvable ambiguity about which constructor is actually being called in the other modifications in this patch. As far as I understand it, the rule in C++ is that if multiple constructors are available for parent classes, the one closest to you in the class heirarchy is closen, but if one of them didn't take a reference, that screws things up. Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965: Add and use backend_reg::equals().Matt Turner2015-11-241-0/+2
| | | | Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965: Convert scalar_* flags to a scalar_stage array.Kenneth Graunke2015-11-161-2/+0
| | | | | | | | | I was going to add scalar_tcs and scalar_tes flags, and then thought better of it and decided to convert this to an array. Simpler. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Combine register file field.Matt Turner2015-11-131-13/+0
| | | | | | | | The first four values (2-bits) are hardware values, and VGRF, ATTR, and UNIFORM remain values used in the IR. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Replace HW_REG with ARF/FIXED_GRF.Matt Turner2015-11-131-2/+3
| | | | | | | | | | | | | | | | HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to look at different fields for type, abs, negate, writemask, swizzle, and a second file. They also caused annoying problems like immediate sources being considered scheduling barriers (commit 6148e94e2) and other such nonsense. Instead use ARF/FIXED_GRF/MRF for fixed registers in those files. After a sufficient amount of time has passed since "GRF" was used, we can rename FIXED_GRF -> GRF, but doing so now would make rebasing awful. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Rename GRF to VGRF.Matt Turner2015-11-131-2/+2
| | | | | | | | | | The 2-bit hardware register file field is ARF, GRF, MRF, IMM. Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to mean an assigned general purpose register. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Move BAD_FILE from the beginning of enum register_file.Matt Turner2015-11-131-1/+1
| | | | | | | | | I'm going to begin using brw_reg's file field in backend_reg and its derivatives, and in order to keep the hardware value for ARF as 0, we have to do something different. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use brw_reg's nr field to store register number.Matt Turner2015-11-131-9/+0
| | | | | | | | | | | | In addition to combining another field, we get replace silliness like "reg.reg" with something that actually makes sense, "reg.nr"; and no one will ever wonder again why dst.reg isn't a dst_reg. Moving the now 16-bit nr field to a 16-bit boundary decreases code size by about 3k. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Remove fixed_hw_reg field from backend_reg.Matt Turner2015-11-131-2/+3
| | | | | | | | Since backend_reg now inherits brw_reg, we can use it in place of the fixed_hw_reg field. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Delete type field from backend_reg.Matt Turner2015-11-131-1/+0
| | | | | | | | | Switching from an implicitly-sized type field to field with an explicit bit width is safe because we have fewer than 2^4 types, and gcc will warn if you attempt to set a value that will not fit. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Delete abs/negate fields from backend_reg.Matt Turner2015-11-131-3/+0
| | | | | | | | Instead use the ones provided by brw_reg. Also allows us to handle HW_REGs in the negate() functions. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Make backend_reg inherit from brw_reg.Matt Turner2015-11-131-3/+3
| | | | | | | | Some fields (file, type, abs, negate) in brw_reg are shadowed by backend_reg. Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Fix scalar VS float[] and vec2[] output arrays.Kenneth Graunke2015-11-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The scalar VS backend has never handled float[] and vec2[] outputs correctly (my original code was broken). Outputs need to be padded out to vec4 slots. In fs_visitor::nir_setup_outputs(), we tried to process each vec4 slot by looping from 0 to ALIGN(type_size_scalar(type), 4) / 4. However, this is wrong: type_size_scalar() for a float[2] would return 2, or for vec2[2] it would return 4. This looked like a single slot, even though in reality each array element would be stored in separate vec4 slots. Because of this bug, outputs[] and output_components[] would not get initialized for the second element's VARYING_SLOT, which meant emit_urb_writes() would skip writing them. Nothing used those values, and dead code elimination threw a party. To fix this, we introduce a new type_size_vec4_times_4() function which pads array elements correctly, but still counts in scalar components, generating correct indices in store_output intrinsics. Normally, varying packing avoids this problem by turning varyings into vec4s. So this doesn't actually fix any Piglit or dEQP tests today. However, if varying packing is disabled, things would be broken. Tessellation shaders can't use varying packing, so this fixes various tcs-input Piglit tests on a branch of mine. v2: Shorten the implementation of type_size_4x to a single line (caught by Connor Abbott), and rename it to type_size_vec4_times_4() (renaming suggested by Jason Ekstrand). Use type_size_vec4 rather than using type_size_vec4_times_4 and then dividing by 4. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/fs: Disable CSE optimization for untyped & typed surface readsJordan Justen2015-10-221-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | An untyped surface read is volatile because it might be affected by a write. In the ES31-CTS.compute_shader.resources-max test, two back to back read/modify/writes of an SSBO variable looked something like this: r1 = untyped_surface_read(ssbo_float) r2 = r1 + 1 untyped_surface_write(ssbo_float, r2) r3 = untyped_surface_read(ssbo_float) r4 = r3 + 1 untyped_surface_write(ssbo_float, r4) And after CSE, we had: r1 = untyped_surface_read(ssbo_float) r2 = r1 + 1 untyped_surface_write(ssbo_float, r2) r4 = r1 + 1 untyped_surface_write(ssbo_float, r4) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/gs: Do prog_data setup and other calculations in brw_compile_gsJason Ekstrand2015-10-211-0/+12
| | | | | | | | | | | | | | | | This commit moves the large pile of setup calculations we have to do for geometry shaders out of brw_gs_emit and into brw_compile_gs. This has a couple of nice implications. First, it's less work that the caller of brw_compile_gs has to do. Second, it's consistent with the vertex and fragment stages. Finally, it allows us to put brw_gs_compile back behind the API boundary where it belongs. v2 (Jason Ekstrand): - Pull the changes to use nir info into a separate patch - Put brw_gs_compile into brw_shader.h rather than brw_vec4_gs_visitor.h so that we can use it for scalar GS. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Move the entire compiler API into a single fileJason Ekstrand2015-10-191-58/+0
| | | | | | | | | | | At this point, the compiler API has been substantially simplified. In the spirit of Kristian's making a compiler library, this commit makes a single header file that contains, more-or-less, the entire compiler API. There's still a bit of cleanup to do particularly in the area of geometry shaders. However, this gets us much closer to having a separate compiler. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Use a const nir_shader in backend_shaderJason Ekstrand2015-10-191-2/+2
| | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Move brw_link_shader() and friends to new file brw_link.cppKristian Høgsberg Kristensen2015-10-081-0/+2
| | | | | | | | We want to use the rest of brw_shader.cpp with the rest of the compiler without pulling in the GLSL linking code. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
* i965: Generalize predicated break pass for use in vec4 backend.Matt Turner2015-10-051-1/+5
| | | | | | | instructions in affected programs: 44204 -> 43762 (-1.00%) helped: 221 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/shader: Get rid of the shader, prog, and shader_prog fieldsJason Ekstrand2015-10-021-7/+2
| | | | | | | | | | Unfortunately, we can't get rid of them entirely. The FS backend still needs gl_program for handling TEXTURE_RECTANGLE. The GS vec4 backend still needs gl_shader_program for handling transfom feedback. However, the VS needs neither and we can substantially reduce the amount they are used. One day we will be free from their tyranny. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/backend_shader: Add a field to store the NIR shaderJason Ekstrand2015-10-021-0/+1
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/shader: Pull assign_common_binding_table_offsets out of backend_shaderJason Ekstrand2015-10-021-2/+9
| | | | | | | This really has nothing to do with the backend compiler and we'd like to eventually be able to set this up earlier in the compile process. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/shader: Get rid of the setup_vec4_uniform_value helperJason Ekstrand2015-10-021-4/+0
| | | | | | It's not used by anything anymore Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>