| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
This will make it easier to add more operations.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
|
| |
This makes sure that overlap checks are done correctly throughout the
back-end when the '*this' register starts before the register/size
pair provided as argument, and is actually less annoying to use than
in_range() at this point since regions_overlap() takes its size
arguments in bytes.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
| |
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous regs_written field can be recovered by rewriting each
rvalue reference of regs_written like 'x = i.regs_written' to 'x =
DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference
like 'i.regs_written = x' to 'i.size_written = x * reg_unit'.
For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
| |
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fs_reg::offset field in byte units introduced in this patch is a
more straightforward alternative to the current register offset
representation split between fs_reg::reg_offset and ::subreg_offset.
The split representation makes it too easy to forget about one of the
offsets while dealing with the other, which has led to multiple
back-end bugs in the past. To make the matter worse the unit
reg_offset was expressed in was rather inconsistent, for uniforms it
would be expressed in either 4B or 16B units depending on the
back-end, and for most other things it would be expressed in 32B
units.
This encodes reg_offset as a new offset field expressed consistently
in byte units. Each rvalue reference of reg_offset in existing code
like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and
each lvalue reference like 'r.reg_offset = x' is rewritten to
'r.offset = r.offset % reg_unit + x * reg_unit'.
Because the change affects a lot of places and is rather non-trivial
to verify due to the inconsistent value of reg_unit, I've tried to
avoid making any additional changes other than applying the rewrite
rule above in order to keep the patch as simple as possible, sometimes
at the cost of introducing obvious stupidity (e.g. algebraic
expressions that could be simplified given some knowledge of the
context) -- I'll clean those up later on in a second pass.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generated by:
sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
| |
Signed-of-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately due to the inconsistent meaning of some surface state
structure fields, we cannot re-use the same binding table entries for
sampling from and rendering into the same set of render buffers, so we
need to allocate a separate binding table block specifically for
render target reads if the non-coherent path is in use.
The slight noise is due to the change of
brw_assign_common_binding_table_offsets to return the next available
binding table index rather than void.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
| |
We will use this for output varyings. To make component
packing simpler we will just treat all varyings as vec4s.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two distinctly different uses of this struct. The first
is to store GL shader objects. The second is to store information
about a shader stage thats been linked.
The two uses actually share few fields and there is clearly confusion
about their use. For example the linked shaders map one to one with
a program so can simply be destroyed along with the program. However
previously we were calling reference counting on the linked shaders.
We were also creating linked shaders with a name even though it
is always 0 and called the driver version of the _mesa_new_shader()
function unnecessarily for GL shader objects.
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead use the internal gl_shader_stage enum everywhere. This
makes things more consistent and gets rid of unnecessary
conversions.
Ideally it would be nice to remove the Type field from gl_shader
altogether but currently it is used to differentiate between
gl_shader and gl_shader_program in the ShaderObjects hash table.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
| |
type_size_vec4_times_4() was introduced as a fix in 8dcf807cb43383
however since 3810c1561 we can just use type_size_scalar() and
get the actual number of outputs we need.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For double-precision vertex inputs we need to measure them in dvec4
terms, and for single-precision vertex inputs we need to measure them in
vec4 terms.
For the later case, we use type_size_vec4() function. For the former
case, we had a wrong implementation based on type_size_vec4().
This commit introduces a proper type_size_dvec4() function, that we use
to measure vertex inputs.
Measuring double-precision vertex inputs as dvec4 is required because
ARB_vertex_attrib_64bit states that these uses the same number of
locations than the single-precision version. That is, two consecutives
dvec4 would be located in location "x" and location "x+1", not "x+2".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
| |
Input attributes can require 2 vec4 or 1 vec4 depending on whether they
are double-precision or not.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
| |
v2 (Iago)
- Fixup accessibility in backend_reg
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
|
|
|
|
|
|
|
|
|
|
| |
According to Timothy, using program_string_id == 0 to identify the
passthrough TCS is going to be problematic for his shader cache work.
So, change it to strcmp() the name at visitor creation time.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is not strictly required for the following changes because none
of the three-source opcodes we support at the moment in the compiler
back-end has been removed or redefined, but that's likely to change in
the future. In any case having hardware instructions specified as a
pair of hardware device and opcode number explicitly in all cases will
simplify the opcode look-up interface introduced in a subsequent
commit, since the opcode number alone is in general ambiguous.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
A future series will implement support for an instruction that happens
to have the same opcode number as another instruction we support
already on a disjoint set of hardware generations. In order to
disambiguate which instruction it is brw_instruction_name() will need
some way to find out which device we are generating code for.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
| |
Also, move them to brw_shader.cpp so they're in a location for code
used by both the vec4 and fs worlds.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
|
|
|
|
|
|
|
|
|
|
|
| |
From the point it's constructed the CFG contains the only existing
copy of the program IR, and it never becomes invalid. Calling
backend_shader::invalidate_cfg would have destroyed the program
structure irrecoverably -- We weren't calling it at all for a good
reason.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
| |
A future patch will want to use designated initalizers, which aren't
available in C++, but this is C.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The TCS is the first tessellation shader stage, and the most
complicated. It has access to each of the control points in the input
patch, and computes a new output patch. There is one logical invocation
per output control point; all invocations run in parallel, and can
communicate by reading and writing output variables.
One of the main responsibilities of the TCS is to write the special
gl_TessLevelOuter[] and gl_TessLevelInner[] output variables which
control how much new geometry the hardware tessellation engine will
produce. Otherwise, it simply writes outputs that are passed along
to the TES.
We run in SIMD4x2 mode, handling two logical invocations per EU thread.
The hardware doesn't properly manage the dispatch mask for us; it always
initializes it to 0xFF. We wrap the whole program in an IF..ENDIF block
to handle an odd number of invocations, essentially falling back to
SIMD4x1 on the last thread.
v2: Update comments (requested by Jordan Justen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The TES is essentially a post-tessellator VS, which has access to the
entire TCS output patch, and a special gl_TessCoord input. Otherwise,
they're very straightforward.
This patch implements SIMD8 tessellation evaluation shaders for Gen8+.
The tessellator can generate a lot of geometry, so operating in SIMD8
mode (8 vertices per thread) is more efficient than SIMD4x2 mode (only
2 vertices per thread). I have another patch which implements SIMD4x2
mode for older hardware (or via an environment variable override).
We currently handle all inputs via the pull model.
v2: Improve comments (suggested by Jordan Justen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
| |
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
|
|
| |
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
|
|
|
|
|
| |
Now that backend_reg inherits from brw_reg, we have to be careful to
avoid the object slicing problem.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to do this, we have to change the signature of the
backend_reg(brw_reg) constructor to take a reference to a brw_reg in
order to avoid unresolvable ambiguity about which constructor is
actually being called in the other modifications in this patch.
As far as I understand it, the rule in C++ is that if multiple
constructors are available for parent classes, the one closest to you in
the class heirarchy is closen, but if one of them didn't take a
reference, that screws things up.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
|
|
|
|
| |
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
|
|
|
|
|
|
|
|
|
| |
I was going to add scalar_tcs and scalar_tes flags, and then thought
better of it and decided to convert this to an array. Simpler.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
|
|
| |
The first four values (2-bits) are hardware values, and VGRF, ATTR, and
UNIFORM remain values used in the IR.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to
look at different fields for type, abs, negate, writemask, swizzle, and
a second file. They also caused annoying problems like immediate sources
being considered scheduling barriers (commit 6148e94e2) and other such
nonsense.
Instead use ARF/FIXED_GRF/MRF for fixed registers in those files.
After a sufficient amount of time has passed since "GRF" was used, we
can rename FIXED_GRF -> GRF, but doing so now would make rebasing awful.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
| |
The 2-bit hardware register file field is ARF, GRF, MRF, IMM.
Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to
mean an assigned general purpose register.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
| |
I'm going to begin using brw_reg's file field in backend_reg and its
derivatives, and in order to keep the hardware value for ARF as 0, we
have to do something different.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In addition to combining another field, we get replace silliness like
"reg.reg" with something that actually makes sense, "reg.nr"; and no one
will ever wonder again why dst.reg isn't a dst_reg.
Moving the now 16-bit nr field to a 16-bit boundary decreases code size
by about 3k.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
| |
Since backend_reg now inherits brw_reg, we can use it in place of the
fixed_hw_reg field.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
| |
Switching from an implicitly-sized type field to field with an explicit
bit width is safe because we have fewer than 2^4 types, and gcc will
warn if you attempt to set a value that will not fit.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
| |
Instead use the ones provided by brw_reg. Also allows us to handle
HW_REGs in the negate() functions.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
| |
Some fields (file, type, abs, negate) in brw_reg are shadowed by
backend_reg.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The scalar VS backend has never handled float[] and vec2[] outputs
correctly (my original code was broken). Outputs need to be padded
out to vec4 slots.
In fs_visitor::nir_setup_outputs(), we tried to process each vec4 slot
by looping from 0 to ALIGN(type_size_scalar(type), 4) / 4. However,
this is wrong: type_size_scalar() for a float[2] would return 2, or
for vec2[2] it would return 4. This looked like a single slot, even
though in reality each array element would be stored in separate vec4
slots.
Because of this bug, outputs[] and output_components[] would not get
initialized for the second element's VARYING_SLOT, which meant
emit_urb_writes() would skip writing them. Nothing used those values,
and dead code elimination threw a party.
To fix this, we introduce a new type_size_vec4_times_4() function which
pads array elements correctly, but still counts in scalar components,
generating correct indices in store_output intrinsics.
Normally, varying packing avoids this problem by turning varyings into
vec4s. So this doesn't actually fix any Piglit or dEQP tests today.
However, if varying packing is disabled, things would be broken.
Tessellation shaders can't use varying packing, so this fixes various
tcs-input Piglit tests on a branch of mine.
v2: Shorten the implementation of type_size_4x to a single line (caught
by Connor Abbott), and rename it to type_size_vec4_times_4()
(renaming suggested by Jason Ekstrand). Use type_size_vec4
rather than using type_size_vec4_times_4 and then dividing by 4.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An untyped surface read is volatile because it might be affected by a
write.
In the ES31-CTS.compute_shader.resources-max test, two back to back
read/modify/writes of an SSBO variable looked something like this:
r1 = untyped_surface_read(ssbo_float)
r2 = r1 + 1
untyped_surface_write(ssbo_float, r2)
r3 = untyped_surface_read(ssbo_float)
r4 = r3 + 1
untyped_surface_write(ssbo_float, r4)
And after CSE, we had:
r1 = untyped_surface_read(ssbo_float)
r2 = r1 + 1
untyped_surface_write(ssbo_float, r2)
r4 = r1 + 1
untyped_surface_write(ssbo_float, r4)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit moves the large pile of setup calculations we have to do for
geometry shaders out of brw_gs_emit and into brw_compile_gs. This has a
couple of nice implications. First, it's less work that the caller of
brw_compile_gs has to do. Second, it's consistent with the vertex and
fragment stages. Finally, it allows us to put brw_gs_compile back behind
the API boundary where it belongs.
v2 (Jason Ekstrand):
- Pull the changes to use nir info into a separate patch
- Put brw_gs_compile into brw_shader.h rather than brw_vec4_gs_visitor.h
so that we can use it for scalar GS.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
| |
At this point, the compiler API has been substantially simplified. In the
spirit of Kristian's making a compiler library, this commit makes a single
header file that contains, more-or-less, the entire compiler API.
There's still a bit of cleanup to do particularly in the area of geometry
shaders. However, this gets us much closer to having a separate compiler.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
|
|
|
|
|
|
|
|
| |
We want to use the rest of brw_shader.cpp with the rest of the compiler
without pulling in the GLSL linking code.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
|
|
|
|
|
|
|
| |
instructions in affected programs: 44204 -> 43762 (-1.00%)
helped: 221
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately, we can't get rid of them entirely. The FS backend still
needs gl_program for handling TEXTURE_RECTANGLE. The GS vec4 backend still
needs gl_shader_program for handling transfom feedback. However, the VS
needs neither and we can substantially reduce the amount they are used.
One day we will be free from their tyranny.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
| |
This really has nothing to do with the backend compiler and we'd like to
eventually be able to set this up earlier in the compile process.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
| |
It's not used by anything anymore
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|