| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We're already checking that branch instructions are within the
contents of the shader and the proper PROG_END sequence is present.
The other thing we need in the presence of branching is to verify that
the shader doesn't overflow past the end of the uniforms stream.
To do that, we require that at the start of any basic block reading
uniforms have the following instructions:
load_imm temp, <offset within uniform stream>
add unif_addr, temp, unif
The instructions are generated by userspace, and the kernel verifies
that the load_imm is of the expected offset, and that the add adds it
to a uniform. We track which uniform in the stream that is, and at
draw call time fix up the uniform stream to have the address of the
start of the shader's uniforms for that draw call.
Signed-off-by: Eric Anholt <eric@anholt.net>
|
|
|
|
|
|
| |
This isn't used yet, it's just a first step toward loop validation.
During the main parsing of instructions, we need to know when we hit a new
basic block so that we can reset validated state.
|
|
|
|
|
| |
This reduces how much we need to pass around as arguments, which was
becoming more of a problem with looping validation.
|
| |
|
|
|
|
|
| |
This only happens when live variables are set up, which is not in the
normal dump, but is set up when we've failed to register allocate.
|
|
|
|
|
| |
Right now our CFG is always a trivial single basic block, but that will
change when enable loops.
|
|
|
|
|
|
|
|
| |
Basically we just treat each block independently. The only inter-block
scheduling I can think of that would be be interesting would be to move
texture result collection to after a short loop/if block that doesn't do
texturing. However, the kernel disallows that as part of its security
validation.
|
|
|
|
|
|
|
|
|
| |
We still decide which uniform to lower based on how many
instructions-that-need-lowering use that uniform, but now we emit a new
temporary uniform load in each of the basic blocks containing an
instruction being lowered.
This commit is best reviewed with diff -b.
|
|
|
|
|
|
| |
We need to apply the peephole pass to each of the blocks in the program.
We don't do dataflow analysis for SF across blocks, but we also don't
generate code that would need us to do so.
|
|
|
|
|
|
|
| |
The optimization passes and scheduling aren't actually ready for multiple
blocks with control flow yet (as seen by the "cur_block" references in
them instead of iterating over blocks), but this creates the structures
necessary for converting them.
|
|
|
|
|
|
|
|
| |
We have the prior list_foreach() all over the code, but I need to move
where instructions live as part of adding support for control flow. Start
by just converting to a helper iterator macro. (The simpler
"qir_for_each_inst()" will be used for the for-each-inst-in-a-block
iterator macro later)
|
|
|
|
|
|
|
|
| |
This avoids a bunch of code gen regressions when enabling loops in vc4.
Prior to that, the GLSL that would have generated these optimizable phi
nodes was being lowered to csels between either (undef, a) or (a, a), and
those were being dealt with by nir_opt_undef and nir_opt_algebraic.
|
|
|
|
|
|
|
|
| |
The allocation has succeeded by that point, so it needs to be freed.
CovID: 1358929
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
|
|
|
| |
We're passed in a freshly dup()ed fd on screen create, so we should close
it on exit. Debugged by Hugh Cole-Baker.
|
|
|
|
|
|
| |
ALU0 didn't have the _dest variant, and ALU2 didn't unset the def the way
ALU1 did. This should make the ALU[012] macros much clearer, by moving
most of their contents to vc4_qir.c
|
|
|
|
|
|
| |
Now that we're about to start generating control flow in our NIR, we want
this in place. It optimizes things frequently in the CS, when the GL VS
has control flow that doesn't affect the vertex position.
|
|
|
|
|
|
|
|
|
|
|
| |
Tiny change on shader-db currently, but it will be important when we start
emitting a lot of SFs from the same variable as part of control flow
support.
total instructions in shared programs: 89463 -> 89430 (-0.04%)
instructions in affected programs: 1522 -> 1489 (-2.17%)
total estimated cycles in shared programs: 250060 -> 250015 (-0.02%)
estimated cycles in affected programs: 8568 -> 8523 (-0.53%)
|
|
|
|
|
|
|
|
|
| |
The DCE pass is going to change significantly to handle control flow,
while we don't really need to change it for the SF handling. We also need
to add some more SF peephole optimization for SF updates generated by
control flow support.
No change on shader-db.
|
|
|
|
|
|
|
|
| |
I'm going to add an optimization for redundant SF update removal, which
will just remove the SF and leave us (in many cases) with an instruction
with a NULL destination and no side effects. Rather than teaching that
pass whether the whole instruction can be removed, leave that
responsibility to this pass.
|
|
|
|
|
|
|
| |
We need to not DCE them even though they don't have a destination in QIR.
We also shouldn't relocate them in vc4_opt_vpm. Neither of these things
happen, but I'm about to make DCE consider instructions with a NULL
destination.
|
|
|
|
|
|
|
| |
Noticed by code inspection. This hasn't been too big of a deal, because
our cond usages all start out as adder ops, either MOVs or the FTOI for Z
writes. MOVs *can* get converted to mul ops during scheduling, but
apparently we hadn't hit this.
|
|
|
|
|
| |
This isn't used since we switched to using the dst.pack field instead of
custom instructions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
D3D9 has a different behaviour for depth bias.
For OGL/D3D1X, the depth bias unit is the
minimal resolvable value for the depth buffer,
which depends on the format (and has different
behaviour for float depth buffers).
For D3D9, the depth bias unit is 1.0f.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clean up misrepetitions ('if if', 'the the' etc) found throughout the
comments. This has been done manually, after grepping
case-insensitively for duplicate if, is, the, then, do, for, an,
plus a few other typos corrected in fly-by
v2:
* proper commit message and non-joke title;
* replace two 'as is' followed by 'is' to 'as-is'.
v3:
* 'a integer' => 'an integer' and similar (originally spotted by
Jason Ekstrand, I fixed a few other similar ones while at it)
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
|
|
|
|
|
|
| |
This says how many window rectangles are supported by the
implementation, although it may not exceed PIPE_MAX_WINDOW_RECTANGLES.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The expected stride calculation is completely wrong. It should
ultimately be multiplying cpp and width rather than dividing. The width
also needs to be aligned to the tiling width first before converting to
stride bytes.
The whole stride check here is possibly pointless. Any buffers which
were allocated outside of vc4 may have strides with larger alignment
requirements.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix warnings like these due to HAVE_LIBDRM being inconsistently defined:
external/libdrm/include/drm/drm.h:839:30: warning: redefinition of typedef 'drm_clip_rect_t' is a C11 feature [-Wtypedef-redefinition]
typedef struct drm_clip_rect drm_clip_rect_t;
HAVE_LIBDRM needs to be set project wide to fix this. This change also
harmlessly links libdrm with everything, but simplifies the makefiles a
bit.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
|
|
|
|
|
|
|
| |
Introduced in 8e2d0843c02daf5280184f179ae8ed440ac90d7f.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
|
|
|
|
|
|
| |
Now that vc4 automated code documentation can be generated with
doxygen, fix the warnings issued by Doxygen 1.8.11.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Push offset down to drivers when importing dmabuf. This is needed
to more fully support EGL_EXT_image_dma_buf_import when a non-zero
offset is specified.
Tesing has been done for freedreno, and compile tested following
gallium drivers:
nouveau,svga,virgl,r600,r300,radeonsi,swrast,i915,ilo
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some hardware supports primitive restart on patch primitives, and other
hardware does not. Modern GL and ES include a query for this feature;
adding a capability bit will allow us to answer it.
As far as I know, AMD hardware does not support this feature, while
NVIDIA and Intel hardware does. However, most Gallium drivers do not
appear to support tessellation shaders yet. So, I've enabled it for
nvc0 and disabled it everywhere else.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
|
|
| |
We don't really support reading/writing of 3D textures since the hardware
doesn't do 3D, but we do need to make sure that a pipe_transfer for them
has enough space to store the image. This was previously not a problem
because the state tracker only mapped a slice at a time until
fb9fe352ea41c7e3633ba2c483c59b73c529845b. Fixes glean glsl1 tests, which
all have setup of a 3D texture at the start.
|
|
|
|
|
| |
This gets us precompile of vertex shaders at the state tracker level as
well.
|
|
|
|
|
| |
Now we have an immutable nir shader in our shader's CSO that we can clone
and lower/optimize.
|
|
|
|
|
|
|
| |
We always clamp fragment colors, since they're always 8-bit unorm, so
there's no need to have us compile separate shaders based on
GL_ARB_color_buffer_float. This gives us precompilation of fragment
programs to the vc4_shader_state_create() level.
|
|
|
|
|
|
| |
This allows the same pipe_shader_state to be referenced from multiple
contexts. Since our pipe_shader_state is treated as immutable (other than
the variant number) within the driver, this is no problem.
|
|
|
|
|
|
|
|
| |
This will be generated by glsl_to_nir, and it turns out that this is a
more code-efficient path than the floating point math, anyway.
No change on shader-db, but drops an instruction in piglit's
glsl-fs-frontfacing.
|
|
|
|
| |
This came from deriving from freedreno.
|
|
|
|
|
| |
This is apparently enabled as an error in Android builds, and the compiler
can't tell that the return value is safe.
|
|
|
|
|
|
|
|
|
| |
This lets us safely enable or disable the extension as needed
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
|
|
|
|
|
| |
This will be used for resetting the uniform stream in the presence of
branching, but may also be useful as an optimization to reduce how many
uniforms we have to copy out per draw call (in exchange for increasing
icache pressure).
|
|
|
|
|
|
| |
Seeing the expansion of a QPU_GET_FIELD in an assert isn't very
informative, and it's hard find what's going wrong without getting a dump
of the instruction that failed.
|
|
|
|
|
| |
This has caught a couple of bugs during loop development so far, and I
should probably have written it long ago.
|
|
|
|
| |
Found by the upcoming QIR validate pass.
|
|
|
|
| |
In the process, this made me flatten out the "%s%s%s%s" fprintf arguments.
|
|
|
|
| |
Prevents a bug in the later control-flow support series.
|
|
|
|
|
|
|
|
| |
We should have already emitted a NOP due to the last instruction being a
TLB or VPM write. However, if you disable dead code elimination then you
might get dead code at the end, and that dead code might have the signal
bits set to something non-default, at which point you die in assertion
failure.
|
|
|
|
| |
Reviewed-by: Eric Anholt <eric@anholt.net>
|