| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should be a win for most loops, which tend to have uniform control
flow.
More importantly, it exposes important information to live variables: that
the break/continue here means that our jump target may have access to
values that were live on our input. Previously, we were just setting the
exec mask and letting control flow fall through, so an intervening def
between the break and the end of the loop would appear to live variables
as if it screened off the variable, when it didn't actually.
Fixes a regression in glsl-vs-loop-redundant-condition.shader_test when a
perturbing of register allocation caused a live variable to get stomped.
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 8e5ec33f1151dd82402bdfdaa4fff7c284e49a1c)
|
|
|
|
|
|
|
|
| |
I had this exactly backwards, but apparently the piglit tests were all
landing in r0-r3 anyway.
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 977d8b526b983c8d19df00af224033389f8ab7c8)
|
|
|
|
|
|
|
| |
Fixes piglit glsl-fs-shadow2D-clamp-z.
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 08d51487e3b8cfb14ca2ece9545b2e2ed344e3cc)
|
|
|
|
|
|
|
|
|
|
| |
It's much better to just skip the draw call entirely. Getting this
information out of register allocation will also be useful for
implementing threaded fragment shaders, which will need to retry
non-threaded if RA fails.
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 4d019bd703e7c20d56d5b858577607115b4926a3)
|
|
|
|
|
|
|
|
| |
The 1/W was apparently not accurate enough, and we were getting sparklies
in the distance. The closed driver also did a N-R step here.
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 283d4d18e598793bbff7d9ba5a601bced9b36542)
|
|
|
|
|
| |
Piglit didn't manage to cover this because fbo-clear-formats uses
scissors, so we don't get fast clearing.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the plan was "if the width/height we have to load/store isn't
the size the user is planning on writing, then we need to load the old
contents out beforehand to prevent writing back undefined".
However, when we're doing glTexImage() we often end up aligning the
width/height into the padding of the texture, and we don't actually
need to read out that padding.
Improves x11perf -aatrapezoid100 performance from ~460/sec to
~700/sec.
|
|
|
|
|
|
|
|
| |
This is a screen cap because drivers are expected to support it either
for all shader types or for none of them.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
|
|
|
|
|
|
|
|
|
| |
We have to be careful to not smash the value they're clearing to, but
other than that we're fine. Avoids quad clears in Processing, which likes
to do glClear(Z|S); glClear(Z).
Improves performance of Processing's QuadRendering demo at 5000 quads by
5.46507% +/- 1.35576% (n=15 before, 32 after)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were incrementing the count at the end of vc4_start_draw(), except that
that function returns immediately if we've already started drawing on this
batch. It also failed to count the statechanges from the GFXH-515
workaround.
This incidentally allows repeated glClear() to be coalesced, because the
fast clears aren't counted in draw_calls_queued any more. Fixes most of
the extra flushes in Processing, which emits glClear(Z|S); glClear(Z);
glClear(C) during its frame setup.
Improves performance of Processing's QuadRendering demo at 5000 quads by
3.33538% +/- 2.05846% (n=21 before, 15 after)
|
| |
|
|
|
|
|
| |
The fix in the vc4-jobs series ended up triggering the fallback path on
GLX apps that use depth but not stencil.
|
|
|
|
|
| |
I was curious if my Z/S buffer was actually ZS or ZX, and the vc4 format
of "0" didn't tell me much.
|
| |
|
|
|
|
|
|
|
| |
This slightly reduces instructions on shader-db, but I think it's just
perturbing register allocation -- the allocator should have always
trivially colored these nodes, before. This commit is just to make QIR
code failing more intelligible when register allocation fails.
|
|
|
|
|
|
|
|
|
| |
If a conditional assignment is only conditioned on the exec mask, that's
still screening off the value in the executed channels (and, since we're
not storing to the unexcuted channels, we don't care what's in there).
Fixes a bunch of extra register pressure on Processing's Ribbons demo,
which is failing to allocate.
|
|
|
|
|
|
| |
We would assertion fail in setting up the simulator the second time
around. This at least postpones the assertion failure until we've closed
all of the first set of screens and started opening a new set.
|
|
|
|
|
| |
Fixes 100 piglit tests since the assertions were added to nir.h. What's
amazing is that these tests used to pass, even when casting garbage.
|
|
|
|
|
|
|
|
|
|
| |
One of NIR's invariants is that control flow lists always start and end
with blocks. There's no good reason why we should return a cf_node from
these functions since we know that it's always a block. Making it a block
lets us remove a bunch of code.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
|
|
|
|
| |
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
VC4 was running into a major performance regression from enabling control
flow in the glmark2 conditionals test, because of short if statements
containing an ffract.
This pass seems like it was was trying to ensure that we only flattened
IFs that should be entirely a win by guaranteeing that there would be
fewer bcsels than there were MOVs otherwise. However, if the number of
ALU ops is small, we can avoid the overhead of branching (which itself
costs cycles) and still get a win, even if it means moving real
instructions out of the THEN/ELSE blocks.
For now, just turn on aggressive flattening on vc4. i965 will need some
tuning to avoid regressions. It does looks like this may be useful to
replace freedreno code.
Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47
fps to 95 fps on vc4.
vc4 shader-db:
total instructions in shared programs: 101282 -> 99543 (-1.72%)
instructions in affected programs: 17365 -> 15626 (-10.01%)
total uniforms in shared programs: 31295 -> 31172 (-0.39%)
uniforms in affected programs: 3580 -> 3457 (-3.44%)
total estimated cycles in shared programs: 225182 -> 223746 (-0.64%)
estimated cycles in affected programs: 26085 -> 24649 (-5.51%)
v2: Update shader-db output.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Track rendering to each FBO independently and flush rendering only when
necessary. This lets us avoid the overhead of storing and loading the
frame when an application momentarily switches to rendering to some other
texture in order to continue rendering the main scene.
Improves glmark -b desktop:effect=shadow:windows=4 by 27%
Improves glmark -b
desktop:blur-radius=5:effect=blur:passes=1:separable=true:windows=4
by 17%
While I haven't tested other apps, this should help X rendering a lot, and
I've heard GLBenchmark needed it too.
|
|
|
|
|
|
| |
This is done in vc4_flush currently, but I'm going to make the job always
track the surfaces it might be rendering to instead of putting in the
destinations at flush time.
|
|
|
|
|
| |
This is a preparation step for having multiple jobs being queued up at the
same time.
|
|
|
|
|
| |
Drops some tricky logic in vc4_flush() trying to update the pointers, and
fixes a broken lack of unref for MSAA surfaces at context destroy time.
|
|
|
|
| |
For calling job_submit() directly, I need the skipping here.
|
|
|
|
|
| |
To implement job shuffling, I want to be able to call submit() on specific
jobs, turning vc4_flush() into the context's flush-all-jobs hook.
|
|
|
|
|
|
|
| |
It's really just an upgrade to attempting WHOLE_RESOURCE. Pulling the
logic out caught two bugs in it: We would try to do so on cubemaps (even
though we're only mapping 1 of the 6 slices), and we would break
persistent coherent mappings by trying to reallocate when we shouldn't.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The clear of Z or stencil will end up clearing the other as well, instead
of masking. There's no way around this that I know of, so if we are
clearing just one then we need to draw a quad.
Fixes a regression in the job-shuffling code, where the clear values move
to the job and don't just have the last clear's value laying around when
you do glClear(DEPTH) and then glClear(STENCIL) separately
(ext_framebuffer_multisample-clear 4 depth)).
This causes regressions in ext_framebuffer_multisample/multisample-blit
depth and ext_framebuffer_multisample/no-color depth, but these were
formerly false positives due to the reference image also being black. Now
the reference and test images are both being drawn, and it looks like
there's an incorrect resolve of depth during blitting to an MSAA FBO.
|
|
|
|
|
|
|
|
| |
not used in any useful way
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A few weeks ago, Jose Fonseca suggested [0] we use .editorconfig files
to try and enforce the formatting of the code, to which Michel Dänzer
suggested [1] we start by importing the existing .dir-locals.el
settings. The first draft was discussed in the RFC [2].
These .editorconfig are a first step, one that has the advantage of
requiring little to no intervention from the devs once the settings
files are in place, but the settings are very limited. This does have
the advantage of applying while the code is being written.
This doesn't replace the need for more comprehensive formatting tools
such as clang-format & clang-tidy, but those reformat the code after
the fact.
[0] https://lists.freedesktop.org/archives/mesa-dev/2016-June/121545.html
[1] https://lists.freedesktop.org/archives/mesa-dev/2016-June/121639.html
[2] https://lists.freedesktop.org/archives/mesa-dev/2016-July/123431.html
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
|
|
|
| |
This opcode isn't used yet, so it didn't affect anything. Caught by
Coverity, reported to me by imirkin.
|
|
|
|
|
| |
I missed this while adding loop support because the discard test inside a
loop was crashing before, anyway. Fixes piglit glsl-fs-discard-04.
|
| |
|
|
|
|
|
| |
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
|
|
|
|
|
|
|
|
|
| |
v1 → v2:
- Fixed indentation (noted by Brian Paul)
- Removed second assert from nouveau's switch statements (suggested by
Brian Paul)
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
|
|
| |
Based vaguely on a patch by jonasarrow on github.
|
|
|
|
|
|
|
| |
We need the source to be in r0-r3, so make a new register class for it.
It will be up to the surrounding passes to make sure that the r0-r3
allocation of its source won't conflict with anything other class
requirements on that temp.
|
|
|
|
| |
Extracted from a patch by jonasarrow on github.
|
|
|
|
|
| |
Extracted and fixed up from a patch by jonasarrow on github. This ended
up not getting used for ddx/ddy, but seems like it might still be useful.
|
|
|
|
| |
We need MUL rotates to do ddx/ddy support.
|
| |
|
|
|
|
| |
Caught problems in the upcoming DDX/DDY implementation.
|
|
|
|
|
| |
This will be used in the ddx/ddy support for "Am I the top half?" or "Am I
the left half?" checks.
|
|
|
|
| |
Fixes glsl-routing in piglit and hangs in glbenchmark 2.0.2.
|
|
|
|
|
|
|
|
|
|
| |
Some hardware can't render to color/depth buffers of mixed bitness. When
that happens a fallback has to happen, but this allows the driver to
express that this isn't an optimal scenario. The purpose of this is to
remove such fbconfigs from the GLX/EGL config list.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|