summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4
Commit message (Collapse)AuthorAgeFilesLines
* vc4: In a loop break/continue, jump if everyone has taken the path.Eric Anholt2016-12-141-10/+17
| | | | | | | | | | | | | | | | | | This should be a win for most loops, which tend to have uniform control flow. More importantly, it exposes important information to live variables: that the break/continue here means that our jump target may have access to values that were live on our input. Previously, we were just setting the exec mask and letting control flow fall through, so an intervening def between the break and the end of the loop would appear to live variables as if it screened off the variable, when it didn't actually. Fixes a regression in glsl-vs-loop-redundant-condition.shader_test when a perturbing of register allocation caused a live variable to get stomped. Cc: 13.0 <mesa-stable@lists.freedesktop.org> (cherry picked from commit 8e5ec33f1151dd82402bdfdaa4fff7c284e49a1c)
* vc4: Fix register class handling of DDX/DDY arguments.Eric Anholt2016-11-241-1/+1
| | | | | | | | I had this exactly backwards, but apparently the piglit tests were all landing in r0-r3 anyway. Cc: "13.0" <mesa-stable@lists.freedesktop.org> (cherry picked from commit 977d8b526b983c8d19df00af224033389f8ab7c8)
* vc4: Clamp the shadow comparison value.Eric Anholt2016-11-231-0/+9
| | | | | | | Fixes piglit glsl-fs-shadow2D-clamp-z. Cc: <mesa-stable@lists.freedesktop.org> (cherry picked from commit 08d51487e3b8cfb14ca2ece9545b2e2ed344e3cc)
* vc4: Don't abort when a shader compile fails.Eric Anholt2016-11-236-8/+32
| | | | | | | | | | It's much better to just skip the draw call entirely. Getting this information out of register allocation will also be useful for implementing threaded fragment shaders, which will need to retry non-threaded if RA fails. Cc: <mesa-stable@lists.freedesktop.org> (cherry picked from commit 4d019bd703e7c20d56d5b858577607115b4926a3)
* vc4: Use Newton-Raphson on the 1/W write to fix glmark2 terrain.Eric Anholt2016-11-091-1/+1
| | | | | | | | The 1/W was apparently not accurate enough, and we were getting sparklies in the distance. The closed driver also did a N-R step here. Cc: <mesa-stable@lists.freedesktop.org> (cherry picked from commit 283d4d18e598793bbff7d9ba5a601bced9b36542)
* vc4: Fix fast clear color packing for 565.Eric Anholt2016-10-161-3/+16
| | | | | Piglit didn't manage to cover this because fbo-clear-formats uses scissors, so we don't get fast clearing.
* vc4: Avoid loading from the texture during non-utile-aligned glTexImage().Eric Anholt2016-10-131-12/+34
| | | | | | | | | | | | | Previously, the plan was "if the width/height we have to load/store isn't the size the user is planning on writing, then we need to load the old contents out beforehand to prevent writing back undefined". However, when we're doing glTexImage() we often end up aligning the width/height into the padding of the texture, and we don't actually need to read out that padding. Improves x11perf -aatrapezoid100 performance from ~460/sec to ~700/sec.
* gallium: add PIPE_CAP_TGSI_ARRAY_COMPONENTSNicolai Hähnle2016-10-121-0/+1
| | | | | | | | This is a screen cap because drivers are expected to support it either for all shader types or for none of them. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Dave Airlie <airlied@redhat.com>
* vc4: Don't worry about partial Z/S clear if the other is already cleared.Eric Anholt2016-10-061-3/+7
| | | | | | | | | We have to be careful to not smash the value they're clearing to, but other than that we're fine. Avoids quad clears in Processing, which likes to do glClear(Z|S); glClear(Z). Improves performance of Processing's QuadRendering demo at 5000 quads by 5.46507% +/- 1.35576% (n=15 before, 32 after)
* vc4: Try to fix the HW-2116 workaround.Eric Anholt2016-10-061-9/+10
| | | | | | | | | | | | | | | We were incrementing the count at the end of vc4_start_draw(), except that that function returns immediately if we've already started drawing on this batch. It also failed to count the statechanges from the GFXH-515 workaround. This incidentally allows repeated glClear() to be coalesced, because the fast clears aren't counted in draw_calls_queued any more. Fixes most of the extra flushes in Processing, which emits glClear(Z|S); glClear(Z); glClear(C) during its frame setup. Improves performance of Processing's QuadRendering demo at 5000 quads by 3.33538% +/- 2.05846% (n=21 before, 15 after)
* vc4: Drop dead argument from vc4_start_draw().Eric Anholt2016-10-061-3/+3
|
* vc4: Fix fallback to quad clears of depth in GLX.Eric Anholt2016-10-064-25/+64
| | | | | The fix in the vc4-jobs series ended up triggering the fallback path on GLX apps that use depth but not stencil.
* vc4: Add the format name in miptree_debug.Eric Anholt2016-10-061-2/+4
| | | | | I was curious if my Z/S buffer was actually ZS or ZX, and the vc4 format of "0" didn't tell me much.
* vc4: Fix perf debug formatting on partial Z/S clear.Eric Anholt2016-10-061-1/+1
|
* vc4: Drop destination register when it's unused.Eric Anholt2016-10-061-1/+22
| | | | | | | This slightly reduces instructions on shader-db, but I think it's just perturbing register allocation -- the allocator should have always trivially colored these nodes, before. This commit is just to make QIR code failing more intelligible when register allocation fails.
* vc4: Fix live intervals analysis for screening defs in if statements.Eric Anholt2016-10-063-5/+20
| | | | | | | | | If a conditional assignment is only conditioned on the exec mask, that's still screening off the value in the executed channels (and, since we're not storing to the unexcuted channels, we don't care what's in there). Fixes a bunch of extra register pressure on Processing's Ribbons demo, which is failing to allocate.
* vc4: Fix simulator when more than one vc4_screen is opened.Eric Anholt2016-10-063-3/+39
| | | | | | We would assertion fail in setting up the simulator the second time around. This at least postpones the assertion failure until we've closed all of the first set of screens and started opening a new set.
* vc4: Fix assertion fails from trying to cast non-ALU instrs to ALU.Eric Anholt2016-10-061-0/+2
| | | | | Fixes 100 piglit tests since the assertions were added to nir.h. What's amazing is that these tests used to pass, even when casting garbage.
* nir: Make nir_foo_first/last_cf_node return a block insteadJason Ekstrand2016-10-061-4/+2
| | | | | | | | | | One of NIR's invariants is that control flow lists always start and end with blocks. There's no good reason why we should return a cf_node from these functions since we know that it's always a block. Making it a block lets us remove a bunch of code. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
* vc4: use the new parent/child pools for transfersNicolai Hähnle2016-10-055-6/+11
| | | | Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* vc4: Emit perf debug when we fall back to quad clears.Eric Anholt2016-09-281-0/+2
|
* nir: Allow opt_peephole_sel to be more aggressive in flattening IFs.Eric Anholt2016-09-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VC4 was running into a major performance regression from enabling control flow in the glmark2 conditionals test, because of short if statements containing an ffract. This pass seems like it was was trying to ensure that we only flattened IFs that should be entirely a win by guaranteeing that there would be fewer bcsels than there were MOVs otherwise. However, if the number of ALU ops is small, we can avoid the overhead of branching (which itself costs cycles) and still get a win, even if it means moving real instructions out of the THEN/ELSE blocks. For now, just turn on aggressive flattening on vc4. i965 will need some tuning to avoid regressions. It does looks like this may be useful to replace freedreno code. Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47 fps to 95 fps on vc4. vc4 shader-db: total instructions in shared programs: 101282 -> 99543 (-1.72%) instructions in affected programs: 17365 -> 15626 (-10.01%) total uniforms in shared programs: 31295 -> 31172 (-0.39%) uniforms in affected programs: 3580 -> 3457 (-3.44%) total estimated cycles in shared programs: 225182 -> 223746 (-0.64%) estimated cycles in affected programs: 26085 -> 24649 (-5.51%) v2: Update shader-db output. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
* nir: Report progress from nir_lower_phis_to_scalar.Kenneth Graunke2016-09-141-2/+1
| | | | | Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* nir: Report progress from nir_lower_alu_to_scalar.Kenneth Graunke2016-09-141-1/+1
| | | | | Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* vc4: Implement job shufflingEric Anholt2016-09-148-194/+333
| | | | | | | | | | | | | | | Track rendering to each FBO independently and flush rendering only when necessary. This lets us avoid the overhead of storing and loading the frame when an application momentarily switches to rendering to some other texture in order to continue rendering the main scene. Improves glmark -b desktop:effect=shadow:windows=4 by 27% Improves glmark -b desktop:blur-radius=5:effect=blur:passes=1:separable=true:windows=4 by 17% While I haven't tested other apps, this should help X rendering a lot, and I've heard GLBenchmark needed it too.
* vc4: Handle resolve skipping at job submit time.Eric Anholt2016-09-143-31/+37
| | | | | | This is done in vc4_flush currently, but I'm going to make the job always track the surfaces it might be rendering to instead of putting in the destinations at flush time.
* vc4: Move the render job state into a separate structure.Eric Anholt2016-09-1412-255/+287
| | | | | This is a preparation step for having multiple jobs being queued up at the same time.
* vc4: Always unref the current job surfaces at job reset time.Eric Anholt2016-09-143-36/+21
| | | | | Drops some tricky logic in vc4_flush() trying to update the pointers, and fixes a broken lack of unref for MSAA surfaces at context destroy time.
* vc4: Move job-submit skip cases to vc4_job_submit().Eric Anholt2016-09-142-12/+12
| | | | For calling job_submit() directly, I need the skipping here.
* vc4: Move bin CL trailer to job_submit() time.Eric Anholt2016-09-142-11/+14
| | | | | To implement job shuffling, I want to be able to call submit() on specific jobs, turning vc4_flush() into the context's flush-all-jobs hook.
* vc4: Simplify the DISCARD_RANGE handlingEric Anholt2016-09-141-12/+15
| | | | | | | It's really just an upgrade to attempting WHOLE_RESOURCE. Pulling the logic out caught two bugs in it: We would try to do so on cubemaps (even though we're only mapping 1 of the 6 slices), and we would break persistent coherent mappings by trying to reallocate when we shouldn't.
* vc4: Fix incorrect clearing of Z/stencil when cleared separately.Eric Anholt2016-09-143-15/+38
| | | | | | | | | | | | | | | | | The clear of Z or stencil will end up clearing the other as well, instead of masking. There's no way around this that I know of, so if we are clearing just one then we need to draw a quad. Fixes a regression in the job-shuffling code, where the clear values move to the job and don't just have the last clear's value laying around when you do glClear(DEPTH) and then glClear(STENCIL) separately (ext_framebuffer_multisample-clear 4 depth)). This causes regressions in ext_framebuffer_multisample/multisample-blit depth and ext_framebuffer_multisample/no-color depth, but these were formerly false positives due to the reference image also being black. Now the reference and test images are both being drawn, and it looks like there's an incorrect resolve of depth during blitting to an MSAA FBO.
* gallium: remove PIPE_BIND_TRANSFER_READ/WRITEMarek Olšák2016-09-081-5/+0
| | | | | | | | not used in any useful way Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* gallium: switch drivers to the slab allocator in src/utilMarek Olšák2016-09-063-8/+8
|
* Introduce .editorconfigEric Engestrom2016-08-312-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | A few weeks ago, Jose Fonseca suggested [0] we use .editorconfig files to try and enforce the formatting of the code, to which Michel Dänzer suggested [1] we start by importing the existing .dir-locals.el settings. The first draft was discussed in the RFC [2]. These .editorconfig are a first step, one that has the advantage of requiring little to no intervention from the devs once the settings files are in place, but the settings are very limited. This does have the advantage of applying while the code is being written. This doesn't replace the need for more comprehensive formatting tools such as clang-format & clang-tidy, but those reformat the code after the fact. [0] https://lists.freedesktop.org/archives/mesa-dev/2016-June/121545.html [1] https://lists.freedesktop.org/archives/mesa-dev/2016-June/121639.html [2] https://lists.freedesktop.org/archives/mesa-dev/2016-July/123431.html Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Acked-by: Eric Anholt <eric@anholt.net> Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
* vc4: Add missing break statement.Eric Anholt2016-08-311-0/+1
| | | | | This opcode isn't used yet, so it didn't affect anything. Caught by Coverity, reported to me by imirkin.
* vc4: Handle discards while in control flow.Eric Anholt2016-08-292-6/+28
| | | | | I missed this while adding loop support because the discard test inside a loop was crashing before, anyway. Fixes piglit glsl-fs-discard-04.
* vc4: Mark when we add discards while lowering blend state.Eric Anholt2016-08-291-0/+1
|
* gallium: Use enum pipe_shader_type in set_sampler_views()Kai Wasserbäch2016-08-291-2/+3
| | | | | Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Brian Paul <brianp@vmware.com>
* gallium: Use enum pipe_shader_type in bind_sampler_states() (v2)Kai Wasserbäch2016-08-291-1/+1
| | | | | | | | | | | v1 → v2: - Fixed indentation (noted by Brian Paul) - Removed second assert from nouveau's switch statements (suggested by Brian Paul) Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>
* vc4: Add support for fddx/fddyEric Anholt2016-08-251-0/+52
| | | | Based vaguely on a patch by jonasarrow on github.
* vc4: Add register allocation support for MUL output rotation.Eric Anholt2016-08-252-0/+14
| | | | | | | We need the source to be in r0-r3, so make a new register class for it. It will be up to the surrounding passes to make sure that the r0-r3 allocation of its source won't conflict with anything other class requirements on that temp.
* vc4: Add support for MUL output rotation.Eric Anholt2016-08-256-0/+51
| | | | Extracted from a patch by jonasarrow on github.
* vc4: Add support for the 2-bit LOAD_IMM variants.Eric Anholt2016-08-256-0/+58
| | | | | Extracted and fixed up from a patch by jonasarrow on github. This ended up not getting used for ddx/ddy, but seems like it might still be useful.
* vc4: Add QPU scheduling to handle MUL rotate sources.Eric Anholt2016-08-251-0/+13
| | | | We need MUL rotates to do ddx/ddy support.
* vc4: Add disassembly for constant MUL rotatesEric Anholt2016-08-251-9/+11
|
* vc4: Add real validation for MUL rotation.Eric Anholt2016-08-252-10/+43
| | | | Caught problems in the upcoming DDX/DDY implementation.
* vc4: Add a QIR value for the QPU element register.Eric Anholt2016-08-254-0/+8
| | | | | This will be used in the ddx/ddy support for "Am I the top half?" or "Am I the left half?" checks.
* vc4: Fix GPU hangs with >16 varying values.Eric Anholt2016-08-242-19/+68
| | | | Fixes glsl-routing in piglit and hangs in glbenchmark 2.0.2.
* gallium: add a cap to expose whether driver supports mixed color/zs bitsIlia Mirkin2016-08-231-0/+1
| | | | | | | | | | Some hardware can't render to color/depth buffers of mixed bitness. When that happens a fallback has to happen, but this allows the driver to express that this isn't an optimal scenario. The purpose of this is to remove such fbconfigs from the GLX/EGL config list. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>