summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4/vc4_context.h
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Don't abort when a shader compile fails.Eric Anholt2016-11-231-1/+7
| | | | | | | | | | It's much better to just skip the draw call entirely. Getting this information out of register allocation will also be useful for implementing threaded fragment shaders, which will need to retry non-threaded if RA fails. Cc: <mesa-stable@lists.freedesktop.org> (cherry picked from commit 4d019bd703e7c20d56d5b858577607115b4926a3)
* vc4: Fix simulator when more than one vc4_screen is opened.Eric Anholt2016-10-061-0/+1
| | | | | | We would assertion fail in setting up the simulator the second time around. This at least postpones the assertion failure until we've closed all of the first set of screens and started opening a new set.
* vc4: use the new parent/child pools for transfersNicolai Hähnle2016-10-051-1/+1
| | | | Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* vc4: Implement job shufflingEric Anholt2016-09-141-6/+33
| | | | | | | | | | | | | | | Track rendering to each FBO independently and flush rendering only when necessary. This lets us avoid the overhead of storing and loading the frame when an application momentarily switches to rendering to some other texture in order to continue rendering the main scene. Improves glmark -b desktop:effect=shadow:windows=4 by 27% Improves glmark -b desktop:blur-radius=5:effect=blur:passes=1:separable=true:windows=4 by 17% While I haven't tested other apps, this should help X rendering a lot, and I've heard GLBenchmark needed it too.
* vc4: Move the render job state into a separate structure.Eric Anholt2016-09-141-18/+33
| | | | | This is a preparation step for having multiple jobs being queued up at the same time.
* vc4: Fix incorrect clearing of Z/stencil when cleared separately.Eric Anholt2016-09-141-0/+1
| | | | | | | | | | | | | | | | | The clear of Z or stencil will end up clearing the other as well, instead of masking. There's no way around this that I know of, so if we are clearing just one then we need to draw a quad. Fixes a regression in the job-shuffling code, where the clear values move to the job and don't just have the last clear's value laying around when you do glClear(DEPTH) and then glClear(STENCIL) separately (ext_framebuffer_multisample-clear 4 depth)). This causes regressions in ext_framebuffer_multisample/multisample-blit depth and ext_framebuffer_multisample/no-color depth, but these were formerly false positives due to the reference image also being black. Now the reference and test images are both being drawn, and it looks like there's an incorrect resolve of depth during blitting to an MSAA FBO.
* gallium: switch drivers to the slab allocator in src/utilMarek Olšák2016-09-061-2/+2
|
* vc4: Add register allocation support for MUL output rotation.Eric Anholt2016-08-251-0/+1
| | | | | | | We need the source to be in r0-r3, so make a new register class for it. It will be up to the surrounding passes to make sure that the r0-r3 allocation of its source won't conflict with anything other class requirements on that temp.
* vc4: Avoid VS shader recompiles by keeping a set of FS inputs seen so far.Eric Anholt2016-08-041-7/+14
| | | | | | | | | | | | We don't want to bake the whole array into the FS key, because of the hashing overhead. But we can keep a set of the arrays seen, and use a pointer to the copy in as the array's proxy. Between this and the previous patch, gl-1.0-blend-func now passes on hardware, where previously it was filling the 256MB CMA area with shaders and OOMing. Drops 712 shaders from shader-db.
* vc4: Disable early Z with computed depth.Eric Anholt2016-07-261-0/+2
| | | | | We don't tell the hardware whether we're computing depth, so we need to manage early Z state manually. Fixes piglit early-z.
* vc4: Speed up glGenerateMipmaps by avoiding shadow baselevel.Eric Anholt2016-07-151-0/+1
| | | | | | | | | | | | | To support general GL_TEXTURE_BASE_LEVEL we have to copy to a temporary miptree. However, if a single level is being selected, we can use the existing miptree and force all the sampling to be from that particular level. This avoids a ton of software fallbacks in glGenerateMipmaps(), which uses base levels in the blit implementation in gallium. Improves "glmark2 -b terrain" from 2 fps to 3 (perhaps some more precision would be useful?), and cuts its CPU usage during the benchmarking from ~30% to ~10% (total CPU time from 8.8s to 7.6s).
* vc4: Drop VC4_DIRTY_TEXSTATE in favor of the per-stage flags.Eric Anholt2016-07-151-1/+0
| | | | | | The compiler uses the per-stage flags already, so it didn't need this. vc4_uniforms was using it, so just replace it with both of the stage flags for now.
* vc4: Remove dead dirty_samplers field.Eric Anholt2016-07-151-1/+0
| | | | We use a big VC4_DIRTY_FRAGTEX/VC4_DIRTY_VERTEX on the stage, instead.
* vc4: Drop the dead export_linkage array.Eric Anholt2016-05-171-5/+0
| | | | This came from deriving from freedreno.
* vc4: Don't flush on read-only access of buffers read by the CL.Eric Anholt2016-04-181-1/+2
| | | | | | Fixes piglit mixed-immediate-and-vbo, and may significantly improve performance of applications that store a 4-byte IB in the same VBO as vertex data.
* vc4: Add support for drawing in MSAA.Eric Anholt2015-12-081-0/+11
|
* vc4: Convert blending to being done in 4x8 unorm normally.Eric Anholt2015-10-231-1/+4
| | | | | | | | | | | | | We can't do this all the time, because you want blending to be done in linear space, and sRGB would lose too much precision being done in 4x8. The win on instructions is pretty huge when you can, though. total uniforms in shared programs: 32065 -> 32168 (0.32%) uniforms in affected programs: 327 -> 430 (31.50%) total instructions in shared programs: 92644 -> 89830 (-3.04%) instructions in affected programs: 15580 -> 12766 (-18.06%) Improves openarena performance at 1920x1080 from 10.7fps to 11.2fps.
* vc4: Add a workaround for HW-2116 (state counter wrap fails).Eric Anholt2015-10-231-3/+3
| | | | | | I haven't proven that this happens (I've got other GPU hangs in the way), but the closed driver also does this and it's documented as an errata.
* vc4: use nir two-sided-color loweringBoyan Ding2015-10-061-1/+0
| | | | | | | | Similar to 9ffc1049ca (freedreno/ir3: use nir two-sided-color lowering). No piglit regression. Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>
* vc4: convert from tgsi semantic/index to varying-slotEric Anholt2015-09-161-5/+4
| | | | | | | | | (originally part of previous patch, split out to separate patch by Rob) v2: squash in some fixes from Eric v3: Another fix from Eric for point coords. Signed-off-by: Rob Clark <robclark@freedesktop.org>
* gallium: add flags parameter to pipe_screen::context_createMarek Olšák2015-08-261-1/+1
| | | | | | | | This allows creating compute-only and debug contexts. Reviewed-by: Brian Paul <brianp@vmware.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com>
* vc4: Actually allow math results to allocate into r4.Eric Anholt2015-08-211-0/+1
| | | | | | | | | | I switched us to tracking whether the results *could* go to r4, but then didn't make a separate register class for the class bits that included r4. Switch the "any" class to actually be "any", and name the "any but r4" class more appropriately. total instructions in shared programs: 96798 -> 94680 (-2.19%) instructions in affected programs: 62736 -> 60618 (-3.38%)
* vc4: Make r4-writes implicitly move to a temp, and allocate temps to r4.Eric Anholt2015-08-041-0/+1
| | | | | | | | | | | Previously, SFU values always moved to a temporary, and TLB color reads and texture reads always lived in r4. Instead, we can have these results just be normal temporaries, and the register allocator can leave the values in r4 when they don't interfere with anything else using r4. shader-db results: total instructions in shared programs: 100809 -> 100040 (-0.76%) instructions in affected programs: 42383 -> 41614 (-1.81%)
* vc4: Skip re-emitting the shader_rec if it's unchanged.Eric Anholt2015-07-281-1/+15
| | | | | | | | It's a bunch of work for us to emit it (and its uniforms), more work for the kernel to validate it, and additional work for the CLE to read it. Improves es2gears framerate by about 50%. Signed-off-by: Eric Anholt <eric@anholt.net>
* vc4: Cache the texture p1 for the sampler.Eric Anholt2015-07-141-0/+11
| | | | | Cuts another 12% of vc4_uniforms.o, in exchange for computing it at CSO creation time.
* vc4: Cache texture p0/p1 setup for the sampler view.Eric Anholt2015-07-141-0/+12
| | | | | In exchange for a bit of space and computation in CSO setup, we cut vc4_uniform.c (draw time) code size by 4.8%.
* vc4: Move tile state/alloc allocation into the kernel.Eric Anholt2015-06-171-3/+0
| | | | | | | This avoids a security issue where userspace could have written the tile state/tile alloc behind the GPU's back, and will apparently be necessary for fixing stability bugs (tile state buffers are missing some top bits for the tile alloc's address).
* vc4: Move RCL generation into the kernel.Eric Anholt2015-06-171-1/+14
| | | | | There weren't that many variations of RCL generation, and this lets us skip all the in-kernel validation for what we generated.
* vc4: Just stream out fallback IB contents.Eric Anholt2015-05-271-0/+2
| | | | | | | | | | | | | | | The idea I had when I wrote the original shadow code was that you'd see a set_index_buffer to the IB, then a bunch of draws out of it. What's actually happening in openarena is that set_index_buffer occurs at every draw, so we end up making a new shadow BO every time, and converting more of the BO than is actually used in the draw. While I could maybe come up with a better caching scheme, for now just do the simple thing that doesn't result in a new shadow IB allocation per draw. Improves performance of isosurf in drawelements mode by 58.7967% +/- 3.86152% (n=8).
* vc4: Hook up VC4_DEBUG=perf to some useful printfs.Eric Anholt2015-04-151-0/+5
|
* vc4: Move the blit code to a separate file.Eric Anholt2015-04-131-0/+1
| | | | | There will be other blit code showing up, and it seems like the place you'd look.
* vc4: Separate out a bit of code for submitting jobs to the kernel.Eric Anholt2015-04-131-0/+3
| | | | | | I want to be able to have multiple jobs being set up at the same time (for example, a render job to do a little fixup blit in the course of doing a render to the main FBO).
* vc4: Make a new #define for making code conditional on the simulator.Eric Anholt2015-03-241-0/+6
| | | | | | I'd like to compile as much of the device-specific code as possible when building for simulator, and using if (using_simulator) instead of ifdefs helps.
* vc4: Fix up statechange management for uncompiled/compiled FS/VS.Eric Anholt2015-01-111-5/+4
| | | | | | | | No need to recheck the FS compile when the VS source has changed, but there *is* a need to recheck the VS compile when the compiled VS has changed (since the live inputs may change). Fixes es3conform's blend test.
* vc4: Cook up the draw-time VPM setup info during shader compile.Eric Anholt2015-01-101-0/+6
| | | | | | This will give the compiler the chance to dead-code eliminate unused VPM reads. This is particularly a big deal in the CS where a bunch of vattrs are just not going to be used.
* vc4: Only render tiles where the scissor ever intersected them.Eric Anholt2014-12-301-0/+10
| | | | | This gives a 2.7x improvement in x11perf -rect100, since we only end up load/storing the x11perf window, not the whole screen.
* vc4: Fix leak of the compiled shader programs in the cache.Eric Anholt2014-12-141-0/+1
|
* vc4: Switch to using the util/ hash table.Eric Anholt2014-12-141-1/+1
| | | | | No performance difference on a microbenchmark with norast that should hit it enough to have mattered, n=220.
* vc4: Update for new kernel ABI with async execution and waits.Eric Anholt2014-11-201-0/+3
| | | | | Our submits now return immediately and you have to manually wait for things to complete if you want to (like a normal driver).
* vc4: Add support for ARL and indirect register access on TGSI_FILE_CONSTANT.Eric Anholt2014-10-281-0/+23
| | | | | Fixes 14 ARB_vp tests (which had no lowering done), and should improve performance of indirect uniform array access in GLSL.
* vc4: Refactor flushing before mapping a BO.Eric Anholt2014-10-241-1/+1
| | | | I'm going to want to make some other decisions here before flushing.
* vc4: Add debug output to match shaderdb info to program dumps.Eric Anholt2014-10-241-0/+5
| | | | | | I'm going to be using VC4_DEBUG=shaderdb,norast to do shaderdb stats, but when debugging regressions, I want to match shaderdb output to shader disassembly.
* vc4: Add support for user clip plane and gl_ClipVertex.Eric Anholt2014-10-151-0/+2
| | | | Fixes about 15 piglit tests about interpolation and clipping.
* vc4: Match VS outputs to FS inputs.Eric Anholt2014-10-131-0/+10
| | | | | | | | | If the VS doesn't output a value that the FS needs, we still need to read the right contents for the remaining FS inputs, by emitting padding. And if the VS outputs something the FS doesn't need, we shouldn't put it in the VPM at all (so the code producing it can get DCEed). Fixes 77 piglit tests.
* vc4: Don't look up the compiled shaders unless state has changed.Eric Anholt2014-10-101-0/+3
| | | | | Improves simulated norast performance on a little benchmark by 38.0965% +/- 3.27534% (n=11).
* vc4: Split the coordinate shader to its own vc4_compiled_shader.Eric Anholt2014-10-091-6/+3
| | | | | | | | | | | Merging VS and CS into the same struct wasn't winning us anything except for not allocating a separate BO (but if we want to pack programs into BOs, we should pack not just those 2 programs together). What it was getting us was a bunch of code duplication about hash table lookups and propagating vc4_compile contents into a vc4_compiled_shader. I was about to make the situation worse with indirect uniform buffer access.
* vc4: Add support for two-sided color.Eric Anholt2014-10-081-1/+6
| | | | | | | | | | It's fairly easy, thanks to Rob Clark's lowering code. Fixes two-sided-lighting and 4 vertex-program-two-side testcases, while regressing 8 testcases that involve enabling two-sided color while only initializing one of the two colors in the VS. If you're enabling two sided color, it's of course expected that you really do set up both colors, so this is still an improvement (and when we set up a linker for TGSI, we'll hopefully fix those 8 fails).
* vc4: Add the necessary stubs for occlusion queries.Eric Anholt2014-09-291-1/+2
| | | | | | We have to expose them for GL 2.0, but we just always return a value of 0. We should be advertising 0 query bits instead of 64, but gallium doesn't have plumbing for that yet. At least this stops the segfaults.
* vc4: Actually add support for polygon offset.Eric Anholt2014-09-241-0/+11
| | | | | Setting the bit without setting the offset values is kind of useless. Fixes piglit polygon-offset (but not polygon-mode-offset).
* vc4: Add support for flat shading.Eric Anholt2014-09-231-0/+5
| | | | | | | This is just the GL 1.1 flat shading of colors -- we don't need to support TGSI constant interpolation bits, because we don't do GLSL 1.30. Fixes 7 piglit tests.