summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4/vc4_state.c
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Implement job shufflingEric Anholt2016-09-141-35/+1
| | | | | | | | | | | | | | | Track rendering to each FBO independently and flush rendering only when necessary. This lets us avoid the overhead of storing and loading the frame when an application momentarily switches to rendering to some other texture in order to continue rendering the main scene. Improves glmark -b desktop:effect=shadow:windows=4 by 27% Improves glmark -b desktop:blur-radius=5:effect=blur:passes=1:separable=true:windows=4 by 17% While I haven't tested other apps, this should help X rendering a lot, and I've heard GLBenchmark needed it too.
* vc4: Move the render job state into a separate structure.Eric Anholt2016-09-141-12/+13
| | | | | This is a preparation step for having multiple jobs being queued up at the same time.
* gallium: Use enum pipe_shader_type in set_sampler_views()Kai Wasserbäch2016-08-291-2/+3
| | | | | Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Brian Paul <brianp@vmware.com>
* gallium: Use enum pipe_shader_type in bind_sampler_states() (v2)Kai Wasserbäch2016-08-291-1/+1
| | | | | | | | | | | v1 → v2: - Fixed indentation (noted by Brian Paul) - Removed second assert from nouveau's switch statements (suggested by Brian Paul) Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>
* vc4: Zero-initialize the hardware sampler view structure.Eric Anholt2016-07-311-1/+1
| | | | | Fixes failure to initialize the force_first_level flag, causing failures in piglit levelclamp.
* vc4: Speed up glGenerateMipmaps by avoiding shadow baselevel.Eric Anholt2016-07-151-2/+7
| | | | | | | | | | | | | To support general GL_TEXTURE_BASE_LEVEL we have to copy to a temporary miptree. However, if a single level is being selected, we can use the existing miptree and force all the sampling to be from that particular level. This avoids a ton of software fallbacks in glGenerateMipmaps(), which uses base levels in the blit implementation in gallium. Improves "glmark2 -b terrain" from 2 fps to 3 (perhaps some more precision would be useful?), and cuts its CPU usage during the benchmarking from ~30% to ~10% (total CPU time from 8.8s to 7.6s).
* vc4: Drop VC4_DIRTY_TEXSTATE in favor of the per-stage flags.Eric Anholt2016-07-151-4/+0
| | | | | | The compiler uses the per-stage flags already, so it didn't need this. vc4_uniforms was using it, so just replace it with both of the stage flags for now.
* vc4: Remove dead dirty_samplers field.Eric Anholt2016-07-151-4/+0
| | | | We use a big VC4_DIRTY_FRAGTEX/VC4_DIRTY_VERTEX on the stage, instead.
* gallium: make constant_buffer constRob Clark2016-06-201-1/+1
| | | | | Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* vc4: Don't consider nr_samples==1 surfaces to be MSAA.Eric Anholt2015-12-151-2/+2
| | | | | | This is apparently a weirdness of gallium -- nr_samples==1 is occasionally used and means the same thing as nr_samples==0. Fixes a bunch of ARB_framebuffer_srgb blit cases in piglit.
* vc4: Add support for drawing in MSAA.Eric Anholt2015-12-081-0/+19
|
* vc4: Add support for loading sample mask.Eric Anholt2015-12-041-1/+1
|
* vc4: Avoid loading undefined (newly-allocated) FBO contents.Eric Anholt2015-11-091-0/+17
| | | | | | | Since X has undefined contents in new pixmaps, it will allocate new textures for an FBO and draw to them without an explicit clear. For VC4, it's much faster to emit a clear than the load of the actual undefined memory contents, so just do that instead.
* vc4: Return NULL when we can't make our shadow for a sampler view.Eric Anholt2015-11-091-0/+4
| | | | | | | I'm not sure what the caller does is appropriate (just have a NULL sampler at this slot), but it fixes the immediate crash. Cc: "11.0" <mesa-stable@lists.freedesktop.org>
* vc4: Allow user index buffers, to avoid slow readback for shadow IBs.Eric Anholt2015-10-291-1/+1
| | | | | Improves low-settings openarena performance by 31.9975% +/- 0.659931% (n=7).
* vc4: Convert blending to being done in 4x8 unorm normally.Eric Anholt2015-10-231-1/+3
| | | | | | | | | | | | | We can't do this all the time, because you want blending to be done in linear space, and sRGB would lose too much precision being done in 4x8. The win on instructions is pretty huge when you can, though. total uniforms in shared programs: 32065 -> 32168 (0.32%) uniforms in affected programs: 327 -> 430 (31.50%) total instructions in shared programs: 92644 -> 89830 (-3.04%) instructions in affected programs: 15580 -> 12766 (-18.06%) Improves openarena performance at 1920x1080 from 10.7fps to 11.2fps.
* vc4: Cache the texture p1 for the sampler.Eric Anholt2015-07-141-1/+54
| | | | | Cuts another 12% of vc4_uniforms.o, in exchange for computing it at CSO creation time.
* vc4: Cache texture p0/p1 setup for the sampler view.Eric Anholt2015-07-141-11/+24
| | | | | In exchange for a bit of space and computation in CSO setup, we cut vc4_uniform.c (draw time) code size by 4.8%.
* vc4: Fix some -Wdouble-promotion warnings.Eric Anholt2015-07-141-1/+1
| | | | | No code generation changes from this, but it'll be useful to have this next time I go checking -Wdouble-promotion.
* vc4: Just stream out fallback IB contents.Eric Anholt2015-05-271-18/+2
| | | | | | | | | | | | | | | The idea I had when I wrote the original shadow code was that you'd see a set_index_buffer to the IB, then a bunch of draws out of it. What's actually happening in openarena is that set_index_buffer occurs at every draw, so we end up making a new shadow BO every time, and converting more of the BO than is actually used in the draw. While I could maybe come up with a better caching scheme, for now just do the simple thing that doesn't result in a new shadow IB allocation per draw. Improves performance of isosurf in drawelements mode by 58.7967% +/- 3.86152% (n=8).
* vc4: Don't forget to make our raster shadow textures non-raster.Eric Anholt2015-05-271-0/+3
| | | | | | Not sure what happened in my testing that made the previous shadow code fix glxgears swapbuffering, but this also fixes lots of CopyArea in X (like dragging xlogo around in metacity).
* vc4: Update the shadow texture for public textures on every draw.Eric Anholt2015-04-151-6/+1
| | | | | We don't know who else has written to it, so we'd better update it every time. This makes the gears spin in X again.
* vc4: When asked to sample from a raster texture, make a shadow tiled copy.Eric Anholt2015-04-131-2/+9
| | | | | | | | | | | So, it turns out my simulator doesn't *quite* match the hardware. And the errata about raster textures tells you most of what's wrong, but there's still stuff wrong after that. Instead, if we're asked to sample from raster, we'll just blit it to a tiled temporary. Raster textures should only be screen scanout, and word is that it's faster to copy to tiled using the tiling engine first than to texture from an entire raster texture, anyway.
* vc4: Add support for enabling early Z discards.Eric Anholt2014-12-161-0/+18
| | | | This is the same basic logic from the original Broadcom driver.
* vc4: Don't throw out the index offset in the shadow index buffer path.Eric Anholt2014-12-111-2/+1
| | | | | When we upload shadow indices at draw time, we need the source offset. Fixes the piglit draw-elements test.
* vc4: Fix stencil writemask handling.Eric Anholt2014-10-211-2/+2
| | | | | | | If the writemask doesn't compress, then we want to put in the uncompressed writemask, not the compressed writemask failure value (all-on). Fixes glean's stencil2 and fbo-clear-formats on stencil.
* vc4: Don't look at back stencil state unless two-sided stencil is enabled.Eric Anholt2014-10-211-2/+6
| | | | | Fixes regressions in the next bugfix, because gallium util stuff leaves the back stencil state as 0 if !back->enabled.
* vc4: Translate 4-byte index buffers to 2 bytes.Eric Anholt2014-10-191-4/+21
| | | | Fixes assertion failures in 14 piglit tests (half of which now pass).
* vc4: Add support for rebasing texture levels so firstlevel == 0.Eric Anholt2014-10-191-1/+25
| | | | | | GLES2 doesn't have GL_TEXTURE_BASE_LEVEL, so the hardware doesn't. Fixes piglit levelclamp, tex-miplevel-selection, and texture-storage/2D mipmap rendering.
* vc4: Add support for user clip plane and gl_ClipVertex.Eric Anholt2014-10-151-1/+3
| | | | Fixes about 15 piglit tests about interpolation and clipping.
* vc4: Fix render target NPOT alignment at small miplevels.Eric Anholt2014-10-141-3/+12
| | | | | | | | The texturing hardware takes the POT level 0 width/height and minifies those. This is different from what we were doing, for example, for 273-wide's level 5: POT(273>>5) == 8, while POT(273)>>5 == 16. Fixes piglit-depthstencil-render-miplevels 273.
* vc4: Mostly fix offset calculation for NPOT mipmap levels.Eric Anholt2014-10-091-1/+12
| | | | | | | | | | | | | | The non-base NPOT levels are stored as POT-aligned images. We get that POT alignment by minifying the POT-aligned base level. This means that level strides are also POT aligned, so we have to tell the rendering mode config that our resource is larger than the actual requested area. Fixes the fbo-generatemipmap-formats NPOT cases. Regresses depthstencil-render-miplevels 273 * -- the texture presentation now works (where it was completely broken before), it looks like there's some overflow of image bounds happening at the lower miplevels.
* vc4: Add support for point size setting.Eric Anholt2014-09-241-2/+4
| | | | This is the support for both the global and per-vertex modes.
* vc4: Actually add support for polygon offset.Eric Anholt2014-09-241-1/+11
| | | | | Setting the bit without setting the offset values is kind of useless. Fixes piglit polygon-offset (but not polygon-mode-offset).
* vc4: Add support for flat shading.Eric Anholt2014-09-231-0/+7
| | | | | | | This is just the GL 1.1 flat shading of colors -- we don't need to support TGSI constant interpolation bits, because we don't do GLSL 1.30. Fixes 7 piglit tests.
* vc4: Add support for stencil operations.Eric Anholt2014-09-181-0/+71
| | | | | | | While depth test state is passed through the fragment shader as sideband, data, the stencil test state has to be set by the fragment shader itself. Many tests are still failing, but this gets most of hiz/ passing.
* vc4: Include stdio/stdlib in headers so I don't have to include it per file.Eric Anholt2014-08-221-2/+0
| | | | | There are a few tools I want to have always available, and fprintf() and abort() are among them.
* vc4: Flip which primitives are considered front-facing.Eric Anholt2014-08-111-1/+1
| | | | This mostly fixes glxgears rendering.
* vc4: Add support for depth clears and tests within a tile.Eric Anholt2014-08-111-3/+20
| | | | | | | | | This doesn't load/store the Z contents across submits yet. It also disables early Z, since it's going to require tracking of Z functions across multiple state updates to track the early Z direction and whether it can be used. v2: Move the key setup to before the search for the key.
* vc4: Switch to actually generating vertex and fragment shader code from TGSI.Eric Anholt2014-08-081-2/+1
| | | | | | | | | | | | | | | | | | This introduces an IR (QIR, for QPU IR) to do optimization on. It's a scalar, SSA IR in general. It looks like optimization is pretty easy this way, though I haven't figured out if it's going to be good for our weird register allocation or not (or if I want to reduce to basically QPU instructions first), and I've got some problems with it having some multi-QPU-instruction opcodes (SEQ and CMP, for example) which I probably want to break down. Of course, this commit mostly doesn't work, since many other things are still hardwired, like the VBO data. v2: Rewrite to use a bunch of helpers (qir_OPCODE) for emitting QIR instructions into temporary values, and make qir_inst4 take the 4 args separately instead of an array (all later callers wanted individual args).
* vc4: Initial skeleton driver import.Eric Anholt2014-08-081-0/+444
This mostly just takes every draw call and turns it into a sequence of commands that clear the FBO and draw a single shaded triangle to it, regardless of the actual input vertices or shaders. I copied the initial driver skeleton mostly from freedreno, and I've preserved Rob Clark's copyright for those. I also based my initial hardcoded shaders and command lists on Scott Mansell (phire)'s "hackdriver" project, though the bit patterns of the shaders emitted end up being different. v2: Rebase on gallium megadrivers changes. v3: Rebase on PIPE_SHADER_CAP_MAX_CONSTS change. v4: Rely on simpenrose actually being installed when building for simulation. v5: Add more header duplicate-include guards. v6: Apply Emil's review (protection against vc4 sim and ilo at the same time, and dropping the dricommon drm bits) and fix a copyright header (thanks, Roland)