summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4/vc4_program.c
Commit message (Collapse)AuthorAgeFilesLines
* vc4: In a loop break/continue, jump if everyone has taken the path.Eric Anholt2016-12-141-10/+17
| | | | | | | | | | | | | | | | | | This should be a win for most loops, which tend to have uniform control flow. More importantly, it exposes important information to live variables: that the break/continue here means that our jump target may have access to values that were live on our input. Previously, we were just setting the exec mask and letting control flow fall through, so an intervening def between the break and the end of the loop would appear to live variables as if it screened off the variable, when it didn't actually. Fixes a regression in glsl-vs-loop-redundant-condition.shader_test when a perturbing of register allocation caused a live variable to get stomped. Cc: 13.0 <mesa-stable@lists.freedesktop.org> (cherry picked from commit 8e5ec33f1151dd82402bdfdaa4fff7c284e49a1c)
* vc4: Clamp the shadow comparison value.Eric Anholt2016-11-231-0/+9
| | | | | | | Fixes piglit glsl-fs-shadow2D-clamp-z. Cc: <mesa-stable@lists.freedesktop.org> (cherry picked from commit 08d51487e3b8cfb14ca2ece9545b2e2ed344e3cc)
* vc4: Don't abort when a shader compile fails.Eric Anholt2016-11-231-4/+14
| | | | | | | | | | It's much better to just skip the draw call entirely. Getting this information out of register allocation will also be useful for implementing threaded fragment shaders, which will need to retry non-threaded if RA fails. Cc: <mesa-stable@lists.freedesktop.org> (cherry picked from commit 4d019bd703e7c20d56d5b858577607115b4926a3)
* vc4: Use Newton-Raphson on the 1/W write to fix glmark2 terrain.Eric Anholt2016-11-091-1/+1
| | | | | | | | The 1/W was apparently not accurate enough, and we were getting sparklies in the distance. The closed driver also did a N-R step here. Cc: <mesa-stable@lists.freedesktop.org> (cherry picked from commit 283d4d18e598793bbff7d9ba5a601bced9b36542)
* vc4: Fix live intervals analysis for screening defs in if statements.Eric Anholt2016-10-061-1/+4
| | | | | | | | | If a conditional assignment is only conditioned on the exec mask, that's still screening off the value in the executed channels (and, since we're not storing to the unexcuted channels, we don't care what's in there). Fixes a bunch of extra register pressure on Processing's Ribbons demo, which is failing to allocate.
* vc4: Fix assertion fails from trying to cast non-ALU instrs to ALU.Eric Anholt2016-10-061-0/+2
| | | | | Fixes 100 piglit tests since the assertions were added to nir.h. What's amazing is that these tests used to pass, even when casting garbage.
* nir: Make nir_foo_first/last_cf_node return a block insteadJason Ekstrand2016-10-061-4/+2
| | | | | | | | | | One of NIR's invariants is that control flow lists always start and end with blocks. There's no good reason why we should return a cf_node from these functions since we know that it's always a block. Making it a block lets us remove a bunch of code. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
* nir: Allow opt_peephole_sel to be more aggressive in flattening IFs.Eric Anholt2016-09-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VC4 was running into a major performance regression from enabling control flow in the glmark2 conditionals test, because of short if statements containing an ffract. This pass seems like it was was trying to ensure that we only flattened IFs that should be entirely a win by guaranteeing that there would be fewer bcsels than there were MOVs otherwise. However, if the number of ALU ops is small, we can avoid the overhead of branching (which itself costs cycles) and still get a win, even if it means moving real instructions out of the THEN/ELSE blocks. For now, just turn on aggressive flattening on vc4. i965 will need some tuning to avoid regressions. It does looks like this may be useful to replace freedreno code. Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47 fps to 95 fps on vc4. vc4 shader-db: total instructions in shared programs: 101282 -> 99543 (-1.72%) instructions in affected programs: 17365 -> 15626 (-10.01%) total uniforms in shared programs: 31295 -> 31172 (-0.39%) uniforms in affected programs: 3580 -> 3457 (-3.44%) total estimated cycles in shared programs: 225182 -> 223746 (-0.64%) estimated cycles in affected programs: 26085 -> 24649 (-5.51%) v2: Update shader-db output. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
* nir: Report progress from nir_lower_phis_to_scalar.Kenneth Graunke2016-09-141-2/+1
| | | | | Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* nir: Report progress from nir_lower_alu_to_scalar.Kenneth Graunke2016-09-141-1/+1
| | | | | Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* vc4: Move the render job state into a separate structure.Eric Anholt2016-09-141-1/+2
| | | | | This is a preparation step for having multiple jobs being queued up at the same time.
* vc4: Handle discards while in control flow.Eric Anholt2016-08-291-6/+27
| | | | | I missed this while adding loop support because the discard test inside a loop was crashing before, anyway. Fixes piglit glsl-fs-discard-04.
* vc4: Add support for fddx/fddyEric Anholt2016-08-251-0/+52
| | | | Based vaguely on a patch by jonasarrow on github.
* vc4: Tell state_tracker that we would prefer NIR.Eric Anholt2016-08-221-7/+25
| | | | | | | | | | Before this series, the code generation path was: GLSL IR -> TGSI -> NIR -> NIR clone -> QIR -> QPU Now it's (generally) GLSL IR -> NIR -> NIR clone -> QIR -> QPU
* vc4: Use proper type sizes for uniforms.Eric Anholt2016-08-221-4/+5
|
* vc4: Add VARYING_SLOT_PNTC support.Eric Anholt2016-08-221-4/+5
| | | | We end up with this when doing GLSL-to-NIR.
* nir: Define system values for vc4's blending-lowering arguments.Eric Anholt2016-08-221-25/+31
| | | | | | | | | | | | | In the GLSL-to-NIR conversion of VC4, I had a bit of trouble with what I was calling the "state uniforms" that I was putting into the NIR fighting with its other lowering passes. Instead of using magic uniform base numbers in the backend, follow the lead of load_user_clip_plane and just define system values for them. v2: Fix unintended change to channel_num, drop unspecified const_index value on blend_const_color_r_float. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* vc4: Switch store_output to using nir_lower_io_to_scalar / component.Eric Anholt2016-08-191-5/+13
|
* vc4: Use the intrinsic's first_component for vattr VPM index.Eric Anholt2016-08-191-5/+1
| | | | Avoids another multiplication by 4 of the base in the NIR.
* vc4: Convert to using nir_lower_io_scalar for FS inputs.Eric Anholt2016-08-191-3/+17
| | | | | The scalarizing of FS inputs can be done in a non-driver-dependent manner, so extract it out of the driver.
* vc4: Switch to using the intrinsic accessors.Eric Anholt2016-08-191-9/+10
| | | | | The const_index[] values have always felt magic, and this documents them a bit better.
* ttn: Make FRAG_RESULT_DEPTH be a float variable to match gtn and ptn.Eric Anholt2016-08-191-1/+1
| | | | | | | This lets TTN-using drivers handle FRAG_RESULT_DEPTH the same between all their source paths. Reviewed-by: Rob Clark <robdclark@gmail.com>
* vc4: Dump the TGSI before trying to convert it to NIR.Eric Anholt2016-08-191-4/+3
| | | | In the case of debugging a crash in TTN, this is nice to have.
* vc4: Move scalarizing and some lowering to link time.Eric Anholt2016-08-041-5/+12
| | | | | | | This works out to be a wash in terms of memory usage: We use more memory to store the separate ALU instructions, but we optimize out a lot of code as well. The main result, though, is that we do more of our work at link time rather than draw time.
* vc4: Avoid VS shader recompiles by keeping a set of FS inputs seen so far.Eric Anholt2016-08-041-12/+66
| | | | | | | | | | | | We don't want to bake the whole array into the FS key, because of the hashing overhead. But we can keep a set of the arrays seen, and use a pointer to the copy in as the array's proxy. Between this and the previous patch, gl-1.0-blend-func now passes on hardware, where previously it was filling the 256MB CMA area with shaders and OOMing. Drops 712 shaders from shader-db.
* vc4: Don't recompile the CS when the FS changes.Eric Anholt2016-08-041-0/+2
| | | | | | | The compiled_fs_id is a proxy for the vc4->prog.fs->input_slots[], but only the VS dereferences it. Drops 754 shaders from shader-db.
* vc4: Move FS inputs setup out to a helper function.Eric Anholt2016-08-041-34/+41
| | | | It's a pretty big block, and I was about to make it bigger.
* vc4: Avoid generating a custom shader per level in glGenerateMipmaps().Eric Anholt2016-08-031-6/+4
| | | | | | | | | | We were baking in the LOD of the source level to each shader. Instead, pass it in as a uniform -- this requires storing it to a temp register, but that's better than compiling a ton of separate shaders: total instructions in shared programs: 115032 -> 115036 (0.00%) instructions in affected programs: 96 -> 100 (4.17%) LOST: 572
* vc4: Dump NIR at shader state creation time as well.Eric Anholt2016-08-031-0/+8
| | | | I keep wanting to see this version of the NIR.
* vc4: Disable early Z with computed depth.Eric Anholt2016-07-261-0/+5
| | | | | We don't tell the hardware whether we're computing depth, so we need to manage early Z state manually. Fixes piglit early-z.
* vc4: Speed up glGenerateMipmaps by avoiding shadow baselevel.Eric Anholt2016-07-151-0/+11
| | | | | | | | | | | | | To support general GL_TEXTURE_BASE_LEVEL we have to copy to a temporary miptree. However, if a single level is being selected, we can use the existing miptree and force all the sampling to be from that particular level. This avoids a ton of software fallbacks in glGenerateMipmaps(), which uses base levels in the blit implementation in gallium. Improves "glmark2 -b terrain" from 2 fps to 3 (perhaps some more precision would be useful?), and cuts its CPU usage during the benchmarking from ~30% to ~10% (total CPU time from 8.8s to 7.6s).
* vc4: Drop VC4_DIRTY_TEXSTATE in favor of the per-stage flags.Eric Anholt2016-07-151-2/+0
| | | | | | The compiler uses the per-stage flags already, so it didn't need this. vc4_uniforms was using it, so just replace it with both of the stage flags for now.
* vc4: Emit resets of the uniform stream at the starts of blocks.Eric Anholt2016-07-131-0/+1
| | | | | | | | If a block might be entered from multiple locations, then the uniform stream will (probably) be at different points, and we need to make sure that it's pointing where we expect it to be. The kernel also enforces that any block reading a uniform resets uniforms, to prevent reading outside of the uniform stream by using looping.
* vc4: Add support for NIR loops and break/continue.Eric Anholt2016-07-121-3/+77
|
* vc4: Add support for emitting NIR IF nodes.Eric Anholt2016-07-121-1/+91
|
* vc4: Add support for storing to NIR registers in a non-SSA fashion.Eric Anholt2016-07-121-85/+132
| | | | | | | Previously, there were occasionally NIR registers in our programs, but they were always actually used SSA-only. Now that we're trying to support control flow, we need to actually conditionally move to registers based on whether channels are active or not.
* vc4: Add a flag in the screen to track control flow support.Eric Anholt2016-07-121-0/+1
| | | | | For now it's still always false, but I need it in place for kernel backwards compat support as I extend the backend for control flow.
* vc4: Add a "qir_for_each_inst_inorder" macro and use it in many places.Eric Anholt2016-07-121-1/+1
| | | | | | | | We have the prior list_foreach() all over the code, but I need to move where instructions live as part of adding support for control flow. Start by just converting to a helper iterator macro. (The simpler "qir_for_each_inst()" will be used for the for-each-inst-in-a-block iterator macro later)
* vc4: Also enable phi elimination.Eric Anholt2016-07-121-0/+1
| | | | | | | | This avoids a bunch of code gen regressions when enabling loops in vc4. Prior to that, the GLSL that would have generated these optimizable phi nodes was being lowered to csels between either (undef, a) or (a, a), and those were being dealt with by nir_opt_undef and nir_opt_algebraic.
* vc4: Enable dead CF elimination.Eric Anholt2016-07-041-0/+1
| | | | | | Now that we're about to start generating control flow in our NIR, we want this in place. It optimizes things frequently in the CS, when the GL VS has control flow that doesn't affect the vertex position.
* vc4: Add support for vertex color clamping in the rasterizer.Eric Anholt2016-05-171-0/+4
| | | | | This gets us precompile of vertex shaders at the state tracker level as well.
* vc4: Move tgsi_to_nir to precompile time.Eric Anholt2016-05-171-12/+15
| | | | | Now we have an immutable nir shader in our shader's CSO that we can clone and lower/optimize.
* vc4: Enable sharing shaders across contexts.Eric Anholt2016-05-171-1/+2
| | | | | | This allows the same pipe_shader_state to be referenced from multiple contexts. Since our pipe_shader_state is treated as immutable (other than the variant number) within the driver, this is no problem.
* vc4: Switch to using nir_load_front_face.Eric Anholt2016-05-171-4/+9
| | | | | | | | This will be generated by glsl_to_nir, and it turns out that this is a more code-efficient path than the floating point math, anyway. No change on shader-db, but drops an instruction in piglit's glsl-fs-frontfacing.
* vc4: fixup for new nir_foreach_block()Connor Abbott2016-05-051-11/+4
| | | | Reviewed-by: Eric Anholt <eric@anholt.net>
* vc4: Use NIR lowering for sRGB decode.Eric Anholt2016-05-021-35/+3
| | | | | This should get us the same decode code generated, but with a lot less custom code in the driver.
* vc4: Just use NIR lowering for texture projection.Eric Anholt2016-05-021-15/+3
| | | | | This means doing Newton-Raphson on the RCP, but it's probably actually a good thing to be accurate on.
* vc4: Scalarize phi nodes as well.Eric Anholt2016-05-021-0/+1
| | | | | This makes fewer programs with loops assertion fail, replacing them with the rendering failure warning.
* vc4: Add whitespace after each program stage dump.Eric Anholt2016-05-021-0/+2
| | | | | In particular it's been hard to find the point where we switch from dumping pre-optimization QIR and post-optimization QIR.
* vc4: Use the NIR cubemap normalization instead of our own.Eric Anholt2016-05-021-6/+1
| | | | | | | This is one of two uses of the current QIR CSE pass according to shader-db. The NIR pass means that we'll end up doing Newton-Raphson on our RCP, which we weren't doing before, but that's probably actually a good thing.