summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/vc4
Commit message (Collapse)AuthorAgeFilesLines
...
* vc4: Add kernel support for branching in shader validation.Eric Anholt2016-07-123-17/+280
| | | | | | | | | | | | | | | | | | | | | We're already checking that branch instructions are within the contents of the shader and the proper PROG_END sequence is present. The other thing we need in the presence of branching is to verify that the shader doesn't overflow past the end of the uniforms stream. To do that, we require that at the start of any basic block reading uniforms have the following instructions: load_imm temp, <offset within uniform stream> add unif_addr, temp, unif The instructions are generated by userspace, and the kernel verifies that the load_imm is of the expected offset, and that the add adds it to a uniform. We track which uniform in the stream that is, and at draw call time fix up the uniform stream to have the address of the start of the shader's uniforms for that draw call. Signed-off-by: Eric Anholt <eric@anholt.net>
* vc4: Add a bitmap of branch targets in kernel validation.Eric Anholt2016-07-123-2/+133
| | | | | | This isn't used yet, it's just a first step toward loop validation. During the main parsing of instructions, we need to know when we hit a new basic block so that we can reset validated state.
* vc4: Track the current instruction into the validation_state.Eric Anholt2016-07-121-24/+30
| | | | | This reduces how much we need to pass around as arguments, which was becoming more of a problem with looping validation.
* vc4: Add QPU support for generating BRANCH instructions.Eric Anholt2016-07-125-1/+85
|
* vc4: Print live variable start/ends during QIR dumping.Eric Anholt2016-07-121-0/+45
| | | | | This only happens when live variables are set up, which is not in the normal dump, but is set up when we've failed to register allocate.
* vc4: Implement live intervals using a CFG.Eric Anholt2016-07-126-39/+393
| | | | | Right now our CFG is always a trivial single basic block, but that will change when enable loops.
* vc4: Make vc4_qir_schedule handle each block in the program.Eric Anholt2016-07-121-14/+23
| | | | | | | | Basically we just treat each block independently. The only inter-block scheduling I can think of that would be be interesting would be to move texture result collection to after a short loop/if block that doesn't do texturing. However, the kernel disallows that as part of its security validation.
* vc4: Convert uniforms lowering to work with multiple blocks.Eric Anholt2016-07-121-29/+44
| | | | | | | | | We still decide which uniform to lower based on how many instructions-that-need-lowering use that uniform, but now we emit a new temporary uniform load in each of the basic blocks containing an instruction being lowered. This commit is best reviewed with diff -b.
* vc4: Convert vc4_opt_peephole_sf to work with control flow.Eric Anholt2016-07-121-4/+18
| | | | | | We need to apply the peephole pass to each of the blocks in the program. We don't do dataflow analysis for SF across blocks, but we also don't generate code that would need us to do so.
* vc4: Create a basic block structure and move the instructions into it.Eric Anholt2016-07-126-20/+122
| | | | | | | The optimization passes and scheduling aren't actually ready for multiple blocks with control flow yet (as seen by the "cur_block" references in them instead of iterating over blocks), but this creates the structures necessary for converting them.
* vc4: Add a "qir_for_each_inst_inorder" macro and use it in many places.Eric Anholt2016-07-1212-14/+17
| | | | | | | | We have the prior list_foreach() all over the code, but I need to move where instructions live as part of adding support for control flow. Start by just converting to a helper iterator macro. (The simpler "qir_for_each_inst()" will be used for the for-each-inst-in-a-block iterator macro later)
* vc4: Also enable phi elimination.Eric Anholt2016-07-121-0/+1
| | | | | | | | This avoids a bunch of code gen regressions when enabling loops in vc4. Prior to that, the GLSL that would have generated these optimizable phi nodes was being lowered to csels between either (undef, a) or (a, a), and those were being dealt with by nir_opt_undef and nir_opt_algebraic.
* vc4: fix memory leakEric Engestrom2016-07-121-1/+1
| | | | | | | | The allocation has succeeded by that point, so it needs to be freed. CovID: 1358929 Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Eric Anholt <eric@anholt.net>
* vc4: Close our screen's fd on screen close.Eric Anholt2016-07-121-0/+3
| | | | | We're passed in a freshly dup()ed fd on screen create, so we should close it on exit. Debugged by Hugh Cole-Baker.
* vc4: Regularize instruction emit macrosEric Anholt2016-07-042-39/+50
| | | | | | ALU0 didn't have the _dest variant, and ALU2 didn't unset the def the way ALU1 did. This should make the ALU[012] macros much clearer, by moving most of their contents to vc4_qir.c
* vc4: Enable dead CF elimination.Eric Anholt2016-07-041-0/+1
| | | | | | Now that we're about to start generating control flow in our NIR, we want this in place. It optimizes things frequently in the CS, when the GL VS has control flow that doesn't affect the vertex position.
* vc4: Optimize out redundant SF updates.Eric Anholt2016-07-042-6/+78
| | | | | | | | | | | Tiny change on shader-db currently, but it will be important when we start emitting a lot of SFs from the same variable as part of control flow support. total instructions in shared programs: 89463 -> 89430 (-0.04%) instructions in affected programs: 1522 -> 1489 (-2.17%) total estimated cycles in shared programs: 250060 -> 250015 (-0.02%) estimated cycles in affected programs: 8568 -> 8523 (-0.53%)
* vc4: Move SF removal to a separate peephole pass.Eric Anholt2016-07-045-17/+85
| | | | | | | | | The DCE pass is going to change significantly to handle control flow, while we don't really need to change it for the SF handling. We also need to add some more SF peephole optimization for SF updates generated by control flow support. No change on shader-db.
* vc4: DCE instructions with a NULL destination.Eric Anholt2016-07-041-2/+3
| | | | | | | | I'm going to add an optimization for redundant SF update removal, which will just remove the SF and leave us (in many cases) with an instruction with a NULL destination and no side effects. Rather than teaching that pass whether the whole instruction can be removed, leave that responsibility to this pass.
* vc4: Mark texturing setup instructions as having side effects.Eric Anholt2016-07-041-5/+5
| | | | | | | We need to not DCE them even though they don't have a destination in QIR. We also shouldn't relocate them in vc4_opt_vpm. Neither of these things happen, but I'm about to make DCE consider instructions with a NULL destination.
* vc4: Fix a pasteo in scheduling condition flag usage.Eric Anholt2016-07-041-1/+1
| | | | | | | Noticed by code inspection. This hasn't been too big of a deal, because our cond usages all start out as adder ops, either MOVs or the FTOI for Z writes. MOVs *can* get converted to mul ops during scheduling, but apparently we hadn't hit this.
* vc4: Drop the dead QIR_PACK() macro.Eric Anholt2016-07-041-8/+0
| | | | | This isn't used since we switched to using the dst.pack field instead of custom instructions.
* gallium: Add a cap for offset_units_unscaledAxel Davy2016-06-251-0/+1
| | | | | | | | | | | | | | D3D9 has a different behaviour for depth bias. For OGL/D3D1X, the depth bias unit is the minimal resolvable value for the depth buffer, which depends on the format (and has different behaviour for float depth buffers). For D3D9, the depth bias unit is 1.0f. Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* Remove wrongly repeated words in commentsGiuseppe Bilotta2016-06-232-2/+2
| | | | | | | | | | | | | | | | | Clean up misrepetitions ('if if', 'the the' etc) found throughout the comments. This has been done manually, after grepping case-insensitively for duplicate if, is, the, then, do, for, an, plus a few other typos corrected in fly-by v2: * proper commit message and non-joke title; * replace two 'as is' followed by 'is' to 'as-is'. v3: * 'a integer' => 'an integer' and similar (originally spotted by Jason Ekstrand, I fixed a few other similar ones while at it) Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Reviewed-by: Chad Versace <chad.versace@intel.com>
* gallium: make constant_buffer constRob Clark2016-06-201-1/+1
| | | | | Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallium: add PIPE_CAP_MAX_WINDOW_RECTANGLES to all driversIlia Mirkin2016-06-181-0/+1
| | | | | | | | This says how many window rectangles are supported by the implementation, although it may not exceed PIPE_MAX_WINDOW_RECTANGLES. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Brian Paul <brianp@vmware.com>
* vc4: fix vc4_resource_from_handle() stride calculationRob Herring2016-06-151-2/+2
| | | | | | | | | | | | | | The expected stride calculation is completely wrong. It should ultimately be multiplying cpp and width rather than dividing. The width also needs to be aligned to the tiling width first before converting to stride bytes. The whole stride check here is possibly pointless. Any buffers which were allocated outside of vc4 may have strides with larger alignment requirements. Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* Android: move libdrm settings to top-level Android.common.mkRob Herring2016-06-131-1/+0
| | | | | | | | | | | | | | Fix warnings like these due to HAVE_LIBDRM being inconsistently defined: external/libdrm/include/drm/drm.h:839:30: warning: redefinition of typedef 'drm_clip_rect_t' is a C11 feature [-Wtypedef-redefinition] typedef struct drm_clip_rect drm_clip_rect_t; HAVE_LIBDRM needs to be set project wide to fix this. This change also harmlessly links libdrm with everything, but simplifies the makefiles a bit. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Emil Velikov <emil.velikov@collabora.com>
* gallium: add PIPE_CAP_TGSI_VOTE for when the VOTE ops are allowedIlia Mirkin2016-06-061-0/+1
| | | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Dave Airlie <airlied@redhat.com>
* vc4: Fix compiler warnings in fail_instr path of QIR validate passRhys Kidd2016-05-311-10/+10
| | | | | | | Introduced in 8e2d0843c02daf5280184f179ae8ed440ac90d7f. Signed-off-by: Rhys Kidd <rhyskidd@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>
* vc4: Fix doxygen warningsRhys Kidd2016-05-302-6/+6
| | | | | | | | Now that vc4 automated code documentation can be generated with doxygen, fix the warnings issued by Doxygen 1.8.11. Signed-off-by: Rhys Kidd <rhyskidd@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
* gallium: push offset down to driverStanimir Varbanov2016-05-301-0/+7
| | | | | | | | | | | | | Push offset down to drivers when importing dmabuf. This is needed to more fully support EGL_EXT_image_dma_buf_import when a non-zero offset is specified. Tesing has been done for freedreno, and compile tested following gallium drivers: nouveau,svga,virgl,r600,r300,radeonsi,swrast,i915,ilo Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
* gallium: Add a pipe cap for whether primitive restart works for patches.Kenneth Graunke2016-05-231-0/+1
| | | | | | | | | | | | | | | Some hardware supports primitive restart on patch primitives, and other hardware does not. Modern GL and ES include a query for this feature; adding a capability bit will allow us to answer it. As far as I know, AMD hardware does not support this feature, while NVIDIA and Intel hardware does. However, most Gallium drivers do not appear to support tessellation shaders yet. So, I've enabled it for nvc0 and disabled it everywhere else. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* vc4: Size transfer temporary mappings appropriately for full maps of 3D.Eric Anholt2016-05-181-2/+2
| | | | | | | | | We don't really support reading/writing of 3D textures since the hardware doesn't do 3D, but we do need to make sure that a pipe_transfer for them has enough space to store the image. This was previously not a problem because the state tracker only mapped a slice at a time until fb9fe352ea41c7e3633ba2c483c59b73c529845b. Fixes glean glsl1 tests, which all have setup of a 3D texture at the start.
* vc4: Add support for vertex color clamping in the rasterizer.Eric Anholt2016-05-173-1/+6
| | | | | This gets us precompile of vertex shaders at the state tracker level as well.
* vc4: Move tgsi_to_nir to precompile time.Eric Anholt2016-05-171-12/+15
| | | | | Now we have an immutable nir shader in our shader's CSO that we can clone and lower/optimize.
* vc4: Mark the driver as supporting fragment color clamping in rast.Eric Anholt2016-05-171-1/+1
| | | | | | | We always clamp fragment colors, since they're always 8-bit unorm, so there's no need to have us compile separate shaders based on GL_ARB_color_buffer_float. This gives us precompilation of fragment programs to the vc4_shader_state_create() level.
* vc4: Enable sharing shaders across contexts.Eric Anholt2016-05-172-2/+3
| | | | | | This allows the same pipe_shader_state to be referenced from multiple contexts. Since our pipe_shader_state is treated as immutable (other than the variant number) within the driver, this is no problem.
* vc4: Switch to using nir_load_front_face.Eric Anholt2016-05-172-9/+18
| | | | | | | | This will be generated by glsl_to_nir, and it turns out that this is a more code-efficient path than the floating point math, anyway. No change on shader-db, but drops an instruction in piglit's glsl-fs-frontfacing.
* vc4: Drop the dead export_linkage array.Eric Anholt2016-05-171-5/+0
| | | | This came from deriving from freedreno.
* vc4: Fix a -Wformat-security warning.Eric Anholt2016-05-171-1/+1
| | | | | This is apparently enabled as an error in Android builds, and the compiler can't tell that the return value is safe.
* gallium: Add a pipe cap for arb_cull_distanceTobias Klausmann2016-05-141-0/+1
| | | | | | | | | This lets us safely enable or disable the extension as needed Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
* vc4: Add support for loading immediate values in QIR.Eric Anholt2016-05-064-0/+32
| | | | | | | This will be used for resetting the uniform stream in the presence of branching, but may also be useful as an optimization to reduce how many uniforms we have to copy out per draw call (in exchange for increasing icache pressure).
* vc4: Make vc4_qpu_validate() produce more verbose failures.Eric Anholt2016-05-061-35/+71
| | | | | | Seeing the expansion of a QPU_GET_FIELD in an assert isn't very informative, and it's hard find what's going wrong without getting a dump of the instruction that failed.
* vc4: Add a small QIR validate pass.Eric Anholt2016-05-064-0/+127
| | | | | This has caught a couple of bugs during loop development so far, and I should probably have written it long ago.
* vc4: Fix the src count on exp2/log2.Eric Anholt2016-05-061-2/+2
| | | | Found by the upcoming QIR validate pass.
* vc4: Reuse QPU disasm's cond flags in QIR.Eric Anholt2016-05-063-27/+46
| | | | In the process, this made me flatten out the "%s%s%s%s" fprintf arguments.
* vc4: When emitting an instruction to an existing temp, mark it non-SSA.Eric Anholt2016-05-061-0/+2
| | | | Prevents a bug in the later control-flow support series.
* vc4: Make sure that we don't overwrite the signal for PROG_END.Eric Anholt2016-05-061-0/+8
| | | | | | | | We should have already emitted a NOP due to the last instruction being a TLB or VPM write. However, if you disable dead code elimination then you might get dead code at the end, and that dead code might have the signal bits set to something non-default, at which point you die in assertion failure.
* vc4: fixup for new nir_foreach_block()Connor Abbott2016-05-054-48/+20
| | | | Reviewed-by: Eric Anholt <eric@anholt.net>