| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
No longer used, so drop the extra arg to ir3_instr_create()
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
| |
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Been on my TODO list for a while. If nothing else this will make gdb
properly grok the opc_t enum.
This first step preserves ir3_instruction::category (with an added
assert that category matches what is encoded in opc_t). Next step is
to drop the category field (and arg to ir3_instr_create()), but that
is split into next commit for bisectability and so that we can run
piglit in the intermediate state to flush out any problems.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It at least happens with some piglit tests, like
$piglit/bin/vp-address-01
VERT
DCL IN[0]
DCL IN[1]
DCL OUT[0], POSITION
DCL OUT[1], COLOR
DCL CONST[0..7]
DCL ADDR[0]
0: ARL ADDR[0].x, IN[1].xxxx
1: MOV_SAT OUT[1], CONST[ADDR[0].x-1]
2: DP4 OUT[0].x, CONST[4], IN[0]
3: DP4 OUT[0].y, CONST[5], IN[0]
4: DP4 OUT[0].z, CONST[6], IN[0]
5: DP4 OUT[0].w, CONST[7], IN[0]
6: END
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
| |
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
|
|
|
|
|
|
| |
For store instructions, the "dst" register is a read register, not a
written register. (Ie. it is the address to store to.) Lets not
confuse register allocation, scheduling, etc, with these details.
Instead just leave a dummy instr->regs[0], and take "dst" from
instr->regs[1] and srcs following.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
|
|
|
| |
Sync updated cat6 encoding from freedreno.git, needed to properly encode
store instructions.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
|
|
|
|
|
|
| |
cp would update instr->address but not update the indirects array
resulting in sched getting confused when it had to 'spill' the address
register. Add an ir3_instr_set_address() helper to set instr->address
and also update ir->indirects, and update all places that were writing
instr->address to use helper instead.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
|
|
|
|
|
| |
It is silly to traverse back to find first instruction that writes part
of a larger "virtual" register many times per instruction (plus per use
as a src to later instructions). Cache this information so we only
figure it out once.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
| |
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This shuffles things around to allow the shader to have multiple basic
blocks. We drop the entire CFG structure from nir and just preserve the
blocks. At scheduling we know whether to schedule conditional branches
or unconditional jumps at the end of the block based on the # of block
successors. (Dropping jumps to the following instruction, etc.)
One slight complication is that variables (load_var/store_var, ie.
arrays) are not in SSA form, so we have to figure out where to put the
phi's ourself. For this, we use the predecessor set information from
nir_block. (We could perhaps use NIR's dominance frontier information
to help with this?)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
|
|
|
| |
Without this, negative branch/jump offsets look like very large positive
offsets.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
|
|
|
|
|
| |
These belong in the shader, rather than the block. Mostly a lot of
churn and nothing too interesting. But splitting this out from the
rest of ir3_block reshuffling to cut down the noise in the later
patch.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
|
|
|
|
| |
Right now, just provides a cleaner way to get at the gpu-id, given the
separation between compiler and context. But we will need this also to
hold the reg-set for new register allocation.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
|
|
|
|
| |
Use a more standard priority-queue based scheduling algo. It is simpler
and will make things easier once we have multiple basic blocks and flow
control.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
|
|
|
|
|
|
| |
Use standard list_head double-linked list and related iterators,
helpers, etc, rather than weird combo of instruction array and next
pointers depending on stage. Now block has an instrs_list. In
certain stages where we want to remove and re-add to the blocks list
we just use list_replace() to copy the list to a new list_head.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
|
|
|
|
| |
It probably *should* be an assert, but for now TGSI f/e isn't very good
about dealing w/ CONST vs ABS/NEG. So for debug builds, print a warning
instead of crashing with an assert for now.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Even though in the end, they map to the same bits, the backend will need
to be able to differentiate float abs/neg vs integer abs/neg. Rather
than making the backend figure it out based on instruction opcode (which
when combined with mov/absneg instructions, can be awkward), just split
out different flags for each so the frontend can signal it's intentions
more clearly. Also, since (neg) for bitwise op's is actually a bitwise-
not, split it out into bnot flag.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
|
|
|
|
| |
Add an array_insert() macro to simplify inserting into dynamically sized
arrays, add a comment, and remove unused prototype inherited from the
original freedreno.git/fdre-a3xx test code, etc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
|
|
|
|
|
| |
For cat1 instructions, use reg() as well for relative src, to ensure
proper accounting of register usage. Also, for relative instructions,
use reg->size rather than reg->wrmask to determine the number of
components read/written.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
|
|
|
|
|
|
|
| |
We may not need this for later a4xx patchlevels, but we do at least need
this for patchlevel 0. Bypass bary.f for fetching varyings when flat
shading is needed (rather than configure via cmdstream). This requires
a special dummy bary.f w/ (ei) flag to signal to scheduler when all
varyings are consumed. And requires shader variants based on rasterizer
flatshade state to handle TGSI_INTERPOLATE_COLOR.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
|
|
|
| |
I think there is at least one more sub-encoding, but these two should be
enough to cover the common load/store instructions.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
|
|
|
|
|
| |
To use fanin's to group registers in an array, we can potentially have a
much larger array of registers. Rather than continuing to bump up the
array size, just make it dynamically allocated when the instruction is
created.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
|
|
|
|
| |
Very initial support. Basic stuff working (es2gears, es2tri, and maybe
about half of glmark2). Expect broken stuff. Still missing: mem->gmem
(restore), queries, mipmaps (blob segfaults!), hw binning, etc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
|
|
|
|
| |
It seems like the hardware is unhappy if we execute a kill instruction
prior to last input (ei). Probably the shader thread stops executing
and the end-input flag is never set.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
| |
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 32f2fd1c5d6088692551c80352b7d6fa35b0cd09, several calls to
_mesa_calloc(x) were replaced with calls to calloc(1, x). This is strictly
equivalent to what the code was doing previously.
But for cases where "x" involves multiplication, now that we are explicitly
using the two-argument calloc, we can do one step better and replace:
calloc(1, A * B);
with:
calloc(A, B);
The advantage of the latter is that calloc will detect any overflow that would
have resulted from the multiplication and will fail the allocation, (whereas
the former would return a small allocation). So this fix can change
potentially exploitable buffer overruns into segmentation faults.
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
Move the bits we want to share between generations from fd3_program to
ir3_shader. So overall structure is:
fdN_shader_stateobj -> ir3_shader -> ir3_shader_variant -> ir3
|- ...
\- ir3_shader_variant -> ir3
So the ir3_shader becomes the topmost generation neutral object, which
manages the set of variants each of which generates, compiles, and
assembles it's own ir.
There is a bit of additional renaming to s/fd3_compiler/ir3_compiler/,
etc.
Keep the split between the gallium level stateobj and the shader helper
object because it might be a good idea to pre-compute some generation
specific register values (ie. anything that is independent of linking).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
|