| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
| |
Only and only if src1 is a power of 2 we can replace IMAD by SHLADD.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
| |
Unfortunately, we can't use the emit helpers for GF100/GK110
because src1 and src2 are swapped.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
|
| |
This instruction is available since SM20 (Fermi) and allow to do
(a << b) + c in one shot. In some situations, IMAD should be
replaced by SHLADD when b is a power of 2, and ADD+SHL should be
replaced by SHLADD as well.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
| |
envyas now uses a much better representation for those control
codes and it displays the different flags instead of an
unreadable hex number.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
| |
Most of the time, even the 512 bytes that we now get is more than sufficient
(pipeline stats queries are the largest at 184 bytes per shot).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
| |
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
| |
v2: enable only when compute is available
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
| |
v2: fix a comment (Gustaw Smolarczyk)
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
| |
To ensure that fences are properly initialized.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We will support the waiting option in ARB_query_buffer_object using
WAIT_REG_MEM on an appropriate fence-like dword. Some queries conveniently
write their results with the highest bit set, and we can just use that;
for others, we have to write a fence explicitly.
ZPASS_DONE for occlusion queries writes its results with the high bit
set, but it writes up to 8 pairs of results (one for each DB). We have
to wait for all of these results, so let's just add an explicit fence.
The new function provides summary information to be used by subsequent
patches.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
|
| |
Save compute shader state that will be used for the ARB_query_buffer_object
implementation.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
|
|
|
| |
These functions extract the pipe state structure from the current
descriptors, for state saving.
v2: correctly dereference *buf (Bas)
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
| |
For bottom-of-pipe fences inside the gfx command stream.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
|
|
|
|
| |
There are driver-specific context flags for barriers that are not covered
by the Gallium barrier interfaces.
The R600 settings of these flags may not be optimal, but we're not going
to use them yet anyway.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
| |
|
|
|
|
|
|
|
| |
Fixes lots of piglit tests crashing due to using uninitialized memory.
Fixes: ecd6fce2611e ("mesa/st: support lowering multi-planar YUV")
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
| |
Replace old string comparison with a mapping table.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
|
|
|
|
| |
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When passed to winsys->buffer_create, this flag will indicate that we require
a buffer that maps 1:1 with a kernel buffer handle.
This is currently set for all textures, since textures can potentially be
exported to other processes. This is not a huge loss, since the main purpose
of this patch series is to deal with applications that allocate many small
buffers.
A hypothetical application with tons of tiny textures might still benefit
from not setting this flag, but that's not a use case I'm worried about
just now.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
|
|
|
| |
This is really the behavior we want most of the time, but having a
SYNCHRONIZED flag instead of an UNSYNCHRONIZED one has the advantage that
OR'ing different flags together always results in stronger guarantees.
The parent BOs of sub-allocated buffers will be added unsynchronized.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
| |
This adds a new envvar called NV50_PROG_CHIPSET which allows to
compile shaders with a different target, especially useful for
shader-db.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
| |
Based off of Ilia's original patch, but with output values replicated so
that it matches the TGSI semantics.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
|
| |
When we create a depth/stencil texture, also check if we can render to
it and set the PIPE_BIND_DEPTH_STENCIL flag. We were previously doing
this for color textures (PIPE_BIND_RENDER_TARGET).
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
|
|
|
|
|
|
|
| |
This may have been needed years ago during development, but not now.
Prevents some regressions after introducing the next patch.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
|
|
|
|
|
|
|
|
|
| |
To handle z/layer fix-ups for blitting and copying. Note that we weren't
doing this properly in svga_blit() before.
Also, remove redundant stex, dtex assignments.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
|
|
|
|
|
|
|
|
| |
The util_is_format_compatible() function didn't quite do what we wanted
for vgpu10. This check is more flexible and allows copies between
formats such as R32G32B32A32_FLOAT and R32G32B32A32_INT.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
|
|
|
|
| |
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
|
|
|
|
| |
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
|
|
|
|
|
|
|
|
|
| |
With latest mesa and latest piglit tests srgb<->linear conversion
is not required as per GL4.4 rules
See commit b662c70aeab6a92751514f30719c13a6de253b40.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
VC4 was running into a major performance regression from enabling control
flow in the glmark2 conditionals test, because of short if statements
containing an ffract.
This pass seems like it was was trying to ensure that we only flattened
IFs that should be entirely a win by guaranteeing that there would be
fewer bcsels than there were MOVs otherwise. However, if the number of
ALU ops is small, we can avoid the overhead of branching (which itself
costs cycles) and still get a win, even if it means moving real
instructions out of the THEN/ELSE blocks.
For now, just turn on aggressive flattening on vc4. i965 will need some
tuning to avoid regressions. It does looks like this may be useful to
replace freedreno code.
Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47
fps to 95 fps on vc4.
vc4 shader-db:
total instructions in shared programs: 101282 -> 99543 (-1.72%)
instructions in affected programs: 17365 -> 15626 (-10.01%)
total uniforms in shared programs: 31295 -> 31172 (-0.39%)
uniforms in affected programs: 3580 -> 3457 (-3.44%)
total estimated cycles in shared programs: 225182 -> 223746 (-0.64%)
estimated cycles in affected programs: 26085 -> 24649 (-5.51%)
v2: Update shader-db output.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
|
|
|
|
|
|
| |
Get rid of unneeded local var.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
|
|
|
|
|
|
| |
Never used, AFAIK.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
|
|
|
|
|
|
|
|
|
|
|
| |
v2:
- no PIPE_CAP_INT64 yet
- emit DIV/MOD without the divide-by-zero workaround
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
|
|
|
|
|
| |
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
|
|
|
|
| |
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
|
|
|
|
| |
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
|
|
|
|
|
|
| |
Switch all RDTSC_START/STOP macros to use AR_BEGIN/END macros.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
|
|
|
|
|
|
|
| |
Same thing as nvc0_stage_set_sampler_views_range().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
| |
This function was quite similar to nvc0_stage_set_sampler_views()
and I don't see any reasons to not remove it.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This helps shaders in UE4 demos, especially with Elemental
(+1% perf). This optimization reduces spilling usage in one
shader which explains the little gain.
GF100/GK104:
total instructions in shared programs :2838551 -> 2838045 (-0.02%)
total gprs used in shared programs :396706 -> 396684 (-0.01%)
total local used in shared programs :34432 -> 34416 (-0.05%)
local gpr inst bytes
helped 1 19 112 112
hurt 0 0 0 0
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
| |
This should emit src0 instead of src1.
Found by inspection.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
|
|
|
|
|
|
|
|
|
| |
A couple of forward-declarations were causing warnings in clang:
'value' defined as a class here but previously declared as a struct
[-Wmismatched-tags]
Signed-off-by: Martina Kollarova <martina.kollarova@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch relaxes the restriction of compressed formats for texture
upload buffer. For now, 3D texture with compressed format
is still not supported in the texture upload buffer path.
As Brian noted, ETQW does many texture updates with glCompressedTexSubImage.
This patch greatly improves the performance of the ETQW trace.
Tested with ETQW, MTT piglit, glretrace, conform, viewperf
v2: Per Brian's suggestion, removed the subregion boundary check.
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
|
|
|
|
|
|
|
| |
This reduces the number of times we flush in some situations (the
arbocclude demo is one trivial example).
Tested with Piglit, ETQW, Sauerbraten, arbocclude.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
|
|
|
|
|
|
|
|
|
| |
Since commit 99d8fe20abe1f we don't have to flush the command buffer when
we end a query.
Tested with Piglit, Sauerbraten, arbocclude, ETQW (noticably faster now).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this patch, when running with vgpu10, instead of mapping directly to the
guest backed memory for texture update, we'll use the texture upload buffer
and use the transfer from buffer command to update the host side texture memory.
This optimization yields about 20% performance improvement with
Lightsmark2008 and about 40% with Tropics.
Tested with Lightsmark2008, Tropics, Heaven, MTT piglit, glretrace, conform.
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
|
|
|
|
|
|
|
| |
Split the functions into separate functions for dma and direct map to make
the code more readable.
Tested with MTT piglit, glretrace, viewperf, conform, various OpenGL apps
Reviewed-by: Brian Paul <brianp@vmware.com>
|