| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
| |
Shared variables are stored in a common pool accessible by all threads
in a compute shader local work group.
These variables are similar to OpenCL's local/__local variables.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
|
|
|
|
|
|
|
|
| |
v2:
* Split from patch to add ir_var_shader_shared (tarceri)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
|
|
|
|
|
|
|
|
|
|
| |
v2:
* Move shared parsing under storage qualifiers (tarceri)
* Fail to compile if shared is used in non-compute shader (tarceri)
* Use separate shared_storage bit for shared variables (tarceri)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
|
|
|
|
|
|
|
|
|
| |
Qualifiers on member variables are redundent all we need to do
if check if it matches the stream associated with the block and
throw an error if its not.
Reviewed-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
|
|
|
|
|
|
| |
Fixes crash when using HUD with Nobel Clinician Viewer.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The stw_st_framebuffer_present_locked() function was getting called
twice per SwapBuffers. First, when st_context_iface::flush() was
called from DrvSwapBuffers() because the ST_FLUSH_FRONT flag was
given. Second, by stw_st_swap_framebuffer_locked() which does the
actual SwapBuffers.
Two code changes:
1. Pass ST_FLUSH_END_OF_FRAME, instead of ST_FLUSH_FRONT.
2. Move the implementation of stw_flush_current_locked() into
DrvSwapBuffers() since it's not called anywhere else.
Not much change in perf for benchmarks like Lightsmark, but some simple
Mesa demos are measurably faster.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
And put 8-bit/channel formats before 5/6/5 formats.
The ChoosePixelFormat() function seems to be finicky about format
selection. Putting the MSAA formats after the non-MSAA formats
means most apps get a low-numbered format. Now we generally get
the same pixel format regardless of whether using vgpu9 or 10.
VMware bug 1455030
Reviewed-by: José Fonseca <jfonseca@vmware.com>
|
|
|
|
|
|
|
| |
This allows to use apitrace's retracediff script on Windows to retrace and
compare two builds of a Mesa based opengl32.dll/ICD side-by-side.
See also https://github.com/apitrace/apitrace/commit/e4a4f15f5b92e0abbd24d7d053da25f8278c9f64
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes GPUVM conflicts with non-4K page size.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92738
v2: Replace sanitization of VM base address alignment with comment why
that's not necessary.
v3: Use unsigned instead of long as the type for the size_align member.
(Marek)
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Christian König <christian.koenig@amd.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This will allow dec/enc/transcode without X
v2: use env override even with X,
use loader_open_device instead of open
v3: clean up
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
|
|
|
|
|
|
|
| |
v2: move the dup to vl_wys_drm for pipe loader
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
|
|
|
|
|
|
|
|
|
|
| |
This will allow the state trackers to use render nodes
with screen creation
v2: dup fd for pipe loader
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
|
|
|
|
|
|
|
| |
There is no dev in drv, and dev should be from vl_screen here
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Altough the compute support is still not complete because textures and
surfaces need to be implemented, it allows to launch very simple compute
kernel like one which reads reading MP performance counters.
This turns on PIPE_CAP_COMPUTE and PIPE_SHADER_COMPUTE.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
|
| |
There might only be a single arg (e.g. cvt), so use mode rather than
looking at the source directly. Also we don't want to rely on the type
of the value, which can be unreliable, but instead use the
instruction's. This works out well since mkSplit doesn't adjust the
type.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
| |
Not reachable from TGSI since it only has UMUL, no IMUL. However it's
surprising that setting argument types to s32 will cause sign to get
lost.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
|
| |
Force the fence to get kicked off, which won't actually wait for its
completion, but any additional work will be put onto a fresh list.
This fixes crashes in teximage-colors --benchmark with too many active
maps.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
| |
As pointed out by Emil, this sometimes hangs, appears to be due to threading
need to rethink how this stuff works for llvmpipe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
|
|
|
|
|
| |
Coverity reported that ret could only be 0 or 1, since it
was setting ret = fn() > 0, instead of doing (ret = fn()) > 0.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
|
|
| |
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
|
|
|
|
| |
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we were assuming that everything read/wrote exactly 1 logical
GRF (1 in SIMD8 and 2 in SIMD16). This isn't actually true. In
particular, the PLN instruction reads 2 logical registers in one of the
components. This commit changes post-RA scheduling to use regs_read and
regs_written instead so that we add enough dependencies.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92770
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
|
|
|
|
| |
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
|
|
|
|
|
|
| |
There are a few non-stoney changes too.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
|
|
|
|
|
|
| |
v2: set emit_scratch_reloc, add a NULL check
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
|
|
|
|
|
|
| |
v2: don't call get_flush_flags twice per function
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
|
|
|
|
|
|
| |
This should improve performance for big copies that need to be split.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
|
|
|
|
|
|
|
| |
Nothing actually uses this yet (due to complications), but the emission
logic is right.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
| |
This removes the hack used for merge, which only covers a fraction of
the cases.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
| |
Need to emulate rcp/rsq before providing full fp64 support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
| |
Now that we support 64 bit immediates in insnCanLoad, we need to swap
64 bit immediate sources too for optimal effect.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Teach insnCanLoad about double immediates, together with the
"Add support for merge-s to the ConstantFolding pass"
This turns the following (nvc0) code:
1: mov u32 $r2 0x00000000 (8)
2: mov u32 $r3 0x3fe00000 (8)
3: add f64 $r0d $r0d $r2d (8)
Into:
1: add f64 $r0d $r0d 0.500000 (8)
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
|
|
| |
This allows later passes like LoadPropagation to properly deal with 64
bit immediates.
If the new 64 bit load this introduces does not get optimized away then
split64BitOpPostRA() will split this into 2 instructions again.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
| |
No instructions are able to load short immediates like nvc0 can.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
| |
Add support for encoding double immediates (up to 20 bits of precision)
into the generated gm107 machine-code.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
| |
Add support for encoding double immediates (up to 20 bits of precision)
into the generated nvc0 machine-code.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
| |
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For these nir intrinsics, we emit the same code as
nir_intrinsic_memory_barrier:
* nir_intrinsic_memory_barrier_atomic_counter
* nir_intrinsic_memory_barrier_buffer
* nir_intrinsic_memory_barrier_image
We treat these nir intrinsics as no-ops:
* nir_intrinsic_group_memory_barrier
* nir_intrinsic_memory_barrier_shared
v3:
* Add comment for no-op cases (curro)
v4:
* Moving comment to a separate patch authored by curro
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
|
|
|
|
|
|
|
|
| |
When these functions are called in glsl-ir, we create a corresponding
nir intrinsic function call.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When these functions are called in GLSL code, we create an intrinsic
function call:
* groupMemoryBarrier => __intrinsic_group_memory_barrier
* memoryBarrierAtomicCounter => __intrinsic_memory_barrier_atomic_counter
* memoryBarrierBuffer => __intrinsic_memory_barrier_buffer
* memoryBarrierImage => __intrinsic_memory_barrier_image
* memoryBarrierShared => __intrinsic_memory_barrier_shared
v2:
* Consolidate with memoryBarrier function/intrinsic creation (curro)
v3:
* Instead of add_memory_barrier_function, add an intrinsic_name
parameter to _memory_barrier (curro)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
|
|
|
|
|
|
|
|
|
|
|
| |
We just needed to set the extra width/height fields to get this working.
v2 (chk): rebased, CC stable added, commit message added, fixed coding style
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Apply the start code fix only to advanced profile.
v2 (chk): add commit message
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before it was only possible to convert a NV12 surface to
RGBA or BGRA. This patch uses the same post processing
function, "handleVAProcPipelineParameterBufferType", but
add definitions for RGBX and BGRX.
This patch also makes vlVaQuerySurfaceAttributes more generic
to avoid copy and pasting the same lines.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
Useful is one wants to create RGBX or BGRX surfaces.
The infrastructure is such that it required just a
few definitions to support these formats.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
|
|
|
|
|
|
|
|
|
| |
In "switch (mem_type)" the brackets were surrounding "case+default"
instead of "case" only.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
|
|
|
|
|
|
|
|
| |
Some lines were using 4 indentation spaces instead of 3.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
|