summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
...
* st/va: Fix H.264 PicOrderCnt valueMark Thompson2016-10-141-1/+1
| | | | | | | TopFieldPicOrderCnt is exactly the PicOrderCnt value for a frame - see H.264 section 8.2.1. Reviewed-by: Christian König <christian.koenig@amd.com>
* st/va: Baseline profile is not supportedMark Thompson2016-10-141-2/+2
| | | | | | | | Constrained baseline profile is supported, so use that instead. This matches what the encoder already does (constraint_set1_flag is always set in the output bitstream). Reviewed-by: Christian König <christian.koenig@amd.com>
* st/va: Return surface formats depending on config chroma formatMark Thompson2016-10-141-2/+10
| | | | | | | | | This makes the supported format actually match the configuration, and allows the user to observe that NV12 is supported for video processing where previously they couldn't (though it did always work if they blindly tried to use it anyway). Reviewed-by: Christian König <christian.koenig@amd.com>
* st/va: Save surface chroma format in configMark Thompson2016-10-142-1/+20
| | | | | | | Both YUV420 and RGB32 configurations are supported, so we need to be able to distinguish which is being used. Reviewed-by: Christian König <christian.koenig@amd.com>
* st/va: Return more useful config attributesMark Thompson2016-10-141-9/+38
| | | | | | | The encoder attributes are needed for a user of the encoder to be able to configure it sensibly without internal knowledge. Reviewed-by: Christian König <christian.koenig@amd.com>
* swr: [rasterizer core] don't construct pArContext on non-ar buildsTim Rowley2016-10-131-0/+6
| | | | | | Stops debug directory being created on non-ar builds. Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
* swr: [rasterizer core] remove WorkerWaitForThreadEvent bucketTim Rowley2016-10-133-6/+0
| | | | | | Cause of bucket stop capture hang, as threads get stuck in level 1. Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
* swr: [rasterizer core] move binner functionality to separate fileTim Rowley2016-10-133-1392/+1444
| | | | Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
* swr: [rasterizer scripts] add DEBUG_OUTPUT_DIR knobTim Rowley2016-10-131-0/+7
| | | | Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
* swr: [rasterizer core] fix comment typoTim Rowley2016-10-131-1/+1
| | | | Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
* swr: [rasterizer core/sim] 8x2 backend + 16-wide tile clear/load/storeTim Rowley2016-10-1313-100/+1895
| | | | | | | | | Work in progress (disabled). USE_8x2_TILE_BACKEND define in knobs.h enables AVX512 code paths (emulated on non-AVX512 HW). Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
* swr: [rasterizer archrast] fix event file issue with saving dataTim Rowley2016-10-134-8/+22
| | | | | | | Also, tagging stats with draw id to correlate these events with draw/dispatch events. Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
* swr: [rasterizer common] fix assert indexEric Engestrom2016-10-131-1/+1
| | | | | | | | Fixes: b3bd8bb611bb465d2e5e ("swr: [rasterizer core] add support for "RAW" surface format") CovID: 1373647 Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
* nv50: enable ARB_enhanced_layoutsIlia Mirkin2016-10-131-1/+1
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0/ir: be more careful about preserving modifiers in SHLADD creationIlia Mirkin2016-10-131-7/+5
| | | | | | | | src2 was being given the wrong modifier, and we were not properly managing the modifier on the SHL source either. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* tgsi: fix comment typo in tgsi_ureg.cBrian Paul2016-10-131-1/+1
| | | | Trivial.
* vc4: Avoid loading from the texture during non-utile-aligned glTexImage().Eric Anholt2016-10-131-12/+34
| | | | | | | | | | | | | Previously, the plan was "if the width/height we have to load/store isn't the size the user is planning on writing, then we need to load the old contents out beforehand to prevent writing back undefined". However, when we're doing glTexImage() we often end up aligning the width/height into the padding of the texture, and we don't actually need to read out that padding. Improves x11perf -aatrapezoid100 performance from ~460/sec to ~700/sec.
* st/nine: Fix possible segfault in surface ctorAxel Davy2016-10-131-2/+2
| | | | | | | | | | | | | | | | Regression introduced by ba0274c7d6c3b77a36bbe1b444f427b0c873e2f3 Check the resource exists before assigning it a flag (and use This->base.resource instead of pResource, since the former may have a newly allocate resource, while the latter would be NULL). This should reintroduce the behaviour of previous code. Signed-off-by: Axel Davy <axel.davy@ens.fr>
* st/nine: Remove useless code in nine_shaderAxel Davy2016-10-131-5/+0
| | | | | | | | | | | | | Since 1604efa6fda9b780e8537a131ad77f3e83e5a67a, lconsti and lconstb don't need to be initialized. Remove some leftovers from the previous code (which has now invalid use of ARRAY_SIZE on a pointer instead of an array). Reported by Coverity. Signed-off-by: Axel Davy <axel.davy@ens.fr>
* gallium/os: Use unsigned integers for size computationAxel Davy2016-10-131-2/+2
| | | | | | | | Use uint64_t instead of int64_t in the calculation, as the result is uint64_t. Signed-off-by: Axel Davy <axel.davy@ens.fr> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* nvc0: enable ARB_enhanced_layoutsSamuel Pitoiset2016-10-131-1/+1
| | | | | | | | All ARB_enhanced_layouts piglit tests pass without any changes in our compiler. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* radeonsi: adjust and clean up Z_ORDER and EXEC_ON_x settingsMarek Olšák2016-10-132-22/+32
| | | | | | | The table was copied from the Vulkan driver. The comment lines are as long as the table for cosmetic reasons. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: disable ReZMarek Olšák2016-10-131-7/+4
| | | | | | | | | This is a serious performance fix. Discovered by luck. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94354 Cc: 12.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: implement TC-compatible HTILEMarek Olšák2016-10-139-24/+185
| | | | | | | | | | | | | | | | | | | | | so that decompress blits aren't needed and depth texturing needs less memory bandwidth. Z16 and Z24 are promoted to Z32_FLOAT by the driver, because TC-compatible HTILE only supports Z32_FLOAT. This doubles memory footprint for Z16. The format promotion is not visible to state trackers. This is part of TC-compatible renderbuffer compression, which has 3 parts: DCC, HTILE, FMASK. Only TC-compatible FMASK compression is missing now. I don't see a measurable increase in performance though. (I tested Talos Principle and DiRT: Showdown, the latter is improved by 0.5%, which is almost noise, and it originally used layered Z16, so at least we know that Z16 promoted to Z32F isn't slower now) Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* gallium: add PIPE_RESOURCE_FLAG_TEXTURING_MORE_LIKELYMarek Olšák2016-10-131-0/+1
| | | | | | | | | | | | | | For performance tuning in drivers. It filters out window system framebuffers and OpenGL renderbuffers. radeonsi will use this to guess whether a depth buffer will be read by a shader. There is no guarantee about what will actually happen. This is a departure from PIPE_BIND flags which are defined to be strict but they are useless in practice. Acked-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: fix regression in image atomicsNicolai Hähnle2016-10-131-1/+1
| | | | Caused by a bad rebase when pushing commit 76a940893.
* radeonsi: fix the coordinate overloading of llvm.amdgcn.image.atomic.cmpswap.*Nicolai Hähnle2016-10-131-2/+7
| | | | | | | Fixes GL45-CTS.shader_image_load_store.basic-allTargets-atomic* Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* swr: automake: add ar_eventhandlerfile_h.template to the tarballEmil Velikov2016-10-121-1/+2
| | | | Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
* nvc0/ir: fix textureGather with a single offsetIlia Mirkin2016-10-121-2/+2
| | | | | | | | | | | Recent fix for non-const offsets broke the case of a single offset (vs 4 offsets). The later code relies on the offs array to contain null values to tell whether they should be added onto the srcs list. Fixes: 5239bd592 ("nvc0/ir: fix overwriting of value backing non-constant gather offset") Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org
* nv50/ir: copy over value's register id when resolving merge of a phiIlia Mirkin2016-10-121-1/+3
| | | | | | | | | | The offset needs to be properly copied over to the phi value, otherwise it will get assigned to the base of the merge instead of the proper location. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org
* st/mesa: enable ARB_enhanced_layouts and turn the cap onNicolai Hähnle2016-10-123-3/+3
| | | | | | | v2: mark llvmpipe & softpipe properly as well (Jason Wood) Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Dave Airlie <airlied@redhat.com>
* tgsi/ureg: add ureg_DECL_output_layoutNicolai Hähnle2016-10-122-13/+38
| | | | | | | | | For specifying an exact location/component. v2: change the order of parameters (Dave) Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> (v1) Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
* tgsi/ureg: add layout/component input declarationsNicolai Hähnle2016-10-122-12/+76
| | | | | | | v2: change the order of parameters (Dave) Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> (v1) Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
* tgsi/scan: fix num_inputs/num_outputs for shaders with overlapping arraysNicolai Hähnle2016-10-121-8/+2
| | | | | | | v2: remove a tautological left-over assert (Marek) Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> (v1) Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
* gallium: add PIPE_CAP_TGSI_ARRAY_COMPONENTSNicolai Hähnle2016-10-1217-0/+24
| | | | | | | | This is a screen cap because drivers are expected to support it either for all shader types or for none of them. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Dave Airlie <airlied@redhat.com>
* radeonsi: Use the new image load/store intrinsic signaturesTom Stellard2016-10-121-14/+45
| | | | | | This patch requires LLVM r284024 or newer. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: Add function for converting LLVM type to intrinsic stringTom Stellard2016-10-121-10/+32
| | | | | | The existing function only worked for integer types. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: Refactor image store/load intrinsic name creationTom Stellard2016-10-121-11/+18
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* winsys/amdgpu: fix infinite loop w/ RADEON_NOOP=1 caused by unsubmitted fencesMarek Olšák2016-10-121-2/+5
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: fix R600_DEBUG=precompile for shader-dbMarek Olšák2016-10-121-0/+6
| | | | | | | radeonsi no longer supports pixel shaders without interpolation optimizations, which led to assertion failures in si_shader_ps when running shader-db. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: use TC write-back instead of full cache invalidationMarek Olšák2016-10-123-13/+7
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: implement TC L2 write-back (flush) without cache invalidationMarek Olšák2016-10-122-28/+74
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: don't invalidate VMEM L1 for memory barriers for index buffersMarek Olšák2016-10-121-3/+4
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* nv50/ir: optimize ADD(SHL(a, b), c) to SHLADD(a, b, c)Samuel Pitoiset2016-10-121-0/+87
| | | | | | | | | | | | | total instructions in shared programs :2286901 -> 2284473 (-0.11%) total gprs used in shared programs :335256 -> 335273 (0.01%) total local used in shared programs :31968 -> 31968 (0.00%) local gpr inst bytes helped 0 41 852 852 hurt 0 44 23 23 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* draw: initialize shader inputsRoland Scheidegger2016-10-121-0/+7
| | | | | | | | | This should make the code more robust if a shader tries to use inputs which aren't defined by the vertex element layout (which usually shouldn't happen). No piglit change. Reviewed-by: Brian Paul <brianp@vmware.com>
* trace: add invalidate_resource callbackIlia Mirkin2016-10-111-0/+21
| | | | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: emit TA_CS_BC_BASE_ADDR on SI only if the kernel allows itMarek Olšák2016-10-111-1/+6
| | | | | | | Reviewed-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* swr: [rasterizer archrast] update proto fileTim Rowley2016-10-111-2/+56
| | | | Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
* swr: [rasterizer archrast] add support for stats filesTim Rowley2016-10-114-20/+57
| | | | | | Only stat and counter events are saved to the event files. Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
* swr: [rasterizer jitter] remove architecture overrideTim Rowley2016-10-111-41/+1
| | | | Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>