summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
...
* radeonsi: merge radeon_llvm_context and si_shader_contextMarek Olšák2016-10-184-317/+290
| | | | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: import all TGSI->LLVM code from gallium/radeonMarek Olšák2016-10-1811-462/+346
| | | | | | Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* gallium/radeon: simplify initialization of 64-bit gallivm buildersMarek Olšák2016-10-181-18/+4
| | | | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* gallium/radeon: remove unused radeon_llvm_reg_index_soaMarek Olšák2016-10-182-7/+0
| | | | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* radeonsi: move LLVM ALU codegen into radeonsiMarek Olšák2016-10-186-992/+1056
| | | | | | Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
* gm107/ir: fix bit offset of tex lod setting for indirect texturingIlia Mirkin2016-10-181-1/+1
| | | | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org
* gm107/ir: fix texturing with indirect samplersIlia Mirkin2016-10-181-0/+10
| | | | | | | | | | The indirect handle has to come right after the coordinates, so if there was a sample/bias/depth compare/offset, everything would end up being shifted by one argument position. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org
* radeonsi: unify the constant load pathsNicolai Hähnle2016-10-171-28/+11
| | | | | | Remove the split between direct and indirect. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: fix indirect loads of 64 bit constantsNicolai Hähnle2016-10-171-2/+2
| | | | | | | This fixes GL45-CTS.compute_shader.fp64-case3. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: shorten "shader->selector" to "sel" in si_shader_createMarek Olšák2016-10-171-7/+8
| | | | | Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: clear DB_RENDER_OVERRIDEMarek Olšák2016-10-171-3/+1
| | | | | | | Vulkan doesn't set these fields even though it doesn't use HiS. HiS is disabled by programming DB_SRESULTS_COMPARE_STATEn to 0. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* vc4: Fix fast clear color packing for 565.Eric Anholt2016-10-161-3/+16
| | | | | Piglit didn't manage to cover this because fbo-clear-formats uses scissors, so we don't get fast clearing.
* nv50/ir: constant fold OP_SPLITTobias Klausmann2016-10-141-0/+18
| | | | | | | | | | | Split the source immediate value into new values and move them into the original defs set by the split. Since we can only have up to 64-bit immediates, this is largely beneficial for F64 (and, in the future, U64) operations. Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> [imirkin: always use U32, set newi for foldCount tracking] Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* swr: [rasterizer core] don't construct pArContext on non-ar buildsTim Rowley2016-10-131-0/+6
| | | | | | Stops debug directory being created on non-ar builds. Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
* swr: [rasterizer core] remove WorkerWaitForThreadEvent bucketTim Rowley2016-10-133-6/+0
| | | | | | Cause of bucket stop capture hang, as threads get stuck in level 1. Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
* swr: [rasterizer core] move binner functionality to separate fileTim Rowley2016-10-133-1392/+1444
| | | | Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
* swr: [rasterizer scripts] add DEBUG_OUTPUT_DIR knobTim Rowley2016-10-131-0/+7
| | | | Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
* swr: [rasterizer core] fix comment typoTim Rowley2016-10-131-1/+1
| | | | Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
* swr: [rasterizer core/sim] 8x2 backend + 16-wide tile clear/load/storeTim Rowley2016-10-1313-100/+1895
| | | | | | | | | Work in progress (disabled). USE_8x2_TILE_BACKEND define in knobs.h enables AVX512 code paths (emulated on non-AVX512 HW). Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
* swr: [rasterizer archrast] fix event file issue with saving dataTim Rowley2016-10-134-8/+22
| | | | | | | Also, tagging stats with draw id to correlate these events with draw/dispatch events. Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
* swr: [rasterizer common] fix assert indexEric Engestrom2016-10-131-1/+1
| | | | | | | | Fixes: b3bd8bb611bb465d2e5e ("swr: [rasterizer core] add support for "RAW" surface format") CovID: 1373647 Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
* nv50: enable ARB_enhanced_layoutsIlia Mirkin2016-10-131-1/+1
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0/ir: be more careful about preserving modifiers in SHLADD creationIlia Mirkin2016-10-131-7/+5
| | | | | | | | src2 was being given the wrong modifier, and we were not properly managing the modifier on the SHL source either. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* vc4: Avoid loading from the texture during non-utile-aligned glTexImage().Eric Anholt2016-10-131-12/+34
| | | | | | | | | | | | | Previously, the plan was "if the width/height we have to load/store isn't the size the user is planning on writing, then we need to load the old contents out beforehand to prevent writing back undefined". However, when we're doing glTexImage() we often end up aligning the width/height into the padding of the texture, and we don't actually need to read out that padding. Improves x11perf -aatrapezoid100 performance from ~460/sec to ~700/sec.
* nvc0: enable ARB_enhanced_layoutsSamuel Pitoiset2016-10-131-1/+1
| | | | | | | | All ARB_enhanced_layouts piglit tests pass without any changes in our compiler. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* radeonsi: adjust and clean up Z_ORDER and EXEC_ON_x settingsMarek Olšák2016-10-132-22/+32
| | | | | | | The table was copied from the Vulkan driver. The comment lines are as long as the table for cosmetic reasons. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: disable ReZMarek Olšák2016-10-131-7/+4
| | | | | | | | | This is a serious performance fix. Discovered by luck. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94354 Cc: 12.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: implement TC-compatible HTILEMarek Olšák2016-10-138-20/+132
| | | | | | | | | | | | | | | | | | | | | so that decompress blits aren't needed and depth texturing needs less memory bandwidth. Z16 and Z24 are promoted to Z32_FLOAT by the driver, because TC-compatible HTILE only supports Z32_FLOAT. This doubles memory footprint for Z16. The format promotion is not visible to state trackers. This is part of TC-compatible renderbuffer compression, which has 3 parts: DCC, HTILE, FMASK. Only TC-compatible FMASK compression is missing now. I don't see a measurable increase in performance though. (I tested Talos Principle and DiRT: Showdown, the latter is improved by 0.5%, which is almost noise, and it originally used layered Z16, so at least we know that Z16 promoted to Z32F isn't slower now) Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: fix regression in image atomicsNicolai Hähnle2016-10-131-1/+1
| | | | Caused by a bad rebase when pushing commit 76a940893.
* radeonsi: fix the coordinate overloading of llvm.amdgcn.image.atomic.cmpswap.*Nicolai Hähnle2016-10-131-2/+7
| | | | | | | Fixes GL45-CTS.shader_image_load_store.basic-allTargets-atomic* Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* swr: automake: add ar_eventhandlerfile_h.template to the tarballEmil Velikov2016-10-121-1/+2
| | | | Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
* nvc0/ir: fix textureGather with a single offsetIlia Mirkin2016-10-121-2/+2
| | | | | | | | | | | Recent fix for non-const offsets broke the case of a single offset (vs 4 offsets). The later code relies on the offs array to contain null values to tell whether they should be added onto the srcs list. Fixes: 5239bd592 ("nvc0/ir: fix overwriting of value backing non-constant gather offset") Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org
* nv50/ir: copy over value's register id when resolving merge of a phiIlia Mirkin2016-10-121-1/+3
| | | | | | | | | | The offset needs to be properly copied over to the phi value, otherwise it will get assigned to the base of the merge instead of the proper location. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org
* st/mesa: enable ARB_enhanced_layouts and turn the cap onNicolai Hähnle2016-10-123-3/+3
| | | | | | | v2: mark llvmpipe & softpipe properly as well (Jason Wood) Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Dave Airlie <airlied@redhat.com>
* gallium: add PIPE_CAP_TGSI_ARRAY_COMPONENTSNicolai Hähnle2016-10-1215-0/+15
| | | | | | | | This is a screen cap because drivers are expected to support it either for all shader types or for none of them. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Dave Airlie <airlied@redhat.com>
* radeonsi: Use the new image load/store intrinsic signaturesTom Stellard2016-10-121-14/+45
| | | | | | This patch requires LLVM r284024 or newer. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: Add function for converting LLVM type to intrinsic stringTom Stellard2016-10-121-10/+32
| | | | | | The existing function only worked for integer types. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: Refactor image store/load intrinsic name creationTom Stellard2016-10-121-11/+18
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: fix R600_DEBUG=precompile for shader-dbMarek Olšák2016-10-121-0/+6
| | | | | | | radeonsi no longer supports pixel shaders without interpolation optimizations, which led to assertion failures in si_shader_ps when running shader-db. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: use TC write-back instead of full cache invalidationMarek Olšák2016-10-123-13/+7
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: implement TC L2 write-back (flush) without cache invalidationMarek Olšák2016-10-122-28/+74
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* radeonsi: don't invalidate VMEM L1 for memory barriers for index buffersMarek Olšák2016-10-121-3/+4
| | | | Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* nv50/ir: optimize ADD(SHL(a, b), c) to SHLADD(a, b, c)Samuel Pitoiset2016-10-121-0/+87
| | | | | | | | | | | | | total instructions in shared programs :2286901 -> 2284473 (-0.11%) total gprs used in shared programs :335256 -> 335273 (0.01%) total local used in shared programs :31968 -> 31968 (0.00%) local gpr inst bytes helped 0 41 852 852 hurt 0 44 23 23 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* trace: add invalidate_resource callbackIlia Mirkin2016-10-111-0/+21
| | | | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: emit TA_CS_BC_BASE_ADDR on SI only if the kernel allows itMarek Olšák2016-10-111-1/+6
| | | | | | | Reviewed-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* swr: [rasterizer archrast] update proto fileTim Rowley2016-10-111-2/+56
| | | | Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
* swr: [rasterizer archrast] add support for stats filesTim Rowley2016-10-114-20/+57
| | | | | | Only stat and counter events are saved to the event files. Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
* swr: [rasterizer jitter] remove architecture overrideTim Rowley2016-10-111-41/+1
| | | | Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
* swr: [rasterizer jitter] adjust jitmanager assertTim Rowley2016-10-111-1/+4
| | | | Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
* swr: [rasterizer] eliminate unused label warnings on gccTim Rowley2016-10-112-0/+8
| | | | Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>