summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/nouveau
Commit message (Collapse)AuthorAgeFilesLines
* nv30 locking fixesIlia Mirkin2016-11-012-2/+22
|
* nouveau: more locking - make sure that fence work is always done withIlia Mirkin2016-11-014-4/+17
| | | | the push mutex acquired
* WIP nouveau: add lockingIlia Mirkin2016-11-0130-45/+372
|
* nvc0/ir: fix emission of IMAD with NEG modifiersSamuel Pitoiset2016-11-012-2/+2
| | | | | | | | | | The emitter tried to emit sub instead of subr when src0 has actually a NEG modifier. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.0 12.0 13.0" <mesa-stable@lists.freedesktop.org> (cherry picked from commit 84e946380b2d5ddc62a107b667be39abf1932704)
* nvc0/ir: fix emission of SHLADD with NEG modifiersSamuel Pitoiset2016-10-272-2/+2
| | | | | | | | | | | | | | This affects GF100:GK110 chipsets, but not GM107+ where the logic is a bit different. The emitters tried to emit sub instead of subr when src0 has a NEG modifier. This fixes the following piglit tests glsl-fs-loop-nested and glsl-vs-loop-nested. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "13.0" <mesa-stable@lists.freedesktop.org> (cherry picked from commit 1ec7227d44dceae8de7b93f846bbd33d66007909)
* nvc0: use correct bufctx when invalidating CP texturesSamuel Pitoiset2016-10-271-1/+1
| | | | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org> (cherry picked from commit 7b2712c367891e96384226a1fa94679a814235d0)
* nvc0: do not break 3D state by pushing MS coordinates on FermiSamuel Pitoiset2016-10-241-43/+44
| | | | | | | | | | | | | | | | | | | Long story short, 3D and CP are aliased on Fermi and initializing compute after pushing the MS sample coordinate offsets seems to corrupt 3D state for weird reasons. I still don't have the faintest clue what is going on, but this seems to only affect Fermi generation. A possible fix could be to use two different channels, one for 3D and one for CP. This fixes a bunch of regressions pinpointed by piglit. Fixes: "nvc0: fix up image support for allowing multiple samples" Cc: "13.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (cherry picked from commit 42273edf79c2500957f51690499aa3405cc689db)
* nv50/ir: process texture offset sources as regular sourcesIlia Mirkin2016-10-241-53/+94
| | | | | | | | | | | | | | | | With ARB_gpu_shader5, texture offsets can be any source, including TEMPs and IN's. Make sure to process them as regular sources so that we pick up masks, etc. This should fix some CTS tests that feed offsets directly to textureGatherOffset, and we were not picking up the input use, thus not advertising it in the shader header. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Dave Airlie <airlied@redhat.com> Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org> (cherry picked from commit cd45d758ff87305ceecca899fe7325779bb6755b)
* nv50,nvc0: avoid reading out of bounds when getting bogus so infoIlia Mirkin2016-10-242-2/+8
| | | | | | | | | | The state tracker tries to attach the info to the wrong shader. This is easy enough to protect against. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org> (cherry picked from commit 313fba5ee1de9416930e45da8aff63a24763940b)
* gm107/ir: fix bit offset of tex lod setting for indirect texturingIlia Mirkin2016-10-181-1/+1
| | | | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org
* gm107/ir: fix texturing with indirect samplersIlia Mirkin2016-10-181-0/+10
| | | | | | | | | | The indirect handle has to come right after the coordinates, so if there was a sample/bias/depth compare/offset, everything would end up being shifted by one argument position. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org
* nv50/ir: constant fold OP_SPLITTobias Klausmann2016-10-141-0/+18
| | | | | | | | | | | Split the source immediate value into new values and move them into the original defs set by the split. Since we can only have up to 64-bit immediates, this is largely beneficial for F64 (and, in the future, U64) operations. Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> [imirkin: always use U32, set newi for foldCount tracking] Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50: enable ARB_enhanced_layoutsIlia Mirkin2016-10-131-1/+1
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0/ir: be more careful about preserving modifiers in SHLADD creationIlia Mirkin2016-10-131-7/+5
| | | | | | | | src2 was being given the wrong modifier, and we were not properly managing the modifier on the SHL source either. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* nvc0: enable ARB_enhanced_layoutsSamuel Pitoiset2016-10-131-1/+1
| | | | | | | | All ARB_enhanced_layouts piglit tests pass without any changes in our compiler. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0/ir: fix textureGather with a single offsetIlia Mirkin2016-10-121-2/+2
| | | | | | | | | | | Recent fix for non-const offsets broke the case of a single offset (vs 4 offsets). The later code relies on the offs array to contain null values to tell whether they should be added onto the srcs list. Fixes: 5239bd592 ("nvc0/ir: fix overwriting of value backing non-constant gather offset") Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org
* nv50/ir: copy over value's register id when resolving merge of a phiIlia Mirkin2016-10-121-1/+3
| | | | | | | | | | The offset needs to be properly copied over to the phi value, otherwise it will get assigned to the base of the merge instead of the proper location. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org
* gallium: add PIPE_CAP_TGSI_ARRAY_COMPONENTSNicolai Hähnle2016-10-123-0/+3
| | | | | | | | This is a screen cap because drivers are expected to support it either for all shader types or for none of them. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Reviewed-by: Dave Airlie <airlied@redhat.com>
* nv50/ir: optimize ADD(SHL(a, b), c) to SHLADD(a, b, c)Samuel Pitoiset2016-10-121-0/+87
| | | | | | | | | | | | | total instructions in shared programs :2286901 -> 2284473 (-0.11%) total gprs used in shared programs :335256 -> 335273 (0.01%) total local used in shared programs :31968 -> 31968 (0.00%) local gpr inst bytes helped 0 41 852 852 hurt 0 44 23 23 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0: fix valid range for shader buffersSamuel Pitoiset2016-10-103-0/+3
| | | | | | | | When offset != 0, the valid range was wrong because the second argument of util_range_add() is end, not size. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0/ir: fix overwriting of value backing non-constant gather offsetIlia Mirkin2016-10-101-2/+2
| | | | | | | | | | | | | Normally the value is an immediate, which is moved to some temporary, so there's no problem. In the case of a non-constant offset (as allowed by ARB_gpu_shader5), we have to take care to copy it first before using it to build up the bits. This fixes a compilation error observed in F1 2015. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org
* nv50/ir: only stick one preret per functionIlia Mirkin2016-10-101-4/+7
| | | | | | | | | | | A function with multiple returns would have had multiple preret settings at the top of the function. While this is unlikely to have caused issues since we don't use functions in earnest, it could have in some cases overflowed the call stack, in case a function had a lot of early returns. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* nv50/ir: fix wrong check when optimizing MAD to SHLADDSamuel Pitoiset2016-10-071-1/+1
| | | | | | | | | Checking if MAD is supported is definitely wrong, and it's more likely a typo I introduced few days ago which breaks NV50 because SHLADD is not supported there. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0: dump program binary only when NV50_PROG_DEBUG is setSamuel Pitoiset2016-10-071-1/+1
| | | | | | | | When the chipset is forced with NV50_PROG_CHIPSET, we actually only want to output the binary if NV50_PROG_DEBUG is also enabled. Otherwise, this pollutes the shader-db output. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* nvc0: expose ARB_compute_variable_group_sizeSamuel Pitoiset2016-10-071-2/+6
| | | | | | | | | Only expose 512 threads/block on Fermi to not be limited by 32 GPRs/thread. v4: - use 512 threads on Fermi, 1024 on Kepler+ Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* nv50/ir: set number of threads/block for variable local sizeSamuel Pitoiset2016-10-071-0/+2
| | | | | | | | | | | | When a variable local size is defined as specified by ARB_compute_variable_group_size, the fixed local size is set to 0 and a SIGFPE occurs when we compute the maximum number of regs. This allows to use 64 GPRs/thread. v4: - use 512 threads on Fermi, 1024 on Kepler+ Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* gallium: add PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCKSamuel Pitoiset2016-10-072-0/+4
| | | | | | | | | v3: - use a new case statement in r600_pipe_common.c - fix compilation of softpipe... Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* nv50/ir: optimize sub(a, 0) to aKarol Herbst2016-10-061-0/+3
| | | | | | | | | | | | | | | | | helped some ue4 demos and divinity OS shaders total instructions in shared programs : 2818674 -> 2818606 (-0.00%) total gprs used in shared programs : 379273 -> 379273 (0.00%) total local used in shared programs : 9505 -> 9505 (0.00%) total bytes used in shared programs : 25837792 -> 25837192 (-0.00%) local gpr inst bytes helped 0 0 33 33 hurt 0 0 0 0 Signed-off-by: Karol Herbst <karolherbst@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
* nvc0: dump program binary when chipset has been forcedSamuel Pitoiset2016-10-051-0/+5
| | | | | | | | Currently, program binaries are only dumped at upload time, but when the chipset has been forced via NV50_PROG_CHIPSET we might want to show the generated code, especially with shaderdb. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* nv50/ra: let simplify return an error and handle thatKarol Herbst2016-10-051-5/+7
| | | | | | | | fixes a crash in the case simplify reports an error Signed-off-by: Karol Herbst <karolherbst@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* nv50/ir: teach insnCanLoad() about SHLADDSamuel Pitoiset2016-09-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | Commutativity is not allowed with SHLADD, but src2 can accept loads. To allow the load propagation pass to do its job, add a special case like for SUCLAMP because src1 is always an immediate. This IMAD to SHLADD optimization helps a bunch of shaders from Tomb Raider, Victor Vran, UE4 demos (+15% perf with Elemental) and Shadow Warrior. GF100/GK104: total instructions in shared programs :2838045 -> 2834712 (-0.12%) total gprs used in shared programs :396684 -> 396386 (-0.08%) total local used in shared programs :34416 -> 34416 (0.00%) local gpr inst bytes helped 0 326 1105 1105 hurt 0 55 3 3 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50/ir: optimize SHLADD(a, b, c) to MOV((a << b) + c)Samuel Pitoiset2016-09-291-0/+3
| | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50/ir: optimize SHLADD(a, b, 0x0) to SHL(a, b)Samuel Pitoiset2016-09-291-0/+8
| | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50/ir: optimize IMAD to SHLADD in presence of power of 2Samuel Pitoiset2016-09-291-0/+7
| | | | | | | Only and only if src1 is a power of 2 we can replace IMAD by SHLADD. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0/ir: add emission for SHLADDSamuel Pitoiset2016-09-293-0/+127
| | | | | | | | Unfortunately, we can't use the emit helpers for GF100/GK110 because src1 and src2 are swapped. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50/ir: add preliminary support for SHLADDSamuel Pitoiset2016-09-295-7/+17
| | | | | | | | | | This instruction is available since SM20 (Fermi) and allow to do (a << b) + c in one shot. In some situations, IMAD should be replaced by SHLADD when b is a power of 2, and ADD+SHL should be replaced by SHLADD as well. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0: update GM107 sched control codes formatSamuel Pitoiset2016-09-292-23/+23
| | | | | | | | | envyas now uses a much better representation for those control codes and it displays the different flags instead of an unreadable hex number. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50/ir: fix comments about instructions infoSamuel Pitoiset2016-09-261-2/+3
| | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0: allow to force compiling programs in debug buildSamuel Pitoiset2016-09-261-9/+10
| | | | | | | | | This adds a new envvar called NV50_PROG_CHIPSET which allows to compile shaders with a different target, especially useful for shader-db. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50/ir: drop unused NVISA_XXX_CHIPSET constantsSamuel Pitoiset2016-09-261-2/+0
| | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0: get rid of nvc0_stage_sampler_states_bind_range()Samuel Pitoiset2016-09-191-74/+9
| | | | | | | Same thing as nvc0_stage_set_sampler_views_range(). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0: get rid of nvc0_stage_set_sampler_views_range()Samuel Pitoiset2016-09-191-89/+15
| | | | | | | | This function was quite similar to nvc0_stage_set_sampler_views() and I don't see any reasons to not remove it. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50/ir: optimize SUB(a, b) to MOV(a - b)Samuel Pitoiset2016-09-181-0/+10
| | | | | | | | | | | | | | | | | | | This helps shaders in UE4 demos, especially with Elemental (+1% perf). This optimization reduces spilling usage in one shader which explains the little gain. GF100/GK104: total instructions in shared programs :2838551 -> 2838045 (-0.02%) total gprs used in shared programs :396706 -> 396684 (-0.01%) total local used in shared programs :34432 -> 34416 (-0.05%) local gpr inst bytes helped 1 19 112 112 hurt 0 0 0 0 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* gk110/ir: fix wrong emission of OP_NOTSamuel Pitoiset2016-09-181-1/+1
| | | | | | | | | This should emit src0 instead of src1. Found by inspection. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org
* nvc0/ir: fix subops for IMADSamuel Pitoiset2016-09-171-4/+6
| | | | | | | | | Offset was wrong, it's at bit 8, not 4. Also, uses subr instead of sub when src2 has neg. Similar to GK110 now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org
* nvc0/ir: fix comments about instructions infoSamuel Pitoiset2016-09-171-2/+3
| | | | | | | | | The comment for the commutative flags was wrong because OP_MUL is before OP_MAD. While we are at it add missing opcodes, and fix the comment about the short forms. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
* gm107/ir: allow indirect inputs to be loaded by frag shaderIlia Mirkin2016-09-102-5/+21
| | | | | | | | | Looks like the GM107 IPA op does not allow a separate offset when using an indirect register. Instead we must use AL2P like we do for indirect vertex operations on Kepler+. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* gm107/ir: AL2P writes to a predicate registerIlia Mirkin2016-09-101-0/+1
| | | | | | | | | | | We have to force it to write to predicate 7 (aka PT) in order for it not to mess up another predicate. Unclear what would be returned in the predicate, perhaps an error code for out-of-bounds requests. Blob doesn't seem to check it. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org
* gallium: remove PIPE_BIND_TRANSFER_READ/WRITEMarek Olšák2016-09-083-12/+6
| | | | | | | | not used in any useful way Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* gk110/ir: fix quadop dall emissionIlia Mirkin2016-09-041-2/+2
| | | | | | | | | We recently starting to always emit the NDV (== dall) bit for quadops. However it was folded into the wrong code word. Fixes: e0a067ed48 (nv50/ir: always emit the NDV bit for OP_QUADOP) Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: <mesa-stable@lists.freedesktop.org>