summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
Commit message (Collapse)AuthorAgeFilesLines
* nv50/ir: teach insnCanLoad() about SHLADDSamuel Pitoiset2016-09-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | Commutativity is not allowed with SHLADD, but src2 can accept loads. To allow the load propagation pass to do its job, add a special case like for SUCLAMP because src1 is always an immediate. This IMAD to SHLADD optimization helps a bunch of shaders from Tomb Raider, Victor Vran, UE4 demos (+15% perf with Elemental) and Shadow Warrior. GF100/GK104: total instructions in shared programs :2838045 -> 2834712 (-0.12%) total gprs used in shared programs :396684 -> 396386 (-0.08%) total local used in shared programs :34416 -> 34416 (0.00%) local gpr inst bytes helped 0 326 1105 1105 hurt 0 55 3 3 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50/ir: add preliminary support for SHLADDSamuel Pitoiset2016-09-291-2/+9
| | | | | | | | | | This instruction is available since SM20 (Fermi) and allow to do (a << b) + c in one shot. In some situations, IMAD should be replaced by SHLADD when b is a power of 2, and ADD+SHL should be replaced by SHLADD as well. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0/ir: fix comments about instructions infoSamuel Pitoiset2016-09-171-2/+3
| | | | | | | | | The comment for the commutative flags was wrong because OP_MUL is before OP_MAD. While we are at it add missing opcodes, and fix the comment about the short forms. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0/ir: allow min/max instructions to be dual-issued in pairsKarol Herbst2016-09-031-2/+12
| | | | | | | | | | | | | | changes for GpuTest /test=pixmark_piano /benchmark /no_scorebox /msaa=0 /benchmark_duration_ms=60000 /width=1024 /height=640: inst_executed: 1.03G inst_issued1: 614M -> 580M inst_issued2: 213M -> 230M score: 1021 -> 1030 Signed-off-by: Karol Herbst <karolherbst@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0/ir: don't dual-issue ops that depend or interfere with each otherKarol Herbst2016-09-031-0/+6
| | | | | | | Signed-off-by: Karol Herbst <karolherbst@gmail.com> Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> [imirkin: rewrite to split up the helpers and move more logic to target] Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nouveau: Add support for SV_WORK_DIMHans de Goede2016-07-021-0/+1
| | | | | | | | Add support for SV_WORK_DIM for nvc0 and nve4. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* nvc0/ir: limit max number of regs based on availability in SMIlia Mirkin2016-05-301-1/+3
| | | | | | | | | This effectively limits registers to 32 and 64 for fermi and kepler when 1024 threads are used, but allows the full amount to be used with smaller thread sizes. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* nv50/ir: fix a comment in canDualIssue()Samuel Pitoiset2016-05-211-1/+1
| | | | | | | Trivial. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nouveau: codegen: Use FILE_MEMORY_BUFFER for buffersHans de Goede2016-04-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | Some of the lowering steps we currently do for FILE_MEMORY_GLOBAL only apply to buffers, making it impossible to use FILE_MEMORY_GLOBAL for OpenCL global buffers. This commits changes the buffer code to use FILE_MEMORY_BUFFER at the ir_from_tgsi and lowering steps, freeing use of FILE_MEMORY_GLOBAL for use with OpenCL global buffers. Note that after lowering buffer accesses use the FILE_MEMORY_GLOBAL register file. Tested with piglet on a gf119 and a gk107: ./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader [9/9] pass: 9 / ./piglit run -o shader -t '.*arb_compute_shader.*' results/shader [20/20] skip: 4, pass: 16 | Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* nvc0/ir: be careful about propagating very large offsets into const loadIlia Mirkin2016-01-141-0/+10
| | | | | | | | | | | | | Indirect constbuf indexing works by using very large offsets. However if an indirect constbuf index load is const-propagated, it becomes a very large const offset. Take that into account when legalizing the SSA by moving the high parts of that offset into the file index. Also disallow very large (or small) indices on most other instructions. This fixes regressions in ubo_array_indexing/*-two-arrays piglit tests. Fixes: abd326e81b (nv50/ir: propagate indirect loads into instructions) Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0: add ARB_shader_draw_parameters supportIlia Mirkin2015-12-301-0/+3
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0/ir: fix up mul+add -> mad algebraic opt, enable for integersIlia Mirkin2015-12-071-2/+0
| | | | | | | For some reason this has been disabled for integers ever since codegen was merged, despite there being emission code for IMAD. Seems to work. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50/ir: avoid looking at uninitialized srcMods entriesIlia Mirkin2015-12-031-1/+1
| | | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
* nvc0/ir: Teach insnCanLoad about double immediatesHans de Goede2015-11-061-6/+19
| | | | | | | | | | | | | | | | Teach insnCanLoad about double immediates, together with the "Add support for merge-s to the ConstantFolding pass" This turns the following (nvc0) code: 1: mov u32 $r2 0x00000000 (8) 2: mov u32 $r3 0x3fe00000 (8) 3: add f64 $r0d $r0d $r2d (8) Into: 1: add f64 $r0d $r0d 0.500000 (8) Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0/ir: tess factors are now sysvals, adapt codegen to expect thatIlia Mirkin2015-07-231-1/+2
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0/ir: no instruction can load a double immediateIlia Mirkin2015-02-201-0/+2
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50/ir: avoid array overrun when checking for supported modsIlia Mirkin2014-09-081-1/+1
| | | | | | | Reported by Coverity Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
* nvc0/ir: use SM35 ISA with GK20AAlexandre Courbot2014-05-271-5/+10
| | | | | | | | GK20A is mostly compatible with GK104, but uses the SM35 ISA. Use the GK110 path when this chip is detected. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0: maxwell isa has no per-instruction join modifierBen Skeggs2014-05-151-1/+2
| | | | | Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0: allow for easier modification of compiler library routinesBen Skeggs2014-05-151-12/+12
| | | | | Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0/ir: add support for new bitfield manipulation opcodesIlia Mirkin2014-04-281-1/+6
| | | | | | | | | | This adds support for: IBFE, UBFE, BFI, LSB, IMSB, UMSB, BREV, POPC Which are all required for ARB_gs5 support. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0/ir: add support for SAMPLEMASK sysvalIlia Mirkin2014-04-261-0/+1
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0: add support for PIPE_CAP_SAMPLE_SHADINGIlia Mirkin2014-04-261-0/+2
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50/ir/gk110: add 64/128-bit fetch/export supportIlia Mirkin2014-03-181-7/+0
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0: fixup gk110 and up not being listed in various switch statementsBen Skeggs2013-12-061-4/+8
| | | | Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
* Move nv30, nv50 and nvc0 to nouveau.Johannes Obermayr2013-09-111-0/+604
It is planned to ship openSUSE 13.1 with -shared libs. nouveau.la, nv30.la, nv50.la and nvc0.la are currently LIBADDs in all nouveau related targets. This change makes it possible to easily build one shared libnouveau.so which is then LIBADDed. Also dlopen will be faster for one library instead of three and build time on -jX will be reduced. Whitespace fixes were requested by 'git am'. Signed-off-by: Johannes Obermayr <johannesobermayr@gmx.de> Acked-by: Christoph Bumiller <christoph.bumiller@speed.at> Acked-by: Ian Romanick <ian.d.romanick@intel.com>