| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commutativity is not allowed with SHLADD, but src2 can accept
loads. To allow the load propagation pass to do its job, add a
special case like for SUCLAMP because src1 is always an immediate.
This IMAD to SHLADD optimization helps a bunch of shaders from Tomb
Raider, Victor Vran, UE4 demos (+15% perf with Elemental) and Shadow
Warrior.
GF100/GK104:
total instructions in shared programs :2838045 -> 2834712 (-0.12%)
total gprs used in shared programs :396684 -> 396386 (-0.08%)
total local used in shared programs :34416 -> 34416 (0.00%)
local gpr inst bytes
helped 0 326 1105 1105
hurt 0 55 3 3
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
|
| |
This instruction is available since SM20 (Fermi) and allow to do
(a << b) + c in one shot. In some situations, IMAD should be
replaced by SHLADD when b is a power of 2, and ADD+SHL should be
replaced by SHLADD as well.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
| |
The comment for the commutative flags was wrong because OP_MUL is
before OP_MAD. While we are at it add missing opcodes, and fix
the comment about the short forms.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
changes for GpuTest /test=pixmark_piano /benchmark /no_scorebox /msaa=0
/benchmark_duration_ms=60000 /width=1024 /height=640:
inst_executed: 1.03G
inst_issued1: 614M -> 580M
inst_issued2: 213M -> 230M
score: 1021 -> 1030
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
| |
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
[imirkin: rewrite to split up the helpers and move more logic to target]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
| |
Add support for SV_WORK_DIM for nvc0 and nve4.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
|
|
|
|
|
|
|
|
|
| |
This effectively limits registers to 32 and 64 for fermi and kepler when
1024 threads are used, but allows the full amount to be used with
smaller thread sizes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
|
|
|
|
|
|
|
| |
Trivial.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some of the lowering steps we currently do for FILE_MEMORY_GLOBAL only
apply to buffers, making it impossible to use FILE_MEMORY_GLOBAL for
OpenCL global buffers.
This commits changes the buffer code to use FILE_MEMORY_BUFFER at the
ir_from_tgsi and lowering steps, freeing use of FILE_MEMORY_GLOBAL
for use with OpenCL global buffers.
Note that after lowering buffer accesses use the FILE_MEMORY_GLOBAL
register file.
Tested with piglet on a gf119 and a gk107:
./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader
[9/9] pass: 9 /
./piglit run -o shader -t '.*arb_compute_shader.*' results/shader
[20/20] skip: 4, pass: 16 |
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Indirect constbuf indexing works by using very large offsets. However if
an indirect constbuf index load is const-propagated, it becomes a very
large const offset. Take that into account when legalizing the SSA by
moving the high parts of that offset into the file index. Also disallow
very large (or small) indices on most other instructions.
This fixes regressions in ubo_array_indexing/*-two-arrays piglit tests.
Fixes: abd326e81b (nv50/ir: propagate indirect loads into instructions)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
| |
For some reason this has been disabled for integers ever since codegen
was merged, despite there being emission code for IMAD. Seems to work.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Teach insnCanLoad about double immediates, together with the
"Add support for merge-s to the ConstantFolding pass"
This turns the following (nvc0) code:
1: mov u32 $r2 0x00000000 (8)
2: mov u32 $r3 0x3fe00000 (8)
3: add f64 $r0d $r0d $r2d (8)
Into:
1: add f64 $r0d $r0d 0.500000 (8)
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
| |
Reported by Coverity
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
|
|
|
|
|
|
|
|
| |
GK20A is mostly compatible with GK104, but uses the SM35 ISA. Use
the GK110 path when this chip is detected.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
| |
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
| |
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
|
| |
This adds support for:
IBFE, UBFE, BFI, LSB, IMSB, UMSB, BREV, POPC
Which are all required for ARB_gs5 support.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
| |
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
|
|
It is planned to ship openSUSE 13.1 with -shared libs.
nouveau.la, nv30.la, nv50.la and nvc0.la are currently LIBADDs in all nouveau
related targets.
This change makes it possible to easily build one shared libnouveau.so which is
then LIBADDed.
Also dlopen will be faster for one library instead of three and build time on
-jX will be reduced.
Whitespace fixes were requested by 'git am'.
Signed-off-by: Johannes Obermayr <johannesobermayr@gmx.de>
Acked-by: Christoph Bumiller <christoph.bumiller@speed.at>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
|