diff options
author | Quentin Colombet <qcolombet@apple.com> | 2013-07-23 22:34:47 +0000 |
---|---|---|
committer | Quentin Colombet <qcolombet@apple.com> | 2013-07-23 22:34:47 +0000 |
commit | 17f99a991f2e270a34c53854ce80acc30754537b (patch) | |
tree | 5c2d54fa2e1e32fe01a1a8b85db83711b2799243 /test | |
parent | 00d92eee327b7ac9d91bc804843f70dea5dfc068 (diff) | |
download | external_llvm-17f99a991f2e270a34c53854ce80acc30754537b.zip external_llvm-17f99a991f2e270a34c53854ce80acc30754537b.tar.gz external_llvm-17f99a991f2e270a34c53854ce80acc30754537b.tar.bz2 |
[ARM][ISel] Improve the lowering of vector loads.
When vectors are built from a single value, the ARM lowering issues a
scalar_to_vector node.
This node is then always morphed into a move from the general purpose unit to
the vector unit.
When the value comes from a load, this can be simplified into a vector load to
the right lane.
This patch changes the lowering of insert_vector_elt to expose a vector
friendly pattern in this situation.
This is a step toward fixing <rdar://problem/14170854>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186999 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'test')
-rw-r--r-- | test/CodeGen/ARM/vector-DAGCombine.ll | 14 |
1 files changed, 14 insertions, 0 deletions
diff --git a/test/CodeGen/ARM/vector-DAGCombine.ll b/test/CodeGen/ARM/vector-DAGCombine.ll index 6d586f2..3e13819 100644 --- a/test/CodeGen/ARM/vector-DAGCombine.ll +++ b/test/CodeGen/ARM/vector-DAGCombine.ll @@ -184,3 +184,17 @@ entry: ; Function Attrs: nounwind readnone declare <8 x i16> @llvm.arm.neon.vmullu.v8i16(<8 x i8>, <8 x i8>) + +; Check that (insert_vector_elt (load)) => (vector_load). +; Thus, check that scalar_to_vector do not interfer with that. +define <8 x i16> @t4(i8* nocapture %sp0) { +; CHECK: t4 +; CHECK: vld1.32 {{{d[0-9]+}}[0]}, [r0] +entry: + %pix_sp0.0.cast = bitcast i8* %sp0 to i32* + %pix_sp0.0.copyload = load i32* %pix_sp0.0.cast, align 1 + %vec = insertelement <2 x i32> undef, i32 %pix_sp0.0.copyload, i32 0 + %0 = bitcast <2 x i32> %vec to <8 x i8> + %vmull.i = tail call <8 x i16> @llvm.arm.neon.vmullu.v8i16(<8 x i8> %0, <8 x i8> %0) + ret <8 x i16> %vmull.i +} |