diff options
author | Chandler Carruth <chandlerc@gmail.com> | 2013-08-22 12:45:17 +0000 |
---|---|---|
committer | Chandler Carruth <chandlerc@gmail.com> | 2013-08-22 12:45:17 +0000 |
commit | 474be0d0f83eb6543bd4091946b40bb4967a3c11 (patch) | |
tree | c204219411036c5974a32ab64afecd02125a6d87 /lib/Transforms/Vectorize | |
parent | bad8d4ca599024de8fdc6255a4b73bb294f49239 (diff) | |
download | external_llvm-474be0d0f83eb6543bd4091946b40bb4967a3c11.zip external_llvm-474be0d0f83eb6543bd4091946b40bb4967a3c11.tar.gz external_llvm-474be0d0f83eb6543bd4091946b40bb4967a3c11.tar.bz2 |
Teach the SLP vectorizer the correct way to check for consecutive access
using GEPs. Previously, it used a number of different heuristics for
analyzing the GEPs. Several of these were conservatively correct, but
failed to fall back to SCEV even when SCEV might have given a reasonable
answer. One was simply incorrect in how it was formulated.
There was good code already to recursively evaluate the constant offsets
in GEPs, look through pointer casts, etc. I gathered this into a form
code like the SLP code can use in a previous commit, which allows all of
this code to become quite simple.
There is some performance (compile time) concern here at first glance as
we're directly attempting to walk both pointers constant GEP chains.
However, a couple of thoughts:
1) The very common cases where there is a dynamic pointer, and a second
pointer at a constant offset (usually a stride) from it, this code
will actually not do any unnecessary work.
2) InstCombine and other passes work very hard to collapse constant
GEPs, so it will be rare that we iterate here for a long time.
That said, if there remain performance problems here, there are some
obvious things that can improve the situation immensely. Doing
a vectorizer-pass-wide memoizer for each individual layer of pointer
values, their base values, and the constant offset is likely to be able
to completely remove redundant work and strictly limit the scaling of
the work to scrape these GEPs. Since this optimization was not done on
the prior version (which would still benefit from it), I've not done it
here. But if folks have benchmarks that slow down it should be straight
forward for them to add.
I've added a test case, but I'm not really confident of the amount of
testing done for different access patterns, strides, and pointer
manipulation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@189007 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'lib/Transforms/Vectorize')
-rw-r--r-- | lib/Transforms/Vectorize/SLPVectorizer.cpp | 64 |
1 files changed, 15 insertions, 49 deletions
diff --git a/lib/Transforms/Vectorize/SLPVectorizer.cpp b/lib/Transforms/Vectorize/SLPVectorizer.cpp index c9b8e7b..b1f097e 100644 --- a/lib/Transforms/Vectorize/SLPVectorizer.cpp +++ b/lib/Transforms/Vectorize/SLPVectorizer.cpp @@ -992,63 +992,29 @@ bool BoUpSLP::isConsecutiveAccess(Value *A, Value *B) { if (PtrA == PtrB || PtrA->getType() != PtrB->getType()) return false; - // Calculate a constant offset from the base pointer without using SCEV - // in the supported cases. - // TODO: Add support for the case where one of the pointers is a GEP that - // uses the other pointer. - GetElementPtrInst *GepA = dyn_cast<GetElementPtrInst>(PtrA); - GetElementPtrInst *GepB = dyn_cast<GetElementPtrInst>(PtrB); - - unsigned BW = DL->getPointerSizeInBits(ASA); + unsigned PtrBitWidth = DL->getPointerSizeInBits(ASA); Type *Ty = cast<PointerType>(PtrA->getType())->getElementType(); - int64_t Sz = DL->getTypeStoreSize(Ty); + APInt Size(PtrBitWidth, DL->getTypeStoreSize(Ty)); - // Check if PtrA is the base and PtrB is a constant offset. - if (GepB && GepB->getPointerOperand() == PtrA) { - APInt Offset(BW, 0); - if (GepB->accumulateConstantOffset(*DL, Offset)) - return Offset.getSExtValue() == Sz; - return false; - } + APInt OffsetA(PtrBitWidth, 0), OffsetB(PtrBitWidth, 0); + PtrA = PtrA->stripAndAccumulateInBoundsConstantOffsets(*DL, OffsetA); + PtrB = PtrB->stripAndAccumulateInBoundsConstantOffsets(*DL, OffsetB); - // Check if PtrB is the base and PtrA is a constant offset. - if (GepA && GepA->getPointerOperand() == PtrB) { - APInt Offset(BW, 0); - if (GepA->accumulateConstantOffset(*DL, Offset)) - return Offset.getSExtValue() == -Sz; - return false; - } + APInt OffsetDelta = OffsetB - OffsetA; - // If both pointers are GEPs: - if (GepA && GepB) { - // Check that they have the same base pointer and number of indices. - if (GepA->getPointerOperand() != GepB->getPointerOperand() || - GepA->getNumIndices() != GepB->getNumIndices()) - return false; + // Check if they are based on the same pointer. That makes the offsets + // sufficient. + if (PtrA == PtrB) + return OffsetDelta == Size; - // Try to strip the geps. This makes SCEV faster. - // Make sure that all of the indices except for the last are identical. - int LastIdx = GepA->getNumIndices(); - for (int i = 0; i < LastIdx - 1; i++) { - if (GepA->getOperand(i+1) != GepB->getOperand(i+1)) - return false; - } - - PtrA = GepA->getOperand(LastIdx); - PtrB = GepB->getOperand(LastIdx); - Sz = 1; - } - - ConstantInt *CA = dyn_cast<ConstantInt>(PtrA); - ConstantInt *CB = dyn_cast<ConstantInt>(PtrB); - if (CA && CB) { - return (CA->getSExtValue() + Sz == CB->getSExtValue()); - } + // Compute the necessary base pointer delta to have the necessary final delta + // equal to the size. + APInt BaseDelta = Size - OffsetDelta; - // Calculate the distance. + // Otherwise compute the distance with SCEV between the base pointers. const SCEV *PtrSCEVA = SE->getSCEV(PtrA); const SCEV *PtrSCEVB = SE->getSCEV(PtrB); - const SCEV *C = SE->getConstant(PtrSCEVA->getType(), Sz); + const SCEV *C = SE->getConstant(BaseDelta); const SCEV *X = SE->getAddExpr(PtrSCEVA, C); return X == PtrSCEVB; } |