aboutsummaryrefslogtreecommitdiffstats
path: root/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* X86: Match pmin/pmax as a target specific dag combine. This occurs during ↵Benjamin Kramer2012-12-212-3/+2790
| | | | | | | | vectorization. Part of PR14667. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170908 91177308-0d34-0410-b5e6-96231b3b80d8
* R600: Expand vec4 INT <-> FP conversionsTom Stellard2012-12-211-0/+52
| | | | git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170901 91177308-0d34-0410-b5e6-96231b3b80d8
* Add test case for r170674Reed Kotler2012-12-211-0/+29
| | | | git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170823 91177308-0d34-0410-b5e6-96231b3b80d8
* Move these files over to the debug info directory.Eric Christopher2012-12-212-112/+0
| | | | git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170810 91177308-0d34-0410-b5e6-96231b3b80d8
* Revert "Adding support for llvm.arm.neon.vaddl[su].* and"Bob Wilson2012-12-202-128/+0
| | | | | | | This reverts r170694. The operations can be represented in IR without adding any new intrinsics. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170765 91177308-0d34-0410-b5e6-96231b3b80d8
* On some ARM cpus, flags setting movs with shifter operand, i.e. lsl, lsr, asr,Evan Cheng2012-12-201-13/+69
| | | | | | | | | | are more expensive than the non-flag setting variant. Teach thumb2 size reduction pass to avoid generating them unless we are optimizing for size. rdar://12892707 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170728 91177308-0d34-0410-b5e6-96231b3b80d8
* Simplify the testcase a bit.Rafael Espindola2012-12-201-15/+4
| | | | | | I checked that it would still crash llc before the corresponding fix. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170709 91177308-0d34-0410-b5e6-96231b3b80d8
* Adding support for llvm.arm.neon.vaddl[su].* andRenato Golin2012-12-202-0/+128
| | | | | | | | | | llvm.arm.neon.vsub[su].* intrinsics. Patch by Pete Couperus <pjcoup@gmail.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170694 91177308-0d34-0410-b5e6-96231b3b80d8
* fix most of remaining issues with large frames.Reed Kotler2012-12-201-2/+2
| | | | | | | | | | | | these patches are tested a lot by test-suite but make check tests are forthcoming once the next few patches that complete this are committed. with the next few patches the pass rate for mips16 is near 100% git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170656 91177308-0d34-0410-b5e6-96231b3b80d8
* [mips] Use "or $r0, $r1, $zero" instead of "addu $r0, $zero, $r1" to copyAkira Hatanaka2012-12-206-18/+18
| | | | | | | | | | | | physical register $r1 to $r0. GNU disassembler recognizes an "or" instruction as a "move", and this change makes the disassembled code easier to read. Original patch by Reed Kotler. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170655 91177308-0d34-0410-b5e6-96231b3b80d8
* Do not introduce vector operations in functions marked with noimplicitfloat.Bob Wilson2012-12-201-0/+17
| | | | | | <rdar://problem/12879313> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170630 91177308-0d34-0410-b5e6-96231b3b80d8
* LLVM sdisel normalize bit extraction of the form:Evan Cheng2012-12-191-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ((x & 0xff00) >> 8) << 2 to (x >> 6) & 0x3fc This is general goodness since it folds a left shift into the mask. However, the trailing zeros in the mask prevents the ARM backend from using the bit extraction instructions. And worse since the mask materialization may require an addition instruction. This comes up fairly frequently when the result of the bit twiddling is used as memory address. e.g. = ptr[(x & 0xFF0000) >> 16] We want to generate: ubfx r3, r1, #16, #8 ldr.w r3, [r0, r3, lsl #2] vs. mov.w r9, #1020 and.w r2, r9, r1, lsr #14 ldr r2, [r0, r2] Add a late ARM specific isel optimization to ARMDAGToDAGISel::PreprocessISelDAG(). It folds the left shift to the 'base + offset' address computation; change the mask to one which doesn't have trailing zeros and enable the use of ubfx. Note the optimization has to be done late since it's target specific and we don't want to change the DAG normalization. It's also fairly restrictive as shifter operands are not always free. It's only done for lsh 1 / 2. It's known to be free on some cpus and they are most common for address computation. This is a slight win for blowfish, rijndael, etc. rdar://12870177 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170581 91177308-0d34-0410-b5e6-96231b3b80d8
* PowerPC: Expand VSELECT nodes.Benjamin Kramer2012-12-191-0/+7
| | | | | | | | There's probably a better expansion for those nodes than the default for altivec, but this is better than crashing. VSELECTs occur in loop vectorizer output. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170551 91177308-0d34-0410-b5e6-96231b3b80d8
* Optimized load + SIGN_EXTEND patterns in the X86 backend.Elena Demikhovsky2012-12-193-3/+98
| | | | git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170506 91177308-0d34-0410-b5e6-96231b3b80d8
* After reducing the size of an operation in the DAG we zero-extend the reducedNadav Rotem2012-12-191-0/+21
| | | | | | | | | | bitwidth op back to the original size. If we reduce ANDs then this can cause an endless loop. This patch changes the ZEXT to ANY_EXTEND if the demanded bits are equal or smaller than the size of the reduced operation. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170505 91177308-0d34-0410-b5e6-96231b3b80d8
* Teach SimplifySetCC that comparing AssertZext i1 against a constant 1 can be ↵Craig Topper2012-12-191-0/+15
| | | | | | rewritten as a compare against a constant 0 with the opposite condition. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170495 91177308-0d34-0410-b5e6-96231b3b80d8
* Disable ARM partial flag dependency optimization at -OzQuentin Colombet2012-12-181-0/+34
| | | | | | | | | To not over constrain the scheduler for ARM in thumb mode, some optimizations for code size reduction, specific to ARM thumb, are blocked when they add a dependency (like write after read dependency). Disables this check when code size is the priority, i.e., code is compiled with -Oz. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170462 91177308-0d34-0410-b5e6-96231b3b80d8
* MISched: add dependence to ExitSU to model live-out latency.Andrew Trick2012-12-181-0/+48
| | | | git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170454 91177308-0d34-0410-b5e6-96231b3b80d8
* Check multiple register classes for inline asm tied registersHal Finkel2012-12-181-0/+22
| | | | | | | | | | | | | | | | | | A register can be associated with several distinct register classes. For example, on PPC, the floating point registers are each associated with both F4RC (which holds f32) and F8RC (which holds f64). As a result, this code would fail when provided with a floating point register and an f64 operand because it would happen to find the register in the F4RC class first and return that. From the F4RC class, SDAG would extract f32 as the register type and then assert because of the invalid implied conversion between the f64 value and the f32 register. Instead, search all register classes. If a register class containing the the requested register has the requested type, then return that register class. Otherwise, as before, return the first register class found that contains the requested register. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170436 91177308-0d34-0410-b5e6-96231b3b80d8
* Add rest of BMI/BMI2 instructions to the folding tables as well as popcnt ↵Craig Topper2012-12-171-0/+76
| | | | | | and lzcnt. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170304 91177308-0d34-0410-b5e6-96231b3b80d8
* This patch is needed to make c++ exceptions work for mips16.Reed Kotler2012-12-161-0/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mips16 is really a processor decoding mode (ala thumb 1) and in the same program, mips16 and mips32 functions can exist and can call each other. If a jal type instruction encounters an address with the lower bit set, then the processor switches to mips16 mode (if it is not already in it). If the lower bit is not set, then it switches to mips32 mode. The linker knows which functions are mips16 and which are mips32. When relocation is performed on code labels, this lower order bit is set if the code label is a mips16 code label. In general this works just fine, however when creating exception handling tables and dwarf, there are cases where you don't want this lower order bit added in. This has been traditionally distinguished in gas assembly source by using a different syntax for the label. lab1: ; this will cause the lower order bit to be added lab2=. ; this will not cause the lower order bit to be added In some cases, it does not matter because in dwarf and debug tables the difference of two labels is used and in that case the lower order bits subtract each other out. To fix this, I have added to mcstreamer the notion of a debuglabel. The default is for label and debug label to be the same. So calling EmitLabel and EmitDebugLabel produce the same result. For various reasons, there is only one set of labels that needs to be modified for the mips exceptions to work. These are the "$eh_func_beginXXX" labels. Mips overrides the debug label suffix from ":" to "=." . This initial patch fixes exceptions. More changes most likely will be needed to DwarfCFException to make all of this work for actual debugging. These changes will be to emit debug labels in some places where a simple label is emitted now. Some historical discussion on this from gcc can be found at: http://gcc.gnu.org/ml/gcc-patches/2008-08/msg00623.html http://gcc.gnu.org/ml/gcc-patches/2008-11/msg01273.html git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170279 91177308-0d34-0410-b5e6-96231b3b80d8
* X86: Add a couple of target-specific dag combines that turn VSELECTS into ↵Benjamin Kramer2012-12-151-0/+340
| | | | | | | | | | | psubus if possible. We match the pattern "x >= y ? x-y : 0" into "subus x, y" and two special cases if y is a constant. DAGCombiner canonicalizes those so we first have to undo the canonicalization for those cases. The pattern occurs in gzip when the loop vectorizer is enabled. Part of PR14613. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170273 91177308-0d34-0410-b5e6-96231b3b80d8
* This code implements most of mips16 hardfloat as it is done by gcc.Reed Kotler2012-12-151-0/+381
| | | | | | | | | | | | | | | | | | In this case, essentially it is soft float with different library routines. The next step will be to make this fully interoperational with mips32 floating point and that requires creating stubs for functions with signatures that contain floating point types. I have a more sophisticated design for mips16 hardfloat which I hope to implement at a later time that directly does floating point without the need for function calls. The mips16 encoding has no floating point instructions so one needs to switch to mips32 mode to execute floating point instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170259 91177308-0d34-0410-b5e6-96231b3b80d8
* TypeLegalizer: Do not generate target specific nodes with illegal types, ↵Nadav Rotem2012-12-141-0/+22
| | | | | | because we cant type-legalize them. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170245 91177308-0d34-0410-b5e6-96231b3b80d8
* This patch removes some nondeterminism from direct object file outputBill Schmidt2012-12-141-4/+0
| | | | | | | | | | | for TLS dynamic models on 64-bit PowerPC ELF. The default sort routine for relocations only sorts on the r_offset field; but with TLS, there can be two relocations with the same r_offset. For PowerPC, this patch sorts secondarily on descending r_type, which matches the behavior expected by the linker. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170237 91177308-0d34-0410-b5e6-96231b3b80d8
* This patch improves the 64-bit PowerPC InitialExec TLS support by providingBill Schmidt2012-12-142-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | for a wider range of GOT entries that can hold thread-relative offsets. This matches the behavior of GCC, which was not documented in the PPC64 TLS ABI. The ABI will be updated with the new code sequence. Former sequence: ld 9,x@got@tprel(2) add 9,9,x@tls New sequence: addis 9,2,x@got@tprel@ha ld 9,x@got@tprel@l(9) add 9,9,x@tls Note that a linker optimization exists to transform the new sequence into the shorter sequence when appropriate, by replacing the addis with a nop and modifying the base register and relocation type of the ld. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170209 91177308-0d34-0410-b5e6-96231b3b80d8
* [mips] Do not copy GOT address to register $gp if the function being called hasAkira Hatanaka2012-12-131-0/+27
| | | | | | | internal linkage. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170092 91177308-0d34-0410-b5e6-96231b3b80d8
* Fix a bug in DAGCombiner::MatchBSwapHWord. Make sure the node has operands ↵Evan Cheng2012-12-131-0/+46
| | | | | | before referencing them. rdar://12868039 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170078 91177308-0d34-0410-b5e6-96231b3b80d8
* Fix a logic bug in inline expansion of memcpy / memset with an overlappingEvan Cheng2012-12-121-0/+11
| | | | | | | | load / store pair. It's not legal to use a wider load than the size of the remaining bytes if it's the first pair of load / store. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170018 91177308-0d34-0410-b5e6-96231b3b80d8
* The ordering of two relocations on the same instruction is apparently notBill Schmidt2012-12-121-3/+8
| | | | | | | | | predictable when compiled on at least one non-PowerPC host. Source of nondeterminism not apparent. Restrict the test to build on PowerPC hosts for now while looking into the issue further. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170016 91177308-0d34-0410-b5e6-96231b3b80d8
* This patch implements local-dynamic TLS model support for the 64-bitBill Schmidt2012-12-122-0/+73
| | | | | | | | | | | | | | | | | | | | | | | PowerPC target. This is the last of the four models, so we now have full TLS support. This is mostly a straightforward extension of the general dynamic model. I had to use an additional Chain operand to tie ADDIS_DTPREL_HA to the register copy following ADDI_TLSLD_L; otherwise everything above the ADDIS_DTPREL_HA appeared dead and was removed. As before, there are new test cases to test the assembly generation, and the relocations output during integrated assembly. The expected code gen sequence can be read in test/CodeGen/PowerPC/tls-ld.ll. There are a couple of things I think can be done more efficiently in the overall TLS code, so there will likely be a clean-up patch forthcoming; but for now I want to be sure the functionality is in place. Bill git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170003 91177308-0d34-0410-b5e6-96231b3b80d8
* llvm/test/CodeGen/X86/atom-bypass-slow-division.ll: Fix possible typo(s) in ↵NAKAMURA Takumi2012-12-121-4/+4
| | | | | | | | CHECK-NOT lines. Found by Alexander Zinenko, thanks! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169978 91177308-0d34-0410-b5e6-96231b3b80d8
* llvm/test/CodeGen/X86/atom-bypass-slow-division.ll: Rename symbols, ↵NAKAMURA Takumi2012-12-121-20/+20
| | | | | | s/test_/Test/g, not to mismatch "CHECK(-NOT): test". git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169977 91177308-0d34-0410-b5e6-96231b3b80d8
* llvm/test/CodeGen/X86/store_op_load_fold.ll: Fix typo, s/CHECK_NEXT/CHECK-NEXT/NAKAMURA Takumi2012-12-121-1/+1
| | | | git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169957 91177308-0d34-0410-b5e6-96231b3b80d8
* llvm/test/CodeGen/X86/store_op_load_fold.ll: Add explicit triple.NAKAMURA Takumi2012-12-121-1/+1
| | | | git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169956 91177308-0d34-0410-b5e6-96231b3b80d8
* DAGCombine: clamp hi bit in APInt::getBitsSet to avoid assertionManman Ren2012-12-121-1/+18
| | | | | | | rdar://12838504 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169951 91177308-0d34-0410-b5e6-96231b3b80d8
* Avoid using lossy load / stores for memcpy / memset expansion. e.g.Evan Cheng2012-12-121-2/+2
| | | | | | | f64 load / store on non-SSE2 x86 targets. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169944 91177308-0d34-0410-b5e6-96231b3b80d8
* Add R600 backendTom Stellard2012-12-1139-0/+636
| | | | | | A new backend supporting AMD GPUs: Radeon HD2XXX - HD7XXX git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169915 91177308-0d34-0410-b5e6-96231b3b80d8
* This patch implements the general dynamic TLS model for 64-bit PowerPC.Bill Schmidt2012-12-112-0/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Given a thread-local symbol x with global-dynamic access, the generated code to obtain x's address is: Instruction Relocation Symbol addis ra,r2,x@got@tlsgd@ha R_PPC64_GOT_TLSGD16_HA x addi r3,ra,x@got@tlsgd@l R_PPC64_GOT_TLSGD16_L x bl __tls_get_addr(x@tlsgd) R_PPC64_TLSGD x R_PPC64_REL24 __tls_get_addr nop <use address in r3> The implementation borrows from the medium code model work for introducing special forms of ADDIS and ADDI into the DAG representation. This is made slightly more complicated by having to introduce a call to the external function __tls_get_addr. Using the full call machinery is overkill and, more importantly, makes it difficult to add a special relocation. So I've introduced another opcode GET_TLS_ADDR to represent the function call, and surrounded it with register copies to set up the parameter and return value. Most of the code is pretty straightforward. I ran into one peculiarity when I introduced a new PPC opcode BL8_NOP_ELF_TLSGD, which is just like BL8_NOP_ELF except that it takes another parameter to represent the symbol ("x" above) that requires a relocation on the call. Something in the TblGen machinery causes BL8_NOP_ELF and BL8_NOP_ELF_TLSGD to be treated identically during the emit phase, so this second operand was never visited to generate relocations. This is the reason for the slightly messy workaround in PPCMCCodeEmitter.cpp:getDirectBrEncoding(). Two new tests are included to demonstrate correct external assembly and correct generation of relocations using the integrated assembler. Comments welcome! Thanks, Bill git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169910 91177308-0d34-0410-b5e6-96231b3b80d8
* Add a triple to this test.Chad Rosier2012-12-111-1/+1
| | | | git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169803 91177308-0d34-0410-b5e6-96231b3b80d8
* Fix a miscompile in the DAG combiner. Previously, we would incorrectlyChandler Carruth2012-12-111-2/+23
| | | | | | | | | | | | | | | | | | | | | | | try to reduce the width of this load, and would end up transforming: (truncate (lshr (sextload i48 <ptr> as i64), 32) to i32) to (truncate (zextload i32 <ptr+4> as i64) to i32) We lost the sext attached to the load while building the narrower i32 load, and replaced it with a zext because lshr always zext's the results. Instead, bail out of this combine when there is a conflict between a sextload and a zext narrowing. The rest of the DAG combiner still optimize the code down to the proper single instruction: movswl 6(...),%eax Which is exactly what we wanted. Previously we read past the end *and* missed the sign extension: movl 6(...), %eax git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169802 91177308-0d34-0410-b5e6-96231b3b80d8
* move X86-specific testPaul Redmond2012-12-111-1/+1
| | | | | | | | | This test case uses -mcpu=corei7 so it belongs in CodeGen/X86 Reviewed by: Nadav git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169801 91177308-0d34-0410-b5e6-96231b3b80d8
* Fall back to the selection dag isel to select tail calls.Chad Rosier2012-12-111-3/+2
| | | | | | | | | | | | | | | | | | | This shouldn't affect codegen for -O0 compiles as tail call markers are not emitted in unoptimized compiles. Testing with the external/internal nightly test suite reveals no change in compile time performance. Testing with -O1, -O2 and -O3 with fast-isel enabled did not cause any compile-time or execution-time failures. All tests were performed on my x86 machine. I'll monitor our arm testers to ensure no regressions occur there. In an upcoming clang patch I will be marking the objc_autoreleaseReturnValue and objc_retainAutoreleaseReturnValue as tail calls unconditionally. While it's theoretically true that this is just an optimization, it's an optimization that we very much want to happen even at -O0, or else ARC applications become substantially harder to debug. Part of rdar://12553082 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169796 91177308-0d34-0410-b5e6-96231b3b80d8
* Some enhancements for memcpy / memset inline expansion.Evan Cheng2012-12-106-39/+143
| | | | | | | | | | | | | | | | | | | | | | 1. Teach it to use overlapping unaligned load / store to copy / set the trailing bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies. 2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g. x86 and ARM. 3. When memcpy from a constant string, do *not* replace the load with a constant if it's not possible to materialize an integer immediate with a single instruction (required a new target hook: TLI.isIntImmLegal()). 4. Use unaligned load / stores more aggressively if target hooks indicates they are "fast". 5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8. Also increase the threshold to something reasonable (8 for memset, 4 pairs for memcpy). This significantly improves Dhrystone, up to 50% on ARM iOS devices. rdar://12760078 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169791 91177308-0d34-0410-b5e6-96231b3b80d8
* Use GetUnderlyingObjects in mischedHal Finkel2012-12-101-0/+101
| | | | | | | | | | | | | | | | misched used GetUnderlyingObject in order to break false load/store dependencies, and the -enable-aa-sched-mi feature similarly relied on GetUnderlyingObject in order to ensure it is safe to use the aliasing analysis. Unfortunately, GetUnderlyingObject does not recurse through phi nodes, and so (especially due to LSR) all of these mechanisms failed for induction-variable-dependent loads and stores inside loops. This change replaces uses of GetUnderlyingObject with GetUnderlyingObjects (which will recurse through phi and select instructions) in misched. Andy reviewed, tested and simplified this patch; Thanks! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169744 91177308-0d34-0410-b5e6-96231b3b80d8
* Teach DAG combine to handle vector add/sub with vectors of all 0s.Craig Topper2012-12-102-5/+5
| | | | git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169727 91177308-0d34-0410-b5e6-96231b3b80d8
* Teach DAG combine to handle vector logical operations with vectors of all 1s ↵Craig Topper2012-12-083-23/+21
| | | | | | or all 0s. These cases can show up when vectors are split for legalizing. Fix some tests that were dependent on these cases not being combined. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169684 91177308-0d34-0410-b5e6-96231b3b80d8
* When we use the BLEND instruction that uses the MSB as a mask, we can removeNadav Rotem2012-12-072-2/+2
| | | | | | | | | | the VSRI instruction before it since it does not affect the MSB. Thanks Craig Topper for suggesting this. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169638 91177308-0d34-0410-b5e6-96231b3b80d8
* In hexagon convertToHardwareLoop, don't deref end() iteratorMatthew Curtis2012-12-071-1/+1
| | | | | | | | | | | In particular, check if MachineBasicBlock::iterator is end() before using it to call getDebugLoc(); See also this thread on llvm-commits: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121112/155914.html git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169634 91177308-0d34-0410-b5e6-96231b3b80d8
* X86: Prefer using VPSHUFD over VPERMIL because it has better throughput.Nadav Rotem2012-12-073-5/+5
| | | | git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169624 91177308-0d34-0410-b5e6-96231b3b80d8