teak-llvm

mirror of https://github.com/Gericom/teak-llvm.git synced 2025-06-25 14:28:54 -04:00

Author	SHA1	Message	Date
Pablo Barrio	a17f855698	[AArch64] Add command-line option for SSBS Summary: SSBS (Speculative Store Bypass Safe) is only mandatory from 8.5 onwards but is optional from Armv8.0-A. This patch adds a command line option to enable SSBS, as it was previously only possible to enable by selecting -march=armv8.5-a. Similar patch upstream in GNU binutils: https://sourceware.org/ml/binutils/2018-09/msg00274.html Reviewers: olista01, samparker, aemerson Reviewed By: samparker Subscribers: javed.absar, kristof.beyls, kristina, llvm-commits Differential Revision: https://reviews.llvm.org/D54629 llvm-svn: 348137	2018-12-03 14:00:47 +00:00
Ron Lieberman	16de4fd2eb	[AMDGPU] Add sdwa support for ADD\|SUB U64 decomposed Pseudos The introduction of S_{ADD\|SUB}_U64_PSEUDO instructions which are decomposed into VOP3 instruction pairs for S_ADD_U64_PSEUDO: V_ADD_I32_e64 V_ADDC_U32_e64 and for S_SUB_U64_PSEUDO V_SUB_I32_e64 V_SUBB_U32_e64 preclude the use of SDWA to encode a constant. SDWA: Sub-Dword addressing is supported on VOP1 and VOP2 instructions, but not on VOP3 instructions. We desire to fold the bit-and operand into the instruction encoding for the V_ADD_I32 instruction. This requires that we transform the VOP3 into a VOP2 form of the instruction (_e32). %19:vgpr_32 = V_AND_B32_e32 255, killed %16:vgpr_32, implicit $exec %47:vgpr_32, %49:sreg_64_xexec = V_ADD_I32_e64 %26.sub0:vreg_64, %19:vgpr_32, implicit $exec %48:vgpr_32, dead %50:sreg_64_xexec = V_ADDC_U32_e64 %26.sub1:vreg_64, %54:vgpr_32, killed %49:sreg_64_xexec, implicit $exec which then allows the SDWA encoding and becomes %47:vgpr_32 = V_ADD_I32_sdwa 0, %26.sub0:vreg_64, 0, killed %16:vgpr_32, 0, 6, 0, 6, 0, implicit-def $vcc, implicit $exec %48:vgpr_32 = V_ADDC_U32_e32 0, %26.sub1:vreg_64, implicit-def $vcc, implicit $vcc, implicit $exec Differential Revision: https://reviews.llvm.org/D54882 llvm-svn: 348132	2018-12-03 13:04:54 +00:00
Tim Northover	5745b6ac3b	ARM: use target-specific SUBS node when combining cmp with cmov. This has two positive effects. First, using a custom node prevents recombination leading to an infinite loop since the output DAG is notionally a little more complex than the input one. Using a flag-setting instruction also allows the subtraction to be folded with the related comparison more easily. https://reviews.llvm.org/D53190 llvm-svn: 348122	2018-12-03 11:16:21 +00:00
Diogo N. Sampaio	3c7d062b6b	[NFC][AArch64] Split out backend features This patch splits backend features currently hidden behind architecture versions. For example, currently the only way to activate complex numbers extension is targeting an v8.3 architecture, where after the patch this extension can be added separately. This refactoring is required by the new command lines proposal: http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html Reviewers: DavidSpickett, olista01, t.p.northover Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio Differential revision: https://reviews.llvm.org/D54633 llvm-svn: 348121	2018-12-03 11:08:13 +00:00
Oliver Stannard	4cf35b4ab0	[ARM][MC] Move information about variadic register defs into tablegen Currently, variadic operands on an MCInst are assumed to be uses, because they come after the defs. However, this is not always the case, for example the Arm/Thumb LDM instructions write to a variable number of registers. This adds a property of instruction definitions which can be used to mark variadic operands as defs. This only affects MCInst, because MachineInstruction already tracks use/def per operand in each instance of the instruction, so can already represent this. This property can then be checked in MCInstrDesc, allowing us to remove some special cases in ARMAsmParser::isITBlockTerminator. Differential revision: https://reviews.llvm.org/D54853 llvm-svn: 348114	2018-12-03 10:32:42 +00:00
Oliver Stannard	c588110f13	[ARM][Asm] Debug trace for the processInstruction loop In the Arm assembly parser, we first match an instruction, then call processInstruction to possibly change it to a different encoding, to match rules in the architecture manual which can't be expressed by the table-generated matcher. This adds debug printing so that this process is visible when using the -debug option. To support this, I've added a new overload of MCInst::dump_pretty which takes the opcode name as a StringRef, since we don't have an InstPrinter instance in the assembly parser. Instead, we can get the same information directly from the MCInstrInfo. Differential revision: https://reviews.llvm.org/D54852 llvm-svn: 348113	2018-12-03 10:21:28 +00:00
Sjoerd Meijer	5afc957eba	[ARM] FP16: support vld1.16 for vector loads with post-increment Differential Revision: https://reviews.llvm.org/D55112 llvm-svn: 348110	2018-12-03 08:26:34 +00:00
Kang Zhang	51986417f9	[PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction Summary: There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD. These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D54738 llvm-svn: 348109	2018-12-03 03:32:57 +00:00
QingShan Zhang	8b7653db72	[NFC] [PowerPC] add an routine in PPCTargetLowering to determine if a global is accessed as got-indirect or not. In theory, we should let the PPC target to determine how to lower the TOC Entry for globals. And the PPCTargetLowering requires this query to do some optimization for TOC_Entry. Differential Revision: https://reviews.llvm.org/D54925 llvm-svn: 348108	2018-12-03 03:32:16 +00:00
Craig Topper	959b415e2f	[X86] Add a DAG combine to turn stores of vXi1 on pre-avx512 targets into a bitcast and a store of a iX scalar. llvm-svn: 348104	2018-12-02 19:47:14 +00:00
Craig Topper	6f54ff57fd	[X86] Fix bad comment. NFC llvm-svn: 348103	2018-12-02 19:47:13 +00:00
Craig Topper	204e4110e0	[X86] Simplify LowerBITCAST code for v2i32/v4i16/v8i8/i64->mmx/i64/f64 bitcast. Previously this code generated its own extracts and build_vector. But we can use a simpler concat_vectors or scalar_to_vector operation and let type legalization do additional legalization of those operations. llvm-svn: 348087	2018-12-02 07:52:39 +00:00
Craig Topper	4bb077910a	[X86] Add custom type legalization for v2i32/v4i16/v8i8->mmx bitcasts to avoid a store/load to/from the stack. Widen the input to a 128 bit vector by padding with undef elements. Then use a movdq2q to convert from xmm register to mmx register. llvm-svn: 348086	2018-12-02 05:46:50 +00:00
Craig Topper	ec096a1dae	[X86] Custom type legalize v2i32/v4i16/v8i8->i64 bitcasts in 64-bit mode similar to what's done when the destination is f64. The generic legalizer will fall back to a stack spill that uses a truncating store. That store will get expanded into a shuffle and non-truncating store on pre-avx512 targets. Once that happens the stack store/load pair will be combined away leaving behind the shuffle and bitcasts. On avx512 targets the truncating store is legal so doesn't get folded away. By custom legalizing it we can avoid this churn and maybe produce better code. llvm-svn: 348085	2018-12-02 05:46:48 +00:00
Jessica Paquette	9a7103b0f8	[MachineOutliner][AArch64] Improve checks for stack instructions If we know that we'll definitely save LR to a register, there's no reason to pre-check whether or not a stack instruction is unsafe to fix up. This makes it so that we check for that condition before mapping instructions. This allows us to outline more, since we don't pessimise as many instructions. Also update some tests, since we outline more. llvm-svn: 348081	2018-12-01 21:24:06 +00:00
Craig Topper	f4b13927e7	[X86] Don't use zero_extend_vector_inreg for mulhu lowering with sse 4.1 Summary: With sse4.1 we use two zero_extend_vector_inreg and a pshufd to expand the v16i8 input into two v8i16 vectors for the multiply. That's 3 shuffles to extend one operand. The other operand is usually constant as this is mostly used by division by constant optimization. Pre sse4.1 we use a punpckhbw and a punpcklbw with a zero vector. That's two shuffles and an xor and a copy due to tied register constraints. That seems maybe better than the 3 shuffles. With AVX we avoid the copy so that's obviously better. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55138 llvm-svn: 348079	2018-12-01 19:26:31 +00:00
Graham Sellers	ba559ac058	[AMDGPU] Split 64-Bit XNOR to 64-Bit NOT/XOR The identity ~(x ^ y) == (~x ^ y) == (x ^ ~y) allows XNOR (XOR/NOT) to turn into NOT/XOR. Handling this case with its own split means we can make the NOT remain in the scalar unit. Previously, we split 64-bit XNOR into two 32-bit XNOR, then lowered. Now, we get three instructions (s_not, v_xor, v_xor) rather than four in the case where either of the sources is a scalar 64-bit. Add test cases to xnor.ll to attempt XNOR Vx, Sy and XNOR Sx, Vy. Also adding test that uses the opposite identity such that (~x ^ y) on the scalar unit (or vector for gfx906) can generate XNOR. This already worked, but I didn't see a test for it. Differential: https://reviews.llvm.org/D55071 llvm-svn: 348075	2018-12-01 12:27:53 +00:00
Alex Bradbury	757d296222	[RISCV] Remove RV64I SLLW/SRLW/SRAW patterns and add new test cases As noted by Eli Friedman <https://reviews.llvm.org/D52977?id=168629#1315291>, the RV64I shift patterns for SLLW/SRLW/SRAW make some incorrect assumptions. SRAW assumed that (sext_inreg foo, i32) could only be produced when sign-extended an i32. However, it can be produced by input such as: define i64 @tricky_ashr(i64 %a, i64 %b) { %1 = shl i64 %a, 32 %2 = ashr i64 %1, 32 %3 = ashr i64 %2, %b ret i64 %3 } It's important not to select sraw in the above case, because sraw only uses bits lower 5 bits from the shift, while a shift of 32-63 would be valid. Similarly, the patterns for srlw assumed (and foo, 0xffffffff) would only be produced when zero-extending a value that was originally i32 in LLVM IR. This is obviously incorrect. This patch removes the SLLW/SRLW/SRAW shift patterns for the time being and adds test cases that would demonstrate a miscompile if the incorrect patterns were re-added. llvm-svn: 348067	2018-12-01 05:00:00 +00:00
Artem Belevich	e5664b1559	[NVPTX] Add lowering of i128 numbers as struct fields Addition to D34555 - override VTs computation with ComputePTXValueVTs for struct fields. Author: Denys Zariaiev<denys.zariaiev@gmail.com> Differential Revision: https://reviews.llvm.org/D55144 llvm-svn: 348057	2018-12-01 00:21:52 +00:00
Nicolai Haehnle	a7b00058e0	AMDGPU: Divergence-driven selection of scalar buffer load intrinsics Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 348050	2018-11-30 22:55:38 +00:00
Nicolai Haehnle	a9cc92c247	AMDGPU: Fix various issues around the VirtReg2Value mapping Summary: The VirtReg2Value mapping is crucial for getting consistently reliable divergence information into the SelectionDAG. This patch fixes a bunch of issues that lead to incorrect divergence info and introduces tight assertions to ensure we don't regress: 1. VirtReg2Value is generated lazily; there were some cases where a lookup was performed before all relevant virtual registers were created, leading to an out-of-sync mapping. Those cases were: - Complex code to lower formal arguments that generated CopyFromReg nodes from live-in registers (fixed by never querying the mapping for live-in registers). - Code that generates CopyToReg for formal arguments that are used outside the entry basic block (fixed by never querying the mapping for Register nodes, which don't need the divergence info anyway). 2. For complex values that are lowered to a sequence of registers, all registers must be reflected in the VirtReg2Value mapping. I am not adding any new tests, since I'm not actually aware of any bugs that these problems are causing with trunk as-is. However, I recently added a test case (in r346423) which fails when D53283 is applied without this change. Also, the new assertions should provide most of the effective test coverage. There is one test change in sdwa-peephole.ll. The underlying issue is that since the divergence info is now correct, the DAGISel will select V_OR_B32 directly instead of S_OR_B32. This leads to an extra COPY which affects the behavior of MachineLICM in a way that ends up with the S_MOV_B32 with the constant in a different basic block than the V_OR_B32, which is presumably what defeats the peephole. Reviewers: alex-t, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54340 llvm-svn: 348049	2018-11-30 22:55:29 +00:00
Jessica Paquette	1cb18ec4ec	[MachineOutliner] Outline both register save calls + no LR save calls together Instead of treating the outlined functions for these as distinct frames, they should be combined into one case. Neither allows for stack fixups, and both generate the same frame. Thus, they ought to be considered one case. This makes the code far easier to understand, for one thing. It also offers some small code size improvements. It's fairly rare to see a class of outlined functions that doesn't fall entirely into one variant (on CTMark anyway). It does happen from time to time though. This mostly offers some serious simplification. Also update the test to show the added functionality. llvm-svn: 348036	2018-11-30 21:14:58 +00:00
Peter Collingbourne	35fcc294ab	AArch64: Don't emit CFI for SCS register in nounwind functions. All that you can legitimately do with the CFI for a nounwind function is get a backtrace, and adjusting the SCS register is not (currently) required for this purpose. Differential Revision: https://reviews.llvm.org/D54988 llvm-svn: 348035	2018-11-30 21:04:25 +00:00
Craig Topper	4d80f199e8	[X86] Change vXi8 MULHU lowering to unpack high and low half of lanes instead of extracting and concating low and high half registers. This reduces the number of shuffle operations that need to be done. The splitting strategy requires the shuffle unit for the extraction and the extension. With the unpack strategy the unpacks accomplish a splitting and extending in one operation. llvm-svn: 348019	2018-11-30 18:43:18 +00:00
Craig Topper	8191307d09	[X86] Prefer lowerVectorShuffleAsBitMask over using a avx512 masked operation when avx512bw/avx512vl is enabled. This does require a constant pool load instead of loading an immediate into a gpr, moving to a k register and masking. But its less instructions and more consistent with previous ISAs. It probably opens up more combine opportunities as one of the test cases demonstrates. llvm-svn: 348018	2018-11-30 18:43:15 +00:00
Ron Lieberman	f48e43bbf7	[AMDGPU] Disable SReg Global LD/ST, perf regression Differential Revision: https://reviews.llvm.org/D55093 llvm-svn: 348014	2018-11-30 18:29:17 +00:00
Valery Pykhtin	3d9afa273f	[AMDGPU] Combine DPP mov with use instructions (VOP1/2/3) Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses. Differential revision: https://reviews.llvm.org/D53762 llvm-svn: 347993	2018-11-30 14:21:56 +00:00
Alex Bradbury	4830fdd21a	[RISCV] Add additional CSR instruction aliases (imm. operands) This patch adds CSR instructions aliases for the cases where the instruction takes an immediate operand but the alias doesn't have the i suffix. This is necessary for gas/gcc compatibility. gas doesn't do a similar conversion for fsflags or fsrm, so this should be complete. Differential Revision: https://reviews.llvm.org/D55008 Patch by Luís Marques. llvm-svn: 347991	2018-11-30 14:10:52 +00:00
Alex Bradbury	26403def69	[RISCV] Add UNIMP instruction (32- and 16-bit forms) This patch adds support for UNIMP in both 32- and 16-bit forms. The 32-bit form can be seen as a variant of the ECALL/EBREAK/etc. family of instructions. The 16-bit form is just all zeroes, which isn't a valid RISC-V instruction, but still follows the 16-bit instruction form (i.e. bits 0-1 != 11). Until recently unimp was undocumented and supported just by binutils, which printed unimp for either the 16 or 32-bit form. Both forms are now documented <https://github.com/riscv/riscv-asm-manual/pull/20> and binutils now supports c.unimp <https://sourceware.org/ml/binutils-cvs/2018-11/msg00179.html>. Differential Revision: https://reviews.llvm.org/D54316 Patch by Luís Marques. llvm-svn: 347988	2018-11-30 13:39:17 +00:00
Alex Bradbury	e0e62e97df	[TargetLowering][RISCV] Introduce isSExtCheaperThanZExt hook and implement for RISC-V DAGTypeLegalizer::PromoteSetCCOperands currently prefers to zero-extend operands when it is able to do so. For some targets this is more expensive than a sign-extension, which is also a valid choice. Introduce the isSExtCheaperThanZExt hook and use it in the new SExtOrZExtPromotedInteger helper. On RISC-V, we prefer sign-extension for FromTy == MVT::i32 and ToTy == MVT::i64, as it can be performed using a single instruction. Differential Revision: https://reviews.llvm.org/D52978 llvm-svn: 347977	2018-11-30 09:56:54 +00:00
Alex Bradbury	bc96a98ed0	[RISCV] Introduce codegen patterns for instructions introduced in RV64I As discussed in the RFC <http://lists.llvm.org/pipermail/llvm-dev/2018-October/126690.html>, 64-bit RISC-V has i64 as the only legal integer type. This patch introduces patterns to support codegen of the new instructions introduced in RV64I: addiw, addiw, subw, sllw, slliw, srlw, srliw, sraw, sraiw, ld, sd. Custom selection code is needed for srliw as SimplifyDemandedBits will remove lower bits from the mask, meaning the obvious pattern won't work: def : Pat<(sext_inreg (srl (and GPR:$rs1, 0xffffffff), uimm5:$shamt), i32), (SRLIW GPR:$rs1, uimm5:$shamt)>; This is sufficient to compile and execute all of the GCC torture suite for RV64I other than those files using frameaddr or returnaddr intrinsics (LegalizeDAG doesn't know how to promote the operands - a future patch addresses this). When promoting i32 sltu/sltiu operands, it would be more efficient to use sign-extension rather than zero-extension for RV64. A future patch adds a hook to allow this. Differential Revision: https://reviews.llvm.org/D52977 llvm-svn: 347973	2018-11-30 09:38:44 +00:00
Craig Topper	a2133061c0	[X86] Emit PACKUS directly from the v16i8 LowerMULH code instead of using a shuffle. llvm-svn: 347967	2018-11-30 08:32:05 +00:00
Craig Topper	6e4b266a0d	[X86] Change the pre-sse4.1 code in the v16i8 MULHU lowering to be what we get after DAG combine cleans it up. Previously we emitted a punpcklbw/punpckhbw to move the byte elements into the upper half of 16 bit elements then shifted right by 8 to zero the upper bits. After DAG combine we end up with punpcklbw/punpckhbw into the lower bits with zeros in the uppers bits and no shifts. So just emit that directly. llvm-svn: 347966	2018-11-30 08:32:01 +00:00
Sjoerd Meijer	ecc7dcb879	[ARM] Don't expand sdiv when optimising for minsize Don't expand SDIV with an immediate that is a power of 2 if we optimise for minimum code size. For example: sdiv %1, i32 4 gets expanded to a sequence of 3 instructions, but this is suboptimal for minimum code size so instead we just generate a MOV and a SDIV if integer division is supported. Differential Revision: https://reviews.llvm.org/D54546 llvm-svn: 347965	2018-11-30 08:14:28 +00:00
Jonas Paulsson	b1d014883c	[SystemZ::TTI] i8/i16 operands extension costs revisited Three minor changes to these extra costs: * For ICmp instructions, instead of adding 2 all the time for extending each operand, this is only done if that operand is neither a load or an immediate. * The operands extension costs for divides removed, because we now use a high cost already for the divide (20). * The costs for lhsr/ashr extra costs removed as this did not seem useful. Review: Ulrich Weigand https://reviews.llvm.org/D55053 llvm-svn: 347961	2018-11-30 07:09:34 +00:00
Craig Topper	0850e8a6b6	[X86] Fix a couple types in SimplifyDemandedVectorEltsForTargetNode. NFCI We had a EVT variable capturing the result of getSimpleValueType which returns an MVT. Another place using EVT that could have been MVT. And an 'int' that should be 'unsigned'. llvm-svn: 347959	2018-11-30 06:23:55 +00:00
Mircea Trofin	5e0b21fb45	Fix build warnings introduced in rL347938 Summary: Suppressed warnings in release builds due to variable used only in assert statement. Subscribers: llvm-commits, eraman, mgorny Differential Revision: https://reviews.llvm.org/D55100 llvm-svn: 347939	2018-11-30 01:53:17 +00:00
Mircea Trofin	f1a49e8525	Revert "Revert r347596 "Support for inserting profile-directed cache prefetches"" Summary: This reverts commit d8517b96dfbd42e6a8db33c50d1fa1e58e63fbb9. Fix: correct the use of DenseMap. Reviewers: davidxl, hans, wmi Reviewed By: wmi Subscribers: mgorny, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D55088 llvm-svn: 347938	2018-11-30 01:01:52 +00:00
Thomas Lively	66ea30c7bc	[WebAssembly] Expand unavailable integer operations for vectors Summary: Expands for vector types all of the integer operations that are expanded for scalars because they are not supported at all by WebAssembly. This CL has no tests because such tests would really be testing the target-independent expansion, but I'm happy to add tests if reviewers think it would be helpful. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55010 llvm-svn: 347923	2018-11-29 22:01:01 +00:00
Jonas Devlieghere	ccf7d4b4aa	Produce an error on non-encodable offsets for darwin ARM scattered relocations. Scattered ARM relocations for Mach-O's only have 24 bits available to encode the offset. This is not checked but just truncated and can result in corrupt binaries after linking because the relocations are applied to the wrong offset. This patch will check and error out in those situations instead of emitting a wrong relocation. Patch by: Sander Bogaert (dzn) Differential revision: https://reviews.llvm.org/D54776 llvm-svn: 347922	2018-11-29 21:58:23 +00:00
Alex Bradbury	66d9a752b9	[RISCV] Implement codegen for cmpxchg on RV32IA Utilise a similar ('late') lowering strategy to D47882. The changes to AtomicExpandPass allow this strategy to be utilised by other targets which implement shouldExpandAtomicCmpXchgInIR. All cmpxchg are lowered as 'strong' currently and failure ordering is ignored. This is conservative but correct. Differential Revision: https://reviews.llvm.org/D48131 llvm-svn: 347914	2018-11-29 20:43:42 +00:00
Craig Topper	73c1d75d58	[X86] Change the pre-type legalization DAG combine added in r347898 into a custom type legalization operation instead. This seems to produce the same results on the tests we have. llvm-svn: 347912	2018-11-29 20:18:58 +00:00
David Stuttard	c6603861d8	Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic" Also revert fix r347876 One of the buildbots was reporting a failure in some relevant tests that I can't repro or explain at present, so reverting until I can isolate. llvm-svn: 347911	2018-11-29 20:14:17 +00:00
Francis Visoiu Mistrih	0b8dd4488e	[MachineScheduler] Order FI-based memops based on stack direction It makes more sense to order FI-based memops in descending order when the stack goes down. This allows offsets to stay "consecutive" and allow easier pattern matching. llvm-svn: 347906	2018-11-29 20:03:19 +00:00
Craig Topper	129d529ab3	[SelectionDAG][AArch64][X86] Move legalization of vector MULHS/MULHU from LegalizeDAG to LegalizeVectorOps I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG. I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse. Differential Revision: https://reviews.llvm.org/D54276 llvm-svn: 347902	2018-11-29 19:36:17 +00:00
Craig Topper	6cd0b17078	[X86] Add a DAG combine pre type legalization to widen division by constant splat on narrow vectors to avoid scalarization This is another patch for -x86-experimental-vector-widening. This pre widens narrow division by constants so that we can get pass the legal type check in the generic DAG combiner. Otherwise we end up scalarizing. I've restricted this to splats for now because it was easy to just call DAG.getConstant. Not sure what we should do for non-splat? Increase the element size?Widen the constant vector by padding with 1? Differential Revision: https://reviews.llvm.org/D54919 llvm-svn: 347898	2018-11-29 19:13:38 +00:00
Graham Sellers	04f7a4d2d2	[AMDGPU] Add and update scalar instructions This patch adds support for S_ANDN2, S_ORN2 32-bit and 64-bit instructions and adds splits to move them to the vector unit (for which there is no equivalent instruction). It modifies the way that the more complex scalar instructions are lowered to vector instructions by first breaking them down to sequences of simpler scalar instructions which are then lowered through the existing code paths. The pattern for S_XNOR has also been updated to apply inversion to one input rather than the output of the XOR as the result is equivalent and may allow leaving the NOT instruction on the scalar unit. A new tests for NAND, NOR, ANDN2 and ORN2 have been added, and existing tests now hit the new instructions (and have been modified accordingly). Differential: https://reviews.llvm.org/D54714 llvm-svn: 347877	2018-11-29 16:05:38 +00:00
David Stuttard	535c1af0bf	Fix: Add support for TFE/LWE in image intrinsic My change svn-id: 347871 caused a buildbot failure due to an unused variable def (used in an assert). Change-Id: Ia882d18bb6fa79b4d7bbfda422b9ea5d23eab336 llvm-svn: 347876	2018-11-29 15:56:36 +00:00
David Stuttard	de02e4b1cc	Add support for TFE/LWE in image intrinsics TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda llvm-svn: 347871	2018-11-29 15:21:13 +00:00
Hans Wennborg	6e3be9d12e	Revert r347596 "Support for inserting profile-directed cache prefetches" It causes asserts building BoringSSL. See https://crbug.com/91009#c3 for repro. This also reverts the follow-ups: Revert r347724 "Do not insert prefetches with unsupported memory operands." Revert r347606 "[X86] Add dependency from X86 to ProfileData after rL347596" Revert r347607 "Add new passes to X86 pipeline tests" llvm-svn: 347864	2018-11-29 13:58:02 +00:00
Petr Pavlu	e6406d568c	[GlobalISel] Make EnableGlobalISel always set when GISel is enabled Change meaning of TargetOptions::EnableGlobalISel. The flag was previously set only when a target switched on GlobalISel but it is now always set when the GlobalISel pipeline is enabled. This makes the flag consistent with TargetOptions::EnableFastISel and allows its use in other parts of the compiler to determine when GlobalISel is enabled. The EnableGlobalISel flag had previouly only one use in TargetPassConfig::isGlobalISelAbortEnabled(). The method used its value to determine if GlobalISel was enabled by a target and returned false in such a case. To preserve the current behaviour, a new flag TargetOptions::GlobalISelAbort is introduced to separately record the abort behaviour. Differential Revision: https://reviews.llvm.org/D54518 llvm-svn: 347861	2018-11-29 12:56:32 +00:00
Andrea Di Biagio	373a4ccf6c	[llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666). This patch adds the ability to specify via tablegen which processor resources are load/store queue resources. A new tablegen class named MemoryQueue can be optionally used to mark resources that model load/store queues. Information about the load/store queue is collected at 'CodeGenSchedule' stage, and analyzed by the 'SubtargetEmitter' to initialize two new fields in struct MCExtraProcessorInfo named `LoadQueueID` and `StoreQueueID`. Those two fields are identifiers for buffered resources used to describe the load queue and the store queue. Field `BufferSize` is interpreted as the number of entries in the queue, while the number of units is a throughput indicator (i.e. number of available pickers for loads/stores). At construction time, LSUnit in llvm-mca checks for the presence of extra processor information (i.e. MCExtraProcessorInfo) in the scheduling model. If that information is available, and fields LoadQueueID and StoreQueueID are set to a value different than zero (i.e. the invalid processor resource index), then LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value declared by the two processor resources. With this patch, we more accurately track dynamic dispatch stalls caused by the lack of LS tokens (i.e. load/store queue full). This is also shown by the differences in two BdVer2 tests. Stalls that were previously classified as generic SCHEDULER FULL stalls, are not correctly classified either as "load queue full" or "store queue full". About the differences in the -scheduler-stats view: those differences are expected, because entries in the load/store queue are not released at instruction issue stage. Instead, those are released at instruction executed stage. This is the main reason why for the modified tests, the load/store queues gets full before PdEx is full. Differential Revision: https://reviews.llvm.org/D54957 llvm-svn: 347857	2018-11-29 12:15:56 +00:00
Nicolai Haehnle	7bed696915	AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo Summary: MachineLoopInfo cannot be relied on for correctness, because it cannot properly recognize loops in irreducible control flow which can be introduced by late machine basic block optimization passes. See the new test case for the reduced form of an example that occurred in practice. Use a simple fixpoint iteration instead. In order to facilitate this change, refactor WaitcntBrackets so that it only tracks pending events and registers, rather than also maintaining state that is relevant for the high-level algorithm. Various accessor methods can be removed or made private as a consequence. Affects (in radv): - dEQP-VK.glsl.loops.special.{for,while}_uniform_iterations.select_iteration_count_{fragment,vertex} Fixes: r345719 ("AMDGPU: Rewrite SILowerI1Copies to always stay on SALU") Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54231 llvm-svn: 347853	2018-11-29 11:06:26 +00:00
Nicolai Haehnle	ab43bf60fe	AMDGPU/InsertWaitcnt: Consistently use uint32_t for scores / time points Summary: There is one obsolete reference to using -1 as an indication of "unknown", but this isn't actually used anywhere. Using unsigned makes robust wrapping checks easier. Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, llvm-commits, tpr, t-tye, hakzsam Differential Revision: https://reviews.llvm.org/D54230 llvm-svn: 347852	2018-11-29 11:06:21 +00:00
Nicolai Haehnle	f96456c611	AMDGPU/InsertWaitcnt: Remove unused WaitAtBeginning Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54229 llvm-svn: 347851	2018-11-29 11:06:18 +00:00
Nicolai Haehnle	d1f45dad84	AMDGPU/InsertWaitcnts: Simplify pending events tracking Summary: Instead of storing the "score" (last time point) of the various relevant events, only store whether an event is pending or not. This is sufficient, because whenever only one event of a count type is pending, its last time point is naturally the upper bound of all time points of this count type, and when multiple event types are pending, the count type has gone out of order and an s_waitcnt to 0 is required to clear any pending event type (and will then clear all pending event types for that count type). This also removes the special handling of GDS_GPR_LOCK and EXP_GPR_LOCK. I do not understand what this special handling ever attempted to achieve. It has existed ever since the original port from an internal code base, so my best guess is that it solved a problem related to EXEC handling in that internal code base. Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54228 llvm-svn: 347850	2018-11-29 11:06:14 +00:00
Nicolai Haehnle	ae369d70c3	AMDGPU/InsertWaitcnts: Use foreach loops for inst and wait event types Summary: It hides the type casting ugliness, and I happened to have to add a new such loop (in a later patch). Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54227 llvm-svn: 347849	2018-11-29 11:06:11 +00:00
Nicolai Haehnle	1a94cbb3f5	AMDGPU/InsertWaitcnts: Untangle some semi-global state Summary: Reduce the statefulness of the algorithm in two ways: 1. More clearly split generateWaitcntInstBefore into two phases: the first one which determines the required wait, if any, without changing the ScoreBrackets, and the second one which actually inserts the wait and updates the brackets. 2. Communicate pre-existing s_waitcnt instructions using an argument to generateWaitcntInstBefore instead of through the ScoreBrackets. To simplify these changes, a Waitcnt structure is introduced which carries the counts of an s_waitcnt instruction in decoded form. There are some functional changes: 1. The FIXME for the VCCZ bug workaround was implemented: we only wait for SMEM instructions as required instead of waiting on all counters. 2. We now properly track pre-existing waitcnt's in all cases, which leads to less conservative waitcnts being emitted in some cases. s_load_dword ... s_waitcnt lgkmcnt(0) <-- pre-existing wait count ds_read_b32 v0, ... ds_read_b32 v1, ... s_waitcnt lgkmcnt(0) <-- this is too conservative use(v0) more code use(v1) This increases code size a bit, but the reduced latency should still be a win in basically all cases. The worst code size regressions in my shader-db are: WORST REGRESSIONS - Code Size Before After Delta Percentage 1724 1736 12 0.70 % shaders/private/f1-2015/1334.shader_test [0] 2276 2284 8 0.35 % shaders/private/f1-2015/1306.shader_test [0] 4632 4640 8 0.17 % shaders/private/ue4_elemental/62.shader_test [0] 2376 2384 8 0.34 % shaders/private/f1-2015/1308.shader_test [0] 3284 3292 8 0.24 % shaders/private/talos_principle/1955.shader_test [0] Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54226 llvm-svn: 347848	2018-11-29 11:06:06 +00:00
Craig Topper	c2540995ed	[X86] Correct comment. NFC llvm-svn: 347835	2018-11-29 05:56:03 +00:00
Li Jia He	bcae407a3c	[PowerPC] Fix a conversion is not considered when the ISD::BR_CC node making the instruction selection Summary: A signed comparison of i1 values produces the opposite result to an unsigned one if the condition code includes less-than or greater-than. This is so because 1 is the most negative signed i1 number and the most positive unsigned i1 number. The CR-logical operations used for such comparisons are non-commutative so for signed comparisons vs. unsigned ones, the input operands just need to be swapped. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D54825 llvm-svn: 347831	2018-11-29 03:04:39 +00:00
Sanjay Patel	2de209313e	[x86] try select simplification for target-specific nodes This failed to select (which might be a separate bug) in X86ISelDAGToDAG because we try to create a select node that can be simplified away after rL347227. This change avoids the problem by simplifying the SHRUNKBLEND node sooner. In the test case, we manage to realize that the true/false values of the select (SHRUNKBLEND) are the same thing, so it simplifies away completely. llvm-svn: 347818	2018-11-28 22:51:04 +00:00
Craig Topper	81f1b4a361	[X86] Make X86TTIImpl::getCastInstrCost properly handle the case where AVX512 is enabled, but 512-bit vectors aren't legal. Unlike most cost model functions this code makes a lot of table lookups without using the results from getTypeLegalizationCost. This means 512-bit vectors can be looked up even when the type isn't legal. This patch adds a check around the two tables that contain 512-bit types to make sure that neither of the types would be split by type legalization. Meaning 512 bit types are illegal. I wanted to write this in a somewhat generic way that uses type legalization query hooks. But if prefered, I can switch to just using is512BitVector and the subtarget feature. Differential Revision: https://reviews.llvm.org/D54984 llvm-svn: 347786	2018-11-28 18:11:42 +00:00
Craig Topper	d3bb036bc9	[X86] Add some cost model entries for sext/zext for avx512bw This fixes some of scalarization costs reported for sext/zext using avx512bw. This does not fix all scalarization costs being reported. Just the worst. I've restricted this only to combinations of types that are legal with avx512bw like v32i1/v64i1/v32i16/v64i8 and conversions between vXi1 and vXi8/vXi16 with legal vXi8/vXi16 result types. Differential Revision: https://reviews.llvm.org/D54979 llvm-svn: 347785	2018-11-28 18:11:39 +00:00
Craig Topper	f3b6f583e2	[X86] Add a combine for back to back VSRAI instructions Expansion of SIGN_EXTEND_INREG can create a VSRAI instruction. If there is already a VSRAI after it, we should combine them into a larger VSRAI Differential Revision: https://reviews.llvm.org/D54959 llvm-svn: 347784	2018-11-28 18:03:38 +00:00
Alex Bradbury	893e5bc774	[RISCV] Support .option push and .option pop This adds support in the RISCVAsmParser the storing of Subtarget feature bits to a stack so that they can be pushed/popped to enable/disable multiple features at once. Differential Revision: https://reviews.llvm.org/D46424 Patch by Lewis Revill. llvm-svn: 347774	2018-11-28 16:39:14 +00:00
Francis Visoiu Mistrih	879087ce5b	[MachineScheduler] Add support for clustering mem ops with FI base operands Before this patch, the following stores in `merge_fail` would fail to be merged, while they would get merged in `merge_ok`: ``` void use(unsigned long long ); void merge_fail(unsigned key, unsigned index) { unsigned long long args[8]; args[0] = key; args[1] = index; use(args); } void merge_ok(unsigned long long dst, unsigned a, unsigned b) { dst[0] = a; dst[1] = b; } ``` The reason is that `getMemOpBaseImmOfs` would return false for FI base operands. This adds support for this. Differential Revision: https://reviews.llvm.org/D54847 llvm-svn: 347747	2018-11-28 12:00:28 +00:00
Francis Visoiu Mistrih	d7eebd6d83	[CodeGen][NFC] Make `TII::getMemOpBaseImmOfs` return a base operand Currently, instructions doing memory accesses through a base operand that is not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`. This means that functions such as `TII::shouldClusterMemOps` will bail out on instructions using an FI as a base instead of a register. The goal of this patch is to refactor all this to return a base operand instead of a base register. Then in a separate patch, I will add FI support to the mem op clustering in the MachineScheduler. Differential Revision: https://reviews.llvm.org/D54846 llvm-svn: 347746	2018-11-28 12:00:20 +00:00
Simon Atanasyan	af860d44fe	[DebugInfo] Rename EmitDebugThreadLocal back to EmitDebugValue. NFC This reverts r294500. DwarfCompileUnit::addAddressExpr uses DIEExpr for PCOffset. In that case the expression is unrelated to thread locals and so emitting a value of the DIEExpr does not have to always mean emit-debug-thread-local. llvm-svn: 347744	2018-11-28 11:48:07 +00:00
Jonas Paulsson	06acb3a236	[SystemZ::TTI] Improve cost for compare of i64 with extended i32 load CGF/CLGF compares an i64 register with a sign/zero extended loaded i32 value in memory. This patch makes such a load considered foldable and so gets a 0 cost. Review: Ulrich Weigand https://reviews.llvm.org/D54944 llvm-svn: 347735	2018-11-28 08:58:27 +00:00
Jonas Paulsson	d6b7aca911	[SystemZ::TTI] Improve costs for i16 add, sub and mul against memory. AH, SH and MH costs are already covered in the cases where LHS is 32 bits and RHS is 16 bits of memory sign-extended to i32. As these instructions are also used when LHS is i16, this patch recognizes that the loads will get folded then as well. Review: Ulrich Weigand https://reviews.llvm.org/D54940 llvm-svn: 347734	2018-11-28 08:31:50 +00:00
Jonas Paulsson	011a503f25	[SystemZ::TTI] Improved cost values for comparison against memory. Single instructions exist for i8 and i16 comparisons of memory against a small immediate. This patch makes sure that if the load in these cases has a single user (the ICmp), it gets a 0 cost (folded), and also that the ICmp gets a cost of 1. Review: Ulrich Weigand https://reviews.llvm.org/D54897 llvm-svn: 347733	2018-11-28 08:08:05 +00:00
Jonas Paulsson	5da8e432b9	[SystemZ::TTI] Return zero cost for scalar load/store connected with a bswap. Since byte-swapping loads and stores are supported, a 'load -> bswap' or 'bswap -> store' sequence should have the cost of one. Review: Ulrich Weigand https://reviews.llvm.org/D54870 llvm-svn: 347732	2018-11-28 07:52:34 +00:00
Mircea Trofin	35f0e5cd2d	Do not insert prefetches with unsupported memory operands. Summary: Ignore advices where the memory operand of the 'anchor' instruction uses unsupported register types. Reviewers: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54983 llvm-svn: 347724	2018-11-28 01:08:45 +00:00
Evandro Menezes	9ef79c884a	[TableGen] Refactor macro names (NFC) Make the names for the macros for `TargetInstrInfo` uniform. llvm-svn: 347706	2018-11-27 20:58:27 +00:00
Craig Topper	7ceef03dc9	[X86] Replace an APInt that is guaranteed to be 8-bits with just an 'unsigned' We're already mixing this APInt with other 'unsigned' variables. This allows us to use regular comparison operators instead of needing to use APInt::ult or APInt::uge. And it removes a later conversion from APInt to unsigned. I might be adding another combine to this function and this will probably simplify the logic required for that. llvm-svn: 347684	2018-11-27 18:24:56 +00:00
Craig Topper	5fb34b5498	[X86] Add cascade lake arch in X86 target. This is skylake-avx512 with the addition of avx512vnni ISA. Patch by Jianping Chen Differential Revision: https://reviews.llvm.org/D54785 llvm-svn: 347681	2018-11-27 18:05:00 +00:00
Stanislav Mekhanoshin	443a7f9788	[AMDGPU] Disable DAG combine at -O0 Differential Revision: https://reviews.llvm.org/D54358 llvm-svn: 347659	2018-11-27 15:13:37 +00:00
Craig Topper	196fd31e33	[X86] Use getUnpackl/getUnpackh instead of directly creating UNPCKL/UNPCKH nodes. llvm-svn: 347642	2018-11-27 06:24:56 +00:00
Craig Topper	4325505f05	[X86] Prevent DAG combine from folding a bitcast from vXi1 to iX with a store on pre-AVX512 targets. If we fold the bitcast into the store we'll end up creating a truncating store to vXi1 that will get scalarized. Instead allow the bitcast to be turned into a movmsk. We probably need to do something if the store itself is a vXi1 type, but I'll leave that til a testcase appears. llvm-svn: 347632	2018-11-27 02:57:27 +00:00
Sterling Augustine	9cc1ffadc5	Notify the linker when a TU compiled with split-stack has a function without a prologue. More context here: https://go-review.googlesource.com/c/go/+/148819/ llvm-svn: 347614	2018-11-26 23:26:31 +00:00
David Blaikie	1fecbec5fa	AArch64ISelLowering: Remove a return-of-assignment to allow NRVO Patch by Arthur O'Dwyer! llvm-svn: 347609	2018-11-26 22:57:18 +00:00
Mircea Trofin	183df14520	Add new passes to X86 pipeline tests Summary: Fixes test failures introduced by rL347596. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54916 llvm-svn: 347607	2018-11-26 22:49:17 +00:00
Fangrui Song	82ddb8154e	[X86] Add dependency from X86 to ProfileData after rL347596 llvm-svn: 347606	2018-11-26 22:16:19 +00:00
Evandro Menezes	6a38a5effe	[AArch64] Refactor the scheduling predicates (3/3) (NFC) Refactor the scheduling predicates based on `MCInstPredicate`. In this case, `AArch64InstrInfo::hasExtendedReg()`. Differential revision: https://reviews.llvm.org/D54822 llvm-svn: 347599	2018-11-26 21:47:46 +00:00
Evandro Menezes	56368c6fa5	[AArch64] Refactor the scheduling predicates (2/3) (NFC) Refactor the scheduling predicates based on `MCInstPredicate`. In this case, `AArch64InstrInfo::hasShiftedReg()`. Differential revision: https://reviews.llvm.org/D54820 llvm-svn: 347598	2018-11-26 21:47:41 +00:00
Evandro Menezes	b02ac8bd21	[AArch64] Refactor the scheduling predicates (1/3) (NFC) Refactor the scheduling predicates based on `MCInstPredicate`. In this case, `AArch64InstrInfo::isScaledAddr()` Differential revision: https://reviews.llvm.org/D54777 llvm-svn: 347597	2018-11-26 21:47:28 +00:00
Mircea Trofin	cfbc1788d6	Support for inserting profile-directed cache prefetches Summary: Support for profile-driven cache prefetching (X86) This change is part of a larger system, consisting of a cache prefetches recommender, create_llvm_prof (https://github.com/google/autofdo), and LLVM. A proof of concept recommender is DynamoRIO's cache miss analyzer. It processes memory access traces obtained from a running binary and identifies patterns in cache misses. Based on them, it produces a csv file with recommendations. The expectation is that, by leveraging such recommendations, we can reduce the amount of clock cycles spent waiting for data from memory. A microbenchmark based on the DynamoRIO analyzer is available as a proof of concept: https://goo.gl/6TM2Xp. The recommender makes prefetch recommendations in terms of: * the binary offset of an instruction with a memory operand; * a delta; * and a type (nta, t0, t1, t2) meaning: a prefetch of that type should be inserted right before the instrution at that binary offset, and the prefetch should be for an address delta away from the memory address the instruction will access. For example: 0x400ab2,64,nta and assuming the instruction at 0x400ab2 is: movzbl (%rbx,%rdx,1),%edx means that the recommender determined it would be beneficial for a prefetchnta instruction to be inserted right before this instruction, as such: prefetchnta 0x40(%rbx,%rdx,1) movzbl (%rbx, %rdx, 1), %edx The workflow for prefetch cache instrumentation is as follows (the proof of concept script details these steps as well): 1. build binary, making sure -gmlt -fdebug-info-for-profiling is passed. The latter option will enable the X86DiscriminateMemOps pass, which ensures instructions with memory operands are uniquely identifiable (this causes ~2% size increase in total binary size due to the additional debug information). 2. collect memory traces, run analysis to obtain recommendations (see above-referenced DynamoRIO demo as a proof of concept). 3. use create_llvm_prof to convert recommendations to reference insertion locations in terms of debug info locations. 4. rebuild binary, using the exact same set of arguments used initially, to which -mllvm -prefetch-hints-file=<file> needs to be added, using the afdo file obtained at step 3. Note that if sample profiling feedback-driven optimization is also desired, that happens before step 1 above. In this case, the sample profile afdo file that was used to produce the binary at step 1 must also be included in step 4. The data needed by the compiler in order to identify prefetch insertion points is very similar to what is needed for sample profiles. For this reason, and given that the overall approach (memory tracing-based cache recommendation mechanisms) is under active development, we use the afdo format as a syntax for capturing this information. We avoid confusing semantics with sample profile afdo data by feeding the two types of information to the compiler through separate files and compiler flags. Should the approach prove successful, we can investigate improvements to this encoding mechanism. Reviewers: davidxl, wmi, craig.topper Reviewed By: davidxl, wmi, craig.topper Subscribers: davide, danielcdh, mgorny, aprantl, eraman, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D54052 llvm-svn: 347596	2018-11-26 21:36:18 +00:00
Matt Arsenault	88ce3dcbc8	AMDGPU: Record SGPR spills when restoring too It's possible in some cases to have a restore present without a corresponding spill. Due to an apparent bug in D54366 <https://reviews.llvm.org/D54366>, only the restore for a register was emitted. It's probably always a bug for this to happen, but due to how SGPR spilling is implemented, this makes the issues appear worse than it is. llvm-svn: 347595	2018-11-26 21:28:40 +00:00
Craig Topper	b955bf382c	[LegalizeVectorTypes][X86][ARM][AArch64][PowerPC] Don't use SplitVecOp_TruncateHelper for FP_TO_SINT/UINT. SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512. This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral. Differential Revision: https://reviews.llvm.org/D54906 llvm-svn: 347593	2018-11-26 21:12:39 +00:00
Than McIntosh	30c804bbb1	[CodeGen] Support custom format of stack maps Summary: Add a hook to the GCMetadataPrinter for emitting stack maps in custom format. The hook will be called at stack map generation time. The default stack map format is used if there is no hook. For this to be useful a few data structures and accessors are exposed from the StackMaps class, so the custom printer can access the stack map data. This patch authored by Cherry Zhang <cherryyz@google.com>. Reviewers: thanm, apilipenko, reames Reviewed By: reames Subscribers: reames, apilipenko, nemanjai, javed.absar, kbarton, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D53892 llvm-svn: 347584	2018-11-26 18:43:48 +00:00
Matt Arsenault	105fc1a5f3	AMDGPU: Don't optimize exec masks at -O0 llvm-svn: 347573	2018-11-26 17:02:02 +00:00
Matt Arsenault	6384d9ea31	AMDGPU: Only add implicit super-reg def for first subreg llvm-svn: 347572	2018-11-26 17:02:01 +00:00
Sanjay Patel	d31220e0de	[x86] promote all multiply i8 by constant to i32 We have these 2 "isDesirable" promotion hooks (I'm not sure why we need both of them, but that's independent of this patch), and we can adjust them to promote "mul i8 X, C" to i32. Then, all of our existing LEA and other multiply expansion magic happens as it would for i32 ops. Some of the test diffs show that we could end up with an actual 32-bit mul instruction here because we choose not to expand to simpler ops. That instruction could be slower depending on the subtarget. On the plus side, this means we don't need a separate instruction to load the constant operand and possibly an extra instruction to move the result. If we need to tune mul i32 further, we could add a later transform that tries to shrink it back to i8 based on subtarget timing. I did not bother to duplicate all of the 32-bit test file RUNs and target settings that exist to test whether LEA expansion is cheap or not. The diffs here assume a default target, so that means LEA is generally cheap. Differential Revision: https://reviews.llvm.org/D54803 llvm-svn: 347557	2018-11-26 15:22:30 +00:00
Diana Picus	0528e2cfb3	[ARM GlobalISel] Support G_CTLZ and G_CTLZ_ZERO_UNDEF We can now select CLZ via the TableGen'erated code, so support G_CTLZ and G_CTLZ_ZERO_UNDEF throughout the pipeline for types <= s32. Legalizer: If the CLZ instruction is available, use it for both G_CTLZ and G_CTLZ_ZERO_UNDEF. Otherwise, use a libcall for G_CTLZ_ZERO_UNDEF and lower G_CTLZ in terms of it. In order to achieve this we need to add support to the LegalizerHelper for the legalization of G_CTLZ_ZERO_UNDEF for s32 as a libcall (__clzsi2). We also need to allow lowering of G_CTLZ in terms of G_CTLZ_ZERO_UNDEF if that is supported as a libcall, as opposed to just if it is Legal or Custom. Due to a minor refactoring of the helper function in charge of this, we will also allow the same behaviour for G_CTTZ and G_CTPOP. This is not going to be a problem in practice since we don't yet have support for treating G_CTTZ and G_CTPOP as libcalls (not even in DAGISel). Reg bank select: Map G_CTLZ to GPR. G_CTLZ_ZERO_UNDEF should not make it to this point. Instruction select: Nothing to do. llvm-svn: 347545	2018-11-26 11:07:02 +00:00
Sam Parker	5338f7aae4	[ARM] Prevent parallel macs for unsigned values Both zext and sext are currently allowed during the search for narrow sequences and sexts operands are later added to the mac candidates. But operands of muls are also added, without checking whether they're sext or zext, which means we can generate a signed smlad when we shouldn't. Differential Revision: https://reviews.llvm.org/D54790 llvm-svn: 347542	2018-11-26 10:22:55 +00:00
Kang Zhang	840e98f9f1	Revert "[PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction" This reverts commits r347532. Forget add the option -mtriple powerpc64-unknown-linux-gnu. So other platform is error except for PowerPC. llvm-svn: 347534	2018-11-26 07:15:31 +00:00
Kang Zhang	e98d4f511c	[PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction Summary: There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD. These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D54738 llvm-svn: 347532	2018-11-26 06:03:25 +00:00
Sanjay Patel	7336e7c67a	[x86] limit transform for select-of-fp-constants This should likely be adjusted to limit this transform further, but these diffs should be clear wins. If we have blendv/conditional move, then we should assume those are cheap ops. The loads become independent of the compare, so those can be speculated before we need to use the values in the blend/mov. llvm-svn: 347526	2018-11-25 17:27:02 +00:00
Fangrui Song	220f2a9cac	[ARM] Add dependency from ARMAsmParser to ARMAsmPrinter after r347494 This fixes -DBUILD_SHARED_LIBS=on llvm-svn: 347506	2018-11-23 23:43:46 +00:00
Luke Cheeseman	6db3a6a4a7	Revert r347490 as it breaks address sanitizer builds llvm-svn: 347499	2018-11-23 17:13:06 +00:00
Oliver Stannard	173bc2bb7f	[ARM][AsmParser] Improve debug printing of parsed asm operands In ARMOperand::print: - Print human-readable register names, instead of numbers. - Print the correct names for IT condition masks (these were in the wrong order before). - Print all parts of memory operands, not just the base register. This makes the output of llvm-mc -show-inst-operands more readable. Differential revision: https://reviews.llvm.org/D54850 llvm-svn: 347494	2018-11-23 14:27:21 +00:00
Luke Cheeseman	d6dbd64104	Revert r343341 - Cannot reproduce the build failure locally and the build logs have been deleted. llvm-svn: 347490	2018-11-23 11:01:47 +00:00
John Brawn	d6e0ebea10	[AArch64] Fix SelectionDAG infinite loop for v1i64 SCALAR_TO_VECTOR A consequence of r347274 is that SCALAR_TO_VECTOR can be converted into BUILD_VECTOR by SimplifyDemandedBits, but LowerBUILD_VECTOR can turn BUILD_VECTOR into SCALAR_TO_VECTOR so we get an infinite loop. Fix this by making LowerBUILD_VECTOR not do this transformation for those vectors that would get transformed back, i.e. BUILD_VECTOR of a single-element constant vector. Doing that means we get a DUP, which we then need to recognise in ISel as a copy. llvm-svn: 347456	2018-11-22 11:45:23 +00:00
Jonas Paulsson	96782c2c0b	[SystemZTTIImpl] Give correct cost values for vector bswap intrinsics. Implement getIntrinsicInstrCost() and return costs reflecting that bswap can be done with a vperm per vector register. Review: Ulrich Weigand https://reviews.llvm.org/D54789 llvm-svn: 347445	2018-11-22 07:17:29 +00:00
Stefan Pintilie	9d6445d34c	[PowerPC][NFC] Split PPCMCCodeEmitter into header and cpp file. This is further cleanup for PPCMCCodeEmitter. The class had been contained within the cpp file alone. Now it has been split up between a header file and a cpp file which allows other classes to make use of the functions in this class if required. llvm-svn: 347428	2018-11-21 21:23:50 +00:00
Stefan Pintilie	46e3cd76e2	[PowerPC][NFC] Minor Code Cleaup for PPCMCCodeEmitter. llvm-svn: 347422	2018-11-21 20:47:59 +00:00
Sanjay Patel	cadf62f360	[x86] fix predicate for avoiding vblendv It only makes sense to produce the logic ops when 1 of the constants is +0.0. Otherwise, go with vblendv to reduce code. llvm-svn: 347403	2018-11-21 18:02:50 +00:00
Vladimir Stefanovic	64ad1cf24b	[mips][mc] Add basic support for R_MIPS_JALR/R_MICROMIPS_JALR R_MIPS_JALR/R_MICROMIPS_JALR can now be parsed in .s files and emitted to .o. They are still not generated with JALR. Differential revision: https://reviews.llvm.org/D54721 llvm-svn: 347398	2018-11-21 16:38:34 +00:00
Michal Gorny	71e902101d	[nios2] Add missing Nios2CodeGen -> Nios2AsmPrinter linkage Add missing linkage from Nios2CodeGen library to Nios2AsmPrinter library. The missing dependency causes shared-lib build to fail with the following reason: lib/Target/Nios2/CMakeFiles/LLVMNios2CodeGen.dir/Nios2AsmPrinter.cpp.o: In function `(anonymous namespace)::Nios2AsmPrinter::PrintAsmMemoryOperand(llvm::MachineInstr const, unsigned int, unsigned int, char const, llvm::raw_ostream&)': Nios2AsmPrinter.cpp:(.text._ZN12_GLOBAL__N_115Nios2AsmPrinter21PrintAsmMemoryOperandEPKN4llvm12MachineInstrEjjPKcRNS1_11raw_ostreamE+0x2b): undefined reference to `llvm::Nios2InstPrinter::getRegisterName(unsigned int)' lib/Target/Nios2/CMakeFiles/LLVMNios2CodeGen.dir/Nios2AsmPrinter.cpp.o: In function `(anonymous namespace)::Nios2AsmPrinter::PrintAsmOperand(llvm::MachineInstr const, unsigned int, unsigned int, char const, llvm::raw_ostream&)': Nios2AsmPrinter.cpp:(.text._ZN12_GLOBAL__N_115Nios2AsmPrinter15PrintAsmOperandEPKN4llvm12MachineInstrEjjPKcRNS1_11raw_ostreamE+0x97): undefined reference to `llvm::Nios2InstPrinter::getRegisterName(unsigned int)' collect2: error: ld returned 1 exit status Differential Revision: https://reviews.llvm.org/D47810 llvm-svn: 347387	2018-11-21 11:25:01 +00:00
Simon Pilgrim	66bae9aee8	[X86][AVX] Remove BROADCAST if we only need the 0'th element We don't catch this with target shuffle simplification if the src/dst types are different. llvm-svn: 347386	2018-11-21 11:00:09 +00:00
Craig Topper	e9b4001a82	[X86] In getScalarMaskingNode, replace scalar_to_vector with a bitcast to v8i1 and an extract_subvector to convert i8 to v1i1. The bitcast can be nicely merged with any i8 loads that exist for argument passing in 32 mode for example. llvm-svn: 347380	2018-11-21 07:01:22 +00:00
Nemanja Ivanovic	5cf902ccd4	[PowerPC] Do not use vectors to codegen bswap with Altivec turned off We have efficient codegen on P9 for lowering bswap that involves moving the value into a vector reg and moving it back. However, the check under which we custom lowered it did not adequately reflect the actual requirements. It required only that the subtarget be an implementation of ISA 3.0 since all compliant implementations have to provide the vector instructions. However, the kernel builds have a valid use case for -mno-altivec -mcpu=pwr9 (i.e. don't emit vector code, don't have to save vector regs for context switch). So we should require the correct features for this lowering. Fixes https://bugs.llvm.org/show_bug.cgi?id=39334 llvm-svn: 347376	2018-11-21 02:53:50 +00:00
Craig Topper	27a5896fe8	[X86] Correct 256 vpmovzx/vpmovsx isel patterns to check HasAVX2 instead of HasAVX to prevent fast-isel from using them incorrectly. These are AVX2 instructions, but have been incorrectly marked in tablegen for a while. This wasn't a problem until r346784 switched the patterns to use target independent ISD opcodes. This made the patterns visible to fast isel. Fixes PR39733 llvm-svn: 347375	2018-11-21 01:39:38 +00:00
Craig Topper	aa52ee2770	[X86] Emit a PACKUS instead of a VECTOR_SHUFFLE from LowerTRUNCATE for v16i16->v16i8. We can't guarantee that demanded bits passing through the vector shuffle won't cause the AND in front of this to be removed. This would prevent the PACKUS from being matched during shuffle lowering. Unfortunately, this adds a packuswb to one of the vector-reduce-mul.ll tests since we were removing the shuffle via SimplifyDemandedVectorElts. We appear to have similar issues with vpmovwb on the same test case on other targets. llvm-svn: 347361	2018-11-20 22:57:48 +00:00
Craig Topper	24b346da42	[X86] Emit a single shuffle for the v16i8->v4i32 step of a SIGN_EXTEND_VECTOR_INREG lowering on pre-sse4.1 targets. Previously we emitted to separate shuffles, one for unpcklbw and one for unpcklwd. Instead emit a single shuffle equivalent to both of the original shuffles. Shuffle lowering seems able to handle it. This avoids a bitcast between the two shuffles which seems helpful to DAG combine. Remove the custom type legalization for v8i8->v8i32. I had put that in to avoid some almost duplicate punpcklbw instructions I was seeing, but this lowering change seems to fix that. It also fixes some duplicate shuffles seen in vector-sext.ll llvm-svn: 347348	2018-11-20 21:21:52 +00:00
Sam Clegg	4791a668f5	[WebAssembly] WebAssemblyLowerEmscriptenEHSjLj: use getter/setter for accessing tempRet0 Rather than assuming that `tempRet0` exists in linear memory only assume the getter/setter functions exist. This avoids conflicting with binaryen which declares a wasm global for this purpose and defines it's own getter and setter for that. The other advantage of doing things this way is that it leaving it up to the linker/finalizer to decide how to actually store this temporary. As it happens binaryen uses a wasm global which is more appropriate since it is thread safe. This also allows us to change the way this is stored in the future (memory, TLS memory, wasm global) without modifying LLVM. This is part of a 4 part change: LLVM: https://reviews.llvm.org/D53240 fastcomp: https://github.com/kripken/emscripten-fastcomp/pull/237 emscripten: https://github.com/kripken/emscripten/pull/7358 binaryen: https://github.com/WebAssembly/binaryen/pull/1709 Differential Revision: https://reviews.llvm.org/D53240 llvm-svn: 347340	2018-11-20 19:25:07 +00:00
Jinsong Ji	9a0ed20072	[PowerPC] Add Itineraries for STWU/STWUX etc When doing some instruction scheduling work, we noticed some missing itineraries. Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling, because we can still get same latency due to default values. With machine scheduler, however, itineraries will have impact to scheduling. eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class. And most of the instruction class with itineraries will have NumMicroOps default to 1. This will has impact on the count of RetiredMOps, affects the Pending/Available Queue, then causing different scheduling or suboptimal scheduling further. This patch is for STWU/STWUX (IIC_LdStStoreUpd ) for P8. Since there are already multiple IIC for store update, this patch also merge IIC_LdStSTDU/IIC_LdStStoreUpd to IIC_LdStSTU IIC_LdStSTDUX to IIC_LdStSTUX and we add a new testcase in https://reviews.llvm.org/D54699 to show the difference. Differential Revision: https://reviews.llvm.org/D54700 llvm-svn: 347311	2018-11-20 15:11:42 +00:00
Simon Pilgrim	c9cc6cca42	Fix MSVC 'truncation of constant value' warning. NFCI. llvm-svn: 347308	2018-11-20 14:29:40 +00:00
Simon Pilgrim	ee8b96f253	[X86][SSE] Add computeKnownBits/ComputeNumSignBits support for PACKSS/PACKUS instructions. Pull out getPackDemandedElts demanded elts remapping helper from computeKnownBitsForTargetNode and use in computeKnownBits/ComputeNumSignBits. llvm-svn: 347303	2018-11-20 13:23:37 +00:00
Simon Pilgrim	ed7e2fda18	[X86][SSE] XFormVExtractWithShuffleIntoLoad - getVectorShuffle won't accept SM_SentinelZero Noticed while working on improving demanded elts target shuffle shuffle combining llvm-svn: 347302	2018-11-20 12:17:50 +00:00
Simon Pilgrim	a6fb85ffa7	[X86][SSE] Lower immediately to PACKUS instead of VECTOR_SHUFFLE. As discussed on rL347240, this avoids some regressions on D54679 and also helps some combines to kick in a bit earlier. llvm-svn: 347300	2018-11-20 11:46:37 +00:00
Simon Pilgrim	7198506ba8	[X86][SSE] Add SimplifyDemandedVectorElts support for PACKSS/PACKUS instructions. As discussed on rL347240. llvm-svn: 347299	2018-11-20 11:09:46 +00:00
Craig Topper	17fa42a69b	[X86] Preserve undef information when creating a punpckl/hbw from a v16i8 where all the even or odd elements are undef. Previously if V2 was unused we ended up using V1 for both inputs as part of the code that follows the new code. By using lowerVectorShuffleWithUNPCK we keep the undef nature of V2 in the output. As near as I can tell this makes v16i8 behavior consistent with every other VT now. This does mean that we give the register allocator freedom to fill in random registers now and create false dependencies. But like I said we're already doing that for other types. llvm-svn: 347296	2018-11-20 09:04:01 +00:00
Craig Topper	b06d1aa3a1	[X86] Add custom type legalization for v8i8->v8i32 sign extend pre-SSE4.1 This helps with a future patch and makes us less reliant on DAG combine merging shuffles. llvm-svn: 347295	2018-11-20 09:03:58 +00:00
Craig Topper	c733c7bf94	[X86] Replace more calls to getZeroVector with regular getConstant. getZeroVector produces a specifically canonicalized zero vector, but we can just let DAG legalization take care of it. The test changes are because MULH lowering happens later than it should and this change gave us the opportunity to constant fold away a multiply during a DAG combine before the build_vector got legalized with a bitcast. llvm-svn: 347290	2018-11-20 06:54:01 +00:00
Nemanja Ivanovic	9b393909e2	[PowerPC] Don't combine to bswap store on 1-byte truncating store Turns out that there was no check for a store that truncates down to a single byte when combining a (store (bswap...)) into a byte-swapping store. This patch just adds that check. Fixes https://bugs.llvm.org/show_bug.cgi?id=39478. llvm-svn: 347288	2018-11-20 04:42:31 +00:00
Craig Topper	808d0dd689	[X86] Rename combineVSZext->combineExtendVectorInreg. NFC Now that we no longer have target specific vector extend nodes let's make the function name match the nodes we do use. llvm-svn: 347268	2018-11-19 22:18:47 +00:00
Konstantin Zhuravlyov	700b1ef54d	AMDGPU: Fix V_FMA_F16 selection on GFX9 GFX9 should select opsel version. Differential Revision: https://reviews.llvm.org/D54545 llvm-svn: 347265	2018-11-19 21:10:16 +00:00
Stanislav Mekhanoshin	8bafbae889	[AMDGPU] Restored selection of scalar_to_vector (v2x16) This works if DAG combiner is enabled, but without combining we cannot select scalar_to_vector of <2 x half> and <2 x i16>. Differential Revision: https://reviews.llvm.org/D54718 llvm-svn: 347259	2018-11-19 19:58:13 +00:00
Craig Topper	a5e0380c30	[X86][CostModel] Don't lookup intrinsic cost tables if the intrinsic isn't one we care about We're seeing some issues internally where we sent some intrinsics into the cost model that the getTypeLegalizationCost call fails on, but X86 specific tables don't care about. Our base class implementation takes care of them. We'd just like X86 backend to ignore them. This patch makes sure the switch returned something X86 cares about and skips the table lookups and type legalization call if not. Probably more efficient too since we don't go scanning the tables for every intrinsic we could possibly see. Differential Revision: https://reviews.llvm.org/D54711 llvm-svn: 347248	2018-11-19 18:57:31 +00:00
Simon Pilgrim	c4861ab170	[X86][SSE] Remove unnecessary bit-and in pshufb vector ctlz (PR39703) SSE PSHUFB vector ctlz lowering works at the i4 nibble level. As detailed in PR39703, we were masking the lower nibble off but we only actually use it in the case where the upper nibble is known to be zero, making it safe to remove the mask and save an instruction. Differential Revision: https://reviews.llvm.org/D54707 llvm-svn: 347242	2018-11-19 18:40:59 +00:00
Craig Topper	311bbcd535	[X86] Attempt to improve v32i8/v64i8 multiply lowering by applying the v16i8 non-avx2 algorithm to each 128-bit lane. Previously we split the vectors in half to allow the two halves to be any extended then concatenated the results back together. This patch instead instead extends the v16i8 sse algorithm to extend half of each 128-bit lane using punpcklbw/punpckhbw. Multiplies all the low half lanes and high half lanes together in separate operations. Then merges the half lane results back together using packuswb. Unfortunately, some of the cases in vector-reduce-mul.ll regress because we aren't narrowing the vector width of the multiplies as we reduce. The splitting was somewhat making up for that before by causing halves to be discarded after the split. Differential Revision: https://reviews.llvm.org/D54668 llvm-svn: 347240	2018-11-19 18:32:53 +00:00
Fangrui Song	d83a5526d5	[AMDGPU] Fix -Wunused-variable llvm-svn: 347234	2018-11-19 17:54:27 +00:00
Stanislav Mekhanoshin	054f8101f1	[AMDGPU] Convert insert_vector_elt into set of selects This allows to avoid scratch use or indirect VGPR addressing for small vectors. Differential Revision: https://reviews.llvm.org/D54606 llvm-svn: 347231	2018-11-19 17:39:20 +00:00
Wouter van Oortmerssen	49482f824a	[WebAssembly] replaced .param/.result by .functype Summary: This makes it easier/cleaner to generate a single signature from this directive. Also: - Adds the symbol name, such that we don't depend on the location of this directive anymore. - Actually constructs the signature in the assembler, and make the assembler own it. - Refactor the use of MVT vs ValType in the streamer and assembler to require less conversions overall. - Changed 700 or so tests to use it. Reviewers: sbc100, dschuff Subscribers: jgravelle-google, eraman, aheejin, sunfish, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D54652 llvm-svn: 347228	2018-11-19 17:10:36 +00:00
David Stuttard	be3d7ba9fb	[AMDGPU] Derive GCNSubtarget from MF to get overridden target features Summary: AMDGPUAsmPrinter has a getSTI function that derives a GCNSubtarget from the TM. However, this means that overridden target features are not detected and can result in incorrect behaviour. Switch to using STM which is a GCNSubtarget derived from the MF (used elsewhere in the same function). Change-Id: Ib6328ad667b7fcdc87e9c06344e59859207db9b0 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54301 llvm-svn: 347221	2018-11-19 15:44:20 +00:00
Martin Elshuber	fef3036d37	Subject: [PATCH] [CodeGen] Add pass to combine interleaved loads. This patch defines an interleaved-load-combine pass. The pass searches for ShuffleVector instructions that represent interleaved loads. Matches are converted such that they will be captured by the InterleavedAccessPass. The pass extends LLVMs capabilities to use target specific instruction selection of interleaved load patterns (e.g.: ld4 on Aarch64 architectures). Differential Revision: https://reviews.llvm.org/D52653 llvm-svn: 347208	2018-11-19 14:26:10 +00:00
Nicolai Haehnle	c548d91419	AMDGPU/InsertWaitcnts: Some more const-correctness Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54225 llvm-svn: 347192	2018-11-19 12:03:11 +00:00
Sam Parker	e7c42dd7e2	[ARM] Remove trunc sinks in ARM CGP Truncs are treated as sources if their produce a value of the same type as the one we currently trying to promote. Truncs used to be considered as a sink if their operand was the same value type. We now allow smaller types in the search, so we should search through truncs that produce a smaller value. These truncs can then be converted to an AND mask. This leaves sinks as being: - points where the value in the register is being observed, such as an icmp, switch or store. - points where value types have to match, such as calls and returns. - zext are included to ease the transformation and are generally removed later on. During this change, it also became apart from truncating sinks was broken: if a sink used a source, its type information had already been lost by the time the truncation happens. So I've changed the method of caching the type information. Differential Revision: https://reviews.llvm.org/D54515 llvm-svn: 347191	2018-11-19 11:34:40 +00:00
Anton Korobeynikov	4df19b75c0	[MSP430] Optimize srl/sra in case of A >> (8 + N) There is no variable-length shifts on MSP430. Therefore "eat" 8 bits of shift via bswap & ext. Path by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D54623 llvm-svn: 347187	2018-11-19 10:43:02 +00:00
Craig Topper	8b22bcd39f	[X86] Use a pcmpgt with 0 instead of psrad 31, to fill elements with the sign bit in v4i32 MULH lowering. The shift requires a copy to avoid clobbering a register. Comparing with 0 uses an xor to produce 0 that will be overwritten with the compare results. So still requires 2 instructions, but should be one byte shorter since it doesn't need to encode an immediate. llvm-svn: 347185	2018-11-19 07:22:26 +00:00
Craig Topper	3616891046	[X86] Use compare with 0 to fill an element with sign bits when sign extending to v2i64 pre-sse4.1 Previously we used an arithmetic shift right by 31, but that requires a copy to preserve the input. So we might as well materialize a zero and compare to it since the comparison will overwrite the register that contains the zeros. This should be one byte shorter. llvm-svn: 347181	2018-11-19 04:33:20 +00:00
Craig Topper	053f1eea96	[X86] Remove most of the SEXTLOAD Custom setOperationAction calls under -x86-experimental-vector-widening-legalization. Leave just the v4i8->v4i64 and v8i8->v8i64, but only enable them on pre-sse4.1 targets when 64-bit mode is enabled. In those cases we end up creating sext loads that get scalarized to code that looks better than what we get from loading into a vector register and doing a multiple step sign extend using unpacks and shifts. llvm-svn: 347180	2018-11-19 00:33:16 +00:00
Simon Pilgrim	7f92efa5a9	[X86][SSE] Add SimplifyDemandedVectorElts support for SSE packed i2fp conversions. llvm-svn: 347177	2018-11-18 22:13:31 +00:00
Craig Topper	0468c860b7	[X86] Add custom type legalization for extending v4i8/v4i16->v4i64. Pre-SSE4.1 sext_invec for v2i64 is complicated because we don't have a v2i64 sra instruction. So instead we sign extend to i32 using unpack and sra, then copy the elements and do a v4i32 sra to fill with sign bits, then interleave the i32 sign extend and the sign bits. So really we're doing to two sign extends but only using half of the v4i32 intermediate result. When the result is more than 128 bits, default type legalization would prefer to split the destination type all the way down to v2i64 with shuffles followed by v16i8/v8i16->v2i64 sext_inreg operations. This results in more instructions than necessary because we are only utilizing the lower 2 elements of the v4i32 intermediate result. Instead we can custom split a v4i8/v4i16->v4i64 sign_extend. Then we can sign extend v4i8/v4i16->v4i32 invec producing a full v4i32 result. Create the sign bit vector as a v4i32 then split and interleave with the sign bits using an punpackldq and punpackhdq. llvm-svn: 347176	2018-11-18 21:28:50 +00:00
Simon Pilgrim	b31bdbd2e9	[X86][SSE] Add SimplifyDemandedVectorElts support for SSE splat-vector-shifts. SSE vector shifts only use the bottom 64-bits of the shift amount vector. llvm-svn: 347173	2018-11-18 20:21:52 +00:00
Craig Topper	11d50948e2	[X86] Disable combineToExtendVectorInReg under -x86-experimental-vector-widening-legalization. Add custom type legalization for extends. If we widen illegal types instead of promoting, we should be able to rely on the type legalizer to create the vector_inreg operations for us with some caveats. This patch disables combineToExtendVectorInReg when we are using widening. I've enabled custom legalization for v8i8->v8i64 extends under avx512f since the type legalizer would want to create a vector_inreg with a v64i8 input type which isn't legal without avx512bw. So we go to v16i8 with custom code using the relaxation of rules we get from D54346. I've also enable custom legalization of v8i64 and v16i32 operations with with AVX. When the input type is 128 bits, the default splitting legalization would extend first 128->256, then do the a split to two 128 pieces. Extend each half to 256 and then concat the result. The custom legalization I've added instead uses a 128->256 bit vector_inreg extend that only reads the lower 64-bits for the low half of the split. Then shuffles the high 64-bits to the low 64-bits and does another vector_inreg extend. llvm-svn: 347172	2018-11-18 18:11:25 +00:00
Craig Topper	bc8148f7b0	[X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an extract_subvector, and a packuswb instruction. Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54671 llvm-svn: 347171	2018-11-18 17:59:28 +00:00
Simon Pilgrim	ec808cf541	Remove unused variable. NFCI. llvm-svn: 347169	2018-11-18 17:24:59 +00:00
Simon Pilgrim	50828c75d0	[X86][SSE] Split IsSplatValue into GetSplatValue and IsSplatVector Refactor towards making this recursive (necessary for PR38243 rotation splat detection). IsSplatVector returns the original vector source of the splat and the splat index. GetSplatValue returns the scalar splatted value as an extraction from IsSplatVector. llvm-svn: 347168	2018-11-18 17:15:06 +00:00
Simon Pilgrim	fec9f8657b	[X86][SSE] Relax IsSplatValue - remove the 'variable shift' limit on subtracts. Means we don't use the per-lane-shifts as much when we can cheaply use the older splat-variable-shifts. llvm-svn: 347162	2018-11-18 15:52:08 +00:00
Simon Pilgrim	cc1f5d2407	[X86][SSE] Use raw shuffle mask decode in SimplifyDemandedVectorEltsForTargetNode (PR39549) We were using the 'normalized' shuffle mask from resolveTargetShuffleInputs, which replaces zero/undef inputs with sentinel values. For SimplifyDemandedVectorElts we need the raw mask so we can correctly demand those 'zero' inputs that got normalized away, this requires an extra bit of logic to locally normalize undef inputs. llvm-svn: 347158	2018-11-18 13:34:53 +00:00
Heejin Ahn	e0f8b9bfc6	[WebAssembly] Add null streamer support Summary: Now `llc -filetype=null` works. Reviewers: eush Subscribers: dschuff, jgravelle-google, sbc100, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54660 llvm-svn: 347155	2018-11-18 11:58:47 +00:00
Craig Topper	cd94a7c227	[X86] Add -x86-experimental-vector-widening-legalization check to combineSelect and combineSetCC to cover vXi16/vXi8 promotion without BWI. I don't yet have any test cases for this, but its the right thing to do based on log file inspection. llvm-svn: 347151	2018-11-18 08:30:09 +00:00
Craig Topper	b03f80a21c	[X86] Rename WidenMaskArithmetic->PromoteMaskArithmetic since we usually use widen to refer to adding elements not making elements larger. NFC llvm-svn: 347150	2018-11-18 07:35:08 +00:00
Craig Topper	f56a57518d	[X86] Don't use a pmaddwd for vXi32 multiply if the inputs are zero extends from i8 or smaller without SSE4.1. Prefer to shrink the mul instead. The zero extend will require two stages of unpacks to implement. So its better to shrink the multiply using pmullw and then extend that result back to v4i32 using a single unpack. llvm-svn: 347149	2018-11-18 05:53:21 +00:00
Craig Topper	0438d791fa	[X86] Add support for matching PACKUSWB from a v64i8 shuffle. llvm-svn: 347143	2018-11-17 18:54:43 +00:00
Craig Topper	dd61f11642	[X86] Don't extend v32i8 multiplies to v32i16 with avx512bw and prefer-vector-width=256. llvm-svn: 347131	2018-11-17 02:36:07 +00:00
Craig Topper	b05ea28f1f	[X86] Use getUnpackl/getUnpackh instead of hardcoding a shuffle mask. llvm-svn: 347127	2018-11-17 02:18:12 +00:00
Fangrui Song	7570932977	Use llvm::copy. NFC llvm-svn: 347126	2018-11-17 01:44:25 +00:00
Craig Topper	ee0333b4a9	[X86] Add custom promotion of narrow fp_to_uint/fp_to_sint operations under -x86-experimental-vector-widening-legalization. This tries to force the result type to vXi32 followed by a truncate. This can help avoid scalarization that would otherwise occur. There's some annoying examples of an avx512 truncate instruction followed by a packus where we should really be able to just use one truncate. But overall this is still a net improvement. llvm-svn: 347105	2018-11-16 22:53:00 +00:00
Craig Topper	87bc07b3dd	[X86] Qualify part of the masked gather handling in ReplaceNodeResults with a getTypeAction call to know if we can use default legalization. If we managed to switch to -x86-experimental-vector-widening-legalization this block can be removed. llvm-svn: 347100	2018-11-16 22:04:29 +00:00
Craig Topper	567aaeb40d	[X86] Remove a branch on SSE4.1 from LowerLoad We should be able to use getExtendInVec with or without sse4.1 to produce a SIGN_EXTEND_VECTOR_INREG. llvm-svn: 347095	2018-11-16 21:05:00 +00:00
Craig Topper	7fff9a9aef	[X86] In LowerLoad, fix assert messages and rename a variable that use Zize instead of Size. NFC llvm-svn: 347093	2018-11-16 21:04:56 +00:00
Peter Collingbourne	527024469a	AArch64: Emit a call frame instruction for the shadow call stack register. When unwinding past a function that uses shadow call stack, we must subtract 8 from the value of the x18 register. This patch causes us to emit a call frame instruction that causes that to happen. Differential Revision: https://reviews.llvm.org/D54609 llvm-svn: 347089	2018-11-16 20:08:54 +00:00
Anton Korobeynikov	e5cb1c35b4	[MSP430] Add RTLIB::[SRL/SRA/SHL]_I32 lowering to EABI lib calls Patch by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D54626 llvm-svn: 347080	2018-11-16 19:36:15 +00:00
Rong Xu	3a38175723	[X86] Disable Condbr_merge pass Disable Condbr_merge pass for now due to PR39658. Will reenable the pass once the bug is fixed. llvm-svn: 347079	2018-11-16 19:35:00 +00:00
Stefan Pintilie	9004444d81	Revert "[PowerPC] Make no-PIC default to match GCC - LLVM" This reverts commit r347069 llvm-svn: 347076	2018-11-16 19:24:23 +00:00
Anton Korobeynikov	883c70959d	[MSP430] Use R_MSP430_16_BYTE type for FK_Data_2 fixup Linker fails to link example like this (simplified case from newlib sources): $ cat test.c extern const char _ctype_b[]; struct _t { char ptr; }; struct _t T = { ((char ) _ctype_b + 3) }; $ cat ctype.c char _ctype_b[4] = { 0, 0, 0, 0 }; LD: test.o:(.data+0x0): warning: internal error: unsupported relocation error We also follow gnu toolchain here, where 2-byte relocation mapped to R_MSP430_16_BYTE, instead of R_MSP430_16. Patch by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D54620 llvm-svn: 347074	2018-11-16 19:20:51 +00:00
Sam Clegg	74f5fd4e32	[WebAssembly] Default to static reloc model Differential Revision: https://reviews.llvm.org/D54637 llvm-svn: 347073	2018-11-16 18:59:51 +00:00
Stefan Pintilie	046eff502f	[PowerPC] Make no-PIC default to match GCC - LLVM Set -fno-PIC as the default option. Differential Revision: https://reviews.llvm.org/D53383 llvm-svn: 347069	2018-11-16 18:36:21 +00:00
Simon Pilgrim	66f42ea6e1	[SelectionDAG] Move (repeated) SDTIntShiftDOp double shift node def to common code. NFCI. Prep work for PR39467. llvm-svn: 347067	2018-11-16 17:50:59 +00:00
Simon Pilgrim	bcd6631a2a	[X86][SSE] Move number of input limit out of resolveTargetShuffleInputs. Only combineX86ShufflesRecursively needs this limit. llvm-svn: 347054	2018-11-16 15:01:05 +00:00
Roman Lebedev	90c5b3f78e	[X86] X86DAGToDAGISel::matchBitExtract(): extract 'lshr' from `X` Summary: As discussed in previous review, and noted in the FIXME, if `X` is actually an `lshr Y, Z` (logical!), we can fold the `Z` into 'control`, and let the `BEXTR` do this too. We could just insert those 8 bits of shift amount into control, but it is better to instead zero-extend them, and 'or' them in place. We can only do this for `lshr`, not `ashr`, because we do not know that the mask cover only the bits of `Y`, and not any of the sign-extended bits. The obvious question is, is this actually legal to do? I believe it is. Relevant quotes, from `Intel® 64 and IA-32 Architectures Software Developer’s Manual`, `BEXTR — Bit Field Extract`: * `Bit 7:0 of the second source operand specifies the starting bit position of bit extraction.` * `A START value exceeding the operand size will not extract any bits from the second source operand.` * `Only bit positions up to (OperandSize -1) of the first source operand are extracted.` * `All higher order bits in the destination operand (starting at bit position LENGTH) are zeroed.` * `The destination register is cleared if no bits are extracted.` FIXME: if we can do this, i wonder if we should prefer `BEXTR` over `BZHI` in such cases. Reviewers: RKSimon, craig.topper, spatel, andreadb Reviewed By: RKSimon, craig.topper, andreadb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54095 llvm-svn: 347048	2018-11-16 13:04:54 +00:00
Alex Bradbury	b4a64cede8	[RISCV][NFC] Define and use the new CA instruction format The RISC-V ISA manual was updated on 2018-11-07 (commit 00557c3) to define a new compressed instruction format, RVC format CA (no actual instruction encodings were changed). This patch updates the RISC-V backend to define the new format, and to use it in the relevant instructions. Differential Revision: https://reviews.llvm.org/D54302 Patch by Luís Marques. llvm-svn: 347043	2018-11-16 10:33:23 +00:00
Alex Bradbury	2146e8fb1e	[RISCV] Constant materialisation for RV64I This commit introduces support for materialising 64-bit constants for RV64I, making use of the RISCVMatInt::generateInstSeq helper in order to share logic for immediate materialisation with the MC layer (where it's used for the li pseudoinstruction). test/CodeGen/RISCV/imm.ll is updated to test RV64, and gains new 64-bit constant tests. It would be preferable if anyext constant returns were sign rather than zero extended (see PR39092). This patch simply adds an explicit signext to the returns in imm.ll. Further optimisations for constant materialisation are possible, most notably for mask-like values which can be generated my loading -1 and shifting right. A future patch will standardise on the C++ codepath for immediate selection on RV32 as well as RV64, and then add further such optimisations to RISCVMatInt::generateInstSeq in order to benefit both RV32 and RV64 for codegen and li expansion. Differential Revision: https://reviews.llvm.org/D52962 llvm-svn: 347042	2018-11-16 10:14:16 +00:00
Anton Korobeynikov	411773d227	[MSP430] Add support for .refsym directive Introduces support for '.refsym' assembler directive. From GCC docs (for MSP430): '.refsym' - This directive instructs assembler to add an undefined reference to the symbol following the directive. No relocation is created for this symbol; it will exist purely for pulling in object files from archives. Patch by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D54618 llvm-svn: 347041	2018-11-16 09:50:24 +00:00
Craig Topper	079c37da58	[X86] Add custom type legalization for v2i8/v4i8/v8i8 mul under -x86-experimental-vector-widening. By early promoting the multiply to use an i16 element type we can avoid op legalization emit a second multiply for the 8 upper elements of the v16i8 type we would otherwise get. llvm-svn: 347032	2018-11-16 06:15:21 +00:00
Matt Arsenault	eabb8dd015	AMDGPU: Fix analyzeBranch failing with pseudoterminators If a block had one of the _term instructions used for gluing exec modifying instructions to the end of the block, analyzeBranch would fail, preventing the verifier from catching a broken successor list. llvm-svn: 347027	2018-11-16 05:03:02 +00:00
Craig Topper	5802b82b40	[X86] Use ANY_EXTEND instead of SIGN_EXTEND in the AVX2 and later path for legalizing vXi8 multiply. We aren't going to use the upper bits of the multiply result that the extend would effect. So we don't need a specific type of extend. This makes some reduction test cases shorter because we were previously trying to sign_extend a truncate which we can't eliminate. llvm-svn: 347011	2018-11-16 01:16:59 +00:00
Craig Topper	1acafd863f	[X86] Update a couple comments to remove a mention of a sign extending that no longer happens. NFC llvm-svn: 347010	2018-11-16 01:16:51 +00:00
Ron Lieberman	cac749ac88	[AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST Add a pass to fixup various vector ISel issues. Currently we handle converting GLOBAL_{LOAD\|STORE}_* and GLOBAL_Atomic_* instructions into their _SADDR variants. This involves feeding the sreg into the saddr field of the new instruction. llvm-svn: 347008	2018-11-16 01:13:34 +00:00
Heejin Ahn	095796a391	[WebAssembly] Split BBs after throw instructions Summary: `throw` instruction is a terminator in wasm, but BBs were not splitted after `throw` instructions, causing machine instruction verifier to fail. This patch - Splits BBs after `throw` instructions in WasmEHPrepare and adding an unreachable instruction after `throw`, which will be deleted in LateEHPrepare pass - Refactors WasmEHPrepare into two member functions - Changes the semantics of `eraseBBsAndChildren` in LateEHPrepare pass to match that of WasmEHPrepare pass, which is newly added. Now `eraseBBsAndChildren` does not delete BBs with remaining predecessors. - Fixes style nits, making static function names conform to clang-tidy - Re-enables the test temporarily disabled by rL346840 && rL346845 Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54571 llvm-svn: 347003	2018-11-16 00:47:18 +00:00
Ron Lieberman	2f5683e6b0	[AMDGPU] NFC Test commit llvm-svn: 347002	2018-11-16 00:46:51 +00:00
Konstantin Zhuravlyov	af7b5d7092	AMDHSA: More code object v3 fixes: - Make sure IsaInfo::hasCodeObjectV3 returns true only for AMDHSA - Update assembler metadata tests to use v2 by default llvm-svn: 347001	2018-11-15 23:14:23 +00:00
Craig Topper	22bfa99448	[X86] Remove ANY_EXTEND special case from canReduceVMulWidth Removing this code doesn't affect any lit tests so it doesn't appear to be tested anymore. I assume it was when it was added, but I guess something else changed? Code coverage report also says its unused. I mostly didn't like that it seemed to count the sign bits as if it was a sign_extend, but then set isPositive as if it was a zero_extend. It feels like we should have picked one interpretation? Differential Revision: https://reviews.llvm.org/D54596 llvm-svn: 346995	2018-11-15 21:19:32 +00:00
Craig Topper	b144c7a6fb	[X86] Minor cleanup to getExtendInVec. NFCI Use unsigned to calculate the subvector index to avoid a cast. Remove an unnecessary condition and replace it with a stronger assert. Use the InVT variable we updated when we extracted instead of grabbing it from the In SDValue. llvm-svn: 346983	2018-11-15 19:20:22 +00:00
Craig Topper	73bb04ab6f	[X86] Add -x86-experimental-vector-widening support to reduceVMULWidth and combineMulToPMADDWD In reduceVMULWidth, we no longer need to worry about extending the vector to 128 bits first. Regular widening of extends, muls and shuffles will take care of that for us. In combineMulToPMADDWD, we can handle v2i32 multiplies and allow the VPMADDWD to be widened to v4i32 during type legalization by adding custom widening like we do have for AVG/ADDUS/SUBUS. I had to modify that code a little to allow different and output VTs. Differential Revision: https://reviews.llvm.org/D54512 llvm-svn: 346980	2018-11-15 18:59:31 +00:00
Thomas Lively	fc3163b67a	[WebAssembly] Fix return type of nextByte Summary: The old return type did not allow for correct error reporting and was causing a compiler warning. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54586 llvm-svn: 346979	2018-11-15 18:56:49 +00:00
Simon Pilgrim	0db8cb0147	[X86] Fix MCNullStreamer support for modules with a CodeView flag This fixes -filetype=null support when compiling for a Win32 target and the module has a CodeView flag. The only places changed are the uses of getTargetStreamer function - this patch guards both of them with null checks. Committed on behalf of @eush (Eugene Sharygin) Differential Revision: https://reviews.llvm.org/D54008 llvm-svn: 346962	2018-11-15 15:17:15 +00:00
Alex Bradbury	f809d89980	[RISCV] Mark C.EBREAK instruction as having side effects C.EBREAK was defined with hasSideEffects = 0, which is incorrect and inconsistent with the non-compressed instruction form. This patch corrects this oversight. This wouldn't cause codegen issues, as compressed instructions are only ever generated by converting the non-compressed form as an MCInst. But having correct flags is still worthwhile. Differential Revision: https://reviews.llvm.org/D54256 Patch by Luís Marques. llvm-svn: 346959	2018-11-15 14:52:24 +00:00
Alex Bradbury	7727240438	[RISCV] Mark FREM as Expand Mark the FREM SelectionDAG node as Expand, which is necessary in order to support the frem IR instruction on RISC-V. This is expanded into a library call. Adds the corresponding test. Previously, this would have triggered an assertion at instruction selection time. Differential Revision: https://reviews.llvm.org/D54159 Patch by Luís Marques. llvm-svn: 346958	2018-11-15 14:46:11 +00:00
Anton Korobeynikov	f0001f4186	Add missed files from prev. commit llvm-svn: 346949	2018-11-15 12:35:04 +00:00
Anton Korobeynikov	49045c6a0d	[MSP430] Add MC layer Reapply r346374 with the fixes for modules build. Original summary: This change implements assembler parser, code emitter, ELF object writer and disassembler for the MSP430 ISA. Also, more instruction forms are added to the target description. Patch by Michael Skvortsov! llvm-svn: 346948	2018-11-15 12:29:43 +00:00
Alex Bradbury	22c091fc3c	[RISCV] Introduce the RISCVMatInt::generateInstSeq helper Logic to load 32-bit and 64-bit immediates is currently present in RISCVAsmParser::emitLoadImm in order to support the li pseudoinstruction. With the introduction of RV64 codegen, there is a greater benefit of sharing immediate materialisation logic between the MC layer and codegen. The generateInstSeq helper allows this by producing a vector of simple structs representing the chosen instructions. This can then be consumed in the MC layer to produce MCInsts or at instruction selection time to produce appropriate SelectionDAG node. Sharing this logic means that both the li pseudoinstruction and codegen can benefit from future optimisations, and that this logic can be used for materialising constants during RV64 codegen. This patch does contain a behaviour change: addi will now be produced on RV64 when no lui is necessary to materialise the constant. In that case addiw takes x0 as the source register, so is semantically identical to addi. Differential Revision: https://reviews.llvm.org/D52961 llvm-svn: 346937	2018-11-15 10:11:31 +00:00
Craig Topper	553ac560aa	[X86] Add some custom type legalization rules for truncate with -x86-experimental-vector-widening-legalization. This avoids some nasty shuffles when we have avx512. It will also prevent using zmm truncate instructions when a ymm instruction that zeroes part of an xmm register will do. Also avoid using avx512 truncate instructions when the input is 128 bits or less. These instructions are 2 uops on skx so we can probably find a better single uop shuffle like pshufb. llvm-svn: 346936	2018-11-15 08:23:40 +00:00
Thomas Lively	77b33c86f5	[WebAssembly] Renumber SIMD bitwise instructions Summary: Changed to match https://github.com/WebAssembly/simd/pull/54. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54561 llvm-svn: 346931	2018-11-15 03:38:59 +00:00
Konstantin Zhuravlyov	a25e0524c0	AMDGPU: Enable code object v3 for AMDHSA only Differential Revision: https://reviews.llvm.org/D54186 llvm-svn: 346923	2018-11-15 02:32:43 +00:00
Craig Topper	ea6ced9d1a	[X86] Don't mark SEXTLOADS with narrow types as Custom with -x86-experimental-vector-widening-legalization. The narrow types end up requesting widening, but generic legalization will end up scalaring and using a build_vector to do the widening. llvm-svn: 346916	2018-11-15 00:21:41 +00:00
Benjamin Kramer	6b7d6fe079	[X86] Remove unused variable llvm-svn: 346909	2018-11-14 23:13:27 +00:00
Craig Topper	0b2089da4b	[X86] Support v2i32/v4i16/v8i8 load/store using f64 on 32-bit targets under -x86-experimental-vector-widening-legalization. On 64-bit targets the type legalizer will use i64 to legalize these. But when i64 isn't legal, the type legalizer won't try an FP type. So do it manually instead. There are a few regressions in here due to some v2i32 operations like mul and div now being reassembled into a full vector just to store instead of storing the pieces. But this was already occuring in 64-bit mode so its not a new issue. llvm-svn: 346908	2018-11-14 23:02:09 +00:00
Jessica Paquette	27e1754fc9	[MachineOutliner][NFC] Don't compute liveness if X16/X17/NZCV are unused Using the MBB flags, we can tell if X16/X17/NZCV are unused in a block, and also not live out. If this holds for all MBBs, then we can avoid checking for liveness on that candidate. Furthermore, if it holds for an individual candidate's MBB, then we can avoid checking for liveness on that candidate. llvm-svn: 346901	2018-11-14 22:23:38 +00:00
Nirav Dave	1241dcb3cf	Bias physical register immediate assignments The machine scheduler currently biases register copies to/from physical registers to be closer to their point of use / def to minimize their live ranges. This change extends this to also physical register assignments from immediate values. This causes a reduction in reduction in overall register pressure and minor reduction in spills and indirectly fixes an out-of-registers assertion (PR39391). Most test changes are from minor instruction reorderings and register name selection changes and direct consequences of that. Reviewers: MatzeB, qcolombet, myatsina, pcc Subscribers: nemanjai, jvesely, nhaehnle, eraman, hiraditya, javed.absar, arphaman, jfb, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D54218 llvm-svn: 346894	2018-11-14 21:11:53 +00:00
Aakanksha Patil	1a60116b5c	AMDGPU: Additional pattern for i16 median3 matching min(max(a, b), max(min(a, b), c)) Differential Revision: https://reviews.llvm.org/D54494 llvm-svn: 346886	2018-11-14 20:10:41 +00:00
Craig Topper	6c94264b1f	[X86] Allow pmulh to be formed from narrow vXi16 vectors under -x86-experimental-vector-widening-legalization Narrower vectors will be widened to 128 bits without changing the element size. And generic type legalization can already handle widening mulhu/mulhs. Differential Revision: https://reviews.llvm.org/D54513 llvm-svn: 346879	2018-11-14 18:16:21 +00:00
Simon Pilgrim	cdb170794b	[CostModel] Add generic expansion funnel shift cost support Add support for the expansion of funnelshift/rotates to getIntrinsicInstrCost. This also required us to move the X86 fshl/fshr costs to the same place as the rotates to avoid expansion and get correct scalarization vs vectorization costs. llvm-svn: 346854	2018-11-14 12:24:50 +00:00
Simon Pilgrim	7501780ec6	[X86][AVX512] Remove constant pool shuffle decoding from SelectionDAG This patch removes the last use of the constant pool shuffle decode helper and consistently uses the 'getTargetShuffleMaskIndices' versions instead. The constant pool versions are now purely used for assembly comments. The avx512vbmi intrinsic upgrades had to be altered as they were being decoded as broadcasts, similar to what I fixed in rL346032. I don't think the change is critical - although its annoying that we lose the {k}{z} instruction test coverage as they are tricky to generate.... Differential Revision: https://reviews.llvm.org/D54083 llvm-svn: 346850	2018-11-14 11:26:35 +00:00
Heejin Ahn	da419bdb5e	[WebAssembly] Add support for the event section Summary: This adds support for the 'event section' specified in the exception handling proposal. (This was named 'exception section' first, but later renamed to 'event section' to take possibilities of other kinds of events into consideration. But currently we only store exception info in this section.) The event section is added between the global section and the export section. This is for ease of validation per request of the V8 team. This patch: - Creates the event symbol type, which is a weak symbol - Makes 'throw' instruction take the event symbol '__cpp_exception' - Adds relocation support for events - Adds WasmObjectWriter / WasmObjectFile (Reader) support - Adds obj2yaml / yaml2obj support - Adds '.eventtype' printing support Reviewers: dschuff, sbc100, aardappel Subscribers: jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54096 llvm-svn: 346825	2018-11-14 02:46:21 +00:00
Zi Xuan Wu	6a3c279d1c	[PowerPC] Enhance the selection(ISD::VSELECT) of vector type To make ISD::VSELECT available(legal) so long as there are altivec instruction, otherwise it's default behavior is expanding, which is legalized at type-legalization phase. Use xxsel to match vselect if vsx is open, or use vsel. Differential Revision: https://reviews.llvm.org/D49531 llvm-svn: 346824	2018-11-14 02:34:45 +00:00
Jessica Paquette	4e97ec94d9	[MachineOutliner][NFC] Use flags set in all candidates to check for calls If we keep track of if the ContainsCalls bit is set in the MBB flags for each candidate, then we have a better chance of not checking the candidate for calls at all. This saves quite a few checks in some CTMark tests (~200 in Bullet, for example.) llvm-svn: 346816	2018-11-13 23:41:31 +00:00
Jessica Paquette	cad864d49e	[MachineOutliner][NFC] Use MBB flags to avoid call checks in getOutliningInfo We already determine a bunch of information about an MBB in getMachineOutlinerMBBFlags. We can reuse that information to avoid calculating things that must be false/true. The first thing we can easily check is if an outlined sequence could ever contain calls. There's no reason to walk over the outlined range, checking for calls, if we already know that there are no calls in the block containing the sequence. llvm-svn: 346809	2018-11-13 23:01:34 +00:00
Jessica Paquette	b2d53c5d7d	[MachineOutliner][NFC] Exit getOutliningType if there are < 2 candidates Since we never outline anything with fewer than 2 occurrences, there's no reason to compute cost model information if there's less than that. llvm-svn: 346803	2018-11-13 22:16:27 +00:00
Stanislav Mekhanoshin	bcb34ac2ea	[AMDGPU] combine extractelement into several selects An extractelement with non-constant index will be lowered either to scratch or movrel loop in most cases. This patch converts such instruction into a set of selects if vector size is not too big. Differential Revision: https://reviews.llvm.org/D54351 llvm-svn: 346800	2018-11-13 21:18:21 +00:00
Craig Topper	aca8390216	[SelectionDAG][X86] Relax restriction on the width of an input to _EXTEND_VECTOR_INREG. Use them and regular _EXTEND to replace the X86 specific VSEXT/VZEXT opcodes Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output. This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes. X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code. I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements. The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits. Differential Revision: https://reviews.llvm.org/D54346 llvm-svn: 346784	2018-11-13 19:45:21 +00:00
Sam Clegg	f98ba05f3d	[WebAssembly] Fix broken assumption that all bitcasts are to functions types Specifically, we can bitcast to void. Fixes PR39591 Differential Revision: https://reviews.llvm.org/D54447 llvm-svn: 346778	2018-11-13 19:14:02 +00:00
Simon Pilgrim	e827fe09b3	[CostModel][X86] Fix constant vector XOP rights shifts We'll constant fold these cases so they are as cheap as vector left shift cases. Noticed while improving funnel shift costs. llvm-svn: 346760	2018-11-13 16:40:10 +00:00
Simon Pilgrim	72a7fbc1a3	Fix comment for XOP rotates. NFCI. llvm-svn: 346753	2018-11-13 12:09:27 +00:00
Alexander Richardson	4eb93907f7	Fix modules build of AVRAsmParser.cpp Summary: Without this change I get the following error: lib/Target/AVR/AVRGenAsmMatcher.inc:1135:1: error: redundant #include of module 'LLVM_Utils.Support.Format' appears within namespace 'llvm' [-Wmodules-import-nested-redundant] Reviewers: dylanmckay Reviewed By: dylanmckay Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53425 llvm-svn: 346750	2018-11-13 10:54:44 +00:00
Jonas Paulsson	f9b2b5e67e	[SystemZ] Increase the number of VLREPs If a loaded value is replicated it is best to combine these two operations into a VLREP (load and replicate), but isel will not produce this if the load has other users as well. This patch handles this by putting the other users of the load to use the REPLICATE 0-element instead of the load. This way the load has only the REPLICATE node as user, and we get a VLREP. Review: Ulrich Weigand https://reviews.llvm.org/D54264 llvm-svn: 346746	2018-11-13 08:37:09 +00:00
Jessica Paquette	106946329d	[MachineOutliner][NFC] Simplify isMBBSafeToOutlineFrom check in AArch64 outliner Turns out it's way simpler to do this check with one LRU. Instead of maintaining two, just keep one. Check if each of the registers is available, and then check if it's a live out from the block. If it's a live out, but available in the block, we know we're in an unsafe case. llvm-svn: 346721	2018-11-13 00:32:09 +00:00
Jessica Paquette	82d9c0a3fa	[MachineOutliner][NFC] Change getMachineOutlinerMBBFlags to isMBBSafeToOutlineFrom Instead of returning Flags, return true if the MBB is safe to outline from. This lets us check for unsafe situations, like say, in AArch64, X17 is live across a MBB without being defined in that MBB. In that case, there's no point in performing an instruction mapping. llvm-svn: 346718	2018-11-12 23:51:32 +00:00
Simon Pilgrim	e565e5a962	[X86][SSE] Add lowerVectorShuffleAsByteRotateAndPermute (PR39387) This patch adds the ability to use a PALIGNR to rotate a pair of inputs to select a range containing all the referenced elements, followed by a single input permute to put them in the right location. Differential Revision: https://reviews.llvm.org/D54267 llvm-svn: 346706	2018-11-12 21:12:38 +00:00
Aakanksha Patil	a992c694c6	AMDGPU: Adding more median3 patterns min(max(a, b), max(min(a, b), c)) -> med3 a, b, c Differential Revision: https://reviews.llvm.org/D54331 llvm-svn: 346704	2018-11-12 21:04:06 +00:00
Wouter van Oortmerssen	cc75e77df5	[WebAssembly] Added WasmAsmParser. Summary: This is to replace the ELFAsmParser that WebAssembly was using, which so far was a stub that didn't do anything, and couldn't work correctly with wasm. This new class is there to implement generic directives related to wasm as a binary format. Wasm target specific directives are still parsed in WebAssemblyAsmParser as before. The two classes now cooperate more correctly too. Also implemented .result which was missing. Any unknown directives will now result in errors. Reviewers: dschuff, sbc100 Subscribers: mgorny, jgravelle-google, eraman, aheejin, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54360 llvm-svn: 346700	2018-11-12 20:15:01 +00:00
Craig Topper	c48712b341	[X86] In LowerMULH, use generic truncate and vector shuffle nodes instead of directly emitting PACKUS. Truncate and shuffle lowering are already capable of matching to PACKUS using known bits analysis. This features one test change where we now prefer to extend v16i16->v16i32 then trunc v16i32->v16i8 over extract_subvector+packus when avx512f is available, but avx512bw is not. llvm-svn: 346697	2018-11-12 19:37:29 +00:00
Stanislav Mekhanoshin	e86c8d33b1	[AMDGPU] Optimize S_CBRANCH_VCC[N]Z -> S_CBRANCH_EXEC[N]Z Sometimes after basic block placement we end up with a code like: sreg = s_mov_b64 -1 vcc = s_and_b64 exec, sreg s_cbranch_vccz This happens as a join of a block assigning -1 to a saved mask and another block which consumes that saved mask with s_and_b64 and a branch. This is essentially a single s_cbranch_execz instruction when moved into a single new basic block. Differential Revision: https://reviews.llvm.org/D54164 llvm-svn: 346690	2018-11-12 18:48:17 +00:00
Simon Pilgrim	93c64e5c76	[CostModel][X86] Add funnel shift rotation special case costs When we repeat the 2 shifting operands then this is a bit rotation - annoyingly this has to be done in the other getIntrinsicInstrCost than most intrinsics as we need to check the operands are the same. llvm-svn: 346688	2018-11-12 18:27:54 +00:00
Simon Pilgrim	49e93d2f0e	[CostModel][X86] Add SHLD/SHRD scalar funnel shift costs The costs match the typical reg-reg cases - the RMW case can be a lot slower but we don't model that at this level llvm-svn: 346683	2018-11-12 17:56:59 +00:00
Simon Pilgrim	f4cd292ba2	[CostModel][X86] SK_ExtractSubvector is cheap if the (legal) subvector is aligned within the source vector llvm-svn: 346664	2018-11-12 15:48:06 +00:00
Jonas Paulsson	5cea85dd59	[SystemZ::TTI] Improve accuracy of costs for vector fp <-> int conversions Improve getCastInstrCost() by respecting the different types of Src and Dst for vector integer <-> fp conversions. This means that extracting from integer becomes more expensive (by the extraction penalty), and the extraction from fp becomes cheaper (no longer has a false extraction penalty). Review: Ulrich Weigand https://reviews.llvm.org/D54423 llvm-svn: 346663	2018-11-12 15:32:27 +00:00
Alex Bradbury	9c03e4cacd	[RISCV] Support .option relax and .option norelax This extends the .option support from D45864 to enable/disable the relax feature flag from D44886 During parsing of the relax/norelax directives, the RISCV::FeatureRelax feature bits of the SubtargetInfo stored in the AsmParser are updated appropriately to reflect whether relaxation is currently enabled in the parser. When an instruction is parsed, the parser checks if relaxation is currently enabled and if so, gets a handle to the AsmBackend and sets the ForceRelocs flag. The AsmBackend uses a combination of the original RISCV::FeatureRelax feature bits set by e.g -mattr=+/-relax and the ForceRelocs flag to determine whether to emit relocations for symbol and branch diffs. Diff relocations should therefore only not be emitted if the relax flag was not set on the command line and no instruction was ever parsed in a section with relaxation enabled to ensure correct diffs are emitted. Differential Revision: https://reviews.llvm.org/D46423 Patch by Lewis Revill. llvm-svn: 346655	2018-11-12 14:25:07 +00:00
Jonas Paulsson	c0ee028dc3	[SystemZ] Replicate the load with most uses in buildVector() Iterate over all elements and count the number of uses among them for each used load. Then make sure to REPLICATE the load which has the most uses in order to minimize the number of needed element insertions. Review: Ulrich Weigand https://reviews.llvm.org/D54322 llvm-svn: 346637	2018-11-12 08:12:20 +00:00
Craig Topper	2eab39f77b	[X86] Use DAG.getConstant instead of getZeroVector. llvm-svn: 346605	2018-11-11 07:24:36 +00:00
Craig Topper	ef33a190bc	[X86] Replace calls to getOnesVector/getZeroVector with getConstant. getConstant will create a BUILD_VECTOR for us and use a legal type if necessary. So just create the simple node and let BUILD_VECTOR legalization do the canonicalization. llvm-svn: 346603	2018-11-11 01:40:04 +00:00
Sanjay Patel	0a515595a7	[x86] allow vector load narrowing with multi-use values This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision: https://reviews.llvm.org/D54073 llvm-svn: 346595	2018-11-10 20:05:31 +00:00
Benjamin Kramer	37c691e867	[X86] Remove unused variable llvm-svn: 346592	2018-11-10 18:11:11 +00:00
Craig Topper	7956a256e9	[X86] Remove apparently unneeded code from combineVSZext. No lit tests fail with this code removed. This is a pre-commit for D54346. llvm-svn: 346590	2018-11-10 17:44:28 +00:00
Simon Pilgrim	d3ca710ec9	[CostModel][X86] SK_ExtractSubvector costs must only be tested for vector types (PR39615) llvm-svn: 346589	2018-11-10 17:37:52 +00:00
Roman Lebedev	b428b8b214	[X86][BdVer2] Fix loads/stores throughput for Piledriver (PR39465) There are two AGU units, and per 1cy, there can be either two loads, or a load and a store; but not two stores, or two loads and a store. Additionally, loads shouldn't affect the store scheduler and vice versa. (but should affect the PdEX scheduler.) Required rL346545. Fixes https://bugs.llvm.org/show_bug.cgi?id=39465 llvm-svn: 346587	2018-11-10 14:31:43 +00:00
Craig Topper	a1b6667c6a	[X86] Use a MOVSX instruction instead of a MOVZX instruction in isel for an any_extend of the remainder from an 8-bit sdivrem. The sdivrem will emit its own MOVSX to move %ah to the low byte of a register. By using a MOVSX for an any_extend this allows a post-isel peephole to merge them. llvm-svn: 346581	2018-11-10 06:04:33 +00:00
Craig Topper	0364085281	[X86] In LowerHorizontalByteSum, emit vector_shuffle nodes instead of directly using X86ISD::UNPCKL/X86ISD::UNPCKH. This gives shuffle lowering the freedom to use zero_extend_vector_inreg for the unpckl shuffle. Shuffle combining usually makes this swap later, but not when AVX512 is enabled it seems. While there also use DAG.getConstant to create a 0 vector instead of using the helper the forces a specific BUILD_VECTOR. I don't think that helper is usually needed. We're basically free to create a constant build_vector anytime and it will be legalized on its own. llvm-svn: 346574	2018-11-10 00:26:42 +00:00
Thomas Lively	936734b777	[WebAssembly] Update bleeding-edge cpu features Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D54362 llvm-svn: 346570	2018-11-10 00:11:14 +00:00
Eli Friedman	ad1151cf6a	[ARM64] [Windows] Handle funclets This patch adds support for funclets in frame lowering and ISel lowering. Together with D50288 and D50166, it enables C++ exception handling. Patch by Sanjin Sijaric, with some fixes by me. Differential Revision: https://reviews.llvm.org/D51524 llvm-svn: 346568	2018-11-09 23:33:30 +00:00
Eli Friedman	0bbb0d0720	[ARM] Add MemOperand to LDRcp to enable DCE. LDRcp should be deleted when the dest register is dead in register coalescing. Without MemOp, dead LDRcp will cause dead constant pool value which references to non-existing label. Patch by Yin Ma. Differential Revision: https://reviews.llvm.org/D54173 llvm-svn: 346563	2018-11-09 23:09:17 +00:00
Craig Topper	17d64c71c5	[X86] Move the promotion of v16i16->v16i8 for avx512f but not avx512bw from lowering to isel. Change to use vpmovzx instead of vpmovsx. With avx512f but not avx512bw we need to extend to v16i32 then truncate that to to v16i8. Previously we emitted both nodes during lowering, but I'm trying to switch to using target independent nodes and with that switched the extend+truncate wou This patch changes the implementation to what will be necessary with that patch which helps minimize test diffs. llvm-svn: 346552	2018-11-09 20:09:53 +00:00
Bryan Chan	123553921f	[AArch64] Support HiSilicon's TSV110 processor Reviewers: t.p.northover, SjoerdMeijer, kristof.beyls Reviewed By: kristof.beyls Subscribers: olista01, javed.absar, kristof.beyls, kristina, llvm-commits Differential Revision: https://reviews.llvm.org/D53908 llvm-svn: 346546	2018-11-09 19:32:08 +00:00
Fangrui Song	60b7fb46e1	[Hexagon] Fix some -Wunused-function with LLVM_DUMP_METHOD and -Wunused-variable llvm-svn: 346543	2018-11-09 19:24:48 +00:00
Craig Topper	731ea7dbc1	[X86] Turn X86ISD::VSEXT into X86ISD::VZEXT if the upper bits aren't demanded. This makes X86ISD::VSEXT more similar to ISD::SIGN_EXTEND and ISD::ZERO_EXTEND. I'm hoping to replace X86ISD::VSEXT/VZEXT with target independent nodes. Making the target specific nodes similar to the target independent nodes helps minimize test diffs in that patch. llvm-svn: 346539	2018-11-09 19:05:51 +00:00
Simon Pilgrim	fc8f1d7da7	[CostModel][X86] SK_ExtractSubvector is free if the subvector is at the start of the source vector llvm-svn: 346538	2018-11-09 19:04:27 +00:00
Jordan Rupprecht	c1741a5a8a	[Hexagon] Fix unused variable warning in release builds llvm-svn: 346537	2018-11-09 18:54:27 +00:00
Fangrui Song	4955066366	[WebAssembly] Hotfix of WebAssemblyInstructionTableSize after rL346465 llvm-svn: 346535	2018-11-09 18:32:20 +00:00
Brendon Cahoon	ac8fed68d5	[Hexagon] Implement noreturn optimization Eliminate the stack frame in functions with the noreturn nounwind attributes, and when the noreturn-stack-elim target feature is enabled. This reduces the code and stack space needed for noreturn functions. Differential Revision: https://reviews.llvm.org/D54210 llvm-svn: 346532	2018-11-09 18:16:24 +00:00
Stanislav Mekhanoshin	13d3371e68	[AMDGPU] Always pass TRI into findRegister[Use/Def]OperandIdx This only covers AMDGPU BE, hopefully all occurrences. Differential Revision: https://reviews.llvm.org/D54235 llvm-svn: 346528	2018-11-09 17:58:59 +00:00
Krzysztof Parzyszek	8567de0871	[Hexagon] Place globals with explicit .sdata section in small data Both -fPIC and -G0 disable placement of globals in small data section, but if a global has an explicit section assigmnent placing it in small data, it should go there anyway. llvm-svn: 346523	2018-11-09 17:31:22 +00:00
Zaara Syeda	5c179bf14b	[Power9] Allow gpr callee saved spills in prologue to vectors registers Currently in llvm, CalleeSavedInfo can only assign a callee saved register to stack frame index to be spilled in the prologue. We would like to enable spilling gprs to vector registers. This patch adds the capability to spill to other registers aside from just the stack. It also adds the changes for power9 to spill gprs to volatile vector registers when they are available. This happens only for leaf functions when using the option -ppc-enable-pe-vector-spills. Differential Revision: https://reviews.llvm.org/D39386 llvm-svn: 346512	2018-11-09 16:36:24 +00:00
Alexey Bataev	93d018a916	Revert "[DEBUGINFO, NVPTX]DO not emit ',debug' option if no debug info or only debug directives are requested." This reverts commit r345972. Need to update the description + possibly to update the patch itself after discussion with Eric Christofer. llvm-svn: 346508	2018-11-09 16:22:35 +00:00
Jonas Paulsson	458b7c0b39	[SystemZ] Avoid inserting same value after replication A minor improvement of buildVector() that skips creating an INSERT_VECTOR_ELT for a Value which has already been used for the REPLICATE. Review: Ulrich Weigand https://reviews.llvm.org/D54315 llvm-svn: 346504	2018-11-09 15:44:28 +00:00
Sam Parker	2804f32ec4	[ARM] Don't promote i1 types in ARM CGP Now that we have mixed type sizes, i1 values need to be explicitly handled as we want to avoid promoting these values. Differential Revision: https://reviews.llvm.org/D54308 llvm-svn: 346499	2018-11-09 15:06:33 +00:00
Sanjay Patel	fa1c0fe478	[x86] try to form broadcast before widening shuffle elements I noticed that we weren't generating broadcasts as much I thought we would with D54271, and this is part of the problem. Widening the shuffle elements means adding bitcasts and hiding the relationship between a splatted scalar and the vector. If we can form a broadcast, do that before going through the rest of the shuffle lowering because broadcasts should be cheap and can often be load-folded. Differential Revision: https://reviews.llvm.org/D54280 llvm-svn: 346498	2018-11-09 14:54:58 +00:00
Alex Bradbury	1cc2d0b9fb	[RISCV] Avoid unnecessary XOR for seteq/setne 0 Differential Revision: https://reviews.llvm.org/D53492 Patch by James Clarke. llvm-svn: 346497	2018-11-09 14:47:36 +00:00
Petar Avramovic	2cefaa2747	[MIPS GlobalISel] narrowScalar G_CONSTANT Legalize s64 G_CONSTANT using narrowScalar on MIPS 32. Differential Revision: https://reviews.llvm.org/D54255 llvm-svn: 346495	2018-11-09 14:21:16 +00:00
Simon Pilgrim	ea51f98b9b	[X86] Add Subtarget to more lowerVectorShuffle functions. NFCI. This will be necessary for an update to D54267 llvm-svn: 346490	2018-11-09 13:19:03 +00:00
Clement Courbet	eee2e06e2a	[llvm-exegesis][NFC] Add a way to declare the default counter binding for unbound CPUs for a target. Summary: This simplifies the code and moves everything to tablegen for consistency. This also prepares the ground for adding issue counters. Reviewers: gchatelet, john.brawn, jsji Subscribers: nemanjai, mgorny, javed.absar, kbarton, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54297 llvm-svn: 346489	2018-11-09 13:15:32 +00:00
Clement Courbet	e6b727e552	[X86] Fix VZEROUPPER scheduling info on SNB,HSW,BDW,SXL,SKX. Summary: Starting from SNB, VZEROUPPER is handled by the renamer and uses no proc resources. After HSW, it also has zero latency. This fixes PR35606. To reproduce: Uops: llvm-exegesis -mode=uops -opcode-name=VZEROUPPER Latency: echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper' \| /tmp/llvm-exegesis -mode=latency -snippets-file=- echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper\naddps %xmm0, %xmm1' \| /tmp/llvm-exegesis -mode=latency -snippets-file=- Reviewers: RKSimon, craig.topper, andreadb Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D54107 llvm-svn: 346482	2018-11-09 09:49:06 +00:00
Sam Parker	08979cd125	[ARM] Enable mixed types in ARM CGP Previously, during the search, all values had to have the same 'TypeSize', which is equal to number of bits of the integer type of the icmp operand. All values in the tree had to match this size; meaning that, if we searched from i16, we wouldn't accept i8s. A change in type size requires zext and truncs to perform the casts so, to allow mixed narrow types, the handling of these instructions is now slightly different: - we allow casts if their result or operand is <= TypeSize. - zexts are sinks if their result > TypeSize. - truncs are still sinks if their operand == TypeSize. - truncs are still sources if their result == TypeSize. The transformation bails on finding an icmp that operates on data smaller than the current TypeSize. Differential Revision: https://reviews.llvm.org/D54108 llvm-svn: 346480	2018-11-09 09:28:27 +00:00
Sam Parker	453ba916a0	[ARM] Small reorganisation in ARMParallelDSP A few code movement things: - AreSymmetrical is now a method of BinOpChain. - Created a lambda in CreateParallelMACPairs to reduce loop nesting. - A Reduction object now gets pasted in a couple of places instead, including CreateParallelMACPairs so it doesn't need to return a value. I've also added RecordSequentialLoads, which is run before the transformation begins, and caches the interesting loads. This can then be queried later instead of cross checking many load values. Differential Revision: https://reviews.llvm.org/D54254 llvm-svn: 346479	2018-11-09 09:18:00 +00:00
Mandeep Singh Grang	397765bc51	[COFF, ARM64] Add support for MSVC buffer security check Reviewers: rnk, mstorsjo, compnerd, efriedma, TomTan Reviewed By: rnk Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D54248 llvm-svn: 346469	2018-11-09 02:48:36 +00:00
Thomas Lively	2faf079494	[WebAssembly] Read prefixed opcodes as ULEB128s Summary: Depends on D54126. Reviewers: aheejin, dschuff, aardappel Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54138 llvm-svn: 346465	2018-11-09 01:57:00 +00:00
Thomas Lively	4ddd22581e	[WebAssembly][NFC] Reorder SIMD section Summary: Reorders the sections in the SIMD tablegen file to roughly match the new opcode ordering. Depends on D54126. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54134 llvm-svn: 346464	2018-11-09 01:49:19 +00:00
Thomas Lively	299d214aba	[WebAssembly] Renumber and LEB128-encode SIMD opcodes Reviewers: aheejin, dschuff, aardappel Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54126 llvm-svn: 346463	2018-11-09 01:45:56 +00:00
Thomas Lively	38c902bc2e	[WebAssembly] Lower select for vectors Summary: Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53675 llvm-svn: 346462	2018-11-09 01:38:44 +00:00
Heejin Ahn	0c68a875fa	[WebAssembly] Fix LowerEmscriptenEHSjLj when there's only longjmp Summary: The pass incorrectly assumed if there's a longjmp declaration in the module, there is also a setjmp function declaration. Fixed it, and now the pass only converts longjmp and does not do any other transformation when there's no setjmp declaration in the module. Fixes PR39562. Reviewers: jgravelle-google, sbc100 Subscribers: dschuff, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54273 llvm-svn: 346445	2018-11-08 22:56:26 +00:00
Sanjay Patel	b5535dc7b3	[x86] use shuffles for scalar insertion into high elements of a constant vector As discussed in D54073, we have a potential regression from more aggressive vector narrowing here, so let's try to avoid that by changing build-vector lowering slightly. Insert-vector-element lowering always does this since there's no "pinsr" for ymm/zmm: // If the vector is wider than 128 bits, extract the 128-bit subvector, insert // into that, and then insert the subvector back into the result. ...but we can sometimes do better for insert-into-constant-vector by using shuffle lowering. Differential Revision: https://reviews.llvm.org/D54271 llvm-svn: 346433	2018-11-08 19:16:27 +00:00
Davide Italiano	ac8279ab8b	Revert "[MSP430] Add MC layer" This commit broke the module buildbots. Error: lib/Target/MSP430/MSP430GenAsmMatcher.inc:1027:1: error: redundant namespace 'llvm' [-Wmodules-import-nested-redundant] ^ llvm-svn: 346410	2018-11-08 16:21:29 +00:00
Jonas Paulsson	1993894c03	[SystemZ] Bugfix in shouldCoalesce() It was discovered in randomized testing that the SystemZ implementation of shouldCoalesce() could be caused to crash when subreg liveness was enabled. This was because an undef use of the virtual register was copied outside current MBB at the point of shouldCoalesce() being called. For more details, see https://bugs.llvm.org/show_bug.cgi?id=39276. This patch changes the check for MBB locality from livein/liveout checks to do checks for all instructions of both intervals being inside MBB. This avoids the cases with dead defs / undef uses outside MBB, which are not affecting liveness in/out of MBB. The original test case included as a reduced .mir test case. Review: Ulrich Weigand https://reviews.llvm.org/D54197 llvm-svn: 346406	2018-11-08 15:29:48 +00:00
Petr Pavlu	7c84b2e3ab	[ARM] Enable spilling of the hGPR register class in Thumb2 Generalize code in Thumb2InstrInfo::storeRegToStackSlot() and loadRegToStackSlot() to allow the GPR class or any of its sub-classes (including hGPR) to be stored/loaded by ARM::t2STRi12/ARM::t2LDRi12. Differential Revision: https://reviews.llvm.org/D51927 llvm-svn: 346401	2018-11-08 13:02:10 +00:00
Anton Korobeynikov	5eb3d339d3	[MSP430] Fix encodeInstruction() for big endian hosts Reviewers: asl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54251 llvm-svn: 346391	2018-11-08 10:17:52 +00:00
Thomas Lively	897171902b	[WebAssembly] Add V128 to WebAssemblyInstrInfo::copyPhysReg Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53872 llvm-svn: 346384	2018-11-08 02:35:28 +00:00
Stanislav Mekhanoshin	6cc8b2fc65	[AMDGPU] Extend promote alloca vectorization Promote alloca can vectorize a small array by bitcasting it to a vector type. Extend vectorization for the case when alloca is already a vector type. We still want to replace GEPs with an insert/extract element instructions in this case. Differential Revision: https://reviews.llvm.org/D54219 llvm-svn: 346376	2018-11-08 00:16:23 +00:00
Anton Korobeynikov	09dff53840	[MSP430] Add MC layer Summary: This change implements assembler parser, code emitter, ELF object writer and disassembler for the MSP430 ISA. Also, more instruction forms are added to the target description. Reviewers: asl Reviewed By: asl Subscribers: pftbest, krisb, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D53661 llvm-svn: 346374	2018-11-08 00:03:45 +00:00
Eli Friedman	0917d0c80c	[AArch64] [Windows] Address post-commit review comment on r346358. In this context, usesWindowsCFI() is basically the same thing as isOSWindows(), but it makes the relevant property of the target more explicit. llvm-svn: 346366	2018-11-07 22:30:56 +00:00
Nicolai Haehnle	bc233f5523	Revert "AMDGPU: Divergence-driven selection of scalar buffer load intrinsics" This reverts commit r344696 for now (except for some test additions). See https://bugs.freedesktop.org/show_bug.cgi?id=108611. llvm-svn: 346364	2018-11-07 21:53:43 +00:00
Nicolai Haehnle	61396ff67c	AMDGPU/InsertWaitcnts: Cleanup some old cruft (NFCI) Summary: Remove redundant logic and simplify control flow. Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54086 llvm-svn: 346363	2018-11-07 21:53:36 +00:00
Nicolai Haehnle	0ab31c9c44	AMDGPU/InsertWaitcnts: Remove kill-related logic Summary: This is not needed, because we don't actually insert relevant branches for KILLs that late in the compilation flow. Besides, this was always checking for the wrong kill opcode anyway... Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54085 llvm-svn: 346362	2018-11-07 21:53:29 +00:00
Konstantin Zhuravlyov	15e90e331c	AMDGPU/NFC: Split FLAT_Global_Atomic_Pseudo into RTN/NO_RTN multiclasses llvm-svn: 346361	2018-11-07 21:42:13 +00:00
Eli Friedman	d00fb2e0a8	[AArch64] [Windows] Trap after noreturn calls. Like the comment says, this isn't the most efficient fix in terms of codesize, but it works. Differential Revision: https://reviews.llvm.org/D54129 llvm-svn: 346358	2018-11-07 21:31:14 +00:00
Konstantin Zhuravlyov	7f1959ebb3	AMDGPU/NFC: Split MUBUF_Pseudo_Atomics into RTN/NO_RTN multiclasses llvm-svn: 346357	2018-11-07 21:21:32 +00:00
Eli Friedman	7d7d41debc	[ARM] Fix CPSR liveness in tMOVCCr_pseudo lowering. The lowering was missing live-ins in certain cases, like a sequence of multiple tMOVCCr_pseudo instructions. This would lead to a verifier failure, and on pre-v6 Thumb CPSR would be incorrectly clobbered. For reasons I don't completely understand, it's hard to get a sequence of multiple tMOVCCr_pseudo instructions; the issue only seems to show up with 64-bit comparisons where the result is zero-extended. I added some extra testcases in case that changes in the future. Probably some optimization opportunities here if anyone is interested. (@test_slt_not is the case that was getting miscompiled.) The code to check the liveness of CPSR was stolen from X86ISelLowering.cpp; maybe it could be refactored into common helper, but I have no idea where to put it. Differential Revision: https://reviews.llvm.org/D54192 llvm-svn: 346355	2018-11-07 21:08:13 +00:00
Matt Arsenault	8ba740a5a8	Allow subclassing ExternalAA This allows testing AMDGPU alias analysis like any other alias analysis pass. This fixes the existing test pointlessly running opt -O3 when it really just wants to run the one analysis. Before there was no way to test this using -aa-eval with opt, since the default constructed pass is run. The wrapper subclass allows the default constructor to pass the necessary callback. llvm-svn: 346353	2018-11-07 20:26:42 +00:00
Than McIntosh	5bcdea5118	[X86] improve split-stack machine BB placement Summary: The conditional branch created to support -fsplit-stack for X86 is left unbiased/unhinted, resulting in less than ideal block placement: the __morestack call block is kept on the main hot path. Bias the branch to insure that the stack allocation block is treated as a "cold" block during machine basic block placement. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54123 llvm-svn: 346336	2018-11-07 17:41:57 +00:00
Sanjay Patel	de58e93666	fix typos aggressively; NFC llvm-svn: 346316	2018-11-07 14:35:36 +00:00
Andrea Di Biagio	4ae974e745	[X86][FixupLEA] Avoid checking target features for every single processed instruction. NFCI llvm-svn: 346309	2018-11-07 12:26:00 +00:00
Petar Avramovic	2624c8db68	[MIPS GlobalISel] Set operand order for G_MERGE and G_UNMERGE Set operands order for G_MERGE_VALUES and G_UNMERGE_VALUES so that least significant bits always go first, regardless of endianness. Differential Revision: https://reviews.llvm.org/D54098 llvm-svn: 346305	2018-11-07 11:45:43 +00:00
Evandro Menezes	f1a0d93b1d	[PATCH] [AArch64] Refactor helper functions (NFC) Refactor helper functions in AArch64InstrInfo to be static methods. llvm-svn: 346273	2018-11-06 22:17:14 +00:00
Yaxun Liu	73bf0af32f	AMDGPU: Add an option -disable-promote-alloca-to-lds Add this option for debugging and providing workaround. By default it is off so no behavior change in backend. Differential Revision: https://reviews.llvm.org/D54158 llvm-svn: 346267	2018-11-06 21:28:17 +00:00
Craig Topper	6428a2cd9a	[X86] Add custom promotion of v2i8/v2i16 fp_to_sint to avoid over promotion to v2i64 which would force scalarization. llvm-svn: 346259	2018-11-06 19:24:21 +00:00
Matthias Braun	c6613879ce	LivePhysRegs/IfConversion: Change some types from unsigned to MCPhysReg; NFC Change the type in a couple of lists and sets that only store physical registers from unsigned to MCPhysRegs. The later is only 16bits and saves us a bit of memory. llvm-svn: 346254	2018-11-06 19:00:11 +00:00
Simon Atanasyan	bb36aea1d5	[mips] Support sigrie instruction The `sigrie` instruction signals a Reserved Instruction Exception. This patch adds support for assembling / disassembling the instruction. Differential Revision: http://reviews.llvm.org/D53861 llvm-svn: 346230	2018-11-06 14:37:24 +00:00
Clement Courbet	54a1184fff	[X86][NFC] Fix comment. llvm-svn: 346226	2018-11-06 13:48:56 +00:00
Matthias Braun	96d12513a1	AArch64: Cleanup CCMP code; NFC Cleanup CCMP pattern matching code in preparation for review/bugfix: - Rename `isConjunctionDisjunctionTree()` to `canEmitConjunction()` (it won't accept arbitrary disjunctions and is really about whether we can transform the subtree into a conjunction that we can emit). - Rename `emitConjunctionDisjunctionTree()` to `emitConjunction()` llvm-svn: 346203	2018-11-06 03:15:22 +00:00
Sam Clegg	5292d17ec8	Revert "[WebAssembly] Fixup `main` signature by default" This reverts rL345880. It caused some test failures on the webassembly waterfall. e.g. binaryen2.test_mainenv fails due the fact that `envp` ends up being undef rather than 0. Differential Revision: https://reviews.llvm.org/D54117 llvm-svn: 346187	2018-11-06 00:31:02 +00:00
Matthias Braun	7a75a91b5b	MachineFunction: Store more specific reference to LLVMTargetMachine; NFC MachineFunction can only be used in code using lib/CodeGen, hence we can keep a more specific reference to LLVMTargetMachine rather than just TargetMachine around. Do the same for references in ScheduleDAG and RegUsageInfoCollector. llvm-svn: 346183	2018-11-05 23:49:14 +00:00
Craig Topper	0b5f8169b0	[TargetLowering] Change TargetLoweringBase::getPreferredVectorAction to take an MVT instead of an EVT. NFC The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit. llvm-svn: 346180	2018-11-05 23:26:13 +00:00
Konstantin Zhuravlyov	108927b944	AMDGPU: Add sram-ecc feature Differential Revision: https://reviews.llvm.org/D53222 llvm-svn: 346177	2018-11-05 22:44:19 +00:00
Craig Topper	def82a81af	[X86] Don't turn any_extend from a mask register into a sign_extend during lowering. Add patterns to match any_extend during isel instead. SimplifyDemandedBits can turn a sign_extend back into an any_extend and trigger an infinite loop. So instead legalize it the same way as a sign_extend, but preserve the opcode. Then just pattern match it the same as sign_extend during isel. I don't have a reduced test case for such an infinite loop yet. llvm-svn: 346170	2018-11-05 22:08:17 +00:00
Zaara Syeda	7509880b54	[Power9] Add support for stxvw4x.be and stxvd2x.be intrinsics On Power9, we don't have patterns to select the following intrinsics: llvm.ppc.vsx.stxvw4x.be llvm.ppc.vsx.stxvd2x.be This patch adds support for these. Differential Revision: https://reviews.llvm.org/D53581 llvm-svn: 346148	2018-11-05 17:31:26 +00:00
Stefan Maksimovic	8d7c351799	[Mips] Supplement long branch pseudo instructions Expand on LONG_BRANCH_LUi and LONG_BRANCH_(D)ADDiu pseudo instructions by creating variants which support less operands/accept GPR64Opnds as their operand in order to appease the machine verifier pass. Differential Revision: https://reviews.llvm.org/D53977 llvm-svn: 346133	2018-11-05 14:37:41 +00:00
Neil Henning	233a02d0ed	[AMDGPU] Fix the new atomic optimizer in pixel shaders. The new atomic optimizer I previously added in D51969 did not work correctly when a pixel shader was using derivatives, and had helper lanes active. To fix this we add an llvm.amdgcn.ps.live call that guards a branch around the entire atomic operation - ensuring that all helper lanes are inactive within the wavefront when we compute our atomic results. I've added a test case that can cause derivatives, and exposes the problem. Differential Revision: https://reviews.llvm.org/D53930 llvm-svn: 346128	2018-11-05 12:04:48 +00:00
Sam Parker	fec793c98f	[ARM] Turn assert into condition in ARMCGP Turn the assert in PrepareConstants into a conditon so that we can handle mul instructions with negative immediates. Differential Revision: https://reviews.llvm.org/D54094 llvm-svn: 346126	2018-11-05 11:26:04 +00:00
Sam Parker	fcd8adab30	[ARM][ARMCGP] Remove unecessary zexts and truncs r345840 slightly changed the way promotion happens which could result in zext and truncs having the same source and destination types. This fixes that issue. We can now also remove the zext and trunc in the following case: (zext (trunc (promoted op)), i32) This means that we can no longer treat a value, that is only used by a sink, to be safe to promote. I've also added in some extra asserts and replaced a cast for a dyn_cast. Differential Revision: https://reviews.llvm.org/D54032 llvm-svn: 346125	2018-11-05 10:58:37 +00:00
Dylan McKay	4c5a5c8db6	[AVR] Fix a backend bug that left extraneous operands after expansion This patch fixes a bug in the AVR FRMIDX expansion logic. The expansion would leave a leftover operand from the original FRMIDX, but now attached to a MOVWRdRr instruction. The MOVWRdRr instruction did not expect this operand and so LLVM rejected the machine instruction. This would trigger an assertion: Assertion failed: ((isImpReg \|\| Op.isRegMask() \|\| MCID->isVariadic() \|\| OpNo < MCID->getNumOperands() \|\| isMetaDataOp) && "Trying to add an operand to a machine instr that is already done!"), function addOperand, file llvm/lib/CodeGen/MachineInstr.cpp Tim fixed this so that now the FRMIDX is expanded correctly into a well-formed MOVWRdRr. Patch by Tim Neumann llvm-svn: 346117	2018-11-05 05:49:04 +00:00
Craig Topper	30b627e5c9	[X86] Custom type legalize v2i8/v2i16/v2i32 mul to use to pmuludq. v2i8/v2i16/v2i32 are promoted to v2i64. pmuludq takes a v2i64 input and produces a v2i64 output. Since we don't about the upper bits of the type legalized multiply we can use the pmuludq to produce the multiply result for the bits we do care about. llvm-svn: 346115	2018-11-05 05:02:12 +00:00
Dylan McKay	9a9ae99b30	[AVR] Disallow the LDDWRdPtrQ instruction with Z as the destination This is an AVR-specific workaround for a limitation of the register allocator that only exposes itself on targets with high register contention like AVR, which only has three pointer registers. The three pointer registers are X, Y, and Z. In most nontrivial functions, Y is reserved for the frame pointer, as per the calling convention. This leaves X and Z. Some instructions, such as LPM ("load program memory"), are only defined for the Z register. Sometimes this just leaves X. When the backend generates a LDDWRdPtrQ instruction with Z as the destination pointer, it usually trips up the register allocator with this error message: LLVM ERROR: ran out of registers during register allocation This patch is a hacky workaround. We ban the LDDWRdPtrQ instruction from ever using the Z register as an operand. This gives the register allocator a bit more space to allocate, fixing the regalloc exhaustion error. Here is a description from the patch author Peter Nimmervoll As far as I understand the problem occurs when LDDWRdPtrQ uses the ptrdispregs register class as target register. This should work, but the allocator can't deal with this for some reason. So from my testing, it seams like (and I might be totally wrong on this) the allocator reserves the Z register for the ICALL instruction and then the register class ptrdispregs only has 1 register left and we can't use Y for source and destination. Removing the Z register from DREGS fixes the problem but removing Y register does not. More information about the bug can be found on the avr-rust issue tracker at https://github.com/avr-rust/rust/issues/37. A bug has raised to track the removal of this workaround and a proper fix; PR39553 at https://bugs.llvm.org/show_bug.cgi?id=39553. Patch by Peter Nimmervoll llvm-svn: 346114	2018-11-05 05:00:44 +00:00
Craig Topper	ed6a0a817f	[X86] Add vector shift by immediate to SimplifyDemandedBitsForTargetNode. Summary: This also enables some constant folding from KnownBits propagation. This helps on some cases vXi64 case in 32-bit mode where constant vectors appear as vXi32 and a bitcast. This can prevent getNode from constant folding sra/shl/srl. Reviewers: RKSimon, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54069 llvm-svn: 346102	2018-11-04 17:31:27 +00:00
Craig Topper	1ba86188cf	[SelectionDAG] Remove special methods for creating *_EXTEND_VECTOR_INREG nodes. Move asserts into getNode. These methods were just wrappers around getNode with additional asserts (identical and repeated 3 times). But getNode already has a switch that can be used to hold these asserts that allows them to be shared for all 3 opcodes. This also enables checking on the places that create these nodes without using the wrappers. The rest of the patch is just changing all callers to use getNode directly. llvm-svn: 346087	2018-11-04 02:10:18 +00:00
Craig Topper	7aed9e600b	[X86] Update comment I forgot to change in r346043. NFC llvm-svn: 346073	2018-11-03 19:49:13 +00:00
Reid Kleckner	2bcb288ade	[codeview] Let the X86 backend tell us the VFRAME offset adjustment Use MachineFrameInfo's OffsetAdjustment field to pass this information from the target to CodeViewDebug.cpp. The X86 backend doesn't use it for any other purpose. This fixes PR38857 in the case where there is a non-aligned quantity of CSRs and a non-aligned quantity of locals. llvm-svn: 346062	2018-11-03 00:41:52 +00:00
Craig Topper	f7108aef14	[X86] In LowerEXTEND_VECTOR_INREG, emit a vector shuffle instead of directly using X86ISD::UNPCKL The majority of the changes are because the rest of shuffle lowering/combining prefers to replace the undef input with the other operand. Using UNPCKL directly seemed to avoid this and just grabbed a randomish register for the undef which can create false dependencies. llvm-svn: 346050	2018-11-02 22:48:02 +00:00
Wouter van Oortmerssen	de28b5d17f	[WebAssembly] Parsing missing directives to produce valid .o Summary: The assembler was able to assemble and then dump back to .s, but was failing to parse certain directives necessary for valid .o output: - .type directives are now recognized to distinguish function symbols and others. - .size is now parsed to provide function size. - .globaltype (introduced in https://reviews.llvm.org/D54012) is now recognized to ensure symbols like __stack_pointer have a proper type set for both .s and .o output. Also added tests for the above. Reviewers: sbc100, dschuff Subscribers: jgravelle-google, aheejin, dexonsmith, kristina, llvm-commits, sunfish Differential Revision: https://reviews.llvm.org/D53842 llvm-svn: 346047	2018-11-02 22:04:33 +00:00
Craig Topper	60c202a494	[X86] Don't emit *_extend_vector_inreg nodes when both the input and output types are legal with AVX1 We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible. I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here. I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector. For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size. Differential Revision: https://reviews.llvm.org/D54024 llvm-svn: 346043	2018-11-02 21:09:49 +00:00
Alex Bradbury	52c27785ce	[RISCV] Add some missing expansions for floating-point intrinsics A number of intrinsics, such as llvm.sin.f32, would result in a failure to select. This patch adds expansions for the relevant selection DAG nodes, as well as exhaustive testing for all f32 and f64 intrinsics. The codegen for FMA remains a TODO item, pending support for the various RISC-V FMA instruction variants. The llvm.minimum.f32.* and llvm.maximum.* tests are commented-out, pending upstream support for target-independent expansion, as discussed in http://lists.llvm.org/pipermail/llvm-dev/2018-November/127408.html. Differential Revision: https://reviews.llvm.org/D54034 Patch by Luís Marques. llvm-svn: 346034	2018-11-02 19:50:38 +00:00
Heejin Ahn	5b023e07ea	[WebAssembly] Fix bugs in rethrow depth counting and InstPrinter Summary: EH stack depth is incremented at `try` and decremented at `catch`. When there are more than two catch instructions for a try instruction, we shouldn't count non-first catches when calculating EH stack depths. This patch fixes two bugs: - CFGStackify: Exclude `catch_all` in the terminate catch pad when calculating EH pad stack, because when we have multiple catches for a try we should count only the first catch instruction when calculating EH pad stack. - InstPrinter: The initial intention was also to exclude non-first catches, but it didn't account nested try-catches, so it failed on this case: ``` try try catch end catch <-- (1) end ``` In the example, when we are at the catch (1), the last seen EH instruction is not `try` but `end_try`, violating the wrong assumption. We don't need these after we switch to the second proposal because there is gonna be only one `catch` instruction. But anyway before then these bugfixes are necessary for keep trunk in working state. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53819 llvm-svn: 346029	2018-11-02 18:38:52 +00:00
Matthias Braun	5f7cb79e94	ARMExpandPseudoInsts: Fix CMP_SWAP expansion adding a kill flag to a def llvm-svn: 346026	2018-11-02 18:22:15 +00:00
Jonas Paulsson	cced2a2775	[SystemZ::TTI] Improve cost handling of uint/sint to fp conversions. Let i8/i16 uint/sint to fp conversions cost 1 if operand is a load. Since the load already does the extension, there is no extra cost (previously returned 2). Review: Ulrich Weigand https://reviews.llvm.org/D54028 llvm-svn: 346009	2018-11-02 17:53:31 +00:00
Sylvestre Ledru	df92dabaef	Fixed inclusion of M_PI fow MinGW-w64 Patch by KOLANICH llvm-svn: 346000	2018-11-02 17:25:40 +00:00
Jonas Paulsson	79f2441eee	[SystemZ] Rework getInterleavedMemoryOpCost() Model this function more closely after the BasicTTIImpl version, with separate handling of loads and stores. For loads, the set of actually loaded vectors is checked. This makes it more readable and just slightly more accurate generally. Review: Ulrich Weigand https://reviews.llvm.org/D53071 llvm-svn: 345998	2018-11-02 17:15:36 +00:00
Krzysztof Parzyszek	f070544f8e	[Hexagon] Do not reduce load size for globals in small-data Small-data (i.e. GP-relative) loads and stores allow 16-bit scaled offset. For a load of a value of type T, the small-data area is equivalent to an array "T sdata[65536]". This implies that objects of smaller sizes need to be closer to the beginning of sdata, while larger objects may be farther away, or otherwise the offset may be insufficient to reach it. Similarly, an object of a larger size should not be accessed via a load of a smaller size. llvm-svn: 345975	2018-11-02 14:17:47 +00:00
Alexey Bataev	8831ef7a16	[DEBUGINFO, NVPTX]DO not emit ',debug' option if no debug info or only debug directives are requested. Summary: If the output of debug directives only is requested, we should drop emission of ',debug' option from the target directive. Required for supporting of nvprof profiler. Reviewers: probinson, echristo, dblaikie Subscribers: Hahnfeld, jholewinski, llvm-commits, JDevlieghere, aprantl Differential Revision: https://reviews.llvm.org/D46061 llvm-svn: 345972	2018-11-02 13:47:47 +00:00
Neil Henning	7d1b77df57	[AMDGPU] UBSan bug fix for r345710 UBSan detected an error in our ISelLowering that is exposed only when you have a dmask == 0x1. Fix this by adding in an explicit check to ensure we don't do the UBSan detected shl << 32. llvm-svn: 345962	2018-11-02 10:24:57 +00:00
Matt Arsenault	8e0269ba0b	AMDGPU: Fix assertion with bitcast from i64 constant to v4i16 llvm-svn: 345922	2018-11-02 02:43:55 +00:00
Wouter van Oortmerssen	3231e518a3	[WebAssembly] Added a .globaltype directive to .s output. Summary: Assembly output can use globals like __stack_pointer implicitly, but has no way of indicating the type of such a global, which makes it hard for tools processing it (such as the MC Assembler) to reconstruct this information. The improved assembler directives parsing (in progress in https://reviews.llvm.org/D53842) will make use of this information. Also deleted code for the .import_global directive which was unused. New test case in userstack.ll Reviewers: dschuff, sbc100 Subscribers: jgravelle-google, aheejin, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54012 llvm-svn: 345917	2018-11-02 00:45:00 +00:00
Thomas Lively	b2382c8bf7	[WebAssembly] General vector shift lowering Summary: Adds support for lowering non-splat shifts. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53625 llvm-svn: 345916	2018-11-02 00:39:57 +00:00
Thomas Lively	fb84fd7c8e	[WebAssembly] Expand inserts and extracts with variable indices Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53964 llvm-svn: 345913	2018-11-02 00:06:56 +00:00
Mandeep Singh Grang	547a0d765a	[COFF, ARM64] Implement Intrinsic.sponentry for AArch64 Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Patch by: Yin Ma (yinma@codeaurora.org) Reviewers: mgrang, ssijaric, eli.friedman, TomTan, mstorsjo, rnk, compnerd, efriedma Reviewed By: efriedma Subscribers: efriedma, javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53996 llvm-svn: 345909	2018-11-01 23:22:25 +00:00
Farhana Aleen	5853762e5a	[AMDGPU] Handle the idot8 pattern generated by FE. Summary: Different variants of idot8 codegen dag patterns are not generated by llvm-tablegen due to a huge increase in the compile time. Support the pattern that clang FE generates after reordering the additions in integer-dot8 source language pattern. Author: FarhanaAleen Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D53937 llvm-svn: 345902	2018-11-01 22:48:19 +00:00
Mandeep Singh Grang	df19e57a1c	[COFF, ARM64] Implement llvm.addressofreturnaddress intrinsic Reviewers: rnk, mstorsjo, efriedma, TomTan Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53962 llvm-svn: 345892	2018-11-01 21:23:47 +00:00
Heejin Ahn	2e398976ba	[WebAssembly] Fix signature parsing for 'try' in AsmParser Summary: Like `block` or `loop`, `try` can take an optional signature which can be omitted. This patch allows `try`'s signature to be omitted. Also added some tests for EH instructions. Reviewers: aardappel Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53873 llvm-svn: 345888	2018-11-01 20:32:15 +00:00
Reid Kleckner	4af6025f09	[Hexagon] Remove unintended fallthrough from MC duplex code I added these annotations in r345878 because I wasn't sure if the fallthrough was intended. Krzysztof Parzyszek confirmed that they should be breaks, so that's what this patch does. Reviewers: kparzysz Differential Revision: https://reviews.llvm.org/D53991 llvm-svn: 345883	2018-11-01 19:59:27 +00:00
Reid Kleckner	4dc0b1ac60	Fix clang -Wimplicit-fallthrough warnings across llvm, NFC This patch should not introduce any behavior changes. It consists of mostly one of two changes: 1. Replacing fall through comments with the LLVM_FALLTHROUGH macro 2. Inserting 'break' before falling through into a case block consisting of only 'break'. We were already using this warning with GCC, but its warning behaves slightly differently. In this patch, the following differences are relevant: 1. GCC recognizes comments that say "fall through" as annotations, clang doesn't 2. GCC doesn't warn on "case N: foo(); default: break;", clang does 3. GCC doesn't warn when the case contains a switch, but falls through the outer case. I will enable the warning separately in a follow-up patch so that it can be cleanly reverted if necessary. Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu Differential Revision: https://reviews.llvm.org/D53950 llvm-svn: 345882	2018-11-01 19:54:45 +00:00
Sam Clegg	ddf049869a	[WebAssembly] Fixup `main` signature by default Differential Revision: https://reviews.llvm.org/D53396 llvm-svn: 345880	2018-11-01 19:38:44 +00:00
Reid Kleckner	bebc53f838	Annotate possibly unintended fallthroughs in Hexagon MC code, NFC Clang's -Wimplicit-fallthrough check fires on these switch cases. GCC does not warn when a case body that ends in a switch falls through to a case label of an outer switch. It's not clear if these fall throughs are truly intended. The Hexagon tests pass regardless of whether these case blocks fall through or break. For now, I have applied the intended fallthrough annotation macro with a FIXME comment to unblock enabling the warning. I will send a follow-up patch that converts them to breaks to the Hexagon maintainers. llvm-svn: 345878	2018-11-01 19:32:04 +00:00
Volkan Keles	0a8dc9eb0f	[GlobalISel] Fix a bug in LegalizeRuleSet::clampMaxNumElements Summary: This function was causing a crash when `MaxElements == 1` because it was trying to create a single element vector type. Reviewers: dsanders, aemerson, aditya_nandakumar Reviewed By: dsanders Subscribers: rovka, kristof.beyls, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53734 llvm-svn: 345875	2018-11-01 19:01:53 +00:00
Simon Pilgrim	b34a052852	[LegalizeDAG] Add generic vector CTPOP expansion (PR32655) This patch adds support for expanding vector CTPOP instructions and removes the x86 'bitmath' lowering which replicates the same expansion. Differential Revision: https://reviews.llvm.org/D53258 llvm-svn: 345869	2018-11-01 18:22:11 +00:00
Reid Kleckner	ba982b5f8f	[Hexagon] Fix MO_JumpTable const extender conversion Previously this case fell through to unreachable, so it is clearly not covered by any test case in LLVM. It may be dynamically unreachable, in fact. However, if it were to run, this is what it would logically do. The assert suggests that the intended behavior was not to allow folding offsets from jump table indices, which makes sense. llvm-svn: 345868	2018-11-01 18:14:45 +00:00
Reid Kleckner	eb56894a4b	[AArch64] Fix unintended fallthrough and strengthen cast This was added in r330630. GCC's -Wimplicit-fallthrough seems to not fire when the previous case contains a switch itself. This fallthrough was bening because the helper function implementing the case used dyn_cast to re-check the type of the node in question. After fixing the fallthrough, we can strengthen the cast. llvm-svn: 345864	2018-11-01 18:02:27 +00:00
Mandeep Singh Grang	b0cdf56dd7	Revert "[COFF, ARM64] Implement Intrinsic.sponentry for AArch64" This reverts commit 585b6667b4712e3c7f32401e929855b3313b4ff2. llvm-svn: 345863	2018-11-01 17:53:57 +00:00
Sam Parker	48fbf752b0	[ARM] Attempt to fix ppc64be buildbot llvm-svn: 345850	2018-11-01 16:44:45 +00:00
Sam Parker	84a2f8b364	[ARM][CGP] Negative constant operand handling While mutating instructions, we sign extended negative constant operands for binary operators that can safely overflow. This was to allow instructions, such as add nuw i8 %a, -2, to still be able to perform a subtraction. However, the code to handle constants doesn't take into consideration that instructions, such as sub nuw i8 -2, %a, require the i8 -2 to be converted into i32 254. This is a relatively simple fix, but I've taken the time to reorganise the code a bit - mainly that instructions that can be promoted are cached and splitting up the Mutate function. Differential Revision: https://reviews.llvm.org/D53972 llvm-svn: 345840	2018-11-01 15:23:42 +00:00
Simon Pilgrim	d5d7224355	[X86][X86FixupLEA] Rename processInstructionForSLM to processInstructionForSlowLEA (NFCI) The function isn't SLM specific (its driven by the FeatureSlowLEA flag). Minor tidyup prior to PR38225. llvm-svn: 345836	2018-11-01 14:57:07 +00:00
Aleksandar Beserminji	b9c840c9f0	[mips][micromips] Fix JmpLink to TargetExternalSymbol When matching MipsISD::JmpLink t9, TargetExternalSymbol:i32'...', wrong JALR16_MM is selected. This patch adds missing pattern for JmpLink, so that JAL instruction is selected. Differential Revision: https://reviews.llvm.org/D53366 llvm-svn: 345830	2018-11-01 13:57:54 +00:00
Chad Rosier	1546efd4a7	[AArch64] Add support for ARMv8.4 in Saphira. llvm-svn: 345827	2018-11-01 13:45:16 +00:00
Simon Pilgrim	1f0a8421ad	[X86][SSE] Move 2-input limit up from getFauxShuffleMask to resolveTargetShuffleInputs (reapplied) Reapplying an updated version of rL345395 (reverted in rL345451), now the issues noticed in PR39483 have been fixed. This patch allows resolveTargetShuffleInputs to remove UNDEF inputs from cases where we have more than 2 inputs. llvm-svn: 345824	2018-11-01 11:52:09 +00:00
Stefan Maksimovic	cd0c50e3d2	[Mips] Conditionally remove successor block In MipsBranchExpansion::splitMBB, upon splitting a block with two direct branches, remove the successor of the newly created block (which inherits successors from the original block) which is pointed to by the last branch in the original block only if the targets of two branches differ. This is to fix the failing test when ran with -verify-machineinstrs enabled. Differential Revision: https://reviews.llvm.org/D53756 llvm-svn: 345821	2018-11-01 10:10:42 +00:00
Jonas Paulsson	6749c24f40	[SystemZ::TTI] Recognize the higher cost of scalar i1 -> fp conversion Scalar i1 to fp conversions are done with a branch sequence, so it should have a higher cost. Review: Ulrich Weigand https://reviews.llvm.org/D53924 llvm-svn: 345818	2018-11-01 09:05:32 +00:00
Jonas Paulsson	f15a53bc81	[SystemZ::TTI] Accurate costs for i1->double vector conversions This factors out a new method getBoolVecToIntConversionCost() containing the code for vector sext/zext of i1, in order to reuse it for i1 to double vector conversions. Review: Ulrich Weigand https://reviews.llvm.org/D53923 llvm-svn: 345817	2018-11-01 09:01:51 +00:00
Li Jia He	03170a904f	[PowerPC] Support constraint 'wi' in asm From the gcc manual, we can see that the specific limit of wi inline asm is “FP or VSX register to hold 64-bit integers for VSX insns or NO_REGS”. The link is https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Machine-Constraints.html#Machine-Constraints. We should accept this constraint. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D53265 llvm-svn: 345810	2018-11-01 02:35:17 +00:00
Matthias Braun	a9f900561e	X86: Consistently declare pass initializers in X86.h; NFC This avoids declaring them twice: in X86TargetMachine.cpp and the file implementing the pass. llvm-svn: 345801	2018-11-01 00:38:01 +00:00
Thomas Lively	d4891a1b7a	[WebAssembly] Lower vselect Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53630 llvm-svn: 345797	2018-11-01 00:01:02 +00:00
Thomas Lively	b61232eacd	[WebAssembly] Process p2align operands for SIMD loads and stores Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53886 llvm-svn: 345795	2018-10-31 23:58:20 +00:00
Thomas Lively	6ff31fe34d	[WebAssembly] Handle vector IMPLICIT_DEFs. Summary: Also reduce the test case for implicit defs and test it with all register classes. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53855 llvm-svn: 345794	2018-10-31 23:50:53 +00:00
Mandeep Singh Grang	88ad9ac720	[COFF, ARM64] Implement Intrinsic.sponentry for AArch64 Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Reviewers: mgrang, TomTan, rnk, compnerd, mstorsjo, efriedma Reviewed By: efriedma Subscribers: majnemer, chrib, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D53673 llvm-svn: 345791	2018-10-31 23:16:20 +00:00
Evandro Menezes	3a06c46470	[AArch64] Sort switch cases (NFC) llvm-svn: 345786	2018-10-31 21:56:49 +00:00
Craig Topper	6c3f1692c8	Revert r345165 "[X86] Bring back the MOV64r0 pseudo instruction" Google is reporting regressions on some benchmarks. llvm-svn: 345785	2018-10-31 21:53:24 +00:00
Eli Friedman	063fd98bcc	[ARM] Add missing pseudo-instruction for Thumb1 RSBS. Shows up rarely for 64-bit arithmetic, more frequently for the compare patterns added in r325323. Differential Revision: https://reviews.llvm.org/D53848 llvm-svn: 345782	2018-10-31 21:45:48 +00:00
Stanislav Mekhanoshin	222e9c11f7	Check shouldReduceLoadWidth from SimplifySetCC SimplifySetCC could shrink a load without checking for profitability or legality of such shink with a target. Added checks to prevent shrinking of aligned scalar loads in AMDGPU below dword as scalar engine does not support it. Differential Revision: https://reviews.llvm.org/D53846 llvm-svn: 345778	2018-10-31 21:24:30 +00:00
Scott Linder	c6c627253d	[AMDGPU] Remove FeatureVGPRSpilling This feature is only relevant to shaders, and is no longer used. When disabled, lowering of reserved registers for shaders causes a compiler crash. Remove the feature and add a test for compilation of shaders at OptNone. Differential Revision: https://reviews.llvm.org/D53829 llvm-svn: 345763	2018-10-31 18:54:06 +00:00
Krzysztof Parzyszek	977a1fe507	[Hexagon] Make sure not to use GP-relative addressing with PIC Make sure that -relocation-model=pic prevents use of GP-relative addressing modes. llvm-svn: 345731	2018-10-31 15:54:31 +00:00
Nicolai Haehnle	814abb59df	AMDGPU: Rewrite SILowerI1Copies to always stay on SALU Summary: Instead of writing boolean values temporarily into 32-bit VGPRs if they are involved in PHIs or are observed from outside a loop, we use bitwise masking operations to combine lane masks in a way that is consistent with wave control flow. Move SIFixSGPRCopies to before this pass, since that pass incorrectly attempts to move SGPR phis to VGPRs. This should recover most of the code quality that was lost with the bug fix in "AMDGPU: Remove PHI loop condition optimization". There are still some relevant cases where code quality could be improved, in particular: - We often introduce redundant masks with EXEC. Ideally, we'd have a generic computeKnownBits-like analysis to determine whether masks are already masked by EXEC, so we can avoid this masking both here and when lowering uniform control flow. - The criterion we use to determine whether a def is observed from outside a loop is conservative: it doesn't check whether (loop) branch conditions are uniform. Change-Id: Ibabdb373a7510e426b90deef00f5e16c5d56e64b Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, t-tye, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D53496 llvm-svn: 345719	2018-10-31 13:27:08 +00:00
Nicolai Haehnle	28212cc689	AMDGPU: Remove PHI loop condition optimization Summary: The optimization to early break out of loops if all threads are dead was never fully implemented. But the PHI node analyzing is actually causing a number of problems, so remove all the extra code for it. (This does actually regress code quality in a few places because it ends up relying more heavily on phi's of i1, which we don't do a great job with. However, since it fixes real bugs in the wild, we should take this change. I have some prototype changes to improve i1 lowering in general -- not just for control flow -- which should help recover the code quality, I just need to make those changes fit for general consumption. -- Nicolai) Change-Id: I6fc6c6c8961857ac6009fcfb9f7e5e48dc23fbb1 Patch-by: Christian König <christian.koenig@amd.com> Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53359 llvm-svn: 345718	2018-10-31 13:26:48 +00:00
Andrea Di Biagio	3d2b7176fc	[tblgen][PredicateExpander] Add the ability to describe more complex constraints on instruction operands. Before this patch, class PredicateExpander only knew how to expand simple predicates that performed checks on instruction operands. In particular, the new scheduling predicate syntax was not rich enough to express checks like this one: Foo(MI->getOperand(0).getImm()) == ExpectedVal; Here, the immediate operand value at index zero is passed in input to function Foo, and ExpectedVal is compared against the value returned by function Foo. While this predicate pattern doesn't show up in any X86 model, it shows up in other upstream targets. So, being able to support those predicates is fundamental if we want to be able to modernize all the scheduling models upstream. With this patch, we allow users to specify if a register/immediate operand value needs to be passed in input to a function as part of the predicate check. Now, register/immediate operand checks all derive from base class CheckOperandBase. This patch also changes where TIIPredicate definitions are expanded by the instructon info emitter. Before, definitions were expanded in class XXXGenInstrInfo (where XXX is a target name). With the introduction of this new syntax, we may want to have TIIPredicates expanded directly in XXXInstrInfo. That is because functions used by the new operand predicates may only exist in the derived class (i.e. XXXInstrInfo). This patch is a non functional change for the existing scheduling models. In future, we will be able to use this richer syntax to better describe complex scheduling predicates, and expose them to llvm-mca. Differential Revision: https://reviews.llvm.org/D53880 llvm-svn: 345714	2018-10-31 12:28:05 +00:00
Neil Henning	63718b214a	[AMDGPU] support image load/store a16 Our a16 support was only enabled for sample/gather and buffer load/store, but not for image load/store operations (which take an i16 as the pixel index rather than a half). Fix our isel lowering and add test cases to prove it out. Differential Revision: https://reviews.llvm.org/D53750 llvm-svn: 345710	2018-10-31 10:34:48 +00:00
Dorit Nuzman	34da6dd696	[LV] Support vectorization of interleave-groups that require an epilog under optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 llvm-svn: 345705	2018-10-31 09:57:56 +00:00
Sanjin Sijaric	fadebc8aae	[ARM64] [Windows] Exception handling support in frame lowering Emit pseudo instructions indicating unwind codes corresponding to each instruction inside the prologue/epilogue. These are used by the MCLayer to populate the .xdata section. Differential Revision: https://reviews.llvm.org/D50288 llvm-svn: 345701	2018-10-31 09:27:01 +00:00
Martin Storsjo	315357faca	[AArch64] Mark condition flags and x16/x17 as clobbered when calling __chkstk This is similar to SVN r311061 for ARM. Differential Revision: https://reviews.llvm.org/D53878 llvm-svn: 345698	2018-10-31 08:14:09 +00:00
Konstantin Zhuravlyov	2d22d24ac4	Revert r345542: AMDGPU: Enable code object v3 by default It breaks mesa. llvm-svn: 345662	2018-10-30 22:02:40 +00:00
Mandeep Singh Grang	71e0cc2a0b	[COFF, ARM64] Make sure to forward arguments from vararg to musttail vararg Summary: Thunk functions in Windows are varag functions that call a musttail function to pass the arguments after the fixup is done. We need to make sure that we forward the arguments from the caller vararg to the callee vararg function. This is the same mechanism that is used for Windows on X86. Reviewers: ssijaric, eli.friedman, TomTan, mgrang, mstorsjo, rnk, compnerd, efriedma Reviewed By: efriedma Subscribers: efriedma, kristof.beyls, chrib, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53843 llvm-svn: 345641	2018-10-30 20:46:10 +00:00
Eli Friedman	93d0129b78	[AArch64] [Windows] SEH opcodes should be scheduling boundaries. Prevents the post-RA scheduler from modifying the prologue sequences emitting by frame lowering. This is roughly similar to what we do for other targets: TargetInstrInfo::isSchedulingBoundary checks isPosition(), which checks for CFI_INSTRUCTION. isSEHInstruction is taken from D50288; it'll land with whatever patch lands first. Differential Revision: https://reviews.llvm.org/D53851 llvm-svn: 345634	2018-10-30 19:24:51 +00:00
David Greene	3e89fa8e08	[AArch64] Create proper memoperand for multi-vector stores Re-apply r345315 with testcase fixes. Include all of the store's source vector operands when creating the MachineMemOperand. Previously, we were missing the first operand, making the store size seem smaller than it really is. Differential Revision: https://reviews.llvm.org/D52816 llvm-svn: 345631	2018-10-30 19:17:51 +00:00
Craig Topper	6958b5ffa9	[X86] In lowerVectorShuffleAsBroadcast, make peeking through CONCAT_VECTORS work correctly if we already walked through a bitcast that changed the element size. The CONCAT_VECTORS case was using the original mask element count to determine how to adjust the broadcast index. But if we looked through a bitcast the original mask size doesn't tell us anything about the concat_vectors. This patch switchs to using the concat_vectors input element count directly instead. Differential Revision: https://reviews.llvm.org/D53823 llvm-svn: 345626	2018-10-30 18:48:42 +00:00
Ulrich Weigand	c5854b0adb	[SystemZ] Simplify LRV/STRV ISD nodes The LRV and STRV nodes carry an extra operand to indicate the type of the memory access. This is redundant, since the nodes are actually of class MemIntrinsicNode and therefore hold that same information already as MemoryVT. NFC intended. llvm-svn: 345618	2018-10-30 18:20:59 +00:00
Jonas Paulsson	af8e036c29	[SystemZ] Improve isFoldableLoad() for Sub, SDiv and UDiv. Sub, SDiv and UDiv are not commutative, so only the RHS operand can fold a load. This patch adds a check for this. Review: Ulrich Weigand https://reviews.llvm.org/D53791 llvm-svn: 345596	2018-10-30 13:41:03 +00:00
Francis Visoiu Mistrih	0e237d357e	[X86] Re-enable the machine verifier after fixing more tests Was disabled again in r345528. Hopefully this the bots. llvm-svn: 345593	2018-10-30 12:20:17 +00:00
Roman Lebedev	b3a14208ac	[X86][BMI1] X86DAGToDAGISel: select BEXTR from x & (-1 >> (32 - y)) pattern Summary: The final pattern. There is no test changes: * We are looking for the pattern with one-use of it's mask, * If the mask is one-use, D48768 will unfold it into pattern d. * Thus, the tests have extra-use on the mask. * Thus, only the BMI2 BZHI can be tested, and it already worked. * So there is no BMI1 test coverage, we just assume it works since it uses the same codepath. Reviewers: craig.topper, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53575 llvm-svn: 345584	2018-10-30 11:12:34 +00:00
Diogo N. Sampaio	a3783b2ac2	[AArch64] Add support for UDF instruction Summary: Add support for AArch64 UDF instruction. UDF - Permanently Undefined generates an Undefined Instruction exception (ESR_ELx.EC = 0b000000). Reviewers: DavidSpickett, javed.absar, t.p.northover Reviewed By: javed.absar Subscribers: nhaehnle, kristof.beyls Differential Revision: https://reviews.llvm.org/D53319 llvm-svn: 345581	2018-10-30 11:06:50 +00:00
Simon Pilgrim	858303b827	[SelectionDAG] Add FoldBUILD_VECTOR to simplify new BUILD_VECTOR nodes Similar to FoldCONCAT_VECTORS, this patch adds FoldBUILD_VECTOR to simplify cases that can avoid the creation of the BUILD_VECTOR - if all the operands are UNDEF or if the BUILD_VECTOR simplifies to a copy. This exposed an assumption in some AMDGPU code that getBuildVector was guaranteed to be a BUILD_VECTOR node that I've tried to handle. Differential Revision: https://reviews.llvm.org/D53760 llvm-svn: 345578	2018-10-30 10:32:11 +00:00
Craig Topper	b293322cee	[LegalizeTypes] Teach PromoteIntRes_BITCAST to better handle a bitcast with vector output type and a vector input type that needs to be widened Summary: Previously if we had a bitcast vector output type that needs promotion and a vector input type that needs widening we would just do a stack store and load to handle the conversion. We can do a little better if we can widen the bitcast to a legal vector type the same size as the widened input type. Then we can do the bitcast between this widened type and the widened input type. Afterwards we can extract_subvector back to the original output and any_extend that. Type legalization will then circle back and handle promotion of the extract_subvector and the any_extend will just be removed. This will avoid going through the stack and allows us to remove a custom version of this legalization from X86. Reviewers: efriedma, RKSimon Reviewed By: efriedma Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53229 llvm-svn: 345567	2018-10-30 03:27:15 +00:00
Craig Topper	67c2878501	[X86] Cleanup the code in LowerFABSorFNEG and LowerFCOPYSIGN a little. NFC Use SelectionDAG::EVTToAPFloatSemantics. Make the LogicVT calculation in LowerFABSorFNEG similar to LowerFCOPYSIGN. Use APInt::getSignedMaxValue instead of ~APInt::getSignMask. llvm-svn: 345565	2018-10-30 03:27:12 +00:00
Craig Topper	676d7a7a43	[X86] Stop changing f128 fand/for/fxor to v2i64. The additional patterns don't cost us much and it seems better than changing element widths. llvm-svn: 345564	2018-10-30 03:27:11 +00:00
Matt Arsenault	abc4f29f9c	AMDGPU: Remove custom BUILD_VECTOR combine This was looping in a testcase and removing it now slightly improves a test. llvm-svn: 345560	2018-10-30 01:37:59 +00:00
Matt Arsenault	b0b741efb8	AMDGPU: Use scavengeRegisterBackwards llvm-svn: 345559	2018-10-30 01:33:14 +00:00
Reid Kleckner	23c9efc071	Remove unneeded friend declarations that clang-cl warns on llvm-svn: 345549	2018-10-29 22:38:13 +00:00
Konstantin Zhuravlyov	5cb950200c	AMDGPU: Enable code object v3 by default Differential Revision: https://reviews.llvm.org/D53525 llvm-svn: 345542	2018-10-29 21:07:27 +00:00
Simon Pilgrim	090a444cb7	[X86] Set isMachineVerifierClean() back to false (PR27481) Put back the isMachineVerifierClean() override removed at rL345513 to fix Windows ThinLTO tests llvm-svn: 345528	2018-10-29 19:51:52 +00:00
Thomas Lively	eb15d00193	[WebAssembly] Lower away condition truncations for scalar selects Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53676 llvm-svn: 345521	2018-10-29 18:38:12 +00:00
Simon Pilgrim	3a2f3c2c0a	[X86][SSE] getFauxShuffleMask - Fix shuffle mask adjustment for multiple inserted subvectors Part of the issue discovered in PR39483, although its not fully exposed until I reapply rL345395 (by reverting rL345451) llvm-svn: 345520	2018-10-29 18:25:48 +00:00
Craig Topper	220fd33522	[X86] Add AES to KNL CPUs to match clang. I believe this was lost from KNL when AES was pushed from Westmere to Skylake recently. KNL used to inherit from IVB. llvm-svn: 345519	2018-10-29 18:17:01 +00:00
Stanislav Mekhanoshin	6b1c6548bd	[AMDGPU] Fixed return value causing warning and regression llvm-svn: 345518	2018-10-29 17:53:23 +00:00
Bryan Chan	bfd32d4377	[AArch64] Rename FP16FML instruction format (NFC) Rename SIMDThreeSameMult (etc.) to SIMDThreeSameVectorFML (etc.) to follow usual naming convention, and add some comments in the .td files. llvm-svn: 345515	2018-10-29 17:27:34 +00:00
Stanislav Mekhanoshin	79080ecd82	[AMDGPU] Match v_swap_b32 Differential Revision: https://reviews.llvm.org/D52677 llvm-svn: 345514	2018-10-29 17:26:01 +00:00
Francis Visoiu Mistrih	61c9de7565	[X86] Enable the MachineVerifier by default The machine verifier was disabled for x86 by default. There are now only 9 tests failing, compared to what previously was between 20 and 30. This is a good opportunity to file bugs for all the remaining issues, then explicitly disable the failing tests and enabling the machine verifier by default. This allows us to avoid adding new tests that break the verifier. PR27481 llvm-svn: 345513	2018-10-29 16:57:43 +00:00

... 6 7 8 9 10 ...

50329 Commits