teak-llvm

mirror of https://github.com/Gericom/teak-llvm.git synced 2025-06-22 04:55:50 -04:00

Author	SHA1	Message	Date
Simon Pilgrim	3c157d3fa3	[AMDGPU] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349912	2018-12-21 15:29:47 +00:00
Simon Pilgrim	ca8bca2ad3	[WebAssembly] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349911	2018-12-21 15:25:37 +00:00
Simon Pilgrim	d800ee4861	[ARM] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349909	2018-12-21 15:15:38 +00:00
Simon Pilgrim	148957f336	[AArch64] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349908	2018-12-21 15:05:10 +00:00
Simon Pilgrim	2482c51e99	[SystemZ] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349906	2018-12-21 14:50:54 +00:00
Simon Pilgrim	d43bdc715c	[Lanai] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349905	2018-12-21 14:48:35 +00:00
Simon Pilgrim	af1ab22a76	[PPC] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349903	2018-12-21 14:32:39 +00:00
Simon Pilgrim	57733507fe	[X86] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the old KnownBits output paramater version. llvm-svn: 349902	2018-12-21 14:25:14 +00:00
Luke Cheeseman	41a9e53500	[Dwarf/AArch64] Return address signing B key dwarf support - When signing return addresses with -msign-return-address=<scope>{+<key>}, either the A key instructions or the B key instructions can be used. To correctly authenticate the return address, the unwinder/debugger must know which key was used to sign the return address. - When and exception is thrown or a break point reached, it may be necessary to unwind the stack. To accomplish this, the unwinder/debugger must be able to first authenticate an the return address if it has been signed. - To enable this, the augmentation string of CIEs has been extended to allow inclusion of a 'B' character. Functions that are signed using the B key variant of the instructions should have and FDE whose associated CIE has a 'B' in the augmentation string. - One must also be able to preserve these semantics when first stepping from a high level language into assembly and then, as a second step, into an object file. To achieve this, I have introduced a new assembly directive '.cfi_b_key_frame ', that tells the assembler the current frame uses return address signing with the B key. - This ensures that the FDE is associated with a CIE that has 'B' in the augmentation string. Differential Revision: https://reviews.llvm.org/D51798 llvm-svn: 349895	2018-12-21 10:45:08 +00:00
Simon Pilgrim	5d403f6bf8	[X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (llvm) This auto upgrades the signed SSE saturated math intrinsics to SADD_SAT/SSUB_SAT generic intrinsics. Clang counterpart: https://reviews.llvm.org/D55890 Differential Revision: https://reviews.llvm.org/D55894 llvm-svn: 349892	2018-12-21 09:04:14 +00:00
Thomas Lively	b6dac89c87	[WebAssembly] Fix invalid machine instrs in -O0, verify in tests Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55956 llvm-svn: 349889	2018-12-21 06:58:15 +00:00
Matt Arsenault	3eae3c4590	AMDGPU/GlobalISel: RegBankSelect for amdgcn.wqm.vote llvm-svn: 349882	2018-12-21 03:20:54 +00:00
Matt Arsenault	f4c21c575a	AMDGPU/GlobalISel: RegBankSelect for some fp ops llvm-svn: 349880	2018-12-21 03:14:45 +00:00
Matt Arsenault	bee2ad7185	AMDGPU/GlobalISel: Redo legality for build_vector It seems better to avoid using the callback if possible since there are coverage assertions which are disabled if this is used. Also fix missing tests. Only test the legal cases since it seems legalization for build_vector is quite lacking. llvm-svn: 349878	2018-12-21 03:03:11 +00:00
Craig Topper	54f1a7be13	[X86] Refactor hasNoCarryFlagUses and hasNoSignFlagUses in X86ISelDAGToDAG.cpp to tranlate opcode to condition code using the helpers in X86InstrInfo.cpp. This shortens the switches in X86ISelDAGToDAG.cpp to only need to check condition code instead of a list of opcodes. This also fixes a bug where the memory forms of SETcc were missing from hasNoCarryFlagUses. llvm-svn: 349868	2018-12-21 01:14:25 +00:00
Craig Topper	e0cff10289	[X86] Add memory forms of some SETCC instructions to hasNoCarryFlagUses. Found while working on another patch llvm-svn: 349867	2018-12-21 01:14:23 +00:00
Eli Friedman	b1bbd5dca3	[ARM] Complete the Thumb1 shift+and->shift+shift transforms. This saves materializing the immediate. The additional forms are less common (they don't usually show up for bitfield insert/extract), but they're still relevant. I had to add a new target hook to prevent DAGCombine from reversing the transform. That isn't the only possible way to solve the conflict, but it seems straightforward enough. Differential Revision: https://reviews.llvm.org/D55630 llvm-svn: 349857	2018-12-20 23:39:54 +00:00
Jessica Paquette	a6b9c68a85	[GlobalISel][AArch64] Add G_FCEIL to isPreISelGenericFloatingPointOpcode If you don't do this, then if you hit a G_LOAD in getInstrMapping, you'll end up with GPRs on the G_FCEIL instead of FPRs. This causes a fallback. Add it to the switch, and add a test verifying that this happens. llvm-svn: 349822	2018-12-20 21:14:15 +00:00
Eli Friedman	48397102d0	[MC] [AArch64] Correctly resolve ":abs_g1:3" etc. We have to treat constructs like this as if they were "symbolic", to use the correct codepath to resolve them. This mostly only affects movz etc. because the other uses of classifySymbolRef conservatively treat everything that isn't a constant as if it were a symbol. Differential Revision: https://reviews.llvm.org/D55906 llvm-svn: 349800	2018-12-20 19:46:14 +00:00
Eli Friedman	4648209e16	[MC] [AArch64] Support resolving fixups for abs_g0 etc. This requires a bit more code than other fixups, to distingush between abs_g0/abs_g1/etc. Actually, I think some of the other fixups are missing some checks, but I won't try to address that here. I haven't seen any real-world code that uses a construct like this, but it clearly should work, and we're considering using it in the implementation of localescape/localrecover on Windows (see https://reviews.llvm.org/D53540). I've verified that binutils produces the same code as llvm-mc for the testcase. This currently doesn't include support for the *_s variants (that requires a bit more work to set the opcode). Differential Revision: https://reviews.llvm.org/D55896 llvm-svn: 349799	2018-12-20 19:38:07 +00:00
Simon Pilgrim	2a25360ae3	[X86] Auto upgrade XOP/AVX512 rotation intrinsics to generic funnel shift intrinsics (llvm) This emits FSHL/FSHR generic intrinsics for the XOP VPROT and AVX512 VPROL/VPROR rotation intrinsics. Clang counterpart: https://reviews.llvm.org/D55937 Differential Revision: https://reviews.llvm.org/D55938 llvm-svn: 349795	2018-12-20 19:01:07 +00:00
Yonghong Song	821c93d556	[BPF] Disable relocation for .BTF.ext section Build llvm with assertion on, and then build bcc against this llvm. Run any bcc tool with debug=8 (turning on -g for clang compilation), you will get the following assertion errors, /home/yhs/work/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:888: void llvm::RuntimeDyldELF::resolveBPFRelocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion `Value <= (4294967295U)' failed. The .BTF.ext ELF section uses Fixup's to get the instruction offsets. The data width of the Fixup is 4 bytes since we only need the insn offset within the section. This caused the above error though since R_BPF_64_32 expects 4-byte value and the Runtime Dyld tried to resolve the actual insn address which is 8 bytes. Actually the offset within the section is all what we need. Therefore, there is no need to perform any kind of relocation for .BTF.ext section and such relocation will actually cause incorrect result. This patch changed BPFELFObjectWriter::getRelocType() such that for Fixup Kind FK_Data_4, if the relocation Target is a temporary symbol, let us skip the relocation (ELF::R_BPF_NONE). Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 349778	2018-12-20 17:40:23 +00:00
Krzysztof Parzyszek	30c42e2ab6	[Hexagon] Add patterns for funnel shifts llvm-svn: 349770	2018-12-20 16:39:20 +00:00
Alex Bradbury	eb3a64a4da	[RISCV] Properly evaluate fixup_riscv_pcrel_lo12 This is a update to D43157 to correctly handle fixup_riscv_pcrel_lo12. Notable changes: Rebased onto trunk Handle and test S-type Test case pcrel-hilo.s is merged into relocations.s D43157 description: VK_RISCV_PCREL_LO has to be handled specially. The MCExpr inside is actually the location of an auipc instruction with a VK_RISCV_PCREL_HI fixup pointing to the real target. Differential Revision: https://reviews.llvm.org/D54029 Patch by Chih-Mao Chen and Michael Spencer. llvm-svn: 349764	2018-12-20 14:52:15 +00:00
Simon Pilgrim	09c081176a	[X86][AVX512] Don't custom lower v16i8 rotations. As discussed on D55747, the expansion to (wider) shifts is better on all AVX512 cases, not just BWI. llvm-svn: 349763	2018-12-20 14:38:35 +00:00
Ulrich Weigand	380bece7af	[SystemZ] "Generic" vector assembler instructions shoud clobber CC There are several vector instructions which may or may not set the condition code register, depending on the value of an argument. For codegen, we use two versions of the instruction, one that sets CC and one that doesn't, which hard-code appropriate values of that argument. But we also have a "generic" version of the instruction that is used for the assembler/disassembler. These generic versions should always be considered to clobber CC just to be safe. llvm-svn: 349761	2018-12-20 14:24:17 +00:00
Ulrich Weigand	44d37ae38c	[SystemZ] Make better use of VLLEZ This patch fixes two deficiencies in current code that recognizes the VLLEZ idiom: - For the floating-point versions, we have ISel patterns that match on a bitconvert as the top node. In more complex cases, that bitconvert may already have been merged into something else. Fix the patterns to match the inner nodes instead. - For the 64-bit integer versions, depending on the surrounding code, we may get either a DAG tree based on JOIN_DWORDS or one based on INSERT_VECTOR_ELT. Use a PatFrags to simply match both variants. llvm-svn: 349749	2018-12-20 13:05:03 +00:00
Ulrich Weigand	8bb46b0f01	[SystemZ] Make better use of VGEF/VGEG Current code in SystemZDAGToDAGISel::tryGather refuses to perform any transformation if the Load SDNode has more than one use. This (erronously) counts uses of the chain result, which prevents the optimization in many cases unnecessarily. Fixed by this patch. llvm-svn: 349748	2018-12-20 13:01:20 +00:00
Clement Courbet	36a3480385	Re-land r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads. Update PPC ir following GEP->bitcat to bitcat->GEP->bitcat change. llvm-svn: 349747	2018-12-20 13:01:04 +00:00
Ulrich Weigand	f43b510015	[SystemZ] Make better use of VLDEB We already have special code (DAG combine support for FP_ROUND) to recognize cases where we an use a vector version of VLEDB to perform two floating-point truncates in parallel, but equivalent support for VLEDB (vector floating-point extends) has been missing so far. This patch adds corresponding DAG combine support for FP_EXTEND. llvm-svn: 349746	2018-12-20 12:59:05 +00:00
Clement Courbet	e22cf4d7cb	Revert r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads." Forgot to update PowerPC tests for the GEP->bitcast change. llvm-svn: 349733	2018-12-20 09:58:33 +00:00
Clement Courbet	1bb6e1b0f2	[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads. Summary: This allows expanding {7,11,13,14,15,21,22,23,25,26,27,28,29,30,31}-byte memcmp in just two loads on X86. These were previously calling memcmp. Reviewers: spatel, gchatelet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55263 llvm-svn: 349731	2018-12-20 09:13:47 +00:00
Kang Zhang	ca8db48974	[PowerPC] Implement the isSelectSupported() target hook Summary: PowerPC has scalar selects (isel) and vector mask selects (xxsel). But PowerPC does not have vector CR selects, PowerPC does not support scalar condition selects on vectors. In addition to implementing this hook, isSelectSupported() should return false when the SelectSupportKind is ScalarCondVectorVal, so that predictable selects are converted into branch sequences. Reviewed By: steven.zhang, hfinkel Differential Revision: https://reviews.llvm.org/D55754 llvm-svn: 349727	2018-12-20 06:19:59 +00:00
Thomas Lively	feb18fe927	[WebAssembly] Emit a splat for v128 IMPLICIT_DEF Summary: This is a code size savings and is also important to get runnable code while engines do not support v128.const. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55910 llvm-svn: 349724	2018-12-20 04:20:32 +00:00
Amara Emerson	321bfb210a	Fix build errors introduced by r349712 on aarch64 bots. llvm-svn: 349723	2018-12-20 03:27:42 +00:00
Thomas Lively	8dbf29af95	[WebAssembly] Gate unimplemented SIMD ops on flag Summary: Gates v128.const, f32x4.sqrt, f32x4.div, i8x16.extract_lane_u, and i16x8.extract_lane_u on the --wasm-enable-unimplemented-simd flag, since these ops are not implemented yet in V8. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55904 llvm-svn: 349720	2018-12-20 02:10:22 +00:00
Matt Arsenault	4339883710	AMDGPU: Make i1/i64/v2i32 and/or/xor legal The 64-bit types do depend on the register bank, but that's another issue to deal with later. llvm-svn: 349716	2018-12-20 01:35:49 +00:00
Matt Arsenault	8cc98bee8a	AMDGPU/GlobalISel: Fix ValueMapping tables for i1 This was incorrectly selecting SGPR for any i1 values, e.g. G_TRUNC to i1 from a VGPR was still an SGPR. llvm-svn: 349715	2018-12-20 01:33:43 +00:00
Craig Topper	9ca2f5605e	[X86] Disable custom widening of signed/unsigned add/sub saturation intrinsics under -x86-experimental-vector-widening-legalization. Generic legalization should take care of this. llvm-svn: 349714	2018-12-20 01:32:06 +00:00
Amara Emerson	8cb186ce17	[AArch64][GlobalISel] Implement selection og G_MERGE of two s32s into s64. This code pattern is an unfortunate side effect of the way some types get split at call lowering. Ideally we'd either not generate it at all or combine it away in the legalizer artifact combiner. Until then, add selection support anyway which is a significant proportion of our current fallbacks on CTMark. rdar://46491420 llvm-svn: 349712	2018-12-20 01:11:04 +00:00
Matt Arsenault	dff33c38e1	AMDGPU/GlobalISel: RegBankSelect for fp conversions llvm-svn: 349709	2018-12-20 00:37:02 +00:00
Matt Arsenault	36d4092173	AMDGPU/GlobalISel: Legality/regbankselect for atomicrmw/atomic_cmpxchg llvm-svn: 349708	2018-12-20 00:33:49 +00:00
Craig Topper	217b3b20d8	[X86] Remove TLI variable from ReplaceNodeResults. NFC We're already in X86TargetLowering which is a derived class of TargetLowering. We can just call methods directly. llvm-svn: 349695	2018-12-19 23:13:03 +00:00
Rhys Perry	3931ad38b9	AMDGPU: Add patterns for v4i16/v4f16 -> v4i16/v4f16 bitcasts Reviewers: arsenm, tstellar Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55058 llvm-svn: 349694	2018-12-19 22:53:33 +00:00
Evandro Menezes	374ccf6768	[AArch64] Improve Exynos predicates Expand the predicate `ExynosResetPred` to include all forms of immediate moves. llvm-svn: 349686	2018-12-19 22:24:36 +00:00
Evandro Menezes	ff827d737a	[AArch64] Use canonical copy idiom Use only the canonical form of the alias for register transfers in the `IsCopyIdiomPred` predicate. llvm-svn: 349685	2018-12-19 22:24:31 +00:00
Craig Topper	d16da2b479	[X86] Remove a bunch of 'else' after returns in reduceVMULWidth. NFC This reduces indentation and makes it obvious this function always returns something. llvm-svn: 349671	2018-12-19 19:39:34 +00:00
Jessica Paquette	3560e93dc1	[GlobalISel][AArch64] Add support for @llvm.ceil This adds a G_FCEIL generic instruction and uses it in AArch64. This adds selection for floating point ceil where it has a supported, dedicated instruction. Other cases aren't handled here. It updates the relevant gisel tests and adds a select-ceil test. It also adds a check to arm64-vcvt.ll which ensures that we don't fall back when we run into one of the relevant cases. llvm-svn: 349664	2018-12-19 19:01:36 +00:00
Craig Topper	84a00bd98a	[X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post processing The (cmp (and X, Y) 0) pattern is greedy and ends up forming a TESTrr and consuming the and when it might be better to use one of the BMI/TBM like BLSR or BLSI. This patch moves removes the pattern from isel and adds a post processing check to combine TESTrr+ANDrr into just a TESTrr. With this patch we are able to select the BMI/TBM instructions, but we'll also emit a TESTrr when the result is compared to 0. In many cases the peephole pass will be able to use optimizeCompareInstr to remove the TEST, but its probably not perfect. Differential Revision: https://reviews.llvm.org/D55870 llvm-svn: 349661	2018-12-19 18:49:13 +00:00
Craig Topper	291470347a	[X86] Fix assert fails in pass X86AvoidSFBPass Fixes https://bugs.llvm.org/show_bug.cgi?id=38743 The function removeRedundantBlockingStores is supposed to remove any blocking stores contained in each other in lockingStoresDispSizeMap. But it currently looks only at the previous one, which will miss some cases that result in assert. This patch refine the function to check all previous layouts until find the uncontained one. So all redundant stores will be removed. Patch by Pengfei Wang Differential Revision: https://reviews.llvm.org/D55642 llvm-svn: 349660	2018-12-19 18:45:57 +00:00
Evandro Menezes	5d409b2278	[AArch64] Improve the Exynos M3 pipeline model llvm-svn: 349652	2018-12-19 17:37:51 +00:00
Yonghong Song	7b410ac352	[BPF] Generate BTF DebugInfo under BPF target This patch implements BTF (BPF Type Format). The BTF is the debug info format for BPF, introduced in the below linux patch: `69b693f0ae (diff-06fb1c8825f653d7e539058b72c83332)` and further extended several times, e.g., https://www.spinics.net/lists/netdev/msg534640.html https://www.spinics.net/lists/netdev/msg538464.html https://www.spinics.net/lists/netdev/msg540246.html The main advantage of implementing in LLVM is: . better integration/deployment as no extra tools are needed. . bpf JIT based compilation (like bcc, bpftrace, etc.) can get BTF without much extra effort. . BTF line_info needs selective source codes, which can be easily retrieved when inside the compiler. This patch implemented BTF generation by registering a BPF specific DebugHandler in BPFAsmPrinter. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D55752 llvm-svn: 349640	2018-12-19 16:40:25 +00:00
Nicolai Haehnle	8d5e974076	AMDGPU: Use an ABS32_LO relocation for SCRATCH_RSRC_DWORD1 Summary: Using HI here makes no logical sense, since the dword is only 32 bits to begin with. Current Mesa master does not look at the relocation type at all, so this change is fine. Future Mesa will rely on this, however. Change-Id: I91085707834c4ac0370926602b93c94b90e44cb1 Reviewers: arsenm, rampitec, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55369 llvm-svn: 349620	2018-12-19 11:55:03 +00:00
Carl Ritson	c521ac3a44	AMDGPU/InsertWaitcnts: Update VGPR/SGPR bounds when brackets are merged Summary: Fix an issue where VGPR/SGPR bounds are not properly extended when brackets are merged. This manifests as missing waitcnt insertions when multiple brackets are forwarded to a successor block and the first forward has lower VGPR/SGPR bounds. Irreducible loop test has been extended based on a CTS failure detected for GFX9. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D55602 llvm-svn: 349611	2018-12-19 10:17:49 +00:00
Diana Picus	6c35a1e5af	[ARM GlobalISel] Support G_CONSTANT for Thumb2 All we have to do is mark it as legal. This allows us to select a lot of new patterns handled by TableGen. This patch adds tests for them and splits up the existing test file for binary operators into 2 files, one for arithmetic ops and one for logical ones. llvm-svn: 349610	2018-12-19 09:55:10 +00:00
Matt Arsenault	b110e2277c	AMDGPU/GlobalISel: Regbankselect for fsub llvm-svn: 349608	2018-12-19 09:07:58 +00:00
Kewen Lin	a6247e7cf4	[PowerPC]Exploit P9 vabsdu for unsigned vselect patterns For type v4i32/v8ii16/v16i8, do following transforms: (vselect (setcc a, b, setugt), (sub a, b), (sub b, a)) -> (vabsd a, b) (vselect (setcc a, b, setuge), (sub a, b), (sub b, a)) -> (vabsd a, b) (vselect (setcc a, b, setult), (sub b, a), (sub a, b)) -> (vabsd a, b) (vselect (setcc a, b, setule), (sub b, a), (sub a, b)) -> (vabsd a, b) Differential Revision: https://reviews.llvm.org/D55812 llvm-svn: 349599	2018-12-19 03:04:07 +00:00
Evandro Menezes	f03c45d582	[AArch64] Simplify the Exynos M3 pipeline model llvm-svn: 349569	2018-12-18 23:19:57 +00:00
Evandro Menezes	4e39fa4474	[AArch64] Fix instructions order (NFC) llvm-svn: 349568	2018-12-18 23:19:55 +00:00
Craig Topper	18a9d545e1	[X86] Add BSR to isUseDefConvertible. We already had BSF here as part of __builtin_ffs improvements and I was just wondering yesterday whether we should have BSR there. This addresses one issue from PR40090. llvm-svn: 349531	2018-12-18 20:03:54 +00:00
Farhana Aleen	59ee2c5362	[AMDGPU] Removed the unnecessary operand size-check-assert from processBaseWithConstOffset(). Summary: 32bit operand sizes are guaranteed by the opcode check AMDGPU::V_ADD_I32_e64 and AMDGPU::V_ADDC_U32_e64. Therefore, we don't any additional operand size-check-assert. Author: FarhanaAleen llvm-svn: 349529	2018-12-18 19:58:39 +00:00
Craig Topper	8434ef7d1e	[X86] Don't use SplitOpsAndApply to create ISD::UADDSAT/ISD::USUBSAT nodes. Let type legalization and op legalization deal with it. Now that we've switched to target independent nodes we can rely on generic infrastructure to do the legalization for us. llvm-svn: 349526	2018-12-18 19:29:08 +00:00
Nikita Popov	f6058ff140	[X86] Use SADDSAT/SSUBSAT instead of ADDS/SUBS Migrate the X86 backend from X86ISD opcodes ADDS and SUBS to generic ISD opcodes SADDSAT and SSUBSAT. This also improves scodegen for @llvm.sadd.sat() and @llvm.ssub.sat() intrinsics. This is a followup to D55787 and part of PR40056. Differential Revision: https://reviews.llvm.org/D55833 llvm-svn: 349520	2018-12-18 18:28:22 +00:00
Craig Topper	20a6db5a84	[X86] Create PSUBUS from (add (umax X, C), -C) InstCombine seems to canonicalize or PSUB patter into a max with the cosntant and an add with an inverse of the constant. This patch recognizes this pattern and turns it into PSUBUS. Future work could improve undef element handling. Fixes some of PR40053 Differential Revision: https://reviews.llvm.org/D55780 llvm-svn: 349519	2018-12-18 18:26:25 +00:00
Simon Pilgrim	1411917431	[X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for constant rotation amounts Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion. llvm-svn: 349510	2018-12-18 17:31:11 +00:00
Simon Pilgrim	e9effe9744	[X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for splat rotation amounts Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion. llvm-svn: 349500	2018-12-18 16:02:23 +00:00
Petar Avramovic	0a5e4eb776	[MIPS GlobalISel] Select G_SDIV, G_UDIV, G_SREM and G_UREM Add support for s64 libcalls for G_SDIV, G_UDIV, G_SREM and G_UREM and use integer type of correct size when creating arguments for CLI.lowerCall. Select G_SDIV, G_UDIV, G_SREM and G_UREM for types s8, s16, s32 and s64 on MIPS32. Differential Revision: https://reviews.llvm.org/D55651 llvm-svn: 349499	2018-12-18 15:59:51 +00:00
Nikita Popov	665ab08178	[X86] Use UADDSAT/USUBSAT instead of ADDUS/SUBUS Replace the X86ISD opcodes ADDUS and SUBUS with generic ISD opcodes UADDSAT and USUBSAT. As a side-effect, this also makes codegen for the @llvm.uadd.sat and @llvm.usub.sat intrinsics reasonable. This only replaces use in the X86 backend, and does not move any of the ADDUS/SUBUS X86 specific combines into generic codegen. Differential Revision: https://reviews.llvm.org/D55787 llvm-svn: 349481	2018-12-18 13:23:03 +00:00
Petar Avramovic	150fd430f6	[MIPS GlobalISel] ClampScalar G_AND G_OR and G_XOR Add narrowScalar for G_AND and G_XOR. Legalize G_AND G_OR and G_XOR for types other then s32 with clampScalar on MIPS32. Differential Revision: https://reviews.llvm.org/D55362 llvm-svn: 349475	2018-12-18 11:36:14 +00:00
Luke Cheeseman	f57d7d8237	[AArch64] - Return address signing dwarf support - Reapply changes intially introduced in r343089 - The archtecture info is no longer loaded whenever a DWARFContext is created - The runtimes libraries (santiziers) make use of the dwarf context classes but do not intialise the target info - The architecture of the object can be obtained without loading the target info - Adding a method to the dwarf context to get this information and multiplex the string printing later on Differential Revision: https://reviews.llvm.org/D55774 llvm-svn: 349472	2018-12-18 10:37:42 +00:00
Matt Arsenault	c94e26c71d	AMDGPU: Legalize/regbankselect frame_index llvm-svn: 349468	2018-12-18 09:46:13 +00:00
Matt Arsenault	c0ea221068	AMDGPU: Legalize/regbankselect fma llvm-svn: 349467	2018-12-18 09:39:56 +00:00
Matt Arsenault	e01e7c81f2	AMDGPU/GlobalISel: Legalize/regbankselect fneg/fabs/fsub llvm-svn: 349463	2018-12-18 09:19:03 +00:00
Simon Pilgrim	8488a44c34	[X86][SSE] Move VSRAI sign extend in reg fold into SimplifyDemandedBits (VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1 This works better as part of SimplifyDemandedBits than part of the general combine. llvm-svn: 349462	2018-12-18 09:11:34 +00:00
Simon Pilgrim	26c630f416	[X86][SSE] Replace (VSRLI (VSRAI X, Y), 31) -> (VSRLI X, 31) fold. This fold was incredibly specific - replace with a SimplifyDemandedBits fold to remove a VSRAI if only the original sign bit is demanded (its guaranteed to stay the same). Test change is merely a rescheduling. llvm-svn: 349459	2018-12-18 08:55:47 +00:00
Kristof Beyls	e66bc1f756	Introduce control flow speculation tracking pass for AArch64 The pass implements tracking of control flow miss-speculation into a "taint" register. That taint register can then be used to mask off registers with sensitive data when executing under miss-speculation, a.k.a. "transient execution". This pass is aimed at mitigating against SpectreV1-style vulnarabilities. At the moment, it implements the tracking of miss-speculation of control flow into a taint register, but doesn't implement a mechanism yet to then use that taint register to mask off vulnerable data in registers (something for a follow-on improvement). Possible strategies to mask out vulnerable data that can be implemented on top of this are: - speculative load hardening to automatically mask of data loaded in registers. - using intrinsics to mask of data in registers as indicated by the programmer (see https://lwn.net/Articles/759423/). For AArch64, the following implementation choices are made. Some of these are different than the implementation choices made in the similar pass implemented in X86SpeculativeLoadHardening.cpp, as the instruction set characteristics result in different trade-offs. - The speculation hardening is done after register allocation. With a relative abundance of registers, one register is reserved (X16) to be the taint register. X16 is expected to not clash with other register reservation mechanisms with very high probability because: . The AArch64 ABI doesn't guarantee X16 to be retained across any call. . The only way to request X16 to be used as a programmer is through inline assembly. In the rare case a function explicitly demands to use X16/W16, this pass falls back to hardening against speculation by inserting a DSB SYS/ISB barrier pair which will prevent control flow speculation. - It is easy to insert mask operations at this late stage as we have mask operations available that don't set flags. - The taint variable contains all-ones when no miss-speculation is detected, and contains all-zeros when miss-speculation is detected. Therefore, when masking, an AND instruction (which only changes the register to be masked, no other side effects) can easily be inserted anywhere that's needed. - The tracking of miss-speculation is done by using a data-flow conditional select instruction (CSEL) to evaluate the flags that were also used to make conditional branch direction decisions. Speculation of the CSEL instruction can be limited with a CSDB instruction - so the combination of CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL aren't speculated. When conditional branch direction gets miss-speculated, the semantics of the inserted CSEL instruction is such that the taint register will contain all zero bits. One key requirement for this to work is that the conditional branch is followed by an execution of the CSEL instruction, where the CSEL instruction needs to use the same flags status as the conditional branch. This means that the conditional branches must not be implemented as one of the AArch64 conditional branches that do not use the flags as input (CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction selectors to not produce these instructions when speculation hardening is enabled. This pass will assert if it does encounter such an instruction. - On function call boundaries, the miss-speculation state is transferred from the taint register X16 to be encoded in the SP register as value 0. Future extensions/improvements could be: - Implement this functionality using full speculation barriers, akin to the x86-slh-lfence option. This may be more useful for the intrinsics-based approach than for the SLH approach to masking. Note that this pass already inserts the full speculation barriers if the function for some niche reason makes use of X16/W16. - no indirect branch misprediction gets protected/instrumented; but this could be done for some indirect branches, such as switch jump tables. Differential Revision: https://reviews.llvm.org/D54896 llvm-svn: 349456	2018-12-18 08:50:02 +00:00
Martin Storsjo	8f0cb9c3a8	[AArch64] [MinGW] Allow enabling SEH exceptions The default still is dwarf, but SEH exceptions can now be enabled optionally for the MinGW target. Differential Revision: https://reviews.llvm.org/D55748 llvm-svn: 349451	2018-12-18 08:32:37 +00:00
Kewen Lin	44ace92596	[PowerPC] Exploit power9 new instruction setb Check the expected pattens feeding to SELECT_CC like: (select_cc lhs, rhs, 1, (sext (setcc [lr]hs, [lr]hs, cc2)), cc1) (select_cc lhs, rhs, -1, (zext (setcc [lr]hs, [lr]hs, cc2)), cc1) (select_cc lhs, rhs, 0, (select_cc [lr]hs, [lr]hs, 1, -1, cc2), seteq) (select_cc lhs, rhs, 0, (select_cc [lr]hs, [lr]hs, -1, 1, cc2), seteq) Further transform the sequence to comparison + setb if hits. Differential Revision: https://reviews.llvm.org/D53275 llvm-svn: 349445	2018-12-18 07:53:26 +00:00
Craig Topper	1ff7356f96	[X86] Const correct some helper functions X86InstrInfo.cpp. NFC llvm-svn: 349440	2018-12-18 04:58:05 +00:00
Kewen Lin	3dac1252da	[PowerPC] Improve vec_abs on P9 Improve the current vec_abs support on P9, generate ISD::ABS node for vector types, combine ABS node to VABSD node for some special cases to make use of P9 VABSD* insns, do custom lowering to vsub(vneg later)+vmax if it has no combination opportunity. Differential Revision: https://reviews.llvm.org/D54783 llvm-svn: 349437	2018-12-18 03:16:43 +00:00
Simon Pilgrim	7e2975a44c	[X86][SSE] Improve immediate vector shift known bits handling. Convert VSRAI to VSRLI is the sign bit is known zero and improve KnownBits output for all shift instruction. Fixes the poor codegen comments in D55768. llvm-svn: 349407	2018-12-17 22:09:47 +00:00
Wouter van Oortmerssen	d3c544aa6e	[WebAssembly] Fix assembler parsing of br_table. Summary: We use `variable_ops` in the tablegen defs to denote the list of branch targets in `br_table`, but unlike other uses of `variable_ops` (e.g. call) the these branch targets need to actually be encoded in the instruction. The existing tables for `variable_ops` cause not operands to be accepted by the assembly matcher. Following the example of ARM: `2cc0a7da87/lib/Target/ARM/ARMInstrInfo.td (L550-L555)` we introduce a new operand type to capture this list, and we use the same {} syntax as ARM as well to differentiate them from regular integer operands. Also removed definition and use of TSFlags in tablegen defs, since `br_table` now has a non-variable_ops immediate operand, so the previous logic of only the variable_ops arguments being labels didn't make sense anymore. Reviewers: dschuff, aheejin, sunfish Subscribers: javed.absar, sbc100, jgravelle-google, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D55401 llvm-svn: 349405	2018-12-17 22:04:44 +00:00
Craig Topper	8c9d772991	[X86] Add T1MSKC and TZMSK to isDefConvertible used by optimizeCompareInstr. These seem to have been missed when the other TBM instructions were added. llvm-svn: 349404	2018-12-17 21:50:06 +00:00
Simon Pilgrim	6b5e0b7b2b	[X86][SSE] Split SimplifyDemandedBitsForTargetNode X86ISD::VSRLI/VSRAI handling. First step towards adding more capable combines to fix comments in D55768. llvm-svn: 349400	2018-12-17 21:36:17 +00:00
Craig Topper	728cbc0378	Convert (CMP (srl/shl X, C), 0) to (CMP (and X, C'), 0) when only the zero flag is used. This allows a TEST to be used and can be combined with any AND that may already exist as an input to the shift. This was already done in EmitTest, but was easily tricked by multiple uses because the setcc might be used by multiple instructions. Once the SETCC and users are legalized then we can look for the shift to be used by a single CMP, but the CMP itself can have multiple users. This appears to fix the case in PR39968. llvm-svn: 349385	2018-12-17 20:02:16 +00:00
Simon Pilgrim	9274f17a5e	[TargetLowering] Add DemandedElts mask to SimplifyDemandedBits (PR40000) This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen. I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well. Differential Revision: https://reviews.llvm.org/D55768 llvm-svn: 349374	2018-12-17 18:43:43 +00:00
Tim Northover	256a16d031	FastIsel: take care to update iterators when removing instructions. We keep a few iterators into the basic block we're selecting while performing FastISel. Usually this is fine, but occasionally code wants to remove already-emitted instructions. When this happens we have to be careful to update those iterators so they're not pointint at dangling memory. llvm-svn: 349365	2018-12-17 17:25:53 +00:00
Petar Avramovic	f9c9bc09ab	[MIPS GlobalISel] Remove switch statement (fix r349346 for MSVC) Temporarily remove switch statement without any case labels in function legalizeCustom in order to fix r349346 for MSVC. llvm-svn: 349356	2018-12-17 15:12:53 +00:00
Tim Northover	ae3b66b7b0	ARM: use acquire/release instruction variants when available. These features (fairly) recently got split out into their own feature, so we should make CodeGen use them when available. The main change here is that the check used to be based on the triple, but now it's based on CPU features. llvm-svn: 349355	2018-12-17 15:05:32 +00:00
Petar Avramovic	b8276f2280	[MIPS GlobalISel] Lower G_UADDE and narrowScalar G_ADD Lower G_UADDE and legalize G_ADD using narrowScalar on MIPS32. Differential Revision: https://reviews.llvm.org/D54580 llvm-svn: 349346	2018-12-17 12:31:07 +00:00
Alexandros Lamprineas	490ae11717	[AArch64] Re-run load/store optimizer after aggressive tail duplication The Load/Store Optimizer runs before Machine Block Placement. At O3 the Tail Duplication Threshold is set to 4 instructions and this can create new opportunities for the Load/Store Optimizer. It seems worthwhile to run it once again. llvm-svn: 349338	2018-12-17 10:45:43 +00:00
Craig Topper	fa4907d671	[X86] Fix bad operand lookup for cmov introduced in r349315 The CC is operand 2 not operand 3. llvm-svn: 349330	2018-12-17 06:40:35 +00:00
Simon Pilgrim	d0c9e43b1c	[X86] Pull out constant splat rotation detection. We had 3 different approaches - consistently use getTargetConstantBitsFromNode and allow undef elts. llvm-svn: 349319	2018-12-16 19:46:04 +00:00
Craig Topper	10f8892837	[X86] Remove truncation handling from EmitTest. Replace it with a DAG combine. I'd like to try to move a lot of the flag matching out of EmitTest and push it to isel or isel preprocessing. This is a step towards that. The test-shrink-bug.ll changie is an improvement because we are no longer interfering with test shrink handling in isel. The pr34137.ll change is a regression, but the IR came from -O0 and was not reduced by InstCombine. So it contains a lot of redundancies like duplicate loads that made it combine poorly. llvm-svn: 349315	2018-12-16 18:35:55 +00:00
Sanjay Patel	13ac2f15b0	[x86] increment/decrement constant vector with min/max in vsetcc lowering (PR39859) This is part of fixing PR39859: https://bugs.llvm.org/show_bug.cgi?id=39859 We have a crippled vector ISA, so we have to invert a typical fold and create min/max here. As discussed in the bug report, we can probably do better by using saturating subtract when it's available, but we should have this improvement for the min/max patterns regardless. Alive proofs: https://rise4fun.com/Alive/zsf https://rise4fun.com/Alive/Qrl Differential Revision: https://reviews.llvm.org/D55515 llvm-svn: 349304	2018-12-16 15:05:48 +00:00
Simon Pilgrim	52c982406e	[X86] Begin cleaning up combineOr -> SHLD/SHRD. NFCI. In preparation for converting to funnel shifts. llvm-svn: 349286	2018-12-15 21:11:49 +00:00
Simon Pilgrim	ef7b5949e5	[X86] Lower to SHLD/SHRD on slow machines for optsize Use consistent rules for when to lower to SHLD/SHRD for slow machines - fixes a weird issue where funnel shift gets expanded but then X86ISelLowering's combineOr sees the optsize and combines to SHLD/SHRD, but now with the modulo amount guard...... llvm-svn: 349285	2018-12-15 19:43:44 +00:00
Simon Pilgrim	9831d4058c	Fix -Wunused-variable warning. NFCI. llvm-svn: 349265	2018-12-15 12:25:22 +00:00
Florian Hahn	abe32c9125	[SILoadStoreOptimizer] Use std::abs to avoid truncation. Using regular abs() causes the following warning error: absolute value function 'abs' given an argument of type 'int64_t' (aka 'long') but has parameter of type 'int' which may cause truncation of value [-Werror,-Wabsolute-value] (uint32_t)abs(Dist) > MaxDist) { ^ lib/Target/AMDGPU/SILoadStoreOptimizer.cpp:1369:19: note: use function 'std::abs' instead which causes a bot to fail: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/18284/steps/bootstrap%20clang/logs/stdio llvm-svn: 349224	2018-12-15 01:32:58 +00:00
Craig Topper	1fc257d97f	[X86] Rename hasNoSignedComparisonUses to hasNoSignFlagUses. Add the instruction that only modify the O flag to the waiver list. The only caller of this turns CMP with 0 into TEST. CMP with 0 and TEST both set OF to 0 so we should have no issues with instructions that only use OF. Though I don't think there's any reason we would read just OF after a compare with 0 anyway. So this probably isn't an observable change. llvm-svn: 349223	2018-12-15 01:07:19 +00:00
Craig Topper	5c304eac41	[X86] Make hasNoCarryFlagUses/hasNoSignedComparisonUses take an SDValue that indicates which result is the flag result. NFCI hasNoCarryFlagUses hardcoded that the flag result is 1 and used that to filter which uses were of interest. hasNoSignedComparisonUses just assumes the only result is flags and checks whether any user of the node is a CopyToReg instruction. After this patch we now do a result number check in both and rely on the caller to provide the result number. This shouldn't change behavior it was just an odd difference between the two functions that I noticed. llvm-svn: 349222	2018-12-15 01:07:16 +00:00
Artem Belevich	6d74bd638a	[NVPTX] Lower instructions that expand into libcalls. The change is an effort to split and refactor abandoned D34708 into smaller parts. Here the behaviour of unsupported instructions is changed to match the behaviour of explicit intrinsics calls. Currently LLVM crashes with: > Assertion getInstruction() && "Not a call or invoke instruction!" failed. With this patch LLVM produces a more sensible error message: > Cannot select: ... i32 = ExternalSymbol'__foobar' Author: Denys Zariaiev <denys.zariaiev@gmail.com> Differential Revision: https://reviews.llvm.org/D55145 llvm-svn: 349213	2018-12-14 23:53:06 +00:00
Krzysztof Parzyszek	26d994f56e	[Hexagon] Add patterns for shifts of v2i16 This fixes https://llvm.org/PR39983. llvm-svn: 349202	2018-12-14 22:33:48 +00:00
Krzysztof Parzyszek	c0fc0a9775	[Hexagon] Use IMPLICIT_DEF to any-extend 32-bit values to 64 bits llvm-svn: 349199	2018-12-14 22:05:44 +00:00
Farhana Aleen	ce095c564a	[AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions. Summary: Promote constant offset to immediate by recomputing the relative 13bit offset from nearby instructions. E.g. s_movk_i32 s0, 0x1800 v_add_co_u32_e32 v0, vcc, s0, v2 v_addc_co_u32_e32 v1, vcc, 0, v6, vcc s_movk_i32 s0, 0x1000 v_add_co_u32_e32 v5, vcc, s0, v2 v_addc_co_u32_e32 v6, vcc, 0, v6, vcc global_load_dwordx2 v[5:6], v[5:6], off global_load_dwordx2 v[0:1], v[0:1], off => s_movk_i32 s0, 0x1000 v_add_co_u32_e32 v5, vcc, s0, v2 v_addc_co_u32_e32 v6, vcc, 0, v6, vcc global_load_dwordx2 v[5:6], v[5:6], off global_load_dwordx2 v[0:1], v[5:6], off offset:2048 Author: FarhanaAleen Reviewed By: arsenm, rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D55539 llvm-svn: 349196	2018-12-14 21:13:14 +00:00
Evandro Menezes	ea9d90083f	[AArch64] Simplify the scheduling predicates (NFC) The instruction encodings make it unnecessary to distinguish extended W-form from X-form instructions. llvm-svn: 349185	2018-12-14 20:04:58 +00:00
Ehsan Amiri	de1742c284	NFC. Adding an empty line to test the updated commit credentials. llvm-svn: 349158	2018-12-14 16:19:02 +00:00
Diana Picus	02c8343c75	[ARM GlobalISel] Thumb2: casts between int and ptr Mark as legal and add tests. Nothing special to do. llvm-svn: 349147	2018-12-14 13:45:38 +00:00
Diana Picus	813af0d283	[ARM GlobalISel] Minor refactoring. NFCI Refactor the ARMInstructionSelector to cache some opcodes in the constructor instead of checking all the time if we're in ARM or Thumb mode. llvm-svn: 349143	2018-12-14 12:37:24 +00:00
Diana Picus	14dc3b2959	[ARM GlobalISel] Allow simple binary ops in Thumb2 Mark G_ADD, G_SUB, G_MUL, G_AND, G_OR and G_XOR as legal for both ARM and Thumb2. Extract the legalizer tests for these opcodes into another file. Add tests for the instruction selector. llvm-svn: 349142	2018-12-14 11:58:14 +00:00
Craig Topper	257ce3871e	[DAGCombiner][X86] Prevent visitSIGN_EXTEND from returning N when (sext (setcc)) already has the target desired type for the setcc Summary: If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node. To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term. Reviewers: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55459 llvm-svn: 349137	2018-12-14 08:28:24 +00:00
Craig Topper	178abc59ac	[X86] Demote EmitTest to a helper function of EmitCmp. Route all callers except EmitCmp through EmitCmp. This requires the two callers to manifest a 0 to make EmitCmp call EmitTest. I'm looking into changing how we combine TEST and flag setting instructions to not be part of lowering. And instead be part of DAG combine or isel. Which will mean EmitTest will probably become gutted and maybe disappear entirely. llvm-svn: 349094	2018-12-13 23:55:30 +00:00
Evandro Menezes	6fe51ac973	[AArch64] Fix Exynos predicates (NFC) Fix the logic in the definition of the `ExynosShiftExPred` as a more specific version of `ExynosShiftPred`. But, since `ExynosShiftExPred` is not used yet, this change has NFC. llvm-svn: 349091	2018-12-13 23:19:46 +00:00
Aakanksha Patil	bc568766b2	Revert r348971: [AMDGPU] Support for "uniform-work-group-size" attribute This patch breaks RADV (and probably RadeonSI as well) llvm-svn: 349084	2018-12-13 21:23:12 +00:00
Matt Arsenault	934e534c47	AMDGPU/GlobalISel: Legalize/regbankselect block_addr llvm-svn: 349081	2018-12-13 20:34:15 +00:00
Mircea Trofin	41c729e78e	[llvm] Address base discriminator overflow in X86DiscriminateMemOps Summary: Macros are expanded on a single line. In case of large expansions, with sufficiently many instructions with memory operands (and when -fdebug-info-for-profiling is requested), we may be unable to generate new base discriminator values - new values overflow (base discriminators may not be larger than 2^12). This CL warns instead of asserting in such a case. A subsequent CL will add APIs to check for overflow before creating new debug info. See https://bugs.llvm.org/show_bug.cgi?id=39890 Reviewers: davidxl, wmi, gbedwell Reviewed By: davidxl Subscribers: aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D55643 llvm-svn: 349075	2018-12-13 19:40:59 +00:00
Simon Pilgrim	b5aaa673c6	[X86][SSE] Add SSE vector imm/var shift support to SimplifyDemandedVectorEltsForTargetNode llvm-svn: 349057	2018-12-13 16:39:29 +00:00
Simon Pilgrim	b0b2f1503a	[X86][SSE] Fix all remaining modulo vector rotation amounts (PR38243) There's still a couple of minor SimplifyDemandedElts regressions in some of the shift amount splats that will be fixed in future patches. llvm-svn: 349052	2018-12-13 15:50:31 +00:00
Daniel Cederman	77611426e1	[Sparc] Add membar assembler tags Summary: The Sparc V9 membar instruction can enforce different types of memory orderings depending on the value in its immediate field. In the architectural manual the type is selected by combining different assembler tags into a mask. This patch adds support for these tags. Reviewers: jyknight, venkatra, brad Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D53491 llvm-svn: 349048	2018-12-13 15:29:12 +00:00
Simon Pilgrim	ba91ff4a86	[X86][SSE] Fix modulo rotation amounts for v8i16/v16i16/v4i32 (PR38243) llvm-svn: 349047	2018-12-13 15:23:09 +00:00
Daniel Cederman	b5d284408e	[Sparc] Use float register for integer constrained with "f" in inline asm Summary: Constraining an integer value to a floating point register using "f" causes an llvm_unreachable to trigger. This patch allows i32 integers to be placed in a single precision float register and i64 integers to be placed in a double precision float register. This matches the behavior of GCC. For other types the llvm_unreachable is removed to instead trigger an error message that points out the offending line. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D51614 llvm-svn: 349045	2018-12-13 15:13:29 +00:00
Jinsong Ji	c7b43b94ce	[PowerPC][NFC] Sorting out Pseudo related classes to avoid confusion There are several Pseudo in PowerPC backend. eg: * ISel Pseudo-instructions , which has let usesCustomInserter=1 in td ExpandISelPseudos -> EmitInstrWithCustomInserter will deal with them. * Post-RA pseudo instruction, which has let isPseudo = 1 in td, or Standard pseudo (SUBREG_TO_REG,COPY etc.) ExpandPostRAPseudos -> expandPostRAPseudo will expand them * Multi-instruction pseudo operations will expand them PPCAsmPrinter::EmitInstruction * Pseudo instruction in CodeEmitter, which has encoding of 0. Currently, in td files, especially PPCInstrVSX.td, we did not distinguish Post-RA pseudo instruction and Pseudo instruction in CodeEmitter very clearly. This patch is to * Rename Pseudo<> class to PPCEmitTimePseudo, which means encoding of 0 in CodeEmitter * Introduce new class PPCPostRAExpPseudo <> for previous PostRA Pseudo * Introduce new class PPCCustomInserterPseudo <> for previous Isel Pseudo Differential Revision: https://reviews.llvm.org/D55143 llvm-svn: 349044	2018-12-13 15:12:57 +00:00
Simon Pilgrim	7c84f7ae3a	[X86][SSE] Merge the vXi16/vXi32 vector rotation expansion cases. NFCI. Merged the repeated code into a single if(). llvm-svn: 349040	2018-12-13 14:51:28 +00:00
Jonas Paulsson	e79b1b986d	[SystemZ] Pass copy-hinted regs first from getRegAllocationHints(). When computing register allocation hints for a GRX32Bit register, make sure that any of the hinted registers that are also copy hints are returned first in the list. Review: Ulrich Weigand. llvm-svn: 349037	2018-12-13 14:37:05 +00:00
Simon Pilgrim	320fd7383f	[X86][BWI] Don't custom lower vXi8 rotations. We always expand to shifts anyhow - test changes are just different scheduling only. llvm-svn: 349034	2018-12-13 13:44:33 +00:00
Chen Zheng	9c6fa536e0	[PowerPC] intrinsic llvm.eh.sjlj.setjmp should not have flag isBarrier. Differential Revision: https://reviews.llvm.org/D55499 llvm-svn: 349029	2018-12-13 12:25:20 +00:00
Simon Pilgrim	ab973a45b9	[DAGCombine] Moved X86 rotate_amount % bitwidth == 0 early out to DAGCombiner Remove common code from custom lowering (code is still safe if somehow a zero value gets used). llvm-svn: 349028	2018-12-13 12:23:32 +00:00
Diana Picus	99cd644b6c	[ARM GlobalISel] Support exts and truncs for Thumb2 Mark G_SEXT, G_ZEXT and G_ANYEXT to 32 bits as legal and add support for them in the instruction selector. This uses handwritten code again because the patterns that are generated with TableGen are tuned for what the DAG combiner would produce and not for simple sext/zext nodes. Luckily, we only need to update the opcodes to use the Thumb2 variants, everything else can be reused from ARM. llvm-svn: 349026	2018-12-13 12:06:54 +00:00
Simon Pilgrim	77fc551d1a	[TargetLowering] Add ISD::ROTL/ROTR vector expansion Move existing rotation expansion code into TargetLowering and set it up for vectors as well. Ideally this would share more of the funnel shift expansion, but we handle the shift amount modulo quite differently at the moment. Begun removing x86 vector rotate custom lowering to use the expansion. llvm-svn: 349025	2018-12-13 11:20:48 +00:00
Alex Bradbury	919f5fb8ca	[RISCV] Add support for the various RISC-V FMA instruction variants Adds support for the various RISC-V FMA instructions (fmadd, fmsub, fnmsub, fnmadd). The criteria for choosing whether a fused add or subtract is used, as well as whether the product is negated or not, is whether some of the arguments to the llvm.fma.* intrinsic are negated or not. In the tests, extraneous fadd instructions were added to avoid the negation being performed using a xor trick, which prevented the proper FMA forms from being selected and thus tested. The FMA instruction patterns might seem incorrect (e.g., fnmadd: -rs1 * rs2 - rs3), but they should be correct. The misleading names were inherited from MIPS, where the negation happens after computing the sum. The llvm.fmuladd.* intrinsics still do not generate RISC-V FMA instructions, as that depends on TargetLowering::isFMAFasterthanFMulAndFAdd. Some comments in the test files about what type of instructions are there tested were updated, to better reflect the current content of those test files. Differential Revision: https://reviews.llvm.org/D54205 Patch by Luís Marques. llvm-svn: 349023	2018-12-13 10:49:05 +00:00
Arnaud A. de Grandmaison	dfe861087d	[AArch64] Catch some more CMN opportunities. Fixes https://bugs.llvm.org/show_bug.cgi?id=33486 llvm-svn: 349022	2018-12-13 10:31:32 +00:00
Matt Arsenault	577b9fc543	AMDGPU/GlobalISel: Legalize f64 fadd/fmul llvm-svn: 349014	2018-12-13 08:27:48 +00:00
Matt Arsenault	f38f483bef	AMDGPU/GlobalISel: RegBankSelect some simple operations llvm-svn: 349012	2018-12-13 08:23:51 +00:00
Craig Topper	a048d58de7	[X86] Remove assert leftover from when i1 was a legal type. Add more accurate assert. NFC llvm-svn: 349007	2018-12-13 06:14:25 +00:00
Stanislav Mekhanoshin	d933c2ced7	[AMDGPU] Fix build failure, second attempt Some compilers complain that variable is captured and some complain when it is not. Switch to [&]. llvm-svn: 349006	2018-12-13 05:52:11 +00:00
Stanislav Mekhanoshin	5225746e03	[AMDGPU] Fix build failure Fixed error 'lambda capture 'CondReg' is not required to be captured for this use'. llvm-svn: 349005	2018-12-13 05:21:25 +00:00
Stanislav Mekhanoshin	6071e1aa58	[AMDGPU] Simplify negated condition Optimize sequence: %sel = V_CNDMASK_B32_e64 0, 1, %cc %cmp = V_CMP_NE_U32 1, %1 $vcc = S_AND_B64 $exec, %cmp S_CBRANCH_VCC[N]Z => $vcc = S_ANDN2_B64 $exec, %cc S_CBRANCH_VCC[N]Z It is the negation pattern inserted by DAGCombiner::visitBRCOND() in the rebuildSetCC(). Differential Revision: https://reviews.llvm.org/D55402 llvm-svn: 349003	2018-12-13 03:17:40 +00:00
Craig Topper	d1c61861dd	[X86] Don't emit MULX by default with BMI2 MULX has somewhat improved register allocation constraints compared to the legacy MUL instruction. Both output registers are encoded instead of fixed to EAX/EDX, but EDX is used as input. It also doesn't touch flags. Unfortunately, the encoding is longer. Prefering it whenever BMI2 is enabled is probably not optimal. Choosing it should somehow be a function of register allocation constraints like converting adds to three address. gcc and icc definitely don't pick MULX by default. Not sure what if any rules they have for using it. Differential Revision: https://reviews.llvm.org/D55565 llvm-svn: 348975	2018-12-12 21:21:31 +00:00
Aakanksha Patil	729309cc89	[AMDGPU] Support for "uniform-work-group-size" attribute Updated the annotate-kernel-features pass to support the propagation of uniform-work-group attribute from the kernel to the called functions. Once this pass is run, all kernels, even the ones which initially did not have the attribute, will be able to indicate weather or not they have uniform work group size depending on the value of the attribute. Differential Revision: https://reviews.llvm.org/D50200 llvm-svn: 348971	2018-12-12 20:49:17 +00:00
Scott Linder	f5b36e56fb	[AMDGPU] Emit MessagePack HSA Metadata for v3 code object Continue to present HSA metadata as YAML in ASM and when output by tools (e.g. llvm-readobj), but encode it in Messagepack in the code object. Differential Revision: https://reviews.llvm.org/D48179 llvm-svn: 348963	2018-12-12 19:39:27 +00:00
Craig Topper	4937adf75f	[X86] Emit SBB instead of SETCC_CARRY from LowerSELECT. Break false dependency on the SBB input. I'm hoping we can just replace SETCC_CARRY with SBB. This is another step towards that. I've explicitly used zero as the input to the setcc to avoid a false dependency that we've had with the SETCC_CARRY. I changed one of the patterns that used NEG to instead use an explicit compare with 0 on the LHS. We needed the zero anyway to avoid the false dependency. The negate would clobber its input register. By using a CMP we can avoid that which could be useful. Differential Revision: https://reviews.llvm.org/D55414 llvm-svn: 348959	2018-12-12 19:20:21 +00:00
Simon Pilgrim	eb508f8ccb	[SelectionDAG] Add a generic isSplatValue function This patch introduces a generic function to determine whether a given vector type is known to be a splat value for the specified demanded elements, recursing up the DAG looking for BUILD_VECTOR or VECTOR_SHUFFLE splat patterns. It also keeps track of the elements that are known to be UNDEF - it returns true if all the demanded elements are UNDEF (as this may be useful under some circumstances), so this needs to be handled by the caller. A wrapper variant is also provided that doesn't take the DemandedElts or UndefElts arguments for cases where we just want to know if the SDValue is a splat or not (with/without UNDEFS). I had hoped to completely remove the X86 local version of this function, but I'm seeing some regressions in shift/rotate codegen that will take a little longer to fix and I hope to get this in sooner so I can continue work on PR38243 which needs more capable splat detection. Differential Revision: https://reviews.llvm.org/D55426 llvm-svn: 348953	2018-12-12 18:32:29 +00:00
Artem Belevich	f802b9324a	[NVPTX] do not rely on cached subtarget info. If a module has function references, but no functions themselves, we may end up never calling runOnMachineFunction and therefore would never initialize nvptxSubtarget field which would eventually cause a crash. Instead of relying on nvptxSubtarget being initialized by one of the methods, retrieve subtarget info directly. Differential Revision: https://reviews.llvm.org/D55580 llvm-svn: 348952	2018-12-12 18:31:04 +00:00
Sanjay Patel	44eaa492b8	[x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA This extends the code that handles 16-bit add promotion to form LEA to also allow 8-bit adds. That allows us to combine add ops with register moves and save some instructions. This is another step towards allowing add truncation in generic DAGCombiner (see D54640). Differential Revision: https://reviews.llvm.org/D55494 llvm-svn: 348946	2018-12-12 17:58:27 +00:00
Neil Henning	76504a4c5e	[AMDGPU] Extend the SI Load/Store optimizer to combine more things. I've extended the load/store optimizer to be able to produce dwordx3 loads and stores, This change allows many more load/stores to be combined, and results in much more optimal code for our hardware. Differential Revision: https://reviews.llvm.org/D54042 llvm-svn: 348937	2018-12-12 16:15:21 +00:00
Simon Atanasyan	fa020082e4	[mips] Enable using of integrated assembler in all cases. llvm-svn: 348934	2018-12-12 15:32:03 +00:00
Piotr Sobczak	3732b4ce25	[AMDGPU] Set metadata access for explicit section Summary: This patch provides a means to set Metadata section kind for a global variable, if its explicit section name is prefixed with ".AMDGPU.metadata." This could be useful to make the global variable go to an ELF section without any section flags set. Reviewers: dstuttard, tpr, kzhuravl, nhaehnle, t-tye Reviewed By: dstuttard, kzhuravl Subscribers: llvm-commits, arsenm, jvesely, wdng, yaxunl, t-tye Differential Revision: https://reviews.llvm.org/D55267 llvm-svn: 348922	2018-12-12 11:20:04 +00:00
Diana Picus	59720b422a	[ARM GlobalISel] Select load/store for Thumb2 Unfortunately we can't use TableGen for this because it doesn't yet support predicates on the source pattern root. Therefore, add a bit of handwritten code to the instruction selector to handle the most basic cases. Also mark them as legal and extract their legalizer test cases to a new test file. llvm-svn: 348920	2018-12-12 10:32:15 +00:00
Jonas Paulsson	896775c2d3	[SystemZ] Minor cleanup of SchedModels Some fixes of a few InstRWs for z13 and z14. Review: Ulrich Weigand llvm-svn: 348917	2018-12-12 08:26:24 +00:00
Craig Topper	1fe466689b	[X86] Combine vpmovdw+vpacksswb into vpmovdb. This is similar to the combine we already have for vpmovdw+vpackuswb. llvm-svn: 348910	2018-12-12 05:56:01 +00:00

1 2 3 4 5 ...

50329 Commits