teak-llvm

mirror of https://github.com/Gericom/teak-llvm.git synced 2025-06-21 12:35:47 -04:00

Author	SHA1	Message	Date
Craig Topper	fad1589f39	Revert r350554 "[X86] Remove AVX512VBMI2 concat and shift intrinsics. Replace with target independent funnel shift intrinsics." The AutoUpgrade.cpp if/else cascade hit an MSVC limit again. llvm-svn: 350562	2019-01-07 19:39:05 +00:00
Craig Topper	826f44b550	[TargetLowering][AMDGPU] Remove the SimplifyDemandedBits function that takes a User and OpIdx. Stop using it in AMDGPU target for simplifyI24. As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions. Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node. This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input. Differential Revision: https://reviews.llvm.org/D56087 llvm-svn: 350560	2019-01-07 19:30:43 +00:00
Craig Topper	9c4f7e9147	[X86] Remove AVX512VBMI2 concat and shift intrinsics. Replace with target independent funnel shift intrinsics. Differential Revision: https://reviews.llvm.org/D56377 llvm-svn: 350554	2019-01-07 19:10:12 +00:00
Diogo N. Sampaio	f192cdb5c9	[ARM] ComputeKnownBits to handle extract vectors This patch adds the sign/zero extension done by vgetlane to ARM computeKnownBitsForTargetNode. Differential revision: https://reviews.llvm.org/D56098 llvm-svn: 350553	2019-01-07 19:01:47 +00:00
Rhys Perry	f77e2e8406	AMDGPU: test for uniformity of branch instruction, not its condition Summary: If a divergent branch instruction is marked as divergent by propagation rule 2 in DivergencePropagator::exploreSyncDependency() and its condition is uniform, that branch would incorrectly be assumed to be uniform. Reviewers: arsenm, tstellar Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D56331 llvm-svn: 350532	2019-01-07 15:52:28 +00:00
Matt Arsenault	7a27c1886f	AMDGPU: Remove v16i8 from register classes llvm-svn: 350518	2019-01-07 13:31:55 +00:00
Matt Arsenault	369acb8470	AMDGPU: Remove VS/SV mappings from select These would violate the constant bus restriction llvm-svn: 350517	2019-01-07 13:21:36 +00:00
Craig Topper	6ffeeb705f	[X86] Add support for matching vector funnel shift to AVX512VBMI2 instructions. Summary: AVX512VBMI2 supports a funnel shift by immediate and a funnel shift by a variable vector. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56361 llvm-svn: 350498	2019-01-06 18:10:18 +00:00
Sanjay Patel	8f12e8f3f6	[x86] explicitly set cost of integer add/sub There are no test changes here in the existing cost model regression tests because integer add/sub have a default legal cost of 1 already. This would break, however, if we custom lower those ops because the default cost model assumes that custom-lowered ops are more expensive. This is similar to the change in rL350403. See discussion in D56011 for more details. When we enhance that patch to handle integer ops, we need this cost model change to avoid unintended diffs here from the custom lowering. llvm-svn: 350496	2019-01-06 16:21:42 +00:00
Craig Topper	1187991bcf	[X86][AsmParser] Don't allow X86::DX in CheckBaseRegAndIndexRegAndScale. This was here because out and in instructions allow '(%dx)' even though its not a memory reference. To handle this we build a special operand for the DX register reference before we get to the call to CheckBaseRegAndIndexRegAndScale. So we no longer need this special case. llvm-svn: 350483	2019-01-05 23:30:28 +00:00
Craig Topper	d0ba531a0c	[X86] Use two pmovmskbs in combineBitcastvxi1 for (i64 (bitcast (v64i1 (truncate (v64i8)))) on KNL. llvm-svn: 350481	2019-01-05 22:42:58 +00:00
Craig Topper	46f8b4a11e	[X86] Allow combinevxi1Bitcast to use pmovmskb on avx512 targets if the input is a truncate from v16i8/v32i8. This is especially helpful on targets without avx512bw since we don't have a good way to convert from v16i8/v32i8 to v16i1/v32i1 for the truncate anyway. If we're just going to convert it to a GPR we might as well use pmovmskb to accomplish both. llvm-svn: 350480	2019-01-05 21:40:07 +00:00
Craig Topper	3f48dbf72e	[X86] Allow LowerTRUNCATE to use PACKUS/PACKSS for v16i16->v16i8 truncate when -mprefer-vector-width-256 is in effect and BWI is not available. llvm-svn: 350473	2019-01-05 18:48:11 +00:00
Craig Topper	45ec002e25	[X86] Require second operand of X86vshiftuniform to be an integer. NFC We don't need to require the first operand to be an integer because we already said it was the same type as the result which we also constrained to an integer. llvm-svn: 350455	2019-01-05 01:40:29 +00:00
Nikita Popov	c35b4a37ba	[X86] Fix warning; NFC llvm-svn: 350437	2019-01-04 21:41:35 +00:00
Vyacheslav Zakharin	0a6f86c54b	Update the pr_datasz of .note.gnu.property section. Patch by Xiang Zhang. Differential Revision: https://reviews.llvm.org/D56080 llvm-svn: 350436	2019-01-04 21:25:01 +00:00
Evandro Menezes	9f53bea536	[AArch64] Adjust the cost model for Exynos M3 Improve the modeling of ASIMD loads and stores. llvm-svn: 350434	2019-01-04 21:02:25 +00:00
Sanjay Patel	6153565511	[x86] lower extracted fadd/fsub to horizontal vector math; 2nd try The 1st try for this was at rL350369, but it caused IR-level diffs because our cost models differentiate custom vs. legal/promote lowering. So that was reverted at rL350373. The cost models were fixed independently at rL350403, so this is effectively the same patch as last time. Original commit message: This would show up if we fix horizontal reductions to narrow as they go along, but it's an improvement for size and/or Jaguar (fast-hops) independent of that. We need to do this late to not interfere with other pattern matching of larger horizontal sequences. We can extend this to integer ops in a follow-up patch. Differential Revision: https://reviews.llvm.org/D56011 llvm-svn: 350421	2019-01-04 17:48:13 +00:00
Nirav Dave	1468d6e1c5	Undo r350355 "[X86] Remove terrible DX Register parsing hack in parse operand. NFCI." Add missing test case and update comments. llvm-svn: 350406	2019-01-04 17:11:15 +00:00
Simon Pilgrim	c2054144ee	[CostModel][X86] Fix SSE1 FADD/FSUB costs Noticed in D56011 - handle the case that scalar fp ops are quicker on P3 than P4 Add the other costs so that we're not relying on the default "is legal/custom" cost logic. llvm-svn: 350403	2019-01-04 16:55:57 +00:00
Simon Pilgrim	9f4dea8c06	[X86] Add VPSLLI/VPSRLI ((X >>u C1) << C2) SimplifyDemandedBits combine Repeat of the generic SimplifyDemandedBits shift combine llvm-svn: 350399	2019-01-04 15:43:43 +00:00
Richard Trieu	e1fef949ae	[WebAssembly] Split the checking from the sorting logic. Move the check for -1 and identical values outside the vector sorting code. Compare functions need to be able to compare identical elements to be conforming. llvm-svn: 350379	2019-01-04 06:49:24 +00:00
Craig Topper	6265a15f2e	[X86] Add post-isel peephole to fold KAND+KORTEST into KTEST if only the zero flag is used. Doing this late so we will prefer to fold the AND into a masked comparison first. That can be better for the live range of the mask register. Differential Revision: https://reviews.llvm.org/D56246 llvm-svn: 350374	2019-01-04 00:10:58 +00:00
Sanjay Patel	26ce9c38a7	revert r350369: [x86] lower extracted fadd/fsub to horizontal vector math There are non-codegen tests that need to be updated with this code change. llvm-svn: 350373	2019-01-04 00:02:02 +00:00
Sanjay Patel	ef4afca2ad	[x86] lower extracted fadd/fsub to horizontal vector math This would show up if we fix horizontal reductions to narrow as they go along, but it's an improvement for size and/or Jaguar (fast-hops) independent of that. We need to do this late to not interfere with other pattern matching of larger horizontal sequences. We can extend this to integer ops in a follow-up patch. Differential Revision: https://reviews.llvm.org/D56011 llvm-svn: 350369	2019-01-03 23:16:19 +00:00
Heejin Ahn	777d01c756	[WebAssembly] Optimize Irreducible Control Flow Summary: Irreducible control flow is not that rare, e.g. it happens in malloc and 3 other places in the libc portions linked in to a hello world program. This patch improves how we handle that code: it emits a br_table to dispatch to only the minimal necessary number of blocks. This reduces the size of malloc by 33%, and makes it comparable in size to asm2wasm's malloc output. Added some tests, and verified this passes the emscripten-wasm tests run on the waterfall (binaryen2, wasmobj2, other). Reviewers: aheejin, sunfish Subscribers: mgrang, jgravelle-google, sbc100, dschuff, llvm-commits Differential Revision: https://reviews.llvm.org/D55467 Patch by Alon Zakai (kripken) llvm-svn: 350367	2019-01-03 23:10:11 +00:00
Wouter van Oortmerssen	820c6263d9	[WebAssembly] Fixed disassembler not knowing about new brlist operand Summary: The previously introduced new operand type for br_table didn't have a disassembler implementation, causing an assert. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D56227 llvm-svn: 350366	2019-01-03 23:01:30 +00:00
Wouter van Oortmerssen	9843295608	[WebAssembly] Made InstPrinter more robust Summary: Instead of asserting on certain kinds of malformed instructions, it now still print, but instead adds an annotation indicating the problem, and/or indicates invalid_type etc. We're using the InstPrinter from many contexts that can't always guarantee values are within range (e.g. the disassembler), where having output is more valueable than asserting. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D56223 llvm-svn: 350365	2019-01-03 22:59:59 +00:00
Nirav Dave	8de916d1a4	[X86] Remove terrible DX Register parsing hack in parse operand. NFCI. Fold hack special casing of (%dx) operand parsing into the related hack for out/in instruction parsing. llvm-svn: 350355	2019-01-03 21:46:30 +00:00
Sanjay Patel	9633d76a40	[DAGCombiner][x86] scalarize binop followed by extractelement As noted in PR39973 and D55558: https://bugs.llvm.org/show_bug.cgi?id=39973 ...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine: // extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index) We want to have this in the DAG too because as we can see in some of the test diffs (reductions), the pattern may not be visible in IR. Given that this is already an IR canonicalization, any backend that would prefer a vector op over a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's a realistic expectation though). The transform is limited with a TLI hook because there's an existing transform in CodeGenPrepare that tries to do the opposite transform. Differential Revision: https://reviews.llvm.org/D55722 llvm-svn: 350354	2019-01-03 21:31:16 +00:00
Alexander Timofeev	993e2798fd	[AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression. Detailed description: SIFoldOperands::foldInstOperand iterates over the operand uses calling the function that changes def-use iteratorson the way. As a result loop exits immediately when def-use iterator is changed. Hence, the operand is folded to the very first use instruction only. This makes VGPR live along the whole basic block and increases register pressure significantly. The performance drop observed in SHOC DeviceMemory test is caused by this bug. Proposed fix: collect uses to separate container for further processing in another loop. Testing: make check-llvm SHOC performance test. Reviewers: rampitec, ronlieb Differential Revision: https://reviews.llvm.org/D56161 llvm-svn: 350350	2019-01-03 19:55:32 +00:00
Evandro Menezes	0f67746c92	[AArch64] Add new scheduling predicates Add new scheduling predicates to identify the ASIMD loads and stores using the post indexed addressing mode. llvm-svn: 350332	2019-01-03 17:28:09 +00:00
Alex Bradbury	2ba76be882	[RISCV][MC] Accept %lo and %pcrel_lo on operands to li This matches GNU assembler behaviour. llvm-svn: 350321	2019-01-03 14:41:41 +00:00
Diogo N. Sampaio	8786a946d8	[ARM] Add command-line option for SB SB (Speculative Barrier) is only mandatory from 8.5 onwards but is optional from Armv8.0-A. This patch adds a command line option to enable SB, as it was previously only possible to enable by selecting -march=armv8.5-a. This patch also renames FeatureSpecRestrict to FeatureSB. Reviewed By: olista01, LukeCheeseman Differential Revision: https://reviews.llvm.org/D55990 llvm-svn: 350299	2019-01-03 12:09:12 +00:00
Simon Pilgrim	d824f99a6c	[X86] Add ADD/SUB SSAT/USAT vector costs (PR40123) Costs for real SSE2 instructions llvm-svn: 350295	2019-01-03 11:38:42 +00:00
Piotr Sobczak	3abef8f9ea	[AMDGPU] Change section name with metadata access Summary: The commit rL348922 introduced a means to set Metadata section kind for a global variable, if its explicit section name was prefixed with ".AMDGPU.metadata.". This patch changes that prefix to ".AMDGPU.comment.", as "metadata" in the section name might lead to ambiguity with metadata used by AMD PAL runtime. Change-Id: Idd4748800d6fe801441d91595fc21e5a4171e668 Reviewers: kzhuravl Reviewed By: kzhuravl Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D56197 llvm-svn: 350292	2019-01-03 11:22:58 +00:00
QingShan Zhang	f24ec7bdd0	[Power9] Enable the Out-of-Order scheduling model for P9 hw When switched to the MI scheduler for P9, the hardware is modeled as out of order. However, inside the MI Scheduler algorithm, we still use the in-order scheduling model as the MicroOpBufferSize isn't set. The MI scheduler take it as the hw cannot buffer the op. So, only when all the available instructions issued, the pending instruction could be scheduled. That is not true for our P9 hw in fact. This patch is trying to enable the Out-of-Order scheduling model. The buffer size 44 is picked from the P9 hw spec, and the perf test indicate that, its value won't hurt the cpu2017. With this patch, there are 3 specs improved over 3% and 1 spec deg over 3%. The detail is as follows: x264_r: +6.95% cactuBSSN_r: +6.94% lbm_r: +4.11% xz_r: -3.85% And the GEOMEAN for all the C/C++ spec in spec2017 is about 0.18% improved. Reviewer: Nemanjai Differential Revision: https://reviews.llvm.org/D55810 llvm-svn: 350285	2019-01-03 05:04:18 +00:00
Robert Widmann	7882b283cd	[LLVM-C] Expand LLVMRelocMode Summary: Add read[only\|write] PIC relocation models to the C API and teach the TargetMachine API about it. Reviewers: whitequark, deadalnix Reviewed By: whitequark Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56187 llvm-svn: 350279	2019-01-03 00:33:44 +00:00
Craig Topper	df5304d8de	[X86] Add load folding support to the custom isel we do for X86ISD::UMUL/SMUL. The peephole pass isn't always able to fold the load because it can't commute the implicit usage of AL/AX/EAX/RAX. llvm-svn: 350272	2019-01-02 23:24:08 +00:00
Wouter van Oortmerssen	ad72f68501	[WebAssembly] made assembler parse block_type Summary: This was previously ignored and an incorrect value generated. Also fixed Disassembler's handling of block_type. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D56092 llvm-svn: 350270	2019-01-02 23:23:51 +00:00
Craig Topper	9d4860ec4e	[X86] Remove X86ISD::INC/DEC. Just select them from X86ISD::ADD/SUB at isel time INC/DEC are pretty much the same as ADD/SUB except that they don't update the C flag. This patch removes the special nodes and just pattern matches from ADD/SUB during isel if the C flag isn't being used. I had to avoid selecting DEC is the result isn't used. This will become a SUB immediate which will turned into a CMP later by optimizeCompareInstr. This lead to the one test change where we use a CMP instead of a DEC for an overflow intrinsic since we only checked the flag. This also exposed a hole in our RMW flag matching use of hasNoCarryFlagUses. Our root node for the match is a store and there's no guarantee that all the flag users have been selected yet. So hasNoCarryFlagUses needs to check copyToReg and machine opcodes, but it also needs to check for the pre-match SETCC, SETCC_CARRY, BRCOND, and CMOV opcodes. Differential Revision: https://reviews.llvm.org/D55975 llvm-svn: 350245	2019-01-02 19:01:05 +00:00
Wei Mi	ecc89b76cb	[PowerPC] Remove SeenUse check when optimizing conditional branch in PPCPreEmitPeephole pass. PPCPreEmitPeephole will convert a BC to B when the conditional branch is based on a constant CR by CRSET or CRUNSET. This is added in https://reviews.llvm.org/rL343100. When the conditional branch is known to be always taken, all branches will be removed and a new unconditional branch will be inserted. However, when SeenUse is false the original patch will not remove the branches, but still insert the new unconditional branch, update the successors and create inconsistent IR. Compiling the synthetic testcase included can show the problem we run into. The patch simply removes the SeenUse condition when adding branches into InstrsToErase set. Differential Revision: https://reviews.llvm.org/D56041 llvm-svn: 350223	2019-01-02 17:07:23 +00:00
Simon Pilgrim	d8125726d5	[X86] Support SHLD/SHRD masked shift-counts (PR34641) Peek through shift modulo masks while matching double shift patterns. I was hoping to delay this until I could remove the X86 code with generic funnel shift matching (PR40081) but this will do for now. Differential Revision: https://reviews.llvm.org/D56199 llvm-svn: 350222	2019-01-02 17:05:37 +00:00
Piotr Sobczak	378131bae0	[AMDGPU] Handle OR as operand of raw load/store Summary: Use isBaseWithConstantOffset() which handles OR as an operand to llvm.amdgcn.raw.buffer.load and llvm.amdgcn.raw.buffer.store. Change-Id: Ifefb9dc5ded8710d333df07ab1900b230e33539a Reviewers: nhaehnle, mareko, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55999 llvm-svn: 350208	2019-01-02 09:47:41 +00:00
Craig Topper	f7cc7e3201	[X86] Remove the separate SMUL8/UMUL8 X86ISD opcodes by merging with SMUL/UMUL. Remove the second result from X86ISD::UMUL. All of these use custom isel so we can pretty easily detect the differences in the custom code in X86ISelDAGToDAG. The ISD opcodes just need to express the desired semantics not the details of how they would be selected by isel. So unifying them lets us remove the special casing from lowering. llvm-svn: 350206	2019-01-02 06:40:11 +00:00
Craig Topper	d4db122483	[X86] Allow LowerSELECT and LowerBRCOND to directly lower i8 UMULO/SMULO. These require a different X86ISD node to be created than i16/i32/i64. I guess no one wanted to add the special code for that except in LowerXALUO. But now LowerXALUO, LowerSELECT, and LowerBRCOND all use a common helper function so they all share the special code. Unfortunately, there are no test changes because we seem to correct the miss in a DAG combine later. I did verify it manually using test cases from xmulo.ll llvm-svn: 350205	2019-01-02 05:46:03 +00:00
Craig Topper	00b390a000	[X86] Factor the core code out of LowerXALUO into a helper function. Use it in LowerBRCOND and LowerSELECT to avoid some duplicated code. This makes it easier to keep the LowerBRCOND and LowerSELECT code in sync with LowerXALUO so they always pick the same operation for overflowing instructions. This is inspired by the helper functions used by ARM and AArch64 for the same purpose. The test change is because LowerSELECT was not in sync with LowerXALUO with regard to INC/DEC for SADDO/SSUBO. llvm-svn: 350198	2019-01-01 19:34:11 +00:00
Sanjay Patel	738a863648	[x86] move/rename helper for horizontal op codegen; NFC Preliminary commit as suggested in D56011. llvm-svn: 350193	2019-01-01 16:08:36 +00:00
Craig Topper	bb0873cf46	[X86] Add X86ISD::VSRAI to computeKnownBitsForTargetNode. Differential Revision: https://reviews.llvm.org/D56169 llvm-svn: 350178	2018-12-31 19:09:27 +00:00
Simon Pilgrim	f2b9d10477	Keep tablegen commands in alphabetical order. NFCI. Mentioned on D56167. llvm-svn: 350176	2018-12-31 14:51:53 +00:00

1 2 3 4 5 ...

50329 Commits