teak-llvm

mirror of https://github.com/Gericom/teak-llvm.git synced 2025-06-20 03:55:48 -04:00

Author	SHA1	Message	Date
Andrea Di Biagio	7bec693433	[MCA] Store extra information about processor resources in the ResourceManager. Method ResourceManager::use() is responsible for updating the internal state of used processor resources, as well as notifying resource groups that contain used resources. Before this patch, method 'use()' didn't know how to quickly obtain the set of groups that contain a particular resource unit. It had to discover groups by perform a potentially slow search (done by iterating over the set of processor resource descriptors). With this patch, the relationship between resource units and groups is stored in the ResourceManager. That means, method 'use()' no longer has to search for groups. This gives an average speedup of ~4-5% on a release build. This patch also adds extra code comments in ResourceManager.h to better describe the resource mask layout, and how resouce indices are computed from resource masks. llvm-svn: 350387	2019-01-04 12:31:14 +00:00
Richard Trieu	e1fef949ae	[WebAssembly] Split the checking from the sorting logic. Move the check for -1 and identical values outside the vector sorting code. Compare functions need to be able to compare identical elements to be conforming. llvm-svn: 350379	2019-01-04 06:49:24 +00:00
Xin Tong	47beee2f3f	[memcpyopt] Remove a few unnecessary isVolatile() checks. NFC We already checked for isSimple() on the store. llvm-svn: 350378	2019-01-04 02:13:22 +00:00
Craig Topper	6265a15f2e	[X86] Add post-isel peephole to fold KAND+KORTEST into KTEST if only the zero flag is used. Doing this late so we will prefer to fold the AND into a masked comparison first. That can be better for the live range of the mask register. Differential Revision: https://reviews.llvm.org/D56246 llvm-svn: 350374	2019-01-04 00:10:58 +00:00
Sanjay Patel	26ce9c38a7	revert r350369: [x86] lower extracted fadd/fsub to horizontal vector math There are non-codegen tests that need to be updated with this code change. llvm-svn: 350373	2019-01-04 00:02:02 +00:00
Sanjay Patel	ef4afca2ad	[x86] lower extracted fadd/fsub to horizontal vector math This would show up if we fix horizontal reductions to narrow as they go along, but it's an improvement for size and/or Jaguar (fast-hops) independent of that. We need to do this late to not interfere with other pattern matching of larger horizontal sequences. We can extend this to integer ops in a follow-up patch. Differential Revision: https://reviews.llvm.org/D56011 llvm-svn: 350369	2019-01-03 23:16:19 +00:00
Heejin Ahn	777d01c756	[WebAssembly] Optimize Irreducible Control Flow Summary: Irreducible control flow is not that rare, e.g. it happens in malloc and 3 other places in the libc portions linked in to a hello world program. This patch improves how we handle that code: it emits a br_table to dispatch to only the minimal necessary number of blocks. This reduces the size of malloc by 33%, and makes it comparable in size to asm2wasm's malloc output. Added some tests, and verified this passes the emscripten-wasm tests run on the waterfall (binaryen2, wasmobj2, other). Reviewers: aheejin, sunfish Subscribers: mgrang, jgravelle-google, sbc100, dschuff, llvm-commits Differential Revision: https://reviews.llvm.org/D55467 Patch by Alon Zakai (kripken) llvm-svn: 350367	2019-01-03 23:10:11 +00:00
Wouter van Oortmerssen	820c6263d9	[WebAssembly] Fixed disassembler not knowing about new brlist operand Summary: The previously introduced new operand type for br_table didn't have a disassembler implementation, causing an assert. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D56227 llvm-svn: 350366	2019-01-03 23:01:30 +00:00
Wouter van Oortmerssen	9843295608	[WebAssembly] Made InstPrinter more robust Summary: Instead of asserting on certain kinds of malformed instructions, it now still print, but instead adds an annotation indicating the problem, and/or indicates invalid_type etc. We're using the InstPrinter from many contexts that can't always guarantee values are within range (e.g. the disassembler), where having output is more valueable than asserting. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D56223 llvm-svn: 350365	2019-01-03 22:59:59 +00:00
Nirav Dave	8de916d1a4	[X86] Remove terrible DX Register parsing hack in parse operand. NFCI. Fold hack special casing of (%dx) operand parsing into the related hack for out/in instruction parsing. llvm-svn: 350355	2019-01-03 21:46:30 +00:00
Sanjay Patel	9633d76a40	[DAGCombiner][x86] scalarize binop followed by extractelement As noted in PR39973 and D55558: https://bugs.llvm.org/show_bug.cgi?id=39973 ...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine: // extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index) We want to have this in the DAG too because as we can see in some of the test diffs (reductions), the pattern may not be visible in IR. Given that this is already an IR canonicalization, any backend that would prefer a vector op over a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's a realistic expectation though). The transform is limited with a TLI hook because there's an existing transform in CodeGenPrepare that tries to do the opposite transform. Differential Revision: https://reviews.llvm.org/D55722 llvm-svn: 350354	2019-01-03 21:31:16 +00:00
Alexander Timofeev	993e2798fd	[AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression. Detailed description: SIFoldOperands::foldInstOperand iterates over the operand uses calling the function that changes def-use iteratorson the way. As a result loop exits immediately when def-use iterator is changed. Hence, the operand is folded to the very first use instruction only. This makes VGPR live along the whole basic block and increases register pressure significantly. The performance drop observed in SHOC DeviceMemory test is caused by this bug. Proposed fix: collect uses to separate container for further processing in another loop. Testing: make check-llvm SHOC performance test. Reviewers: rampitec, ronlieb Differential Revision: https://reviews.llvm.org/D56161 llvm-svn: 350350	2019-01-03 19:55:32 +00:00
Anna Thomas	a470aa6701	[UnrollRuntime] Move the DomTree verification under expensive checks Suggested by Hal as done in r349871. llvm-svn: 350349	2019-01-03 19:43:33 +00:00
Stefan Granitz	a9b7ca472d	Revert "Resubmit rL345008 "Split MachinePipeliner code into header and cpp files"" This reverts commit r350290. llvm-svn: 350345	2019-01-03 19:09:24 +00:00
Kristina Brooks	e434280f3d	[MCStreamer] Use report_fatal_error in EmitRawTextImpl Use report_fatal_error in MCStreamer::EmitRawTextImpl instead of using errs() and explain the rationale behind it not being llvm_unreachable() to save confusion for any future maintainers. Differential Revision: https://reviews.llvm.org/D56245 llvm-svn: 350342	2019-01-03 18:42:31 +00:00
Anna Thomas	0785e7307e	[UnrollRuntime] Add DomTree verification under debug mode NFC: This adds the dom tree verification under debug mode at a point just before we start unrolling the loop. This allows us to verify dom tree at a state where it is much smaller and before the unrolling actually happens. This also implies we do not need to run -verify-dom-info everytime to see if the DT is in a valid state when we transform the loop for runtime unrolling. llvm-svn: 350334	2019-01-03 17:44:44 +00:00
Evandro Menezes	0f67746c92	[AArch64] Add new scheduling predicates Add new scheduling predicates to identify the ASIMD loads and stores using the post indexed addressing mode. llvm-svn: 350332	2019-01-03 17:28:09 +00:00
Andrea Di Biagio	b284054b26	[MCA] Improve code comment and reuse an helper function in ResourceManager. NFCI llvm-svn: 350322	2019-01-03 14:47:46 +00:00
Alex Bradbury	2ba76be882	[RISCV][MC] Accept %lo and %pcrel_lo on operands to li This matches GNU assembler behaviour. llvm-svn: 350321	2019-01-03 14:41:41 +00:00
Philip Pfaffe	b39a97c8f6	[NewPM] Port Msan Summary: Keeping msan a function pass requires replacing the module level initialization: That means, don't define a ctor function which calls __msan_init, instead just declare the init function at the first access, and add that to the global ctors list. Changes: - Pull the actual sanitizer and the wrapper pass apart. - Add a newpm msan pass. The function pass inserts calls to runtime library functions, for which it inserts declarations as necessary. - Update tests. Caveats: - There is one test that I dropped, because it specifically tested the definition of the ctor. Reviewers: chandlerc, fedor.sergeev, leonardchan, vitalybuka Subscribers: sdardis, nemanjai, javed.absar, hiraditya, kbarton, bollu, atanasyan, jsji Differential Revision: https://reviews.llvm.org/D55647 llvm-svn: 350305	2019-01-03 13:42:44 +00:00
Simon Pilgrim	c2aadfaaad	[SLPVectorizer] Flag ADD/SUB SSAT/USAT intrinsics trivially vectorizable (PR40123) Enables SLP vectorization for the SSE2 PADDS/PADDUS/PSUBS/PSUBUS style intrinsics llvm-svn: 350300	2019-01-03 12:18:23 +00:00
Diogo N. Sampaio	8786a946d8	[ARM] Add command-line option for SB SB (Speculative Barrier) is only mandatory from 8.5 onwards but is optional from Armv8.0-A. This patch adds a command line option to enable SB, as it was previously only possible to enable by selecting -march=armv8.5-a. This patch also renames FeatureSpecRestrict to FeatureSB. Reviewed By: olista01, LukeCheeseman Differential Revision: https://reviews.llvm.org/D55990 llvm-svn: 350299	2019-01-03 12:09:12 +00:00
Simon Pilgrim	d824f99a6c	[X86] Add ADD/SUB SSAT/USAT vector costs (PR40123) Costs for real SSE2 instructions llvm-svn: 350295	2019-01-03 11:38:42 +00:00
Piotr Sobczak	3abef8f9ea	[AMDGPU] Change section name with metadata access Summary: The commit rL348922 introduced a means to set Metadata section kind for a global variable, if its explicit section name was prefixed with ".AMDGPU.metadata.". This patch changes that prefix to ".AMDGPU.comment.", as "metadata" in the section name might lead to ambiguity with metadata used by AMD PAL runtime. Change-Id: Idd4748800d6fe801441d91595fc21e5a4171e668 Reviewers: kzhuravl Reviewed By: kzhuravl Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D56197 llvm-svn: 350292	2019-01-03 11:22:58 +00:00
Lama Saba	4d752a88e8	Resubmit rL345008 "Split MachinePipeliner code into header and cpp files" The commit caused unclear failures in http://green.lab.llvm.org/green//job/lldb-cmake/ will revert if the error reappears Differential Revision: https://reviews.llvm.org/D56084 llvm-svn: 350290	2019-01-03 10:03:54 +00:00
Markus Lavin	72b9deb21f	[CodeGen] Skip over dbg-instr in twoaddr pass A DBG_VALUE between a two-address instruction and a following COPY would prevent rescheduleMIBelowKill optimization inside TwoAddressInstructionPass. Differential Revision: https://reviews.llvm.org/D55987 llvm-svn: 350289	2019-01-03 08:36:06 +00:00
Martin Storsjo	74e7d26090	[llvm-readobj] [COFF] Print the symbol index for relocations There can be multiple local symbols with the same name (for e.g. comdat sections), and thus the symbol name itself isn't enough to disambiguate symbols. Differential Revision: https://reviews.llvm.org/D56140 llvm-svn: 350288	2019-01-03 08:08:23 +00:00
Kristina Brooks	bbbec9daa4	Don't go over 80 chars in MCStreamer.cpp. NFC. Fixing up style issues around the area to prepare for a larger differential. llvm-svn: 350286	2019-01-03 06:06:38 +00:00
QingShan Zhang	f24ec7bdd0	[Power9] Enable the Out-of-Order scheduling model for P9 hw When switched to the MI scheduler for P9, the hardware is modeled as out of order. However, inside the MI Scheduler algorithm, we still use the in-order scheduling model as the MicroOpBufferSize isn't set. The MI scheduler take it as the hw cannot buffer the op. So, only when all the available instructions issued, the pending instruction could be scheduled. That is not true for our P9 hw in fact. This patch is trying to enable the Out-of-Order scheduling model. The buffer size 44 is picked from the P9 hw spec, and the perf test indicate that, its value won't hurt the cpu2017. With this patch, there are 3 specs improved over 3% and 1 spec deg over 3%. The detail is as follows: x264_r: +6.95% cactuBSSN_r: +6.94% lbm_r: +4.11% xz_r: -3.85% And the GEOMEAN for all the C/C++ spec in spec2017 is about 0.18% improved. Reviewer: Nemanjai Differential Revision: https://reviews.llvm.org/D55810 llvm-svn: 350285	2019-01-03 05:04:18 +00:00
Pete Cooper	697281df42	Teach ObjCARC optimizer about equivalent PHIs when eliminating autoreleaseRV/retainRV pairs OptimizeAutoreleaseRVCall skips optimizing llvm.objc.autoreleaseReturnValue if it sees a user which is llvm.objc.retainAutoreleasedReturnValue, and if they have equivalent arguments (either identical or equivalent PHIs). It then assumes that ObjCARCOpt::OptimizeRetainRVCall will optimize the pair instead. Trouble is, ObjCARCOpt::OptimizeRetainRVCall doesn't know about equivalent PHIs so optimizes in a different way and we are left with an unoptimized llvm.objc.autoreleaseReturnValue. This teaches ObjCARCOpt::OptimizeRetainRVCall to also understand PHI equivalence. rdar://problem/47005143 Reviewed By: ahatanak Differential Revision: https://reviews.llvm.org/D56235 llvm-svn: 350284	2019-01-03 01:38:08 +00:00
Robert Widmann	7882b283cd	[LLVM-C] Expand LLVMRelocMode Summary: Add read[only\|write] PIC relocation models to the C API and teach the TargetMachine API about it. Reviewers: whitequark, deadalnix Reviewed By: whitequark Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56187 llvm-svn: 350279	2019-01-03 00:33:44 +00:00
Craig Topper	df5304d8de	[X86] Add load folding support to the custom isel we do for X86ISD::UMUL/SMUL. The peephole pass isn't always able to fold the load because it can't commute the implicit usage of AL/AX/EAX/RAX. llvm-svn: 350272	2019-01-02 23:24:08 +00:00
Wouter van Oortmerssen	ad72f68501	[WebAssembly] made assembler parse block_type Summary: This was previously ignored and an incorrect value generated. Also fixed Disassembler's handling of block_type. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D56092 llvm-svn: 350270	2019-01-02 23:23:51 +00:00
Xin Tong	33e3b4b9b3	[ThinLTO] Scan all variants of vague symbol for reachability. Summary: Alias can make one (but not all) live, we still need to scan all others if this symbol is reachable from somewhere else. Reviewers: tejohnson, grimar Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D56117 llvm-svn: 350269	2019-01-02 23:18:20 +00:00
Pete Cooper	8d58048024	Fix assert in ObjCARC optimizer when deleting retainBlock of null or undef. The caller to EraseInstruction had this conditional: // ARC calls with null are no-ops. Delete them. if (IsNullOrUndef(Arg)) but the assert inside EraseInstruction only allowed ConstantPointerNull and not undef or bitcasts. This adds support for both of these cases. rdar://problem/47003805 llvm-svn: 350261	2019-01-02 21:00:02 +00:00
Nikita Popov	cc6ef7f153	[BDCE] Remove instructions without demanded bits If an instruction has no demanded bits, remove it directly during BDCE, instead of leaving it for something else to clean up. Differential Revision: https://reviews.llvm.org/D56185 llvm-svn: 350257	2019-01-02 20:02:14 +00:00
Pawel Bylica	119aa8fa5f	Format AggresiveInstCombine.cpp. NFC llvm-svn: 350255	2019-01-02 19:51:46 +00:00
Craig Topper	9d4860ec4e	[X86] Remove X86ISD::INC/DEC. Just select them from X86ISD::ADD/SUB at isel time INC/DEC are pretty much the same as ADD/SUB except that they don't update the C flag. This patch removes the special nodes and just pattern matches from ADD/SUB during isel if the C flag isn't being used. I had to avoid selecting DEC is the result isn't used. This will become a SUB immediate which will turned into a CMP later by optimizeCompareInstr. This lead to the one test change where we use a CMP instead of a DEC for an overflow intrinsic since we only checked the flag. This also exposed a hole in our RMW flag matching use of hasNoCarryFlagUses. Our root node for the match is a store and there's no guarantee that all the flag users have been selected yet. So hasNoCarryFlagUses needs to check copyToReg and machine opcodes, but it also needs to check for the pre-match SETCC, SETCC_CARRY, BRCOND, and CMOV opcodes. Differential Revision: https://reviews.llvm.org/D55975 llvm-svn: 350245	2019-01-02 19:01:05 +00:00
Zachary Turner	ba797b6dae	[MS Demangler] Add a flag for dumping types without tag specifier. Sometimes it's useful to be able to output demangled names without tag specifiers like "struct", "class", etc. This patch adds a flag enabling this. llvm-svn: 350241	2019-01-02 18:33:12 +00:00
Craig Topper	8dd7bd2cd7	[DAGCombiner] After performing the division by constant optimization for a DIV or REM node, replace the users of the corresponding REM or DIV node if it exists. Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced. Improves the test case from PR38217. There may be additional opportunities after this. Differential Revision: https://reviews.llvm.org/D56145 llvm-svn: 350239	2019-01-02 18:19:07 +00:00
Craig Topper	3109f3a4ab	[LegalizeIntegerTypes] When promoting the result of an extract_vector_elt also promote the input type if necessary By also promoting the input type we get a better idea for what scalar type to use. This can provide better results if the result of the extract is sign extended. What was previously happening is that the extract result would be legalized, sometime later the input of the sign extend would be legalized using the result of the extract. Then later the extract input would be legalized forcing a truncate into the input of the sign extend using a replace all uses. This requires DAG combine to combine out the sext/truncate pair. But sometimes we visited the truncate first and messed things up before the sext could be combined. By creating the extract with the correct scalar type when we create legalize the result type, the truncate will be added right away. Then when the sign_extend input is legalized it will create an any_extend of the truncate which can be optimized by getNode to maybe remove the truncate. And then a sign_extend_inreg. Now DAG combine doesn't have to worry about getting rid of the extend. This fixes the regression on X86 in D56156. Differential Revision: https://reviews.llvm.org/D56176 llvm-svn: 350236	2019-01-02 17:58:30 +00:00
Craig Topper	c562fae02b	[DAGCombiner][X86][PowerPC] Teach visitSIGN_EXTEND_INREG to fold (sext_in_reg (aext/sext x)) -> (sext x) when x has more than 1 sign bit and the sext_inreg is from one of them. If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead. The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this. Differential Revision: https://reviews.llvm.org/D56156 llvm-svn: 350235	2019-01-02 17:58:27 +00:00
Wei Mi	ecc89b76cb	[PowerPC] Remove SeenUse check when optimizing conditional branch in PPCPreEmitPeephole pass. PPCPreEmitPeephole will convert a BC to B when the conditional branch is based on a constant CR by CRSET or CRUNSET. This is added in https://reviews.llvm.org/rL343100. When the conditional branch is known to be always taken, all branches will be removed and a new unconditional branch will be inserted. However, when SeenUse is false the original patch will not remove the branches, but still insert the new unconditional branch, update the successors and create inconsistent IR. Compiling the synthetic testcase included can show the problem we run into. The patch simply removes the SeenUse condition when adding branches into InstrsToErase set. Differential Revision: https://reviews.llvm.org/D56041 llvm-svn: 350223	2019-01-02 17:07:23 +00:00
Simon Pilgrim	d8125726d5	[X86] Support SHLD/SHRD masked shift-counts (PR34641) Peek through shift modulo masks while matching double shift patterns. I was hoping to delay this until I could remove the X86 code with generic funnel shift matching (PR40081) but this will do for now. Differential Revision: https://reviews.llvm.org/D56199 llvm-svn: 350222	2019-01-02 17:05:37 +00:00
Hal Finkel	4f2381440d	[BasicAA] Support arbitrary pointer sizes (and fix an overflow bug) Motivated by the discussion in D38499, this patch updates BasicAA to support arbitrary pointer sizes by switching most remaining non-APInt calculations to use APInt. The size of these APInts is set to the maximum pointer size (maximum over all address spaces described by the data layout string). Most of this translation is straightforward, but this patch contains a fix for a bug that revealed itself during this translation process. In order for test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit pointers, the intermediate calculations must be performed using 64-bit integers. This is because, as noted in the patch, when GetLinearExpression decomposes an expression into C1V+C2, and we then multiply this by Scale, and distribute, to get (C1Scale)V + C2Scale, it can be the case that, even through C1V+C2 does not overflow for relevant values of V, (C2Scale) can overflow. If this happens, later logic will draw invalid conclusions from the (base) offset value. Thus, when initially applying the APInt conversion, because the maximum pointer size in this test is 32 bits, it started failing. Suspicious, I created a 64-bit version of this test (included here), and that failed (miscompiled) on trunk for a similar reason (the multiplication can overflow). After fixing this overflow bug, the first test case (at least) in Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was relying on having 64-bit intermediate values to have BasicAA return an accurate result. In order to fix this problem, and because I believe that it is not uncommon to use i64 indexing expressions in 32-bit code (especially portable code using int64_t), it seems reasonable to always use at least 64-bit integers. In this way, we won't regress our analysis capabilities (and there's a command-line option added, so experimenting with this should be easy). As pointed out by Eli during the review, there are other potential overflow conditions that this patch does not address. Fixing those is left to follow-up work. Patch by me with contributions from Michael Ferguson (mferguson@cray.com). Differential Revision: https://reviews.llvm.org/D38662 llvm-svn: 350220	2019-01-02 16:28:09 +00:00
Philip Pfaffe	6bc98ad7e8	Extend Module::getOrInsertGlobal to control the construction of the GlobalVariable Summary: Extend Module::getOrInsertGlobal to accept a callback for creating a new GlobalVariable if necessary instead of calling the GV constructor directly using default arguments. Additionally overload getOrInsertGlobal for the previous default behavior. Reviewers: chandlerc Subscribers: hiraditya, llvm-commits, bollu Differential Revision: https://reviews.llvm.org/D56130 llvm-svn: 350219	2019-01-02 15:41:47 +00:00
Andrea Di Biagio	0682afbaee	[MCA] Minor refactoring of method DefaultResourceStrategy::select. NFCI Common code used by the default resource strategy to select pipeline resources has been moved to an helper function. The new selection logic has been slightly rewritten to get rid of a redundant zero check on the `ReadyMask` value. Before this patch, method select internally called function `PowerOf2Floor` to compute the next ready pipeline resource. However, `PowerOf2Floor` forces an implicit (redundant) zero check on the input value. By construction, `ReadyMask` can never be zero. This patch replaces the call to `PowerOf2Floor` with an equivalent block of code which avoids the redundant zero check. This gives a minor 3-3.5% speedup on a release build. No functional change intended. llvm-svn: 350218	2019-01-02 15:40:52 +00:00
Piotr Sobczak	378131bae0	[AMDGPU] Handle OR as operand of raw load/store Summary: Use isBaseWithConstantOffset() which handles OR as an operand to llvm.amdgcn.raw.buffer.load and llvm.amdgcn.raw.buffer.store. Change-Id: Ifefb9dc5ded8710d333df07ab1900b230e33539a Reviewers: nhaehnle, mareko, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55999 llvm-svn: 350208	2019-01-02 09:47:41 +00:00
Craig Topper	f7cc7e3201	[X86] Remove the separate SMUL8/UMUL8 X86ISD opcodes by merging with SMUL/UMUL. Remove the second result from X86ISD::UMUL. All of these use custom isel so we can pretty easily detect the differences in the custom code in X86ISelDAGToDAG. The ISD opcodes just need to express the desired semantics not the details of how they would be selected by isel. So unifying them lets us remove the special casing from lowering. llvm-svn: 350206	2019-01-02 06:40:11 +00:00
Craig Topper	d4db122483	[X86] Allow LowerSELECT and LowerBRCOND to directly lower i8 UMULO/SMULO. These require a different X86ISD node to be created than i16/i32/i64. I guess no one wanted to add the special code for that except in LowerXALUO. But now LowerXALUO, LowerSELECT, and LowerBRCOND all use a common helper function so they all share the special code. Unfortunately, there are no test changes because we seem to correct the miss in a DAG combine later. I did verify it manually using test cases from xmulo.ll llvm-svn: 350205	2019-01-02 05:46:03 +00:00
Sanjay Patel	654e6aabb9	[InstCombine] canonicalize raw IR rotate patterns to funnel shift The final piece of IR-level analysis to allow this was committed with: rL350188 Using the intrinsics should improve transforms based on cost models like vectorization and inlining. The backend should be prepared too, so we can now canonicalize more sequences of shift/logic to the intrinsics and know that the end result should be equal or better to the original code even if the target does not have an actual rotate instruction. llvm-svn: 350199	2019-01-01 21:51:39 +00:00
Craig Topper	00b390a000	[X86] Factor the core code out of LowerXALUO into a helper function. Use it in LowerBRCOND and LowerSELECT to avoid some duplicated code. This makes it easier to keep the LowerBRCOND and LowerSELECT code in sync with LowerXALUO so they always pick the same operation for overflowing instructions. This is inspired by the helper functions used by ARM and AArch64 for the same purpose. The test change is because LowerSELECT was not in sync with LowerXALUO with regard to INC/DEC for SADDO/SSUBO. llvm-svn: 350198	2019-01-01 19:34:11 +00:00
Robert Widmann	db5b537f1e	[LLVM-C] bool -> LLVMBool llvm-svn: 350197	2019-01-01 19:03:37 +00:00
Robert Widmann	5d1dfa3eb6	[LLVM-C] Add Accessors for Discarding Value Names in the IR Summary: Add accessors so the performance improvement from this setting is accessible to third parties. Reviewers: whitequark, deadalnix Reviewed By: whitequark Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56179 llvm-svn: 350196	2019-01-01 18:56:51 +00:00
Sanjay Patel	738a863648	[x86] move/rename helper for horizontal op codegen; NFC Preliminary commit as suggested in D56011. llvm-svn: 350193	2019-01-01 16:08:36 +00:00
Nikita Popov	bc9986e9ad	Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions" This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771. BDCE currently detects instructions that don't have any demanded bits and replaces their uses with zero. However, if an instruction has multiple uses, then some of the uses may be dead (have no demanded bits) even though the instruction itself is still live. This patch extends DemandedBits/BDCE to detect such uses and replace them with zero. While this will not immediately render any instructions dead, it may lead to simplifications (in the motivating case, by converting a rotate into a simple shift), break dependencies, etc. The implementation tries to strike a balance between analysis power and complexity/memory usage. Originally I wanted to track demanded bits on a per-use level, but ultimately we're only really interested in whether a use is entirely dead or not. I'm using an extra set to track which uses are dead. However, as initially all uses are dead, I'm not storing uses those user is also dead. This case is checked separately instead. The previous attempt to land this lead to miscompiles, because cases where uses were initially dead but were later found to be live during further analysis were not always correctly removed from the DeadUses set. This is fixed now and the added test case demanstrates such an instance. Differential Revision: https://reviews.llvm.org/D55563 llvm-svn: 350188	2019-01-01 10:05:26 +00:00
Ayonam Ray	e00606a1b2	Reversing the commit in revision 350186. Revision causes regression in 4 tests. llvm-svn: 350187	2019-01-01 07:28:55 +00:00
Ayonam Ray	c471bb2e67	Omit range checks from jump tables when lowering switches with unreachable default During the lowering of a switch that would result in the generation of a jump table, a range check is performed before indexing into the jump table, for the switch value being outside the jump table range and a conditional branch is inserted to jump to the default block. In case the default block is unreachable, this conditional jump can be omitted. This patch implements omitting this conditional branch for unreachable defaults. Review Reference: D52002 llvm-svn: 350186	2019-01-01 06:37:50 +00:00
Chen Zheng	4952e668f8	[InstCombine] canonicalize MUL with NEG operand -X * Y --> -(X * Y) X * -Y --> -(X * Y) Differential Revision: https://reviews.llvm.org/D55961 llvm-svn: 350185	2019-01-01 01:09:20 +00:00
Craig Topper	ed3ffae4a4	[SelectionDAG] Add SIGN_EXTEND_VECTOR_INREG support to computeKnownBits. Differential Revision: https://reviews.llvm.org/D56168 llvm-svn: 350179	2018-12-31 19:09:30 +00:00
Craig Topper	bb0873cf46	[X86] Add X86ISD::VSRAI to computeKnownBitsForTargetNode. Differential Revision: https://reviews.llvm.org/D56169 llvm-svn: 350178	2018-12-31 19:09:27 +00:00
Simon Pilgrim	f2b9d10477	Keep tablegen commands in alphabetical order. NFCI. Mentioned on D56167. llvm-svn: 350176	2018-12-31 14:51:53 +00:00
Martin Storsjo	74d93f9b24	[AArch64] Accept "sve" as arch feature in assembler Differential Revision: https://reviews.llvm.org/D56128 llvm-svn: 350174	2018-12-31 10:22:04 +00:00
Alexander Potapenko	cea4f83371	[MSan] Handle llvm.is.constant intrinsic MSan used to report false positives in the case the argument of llvm.is.constant intrinsic was uninitialized. In fact checking this argument is unnecessary, as the intrinsic is only used at compile time, and its value doesn't depend on the value of the argument. llvm-svn: 350173	2018-12-31 09:42:23 +00:00
Craig Topper	802c4979ae	[DAGCombiner] Add missing one use check on the shuffle in the bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1) transform. Found while trying out some other changes so I don't really have a test case. llvm-svn: 350172	2018-12-31 05:40:46 +00:00
Martin Storsjo	2018777836	[AArch64] Implement the .arch_extension directive Differential Revision: https://reviews.llvm.org/D56131 llvm-svn: 350169	2018-12-30 21:06:32 +00:00
Kang Zhang	9d78c60bf4	[PowerPC] Fix machine verify pass error for PATCHPOINT pseudo instruction that bad machine code Summary: For SDAG, we pretend patchpoints aren't special at all until we emit the code for the pseudo. Then the verifier runs and it seems like we have a use of an undefined register (the register will be reserved later, but the verifier doesn't know that). So this patch call setUsesTOCBasePtr before emit the code for the pseudo, so verifier can know X2 is a reserved register. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D56148 llvm-svn: 350165	2018-12-30 15:13:51 +00:00
David Bolvansky	90004149cc	[NFC] Fixed extra semicolon warning -This line, and those below, will be ignored-- M lib/Support/Error.cpp llvm-svn: 350162	2018-12-30 13:18:17 +00:00
Kang Zhang	4aa6453767	[PowerPC] Fix ADDE, SUBE do not know how to promote operator Summary: This patch is created to fix the Bugzilla bug 39815: https://bugs.llvm.org/show_bug.cgi?id=39815 This patch is to support promotion integer result for the instruction ADDE, SUBE. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D56119 llvm-svn: 350161	2018-12-30 07:48:09 +00:00
Craig Topper	a32e353afa	[X86] Don't mark SEXTLOAD from v4i8/v4i16/v8i8 as Custom on pre-sse4.1. This seems to be getting in the way more than its helping. This does mean we stop scalarizing some cases, but I'm not convinced the scalarization was really better. Some of the changes to vsel-cmp-load.ll are a regression but D56156 should fix it. llvm-svn: 350159	2018-12-30 03:05:07 +00:00
Craig Topper	f237ce159e	[X86] Add custom type legalization for SIGN_EXTEND_VECTOR_INREG from 16i16/v32i8 to v4i64 when v4i64 needs splitting. This allows us to sign extend to v4i32 first. And then share that extension to implement the final steps to v4i64 using a pcmpgt and punpckl and punpckh. We already do something similar for SIGN_EXTEND with -x86-experimental-vector-widening-legalization. llvm-svn: 350158	2018-12-30 02:30:34 +00:00
Nemanja Ivanovic	0dad994a10	[PowerPC][NFC] Macro for register set defs for the Asm Parser We have some unfortunate code in the back end that defines a bunch of register sets for the Asm Parser. Every time another class is needed in the parser, we have to add another one of those definitions with explicit lists of registers. This NFC patch simply provides macros to use to condense that code a little bit. Differential revision: https://reviews.llvm.org/D54433 llvm-svn: 350156	2018-12-29 16:13:11 +00:00
Nemanja Ivanovic	0f7715afe1	[PowerPC] Complete the custom legalization of vector int to fp conversion A recent patch has added custom legalization of vector conversions of v2i16 -> v2f64. This just rounds it out for other types where the input vector has an illegal (narrower) type than the result vector. Specifically, this will handle the following conversions: v2i8 -> v2f64 v4i8 -> v4f32 v4i16 -> v4f32 Differential revision: https://reviews.llvm.org/D54663 llvm-svn: 350155	2018-12-29 13:40:48 +00:00
Nemanja Ivanovic	3c7ac649ec	[PowerPC] Fix CR Bit spill pseudo expansion The current CRBIT spill pseudo-op expansion creates a KILL instruction that kills the CRBIT and defines the enclosing CR field. However, this paints a false picture to the register allocator that all bits in the CR field are killed so copies of other bits out of the field become dead and removable. This changes the expansion to preserve the KILL flag on the CRBIT as an implicit use and to treat the CR field as an undef input. Thanks to Hal Finkel for the review and Uli Weigand for implementation input. Differential revision: https://reviews.llvm.org/D55996 llvm-svn: 350153	2018-12-29 11:43:54 +00:00
Simon Atanasyan	a6424e7c4e	[mips] Show an error on attempt to use 64-bit PC-relative relocation The following code requests 64-bit PC-relative relocations unsupported by MIPS ABI. Now it triggers an assertion. It's better to show an error message. ``` foo: .quad bar - foo ``` llvm-svn: 350152	2018-12-29 10:10:02 +00:00
Simon Atanasyan	b243d8d42a	[mips] Show a regular error message on attempt to use one byte relocation llvm-svn: 350151	2018-12-29 10:09:55 +00:00
Max Kazantsev	201534d753	Drop SE cache early because loop parent can change in LoopSimplifyCFG llvm-svn: 350145	2018-12-29 04:26:22 +00:00
Heejin Ahn	4d98dfb67d	[WebAssembly] Fix comments in ExplicitLocals (NFC) llvm-svn: 350144	2018-12-29 02:42:04 +00:00
Richard Trieu	a87b70d1db	Add vtable anchor to classes. llvm-svn: 350142	2018-12-29 02:02:13 +00:00
Craig Topper	0a6cec6f9f	[X86] Don't mark SEXTLOAD v4i8->v4i64 and v8i8->v8i64 as custom under vector widening legalization. This was tricking us into making these operations and then letting them get scalarized later. But I can't prove that the scalarized version is actually better. llvm-svn: 350141	2018-12-29 01:17:11 +00:00
Craig Topper	f814d28eb3	[X86] Directly emit X86ISD::PMULUDQ from the ReplaceNodeResults handling of v2i8/v2i16/v2i32 multiply. Previously we emitted a multiply and some masking that was supposed to matched to PMULUDQ, but the masking could sometimes be removed before we got a chance to match it. So instead just emit the PMULUDQ directly. Remove the DAG combine that was added when the ReplaceNodeResults code was originally added. Add a new DAG combine to avoid regressions in shrink_vmul.ll Some of the shrink_vmul.ll test cases now pick PMULUDQ instead of PMADDWD/PMULLD, but I think this should be an improvement on most CPUs. I think all of this can go away if/when we switch to -x86-experimental-vector-widening-legalization llvm-svn: 350134	2018-12-28 19:19:39 +00:00
Anna Thomas	98743fa77a	[UnrollRuntime] NFC: Add comment and verify LCSSA Added -verify-loop-lcssa to test cases. Updated comments in ConnectProlog. llvm-svn: 350131	2018-12-28 18:52:16 +00:00
Diogo N. Sampaio	9123f82cc4	[AArch64] Add command-line option for SB SB (Speculative Barrier) is only mandatory from 8.5 onwards but is optional from Armv8.0-A. This patch adds a command line option to enable SB, as it was previously only possible to enable by selecting -march=armv8.5-a. This patch also moves to FeatureSB the old FeatureSpecRestrict. Reviewers: pbarrio, olista01, t.p.northover, LukeCheeseman Differential Revision: https://reviews.llvm.org/D55921 llvm-svn: 350126	2018-12-28 17:14:58 +00:00
Hiroshi Inoue	1ea98f040e	[PowerPC] handle ISD:TRUNCATE in BitPermutationSelector This is the last one in a series of patches to support better code generation for bitfield insert. BitPermutationSelector already support ISD::ZERO_EXTEND but not TRUNCATE. This patch adds support for ISD:TRUNCATE in BitPermutationSelector. For example of this test case, struct s64b { int a:4; int b:16; int c:24; }; void bitfieldinsert64b(struct s64b *p, unsigned char v) { p->b = v; } the selection DAG loos like: t14: i32,ch = load<(load 4 from %ir.0)> t0, t2, undef:i64 t18: i32 = and t14, Constant:i32<-1048561> t4: i64,ch = CopyFromReg t0, Register:i64 %1 t22: i64 = AssertZext t4, ValueType:ch:i8 t23: i32 = truncate t22 t16: i32 = shl nuw nsw t23, Constant:i32<4> t19: i32 = or t18, t16 t20: ch = store<(store 4 into %ir.0)> t14:1, t19, t2, undef:i64 By handling truncate in the BitPermutationSelector, we can use information from AssertZext when selecting t19 and skip the mask operation corresponding to t18. So the generated sequences with and without this patch are without this patch rlwinm 5, 5, 0, 28, 11 # corresponding to t18 rlwimi 5, 4, 4, 20, 27 with this patch rlwimi 5, 4, 4, 12, 27 Differential Revision: https://reviews.llvm.org/D49076 llvm-svn: 350118	2018-12-28 08:00:39 +00:00
Max Kazantsev	530ff8f3cc	Temporarily disable term folding in LoopSimplifyCFG, add tests llvm-svn: 350117	2018-12-28 06:22:39 +00:00
Max Kazantsev	80e4b40f3e	[LoopSimplifyCFG] Delete dead blocks in RPO Deletion of dead blocks in arbitrary order may lead to failure of assertion in `DeleteDeadBlock` that requires that we have deleted all predecessors before we can delete the current block. We should instead delete them in RPO order. llvm-svn: 350116	2018-12-28 06:08:51 +00:00
QingShan Zhang	f2d9df61c7	[PowerPC] Remove the implicit use of the register if it is replaced by Imm If we are changing the MI operand from Reg to Imm, we need also handle its implicit use if have. Differential Revision: https://reviews.llvm.org/D56078 llvm-svn: 350115	2018-12-28 03:38:09 +00:00
Zi Xuan Wu	5187444345	[NFC] clang-format functions related to r350113 llvm-svn: 350114	2018-12-28 02:45:17 +00:00
Zi Xuan Wu	a02a3feecf	[PowerPC] Fix assert from machine verify pass that atomic pseudo expanding causes mismatched register class For atomic value operand which less than 4 bytes need to be masked. And the related operation to calculate the newvalue can be done in 32 bit gprc. So just use gprc for mask and value calculation. Differential Revision: https://reviews.llvm.org/D56077 llvm-svn: 350113	2018-12-28 02:12:55 +00:00
Chen Zheng	5ede950df9	[PowerPC] fix register class after converting X-FORM instruction to D-FORM instruction Differential Revision: https://reviews.llvm.org/D55806 llvm-svn: 350111	2018-12-28 01:02:35 +00:00
Chandler Carruth	05b5bd8b85	[CallSite removal] Add and flesh out APIs on the new `CallBase` base class that previously were only available on the `CallSite` wrapper. Summary: This will make migrating code easier and generally seems like a good collection of API improvements. Some of these APIs seem like more consistent / better naming of existing ones. I've retained the old names for migration simplicit and am just adding the new ones in this commit. I'll try to garbage collect these once CallSite is gone. Subscribers: sanjoy, mcrosier, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D55638 llvm-svn: 350109	2018-12-27 23:40:17 +00:00
Craig Topper	787ad92bf6	[X86] Remove check that avoids creating PMULDQ with illegal types. Rely on SplitOpsAndApply to legalize it. Create PMULDQ/PMULUDQ as long as the number of elements is a power of 2. This seems to give some improvements in our ability to use SimplifyDemandedBits. llvm-svn: 350084	2018-12-27 03:37:04 +00:00
Craig Topper	a8f07e51f9	[X86] Factor the core code out of LowerSETCC into a helper that can create CMP/BT/PTEST/KORTEST etc. without making an X86ISD::SETCC node. NFCI Make each of the helper functions only return their comparison node and the condition code. Leave X86ISD::SETCC creation to the LowerSETCC function itself. Looking into whether we can use this code directly in BRCOND and SELECT lowering instead of going through LowerSETCC which creates an X86ISD::SETCC node we need to look through. llvm-svn: 350082	2018-12-27 01:50:40 +00:00
Craig Topper	4f1ef9fc0f	[X86] Merge getBitTestCondition into LowerAndToBT. Don't create X86ISD::SETCC node in the merged function. NFCI Only one of the 3 callers of LowerAndToBT need the SETCC node. Two of them have to look through it to find the operands they really need. Instead create it after the one call that needs it. LowerAndToBT now returns both the BT node and the X86 specific condition code separately. llvm-svn: 350081	2018-12-27 01:50:38 +00:00
Wouter van Oortmerssen	f227621036	[WebAssembly] Added basic support for if/else/end_if in MC layer. Summary: These instructions are currently unused in our backend, but for completeness it is good to support them, so they can be used with the assembler in hand-written code. Tests are very basic, signature support missing much like other blocks. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55973 llvm-svn: 350079	2018-12-26 22:55:26 +00:00
Wouter van Oortmerssen	29c6ce5879	[WebAssembly] Make assembler check for proper nesting of control flow. Summary: It does so using a simple nesting stack, and gives clear errors upon violation. This is unique to wasm, since most CPUs do not have any nested constructs. Had to add an end of file check to the general assembler for this. Note: if/else/end instructions are not currently supported in our tablegen defs, so these tests will be enabled in a follow-up. They already pass the nesting check. Reviewers: dschuff, aheejin Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D55797 llvm-svn: 350078	2018-12-26 22:46:18 +00:00
Heejin Ahn	ce1d50f9d7	[WebAssembly] Delete an unnecessary line in RegStackify `OneUseInst` is set outside of the loop before and `OneUse` does not change throughout the loop, so this line is not necessary. llvm-svn: 350076	2018-12-26 22:33:35 +00:00
Heejin Ahn	99d3946398	[WebAssembly] Fix typos in comments in RegStackify (NFC) llvm-svn: 350075	2018-12-26 22:27:46 +00:00
Craig Topper	c9a6000755	[LoopIdiomRecognize] Add CTTZ support Summary: Existing LIR recognizes CTLZ where shifting input variable right until it is zero. (Shift-Until-Zero idiom) This commit: 1. Augments Shift-Until-Zero idiom to recognize CTTZ where input variable is shifted left. 2. Prepare for BitScan idiom recognition. Patch by Yuanfang Chen (tabloid.adroit) Reviewers: craig.topper, evstupac Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55876 llvm-svn: 350074	2018-12-26 21:59:48 +00:00
Reid Kleckner	c168c6f86f	[codeview] Check if this 'this' type of a method is a pointer Fixes crash reported after r347354 for frontends that don't always emit 'this' pointers for methods. Now we will silently produce debug info that makes functions like this look like static methods, which seems reasonable. llvm-svn: 350073	2018-12-26 21:52:17 +00:00

1 2 3 4 5 ...

119417 Commits