teak-llvm

mirror of https://github.com/Gericom/teak-llvm.git synced 2025-06-23 13:35:42 -04:00

Author	SHA1	Message	Date
Aditya Nandakumar	f75d4f329c	[GISel]: Provide standard interface to observe changes in GISel passes https://reviews.llvm.org/D54980 This provides a standard API across GISel passes to observe and notify passes about changes (insertions/deletions/mutations) to MachineInstrs. This patch also removes the recordInsertion method in MachineIRBuilder and instead provides method to setObserver. Reviewed by: vkeles. llvm-svn: 348406	2018-12-05 20:14:52 +00:00
Jessica Paquette	34b618bf7e	[MachineOutliner][NFC] Don't create outlined sequence from integer mapping Some gardening/refactoring. It's cleaner to copy the instructions into the MachineFunction using the first candidate instead of going to the mapper. Also, by doing this we can remove the Seq member from OutlinedFunction entirely. llvm-svn: 348390	2018-12-05 17:57:33 +00:00
Sanjay Patel	33a448f935	[DAGCombiner] don't try to extract a fraction of a vector binop and crash (PR39893) Because we're potentially peeking through a bitcast in this transform, we need to use overall bitwidths rather than number of elements to determine when it's safe to proceed. Should fix: https://bugs.llvm.org/show_bug.cgi?id=39893 llvm-svn: 348383	2018-12-05 17:10:30 +00:00
Simon Pilgrim	8fdaf5c915	[TargetLowering] Remove ISD::ANY_EXTEND/ANY_EXTEND_VECTOR_INREG opcodes from SimplifyDemandedVectorElts These have no test coverage and the KnownZero flags can't be guaranteed unlike SIGN/ZERO_EXTEND cases. llvm-svn: 348361	2018-12-05 12:20:05 +00:00
Simon Pilgrim	180639afe5	[SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467) This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions. Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code. Differential Revision: https://reviews.llvm.org/D54698 llvm-svn: 348353	2018-12-05 11:12:12 +00:00
Simon Pilgrim	cd8a152b18	Remove superfluous comments. NFCI. As requested in D54698. llvm-svn: 348350	2018-12-05 10:45:44 +00:00
Simon Pilgrim	d24730cdda	[TargetLowering] SimplifyDemandedVectorElts - don't alter DemandedElts mask Fix potential issue with the ISD::INSERT_VECTOR_ELT case tweaking the DemandedElts mask instead of using a local copy - so later uses of the mask use the tweaked version..... Noticed while investigating adding zero/undef folding to SimplifyDemandedVectorElts and the altered DemandedElts mask was causing mismatches. llvm-svn: 348348	2018-12-05 10:37:45 +00:00
Craig Topper	6934202dc0	[MachineLICM][X86][AMDGPU] Fix subtle bug in the updating of PhysRegClobbers in post-RA LICM It looks like MCRegAliasIterator can visit the same physical register twice. When this happens in this code in LICM we end up setting the PhysRegDef and then later in the same loop visit the register again. Now we see that PhysRegDef is set from the earlier iteration so now set PhysRegClobber. This patch splits the loop so we have one that uses the previous value of PhysRegDef to update PhysRegClobber and second loop that updates PhysRegDef. The X86 atomic test is an improvement. I had to add sideeffect to the two shrink wrapping tests to prevent hoisting from occurring. I'm not sure about the AMDGPU tests. It looks like the branch instruction changed at end the of the loops. And in the branch-relaxation test I think there is now "and vcc, exec, -1" instruction that wasn't there before. Differential Revision: https://reviews.llvm.org/D55102 llvm-svn: 348330	2018-12-05 03:41:26 +00:00
Amara Emerson	814a6794ba	[SelectionDAG] Split very large token factors for loads into 64k chunks. There's a 64k limit on the number of SDNode operands, and some very large functions with 64k or more loads can cause crashes due to this limit being hit when a TokenFactor with this many operands is created. To fix this, create sub-tokenfactors if we've exceeded the limit. No test case as it requires a very large function. rdar://45196621 Differential Revision: https://reviews.llvm.org/D55073 llvm-svn: 348324	2018-12-05 00:41:30 +00:00
Nirav Dave	ce26c27b2a	[SelectionDAG] Redefine isGAPlusOffset in terms of unwrapAddress. NFCI. llvm-svn: 348288	2018-12-04 17:59:43 +00:00
Matt Arsenault	43153024ab	MIR: Add method to stop after specific runs of passes Currently if you use -{start,stop}-{before,after}, it picks the first instance with the matching pass name. If you run the same pass multiple times, there's no way to distinguish them. Allow specifying a run index wih ,N to specify which you mean. llvm-svn: 348285	2018-12-04 17:45:12 +00:00
Simon Pilgrim	0add090e24	[TargetLowering] expandFP_TO_UINT - avoid FPE due to out of range conversion (PR17686) PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction. The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment. The X87 cases don't need the strict flag but generates much nicer code with it.... Differential Revision: https://reviews.llvm.org/D53794 llvm-svn: 348251	2018-12-04 11:21:30 +00:00
Simon Pilgrim	666261cdc8	[TargetLowering] Add SimplifyDemandedVectorElts support to EXTEND opcodes Add support for ISD::_EXTEND and ISD::_EXTEND_VECTOR_INREG opcodes. The extra broadcast in trunc-subvector.ll will be fixed in an upcoming patch. llvm-svn: 348246	2018-12-04 10:41:06 +00:00
Sanjay Patel	d24f63477d	[DAGCombiner] narrow truncated vector binops when legal This is the smallest vector enhancement I could find to D54640. Here, we're allowing narrowing to only legal vector ops because we'll see regressions without that. All of the test diffs are wins from what I can tell. With AVX/AVX512, we can shrink ymm/zmm ops to xmm. x86 vector multiplies are the problem case that we're avoiding due to the patchwork ISA, and it's not clear to me if we can dance around those regressions using TLI hooks or if we need preliminary patches to plug those holes. Differential Revision: https://reviews.llvm.org/D55126 llvm-svn: 348195	2018-12-03 21:57:35 +00:00
Craig Topper	e35b01f8ea	[X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8. Summary: Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction. The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54836 llvm-svn: 348158	2018-12-03 18:26:24 +00:00
Sanjay Patel	b205606d3e	[SelectionDAG] fold constant with undef vector per element This makes the SDAG behavior consistent with the way we do this in IR. It's possible that we were getting the wrong answer before. For example, 'xor undef, undef --> 0' but 'xor undef, C' --> undef. But the most practical improvement is likely as shown in the tests here - for FP, we were overconstraining undef lanes to NaN, and that can prevent vector simplifications/narrowing (see D51553). llvm-svn: 348090	2018-12-02 13:48:42 +00:00
Sanjay Patel	2daceedf92	[DAGCombiner] guard against an oversized shift crash This change prevents the crash noted in the post-commit comments for rL347478 : http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181119/605166.html We can't guarantee that an oversized shift amount is folded away, so we have to check for it. Note that I committed an incomplete fix for that crash with: rL347502 But as discussed here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181126/605679.html ...we have to try harder. So I'm not sure how to expose the bug now (and apparently no fuzzers have found a way yet either). On the plus side, we have discovered that we're missing real optimizations by not simplifying nodes sooner, so the earlier fix still has value, and there's likely more value in extending that so we can simplify more opcodes and simplify when doing RAUW and/or putting nodes on the combiner worklist. Differential Revision: https://reviews.llvm.org/D54954 llvm-svn: 348089	2018-12-02 13:33:56 +00:00
Simon Pilgrim	e017ed3245	[SelectionDAG] Improve SimplifyDemandedBits to SimplifyDemandedVectorElts simplification D52935 introduced the ability for SimplifyDemandedBits to call SimplifyDemandedVectorElts through BITCASTs if the demanded bit mask entirely covered the sub element. This patch relaxes this to demanding an element if we need any bit from it. Differential Revision: https://reviews.llvm.org/D54761 llvm-svn: 348073	2018-12-01 12:08:55 +00:00
Nicolai Haehnle	a9cc92c247	AMDGPU: Fix various issues around the VirtReg2Value mapping Summary: The VirtReg2Value mapping is crucial for getting consistently reliable divergence information into the SelectionDAG. This patch fixes a bunch of issues that lead to incorrect divergence info and introduces tight assertions to ensure we don't regress: 1. VirtReg2Value is generated lazily; there were some cases where a lookup was performed before all relevant virtual registers were created, leading to an out-of-sync mapping. Those cases were: - Complex code to lower formal arguments that generated CopyFromReg nodes from live-in registers (fixed by never querying the mapping for live-in registers). - Code that generates CopyToReg for formal arguments that are used outside the entry basic block (fixed by never querying the mapping for Register nodes, which don't need the divergence info anyway). 2. For complex values that are lowered to a sequence of registers, all registers must be reflected in the VirtReg2Value mapping. I am not adding any new tests, since I'm not actually aware of any bugs that these problems are causing with trunk as-is. However, I recently added a test case (in r346423) which fails when D53283 is applied without this change. Also, the new assertions should provide most of the effective test coverage. There is one test change in sdwa-peephole.ll. The underlying issue is that since the divergence info is now correct, the DAGISel will select V_OR_B32 directly instead of S_OR_B32. This leads to an extra COPY which affects the behavior of MachineLICM in a way that ends up with the S_MOV_B32 with the constant in a different basic block than the V_OR_B32, which is presumably what defeats the peephole. Reviewers: alex-t, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54340 llvm-svn: 348049	2018-11-30 22:55:29 +00:00
Sanjay Patel	1901a12e76	[SelectionDAG] fold FP binops with 2 undef operands to undef llvm-svn: 348016	2018-11-30 18:38:52 +00:00
Yonghong Song	f487334622	Revert "[BTF] Add BTF DebugInfo" This reverts commit 9c6b970db8bc63b28ce58a129bb1580a6a3c6caf. llvm-svn: 348004	2018-11-30 16:54:43 +00:00
Yonghong Song	81b77e9159	[BTF] Add BTF DebugInfo This patch adds BPF Debug Format (BTF) as a standalone LLVM debuginfo. The BTF related sections are directly generated from IR. The BTF debuginfo is generated only when the compilation target is BPF. What is BTF? ============ First, the BPF is a linux kernel virtual machine and widely used for tracing, networking and security. https://www.kernel.org/doc/Documentation/networking/filter.txt https://cilium.readthedocs.io/en/v1.2/bpf/ BTF is the debug info format for BPF, introduced in the below linux patch `69b693f0ae (diff-06fb1c8825f653d7e539058b72c83332)` in the patch set mentioned in the below lwn article. https://lwn.net/Articles/752047/ The BTF format is specified in the above github commit. In summary, its layout looks like struct btf_header type subsection (a list of types) string subsection (a list of strings) With such information, the kernel and the user space is able to pretty print a particular bpf map key/value. One possible example below: Withtout BTF: key: [ 0x01, 0x01, 0x00, 0x00 ] With BTF: key: struct t { a : 1; b : 1; c : 0} where struct is defined as struct t { char a; char b; short c; }; How BTF is generated? ===================== Currently, the BTF is generated through pahole. https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=68645f7facc2eb69d0aeb2dd7d2f0cac0feb4d69 and available in pahole v1.12 https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=4a21c5c8db0fcd2a279d067ecfb731596de822d4 Basically, the bpf program needs to be compiled with -g with dwarf sections generated. The pahole is enhanced such that a .BTF section can be generated based on dwarf. This format of the .BTF section matches the format expected by the kernel, so a bpf loader can just take the .BTF section and load it into the kernel. `8a138aed4a` The .BTF section layout is also specified in this patch: with file include/llvm/BinaryFormat/BTF.h. What use cases this patch tries to address? =========================================== Currently, only the bpf instruction stream is required to pass to the kernel. The kernel verifies it, jits it if configured to do so, attaches it to a particular kernel attachment point, and later executes when a particular event happens. This patch tries to expand BTF to support two more use cases below: (1). BPF supports subroutine calls. During performance analysis, it would be good to differentiate which call is hot instead of just providing a virtual address. This would require to pass a unique identifier for each subroutine to the kernel, the subroutine name is a natual choice. (2). If a particular jitted instruction is hot, we want user to know which source line this jitted instruction belongs to. This would require the source information is available to various profiling tools. Note that in a single ELF file, . there may be multiple loadable bpf programs, . for a particular to-be-loaded bpf instruction stream, its instructions may come from multiple PROGBITS sections, the bpf loader needs to merge them together to a single consecutive insn stream before loading to the kernel. For example: section .text: subroutines funcFoo section _progA: calling funcFoo section _progB: calling funcFoo The bpf loader could construct two loadable bpf instruction streams and load them into the kernel: . _progA funcFoo . _progB funcFoo So per ELF section function offset and instruction offset will need to be adjusted before passing to the kernel, and the kernel essentially expect only one code section regardless of how many in the ELF file. What do we propose and Why? =========================== To support the above two use cases, we propose to add an additional section, .BTF.ext, to the ELF file which is the input of the bpf loader. A different section is preferred since loader may need to manipulate it before loading part of its data to the kernel. The .BTF.ext section has a similar header to the .BTF section and it contains two subsections for func_info and line_info. . the func_info maps the func insn byte offset to a func type in the .BTF type subsection. . the line_info maps the insn byte offset to a line info. . both func_info and line_info subsections are organized by ELF PROGBITS AX sections. pahole is not a good place to implement .BTF.ext as pahole is mostly for structure hole information and more importantly, we want to pass the actual code to the kernel. . bpf program typically is small so storage overhead should be small. . in bpf land, it is totally possible that an application loads the bpf program into the kernel and then that application quits, so holding debug info by the user space application is not practical as you may not even know who loads this bpf program. . having source codes directly kept by kernel would ease deployment since the original source code does not need ship on every hosts and kernel-devel package does not need to be deployed even if kernel headers are used. LLVM is a good place to implement. . The only reliable time to get the source code is during compilation time. This will result in both more accurate information and easier deployment as stated in the above. . Another consideration is for JIT. The project like bcc (https://github.com/iovisor/bcc) use MCJIT to compile a C program into bpf insns and load them to the kernel. The llvm generated BTF sections will be readily available for such cases as well. Design and implementation of emiting .BTF/.BTF.ext sections =========================================================== The BTF debuginfo format is defined. Both .BTF and .BTF.ext sections are generated directly from IR when both "-target bpf" and "-g" are specified. Note that dwarf sections are still generated as dwarf is used by user space tools like llvm-objdump etc. for BPF target. This patch also contains tests to verify generated .BTF and .BTF.ext sections for all supported types, func_info and line_info subsections. The patch is also tested against linux kernel bpf sample tests and selftests. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D53736 llvm-svn: 347999	2018-11-30 16:22:59 +00:00
Than McIntosh	0e0a8a3fee	[CodeGen] Prefer static frame index for STATEPOINT liveness args Summary: If a given liveness arg of STATEPOINT is at a fixed frame index (e.g. a function argument passed on stack), prefer to use this fixed location even the address is also in a register. If we use the register it will generate a spill, which is not necessary since the fixed frame index can be directly recorded in the stack map. Patch by Cherry Zhang <cherryyz@google.com>. Reviewers: thanm, niravd, reames Reviewed By: reames Subscribers: cherryyz, reames, anna, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D53889 llvm-svn: 347998	2018-11-30 16:22:41 +00:00
Nicolai Haehnle	445b0b6260	TableGen/ISel: Allow PatFrag predicate code to access captured operands Summary: This simplifies writing predicates for pattern fragments that are automatically re-associated or commuted. For example, a followup patch adds patterns for fragments of the form (add (shl $x, $y), $z) to the AMDGPU backend. Such patterns are automatically commuted to (add $z, (shl $x, $y)), which makes it basically impossible to refer to $x, $y, and $z generically in the PredicateCode. With this change, the PredicateCode can refer to $x, $y, and $z simply as `Operands[i]`. Test confirmed that there are no changes to any of the generated files when building all (non-experimental) targets. Change-Id: I61c00ace7eed42c1d4edc4c5351174b56b77a79c Reviewers: arsenm, rampitec, RKSimon, craig.topper, hfinkel, uweigand Subscribers: wdng, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D51994 llvm-svn: 347992	2018-11-30 14:15:13 +00:00
Alex Bradbury	fca95cfee9	[SelectionDAG] Support result type promotion for FLT_ROUNDS_ For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the result of ISD::FLT_ROUNDS_. Differential Revision: https://reviews.llvm.org/D53820 llvm-svn: 347986	2018-11-30 13:18:33 +00:00
Alex Bradbury	bd24c7b045	[SelectionDAG] Support promotion of PREFETCH operands For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the operands of ISD::PREFETCH. Differential Revision: https://reviews.llvm.org/D53281 llvm-svn: 347980	2018-11-30 10:06:31 +00:00
Alex Bradbury	36e0fd1d39	[SelectionDAG] Support promotion of FRAMEADDR/RETURNADDR operands For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the operand. Differential Revision: https://reviews.llvm.org/D53279 llvm-svn: 347978	2018-11-30 10:02:06 +00:00
Alex Bradbury	e0e62e97df	[TargetLowering][RISCV] Introduce isSExtCheaperThanZExt hook and implement for RISC-V DAGTypeLegalizer::PromoteSetCCOperands currently prefers to zero-extend operands when it is able to do so. For some targets this is more expensive than a sign-extension, which is also a valid choice. Introduce the isSExtCheaperThanZExt hook and use it in the new SExtOrZExtPromotedInteger helper. On RISC-V, we prefer sign-extension for FromTy == MVT::i32 and ToTy == MVT::i64, as it can be performed using a single instruction. Differential Revision: https://reviews.llvm.org/D52978 llvm-svn: 347977	2018-11-30 09:56:54 +00:00
Hsiangkai Wang	957578ddf7	[CodeGen] Fix bugs in BranchFolderPass when debug labels are generated. Skip DBG_VALUE and DBG_LABEL in branch folding algorithms. The bug is reported in https://bugs.chromium.org/p/chromium/issues/detail?id=898160. Differential Revision: https://reviews.llvm.org/D54199 llvm-svn: 347964	2018-11-30 08:07:29 +00:00
Hsiangkai Wang	d72f6f133a	[NFC] Refine doxygen format. Differential Revision: https://reviews.llvm.org/D54568 llvm-svn: 347963	2018-11-30 08:07:24 +00:00
Sanjay Patel	8d27144251	[DAGCombiner] narrow truncated binops The motivating case for this is shown in: https://bugs.llvm.org/show_bug.cgi?id=32023 and the corresponding rot16.ll regression tests. Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc sequences that don't get folded in IR. As the TODO comments suggest, there will be regressions if we extend this (for x86, we mostly seem to be missing LEA opportunities, but there are likely vector folds missing too). I think those should be considered existing bugs because this is the same transform that we do as an IR canonicalization in instcombine. We just need more tests to make those visible independent of this patch. Differential Revision: https://reviews.llvm.org/D54640 llvm-svn: 347917	2018-11-29 20:58:26 +00:00
Alex Bradbury	66d9a752b9	[RISCV] Implement codegen for cmpxchg on RV32IA Utilise a similar ('late') lowering strategy to D47882. The changes to AtomicExpandPass allow this strategy to be utilised by other targets which implement shouldExpandAtomicCmpXchgInIR. All cmpxchg are lowered as 'strong' currently and failure ordering is ignored. This is conservative but correct. Differential Revision: https://reviews.llvm.org/D48131 llvm-svn: 347914	2018-11-29 20:43:42 +00:00
Francis Visoiu Mistrih	0b8dd4488e	[MachineScheduler] Order FI-based memops based on stack direction It makes more sense to order FI-based memops in descending order when the stack goes down. This allows offsets to stay "consecutive" and allow easier pattern matching. llvm-svn: 347906	2018-11-29 20:03:19 +00:00
Craig Topper	129d529ab3	[SelectionDAG][AArch64][X86] Move legalization of vector MULHS/MULHU from LegalizeDAG to LegalizeVectorOps I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG. I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse. Differential Revision: https://reviews.llvm.org/D54276 llvm-svn: 347902	2018-11-29 19:36:17 +00:00
Petr Pavlu	6bb80512db	[GlobalISel] Fix insertion of stack-protector epilogue * Tell the StackProtector pass to generate the epilogue instrumentation when GlobalISel is enabled because GISel currently does not implement the same deferred epilogue insertion as SelectionDAG. * Update StackProtector::InsertStackProtectors() to find a stack guard slot by searching for the llvm.stackprotector intrinsic when the prologue was not created by StackProtector itself but the pass still needs to generate the epilogue instrumentation. This fixes a problem when the pass would abort because the stack guard AllocInst pointer was null when generating the epilogue -- test CodeGen/AArch64/GlobalISel/arm64-irtranslator-stackprotect.ll. Differential Revision: https://reviews.llvm.org/D54518 llvm-svn: 347862	2018-11-29 13:22:53 +00:00
Petr Pavlu	e6406d568c	[GlobalISel] Make EnableGlobalISel always set when GISel is enabled Change meaning of TargetOptions::EnableGlobalISel. The flag was previously set only when a target switched on GlobalISel but it is now always set when the GlobalISel pipeline is enabled. This makes the flag consistent with TargetOptions::EnableFastISel and allows its use in other parts of the compiler to determine when GlobalISel is enabled. The EnableGlobalISel flag had previouly only one use in TargetPassConfig::isGlobalISelAbortEnabled(). The method used its value to determine if GlobalISel was enabled by a target and returned false in such a case. To preserve the current behaviour, a new flag TargetOptions::GlobalISelAbort is introduced to separately record the abort behaviour. Differential Revision: https://reviews.llvm.org/D54518 llvm-svn: 347861	2018-11-29 12:56:32 +00:00
Serguei Katkov	2673f1783e	[CGP] Improve compile time for complex addressing mode This is a fix for PR39625 with improvement the compile time by reducing the number of intermediate Phi nodes created. Reviewers: john.brawn, reames Reviewed By: john.brawn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54932 llvm-svn: 347839	2018-11-29 06:45:18 +00:00
Francis Visoiu Mistrih	879087ce5b	[MachineScheduler] Add support for clustering mem ops with FI base operands Before this patch, the following stores in `merge_fail` would fail to be merged, while they would get merged in `merge_ok`: ``` void use(unsigned long long ); void merge_fail(unsigned key, unsigned index) { unsigned long long args[8]; args[0] = key; args[1] = index; use(args); } void merge_ok(unsigned long long dst, unsigned a, unsigned b) { dst[0] = a; dst[1] = b; } ``` The reason is that `getMemOpBaseImmOfs` would return false for FI base operands. This adds support for this. Differential Revision: https://reviews.llvm.org/D54847 llvm-svn: 347747	2018-11-28 12:00:28 +00:00
Francis Visoiu Mistrih	d7eebd6d83	[CodeGen][NFC] Make `TII::getMemOpBaseImmOfs` return a base operand Currently, instructions doing memory accesses through a base operand that is not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`. This means that functions such as `TII::shouldClusterMemOps` will bail out on instructions using an FI as a base instead of a register. The goal of this patch is to refactor all this to return a base operand instead of a base register. Then in a separate patch, I will add FI support to the mem op clustering in the MachineScheduler. Differential Revision: https://reviews.llvm.org/D54846 llvm-svn: 347746	2018-11-28 12:00:20 +00:00
Simon Atanasyan	af860d44fe	[DebugInfo] Rename EmitDebugThreadLocal back to EmitDebugValue. NFC This reverts r294500. DwarfCompileUnit::addAddressExpr uses DIEExpr for PCOffset. In that case the expression is unrelated to thread locals and so emitting a value of the DIEExpr does not have to always mean emit-debug-thread-local. llvm-svn: 347744	2018-11-28 11:48:07 +00:00
Craig Topper	b955bf382c	[LegalizeVectorTypes][X86][ARM][AArch64][PowerPC] Don't use SplitVecOp_TruncateHelper for FP_TO_SINT/UINT. SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512. This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral. Differential Revision: https://reviews.llvm.org/D54906 llvm-svn: 347593	2018-11-26 21:12:39 +00:00
Craig Topper	923f463ef2	[SelectionDAG] Teach BaseIndexOffset::match to unwrap the base after looking through an add/or We might find a target specific node that needs to be unwrapped after we look through an add/or. Otherwise we get inconsistent results if one pointer is just X86WrapperRIP and the other is (add X86WrapperRIP, C) Differential Revision: https://reviews.llvm.org/D54818 llvm-svn: 347591	2018-11-26 20:16:33 +00:00
Than McIntosh	30c804bbb1	[CodeGen] Support custom format of stack maps Summary: Add a hook to the GCMetadataPrinter for emitting stack maps in custom format. The hook will be called at stack map generation time. The default stack map format is used if there is no hook. For this to be useful a few data structures and accessors are exposed from the StackMaps class, so the custom printer can access the stack map data. This patch authored by Cherry Zhang <cherryyz@google.com>. Reviewers: thanm, apilipenko, reames Reviewed By: reames Subscribers: reames, apilipenko, nemanjai, javed.absar, kbarton, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D53892 llvm-svn: 347584	2018-11-26 18:43:48 +00:00
Erich Keane	e381120477	Delete dead code introduced in r347354. ParentTy is never used other than an assignment, and since it is a pointer, there is no side effect. Some versions of GCC notice and warn on this. Change-Id: I37dc1a18c7b58040419afb803621de13d8904a8f llvm-svn: 347581	2018-11-26 17:51:27 +00:00
Than McIntosh	b9e4852c92	[CodeGen] Take SPAdj into account for STATEPOINT liveness args Summary: STATEPOINT records its args' locations on stack relative to SP. If the SP is changed, take that into account. This patch authored by Cherry Zhang <cherryyz@google.com>. Reviewers: thanm, reames Reviewed By: reames Subscribers: reames, llvm-commits Differential Revision: https://reviews.llvm.org/D53603 llvm-svn: 347569	2018-11-26 16:16:09 +00:00
Aaron Ballman	0442a12a80	Remove an unnecessary file; NFC. This source file has not been needed since r346522 and was triggering diagnostics in MSVC about an object file which exports no public symbols (LNK4221). llvm-svn: 347565	2018-11-26 15:54:36 +00:00
Diana Picus	0528e2cfb3	[ARM GlobalISel] Support G_CTLZ and G_CTLZ_ZERO_UNDEF We can now select CLZ via the TableGen'erated code, so support G_CTLZ and G_CTLZ_ZERO_UNDEF throughout the pipeline for types <= s32. Legalizer: If the CLZ instruction is available, use it for both G_CTLZ and G_CTLZ_ZERO_UNDEF. Otherwise, use a libcall for G_CTLZ_ZERO_UNDEF and lower G_CTLZ in terms of it. In order to achieve this we need to add support to the LegalizerHelper for the legalization of G_CTLZ_ZERO_UNDEF for s32 as a libcall (__clzsi2). We also need to allow lowering of G_CTLZ in terms of G_CTLZ_ZERO_UNDEF if that is supported as a libcall, as opposed to just if it is Legal or Custom. Due to a minor refactoring of the helper function in charge of this, we will also allow the same behaviour for G_CTTZ and G_CTPOP. This is not going to be a problem in practice since we don't yet have support for treating G_CTTZ and G_CTPOP as libcalls (not even in DAGISel). Reg bank select: Map G_CTLZ to GPR. G_CTLZ_ZERO_UNDEF should not make it to this point. Instruction select: Nothing to do. llvm-svn: 347545	2018-11-26 11:07:02 +00:00
Diana Picus	30887bf6c3	Fix typo in comment. NFC llvm-svn: 347544	2018-11-26 11:06:53 +00:00
Sanjay Patel	7336e7c67a	[x86] limit transform for select-of-fp-constants This should likely be adjusted to limit this transform further, but these diffs should be clear wins. If we have blendv/conditional move, then we should assume those are cheap ops. The loads become independent of the compare, so those can be speculated before we need to use the values in the blend/mov. llvm-svn: 347526	2018-11-25 17:27:02 +00:00
Sanjay Patel	04435677d0	[SelectionDAG] move constant or splat functions to common location rL347502 moved the null sibling, so we should group all of these together. I'm not sure why these aren't methods of the SDValue class itself, but that's another patch if that's possible. llvm-svn: 347523	2018-11-25 16:09:32 +00:00
Sanjay Patel	7e119c0400	[DAG] consolidate shift simplifications ...and use them to avoid creating obviously undef values as discussed in the post-commit thread for r347478. The diffs in vector div/rem show that we were missing real optimizations by creating bogus shift nodes. llvm-svn: 347502	2018-11-23 20:05:12 +00:00
Luke Cheeseman	6db3a6a4a7	Revert r347490 as it breaks address sanitizer builds llvm-svn: 347499	2018-11-23 17:13:06 +00:00
Luke Cheeseman	d6dbd64104	Revert r343341 - Cannot reproduce the build failure locally and the build logs have been deleted. llvm-svn: 347490	2018-11-23 11:01:47 +00:00
Craig Topper	0ec17884de	[LegalizeVectorTypes] Don't use SplitVecOp_TruncateHelper if we're heading towards scalarizing the type. This code takes a truncate, fp_to_int, or int_to_fp with a legal result type and an input type that needs to be split and enlarges the elements in the result type before doing the split. Then inserts a follow up truncate or fp_round after concatenating the two halves back together. But if the input type of the original op is being split on its way to ultimately being scalarized we're just going to end up building a vector from scalars and then truncating or rounding it in the vector register. Seems kind of silly to enlarge the result element type of the operation only to end up with scalar code and then building a vector with large elements only to make the elements smaller again in the vector register. Seems better to just try to get away producing smaller result types in the scalarized code. The X86 test case that changes is a pretty contrived test case that exists because of a bug we used to have in our AVG matching code. I think the code is better now, but its not realistic anyway. llvm-svn: 347482	2018-11-23 02:32:13 +00:00
Craig Topper	b239763384	[LegalizeVectorTypes] Have SplitVecOp_TruncateHelper fall back to SplitVecOp_UnaryOp if splitting the output type would be a legal type. SplitVecOp_TruncateHelper tries to introduce a multilevel truncate to avoid scalarization. But if splitting the result type would still be a legal type we don't need to do that. The comment block at the top of the function implied that this was already implemented. I looked back through the history and it doesn't look to have ever been checked. llvm-svn: 347479	2018-11-22 22:56:52 +00:00
Sanjay Patel	3e80019275	[DAGCombiner] form 'not' ops ahead of shifts (PR39657) We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'), but that would not matter without this patch because DAGCombiner was reversing that transform. I think we need this transform in the backend regardless of what happens in IR to catch cases where the shift-xor is formed late from GEP or other ops. https://rise4fun.com/Alive/NC1 Name: shl Pre: (-1 << C2) == C1 %shl = shl i8 %x, C2 %r = xor i8 %shl, C1 => %not = xor i8 %x, -1 %r = shl i8 %not, C2 Name: shr Pre: (-1 u>> C2) == C1 %sh = lshr i8 %x, C2 %r = xor i8 %sh, C1 => %not = xor i8 %x, -1 %r = lshr i8 %not, C2 https://bugs.llvm.org/show_bug.cgi?id=39657 llvm-svn: 347478	2018-11-22 19:24:10 +00:00
Reid Kleckner	86ada54e4c	[mingw] Use unmangled name after the $ in the section name GCC does it this way, and we have to be consistent. This includes stdcall and fastcall functions with suffixes. I confirmed that a fastcall function named "foo" ends up in ".text$foo", not ".text$@foo@8". Based on a patch by Andrew Yohn! Fixes PR39218. Differential Revision: https://reviews.llvm.org/D54762 llvm-svn: 347431	2018-11-21 22:01:10 +00:00
Sanjay Patel	20935e0ab5	[DAGCombiner] refactor select-of-FP-constants transform This transform needs to be limited. We are converting to a constant pool load very early, and we are turning loads that are independent of the select condition (and therefore speculatable) into a dependent non-speculatable load. We may also be transferring a condition code from an FP register to integer to create that dependent load. llvm-svn: 347424	2018-11-21 20:54:47 +00:00
Sanjay Patel	1c74747478	[DAGCombiner] reduce code duplication; NFC llvm-svn: 347410	2018-11-21 20:00:32 +00:00
Simon Pilgrim	5448339889	[TargetLowering] SimplifyDemandedBits - only reduce known bits for integer constants Avoids fuzzing crash found by Mikael Holmén. llvm-svn: 347393	2018-11-21 14:26:19 +00:00
Nikita Popov	c5b152bdef	Test commit: Delete trailing space in comment llvm-svn: 347385	2018-11-21 10:57:22 +00:00
Sanjay Patel	357053f289	[DAGCombiner] look through bitcasts when trying to narrow vector binops This is another step in vector narrowing - a follow-up to D53784 (and hoping to eventually squash potential regressions seen in D51553). The x86 test diffs are wins, but the AArch64 diff is probably not. That problem already exists independent of this patch (see PR39722), but it went unnoticed in the previous patch because there were no regression tests that showed the possibility. The x86 diff in i64-mem-copy.ll is close. Given the frequency throttling concerns with using wider vector ops, an extra extract to reduce vector width is the right trade-off at this level of codegen. Differential Revision: https://reviews.llvm.org/D54392 llvm-svn: 347356	2018-11-20 22:26:35 +00:00
Zachary Turner	c68f895702	[CodeView] Add support for ref-qualified member functions. When you have a member function with a ref-qualifier, for example: struct Foo { void Func() &; void Func2() &&; }; clang-cl was not emitting this information. Doing so is a bit awkward, because it's not a property of the LF_MFUNCTION type, which is what you'd expect. Instead, it's a property of the this pointer which is actually an LF_POINTER. This record has an attributes bitmask on it, and our handling of this bitmask was all wrong. We had some parts of the bitmask defined incorrectly, but importantly for this bug, we didn't know about these extra 2 bits that represent the ref qualifier at all. Differential Revision: https://reviews.llvm.org/D54667 llvm-svn: 347354	2018-11-20 22:13:43 +00:00
Zachary Turner	3826566c04	[CodeView] Mark this pointers as const. This is for compatibility with MSVC, which also marks this pointers as being const-qualified. Fixes llvm.org/pr36526 Differential Revision: https://reviews.llvm.org/D54736 llvm-svn: 347353	2018-11-20 22:13:23 +00:00
Simon Pilgrim	3735105961	[DAGCombine] Add calls to SimplifyDemandedVectorElts from visitINSERT_SUBVECTOR (PR37989) This uncovered an off-by-one typo in SimplifyDemandedVectorElts's INSERT_SUBVECTOR handling as its bounds check was bailing on safe indices. llvm-svn: 347313	2018-11-20 15:23:50 +00:00
Simon Pilgrim	b356d0463e	[TargetLowering] Improve SimplifyDemandedVectorElts/SimplifyDemandedBits support For bitcast nodes from larger element types, add the ability for SimplifyDemandedVectorElts to call SimplifyDemandedBits by merging the elts mask to a bits mask. I've raised https://bugs.llvm.org/show_bug.cgi?id=39689 to deal with the few places where SimplifyDemandedBits's lack of vector handling is a problem. Differential Revision: https://reviews.llvm.org/D54679 llvm-svn: 347301	2018-11-20 12:02:16 +00:00
Craig Topper	4954c66430	[SelectionDAG] Compute known bits and num sign bits for live out vector registers. Use it to add AssertZExt/AssertSExt in the live in basic blocks Summary: We already support this for scalars, but it was explicitly disabled for vectors. In the updated test cases this allows us to see the upper bits are zero to use less multiply instructions to emulate a 64 bit multiply. This should help with this ispc issue that a coworker pointed me to https://github.com/ispc/ispc/issues/1362 Reviewers: spatel, efriedma, RKSimon, arsenm Reviewed By: spatel Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D54725 llvm-svn: 347287	2018-11-20 04:30:26 +00:00
Sanjay Patel	a36c444471	[DAGCombiner] reduce code duplication in visitXOR; NFC llvm-svn: 347278	2018-11-20 00:51:45 +00:00
Stanislav Mekhanoshin	54ebfe8aee	Implement computeKnownBits for scalar_to_vector Differential Revision: https://reviews.llvm.org/D54728 llvm-svn: 347274	2018-11-19 23:34:07 +00:00
Simon Pilgrim	740122fb8c	[DAGCombine] SimplifyNodeWithTwoResults - ensure same legalization for LO/HI operands (PR21207) Consistently use (!LegalOperations \|\| isOperationLegalOrCustom) for all node pairs. Differential Revision: https://reviews.llvm.org/D53478 llvm-svn: 347255	2018-11-19 19:37:59 +00:00
Simon Pilgrim	01f4c4be90	Fix Wdocumentation warning. NFCI. llvm-svn: 347253	2018-11-19 19:18:33 +00:00
Simon Pilgrim	493ac5320b	Fix unused function warning. llvm-svn: 347252	2018-11-19 19:18:00 +00:00
Simon Pilgrim	de3605f56b	[TargetLowering] expandFP_TO_UINT - improve fp16 support As discussed on D53794, for float types with ranges smaller than the destination integer type, then we should be able to just use a regular FP_TO_SINT opcode. I thought we'd need to provide MSA test cases for very small integer types as well (fp16 -> i8 etc.), but it turns out that promotion will kick in so they're unnecessary. Differential Revision: https://reviews.llvm.org/D54703 llvm-svn: 347251	2018-11-19 19:16:13 +00:00
Simon Pilgrim	9ad5717fcc	Add missing stream operator for Polynomial class to fix debug builds. llvm-svn: 347249	2018-11-19 18:57:49 +00:00
Martin Elshuber	5a47dc607e	[InterleavedLoadCombine] Fix warnings * remove unused function * fix compare llvm-svn: 347241	2018-11-19 18:35:31 +00:00
Paul Robinson	cda5421016	[DebugInfo] DISubprogram flags get their own flags word. NFC. This will hold flags specific to subprograms. In the future we could potentially free up scarce bits in DIFlags by moving subprogram-specific flags from there to the new flags word. This patch does not change IR/bitcode formats, that will be done in a follow-up. Differential Revision: https://reviews.llvm.org/D54597 llvm-svn: 347239	2018-11-19 18:29:28 +00:00
Martin Elshuber	026a2a5152	[InterleavedLoadCombine] Fix warning unused variable Differential Revision: https://reviews.llvm.org/D52653 llvm-svn: 347229	2018-11-19 17:11:48 +00:00
Sanjay Patel	b25adf5edb	[SelectionDAG] simplify vector select with undef operand(s) llvm-svn: 347227	2018-11-19 17:06:05 +00:00
Benjamin Kramer	22a04efcb0	[InterleavedLoadCombine] Remove unused include. NFC. llvm-svn: 347226	2018-11-19 17:01:19 +00:00
Sanjay Patel	a1dca3553e	[SelectionDAG] simplify select FP with undef condition llvm-svn: 347212	2018-11-19 14:42:28 +00:00
Sanjay Patel	c036d844be	[SelectionDAG] add simplifySelect() to reduce code duplication; NFC This should be extended to handle FP and vectors in follow-up patches. llvm-svn: 347210	2018-11-19 14:35:22 +00:00
Martin Elshuber	fef3036d37	Subject: [PATCH] [CodeGen] Add pass to combine interleaved loads. This patch defines an interleaved-load-combine pass. The pass searches for ShuffleVector instructions that represent interleaved loads. Matches are converted such that they will be captured by the InterleavedAccessPass. The pass extends LLVMs capabilities to use target specific instruction selection of interleaved load patterns (e.g.: ld4 on Aarch64 architectures). Differential Revision: https://reviews.llvm.org/D52653 llvm-svn: 347208	2018-11-19 14:26:10 +00:00
Serge Guelton	12c7a96064	Fix disturbing warning - NFCI llvm-svn: 347186	2018-11-19 10:05:28 +00:00
Vedant Kumar	e7b789b529	[ProfileSummary] Standardize methods and fix comment Every Analysis pass has a get method that returns a reference of the Result of the Analysis, for example, BlockFrequencyInfo &BlockFrequencyInfoWrapperPass::getBFI(). I believe that ProfileSummaryInfo::getPSI() is the only exception to that, as it was returning a pointer. Another change is renaming isHotBB and isColdBB to isHotBlock and isColdBlock, respectively. Most methods use BB as the argument of variable names while methods usually refer to Basic Blocks as Blocks, instead of BB. For example, Function::getEntryBlock, Loop:getExitBlock, etc. I also fixed one of the comments. Patch by Rodrigo Caetano Rocha! Differential Revision: https://reviews.llvm.org/D54669 llvm-svn: 347182	2018-11-19 05:23:16 +00:00
Sanjay Patel	8c0cd77bff	[DAG] add undef simplifications for select nodes Sadly, this duplicates (twice) the logic from InstSimplify. There might be some way to at least share the DAG versions of the code, but copying the folds seems to be the standard method to ensure that we don't miss these folds. Unlike in IR, we don't run DAGCombiner to fixpoint, so there's no way to ensure that we do these kinds of simplifications unless the code is repeated at node creation time and during combines. There were other tests that would become worthless with this improvement that I changed as pre-commits: rL347161 rL347164 rL347165 rL347166 rL347167 I'm not sure how to salvage the remaining tests (diffs in this patch). So the x86 tests verify that the new code is working as intended. The AMDGPU test is actually similar to my motivating case: we have some undef value that has survived to machine IR in an x86 test, and then it gets folded in some weird way, or we crash if we don't transfer the undef flag. But we would have been better off never getting to that point by doing these simplifications. This will lead back to PR32023 someday... https://bugs.llvm.org/show_bug.cgi?id=32023 llvm-svn: 347170	2018-11-18 17:36:23 +00:00
Sanjay Patel	42c22a1f87	[SelectionDAG] simplify code; NFC llvm-svn: 347160	2018-11-18 14:39:03 +00:00
Fangrui Song	7570932977	Use llvm::copy. NFC llvm-svn: 347126	2018-11-17 01:44:25 +00:00
Stanislav Mekhanoshin	0ff7c8309d	DAG combiner: fold (select, C, X, undef) -> X Differential Revision: https://reviews.llvm.org/D54646 llvm-svn: 347110	2018-11-16 23:13:38 +00:00
Craig Topper	9e97054211	[LegalizeVectorOps] After custom legalizing an extending load or a truncating store, make sure the custom code is also legal. For example, on X86 we emit a sign_extend_vector_inreg from LowerLoad and without sse4.1 this node will need further legalization. Previously this sign_extend_vector_inreg was being custom lowered during DAG legalization instead of vector op legalization. Unfortunately, this doesn't seem to matter for the output of any existing lit tests. llvm-svn: 347094	2018-11-16 21:04:58 +00:00
Reid Kleckner	755577168a	[codeview] Expose -gcodeview-ghash for global type hashing Summary: Experience has shown that the functionality is useful. It makes linking optimized clang with debug info for me a lot faster, 20s to 13s. The type merging phase of PDB writing goes from 10s to 3s. This removes the LLVM cl::opt and replaces it with a metadata flag. After this change, users can do the following to use ghash: - add -gcodeview-ghash to compiler flags - replace /DEBUG with /DEBUG:GHASH in linker flags Reviewers: zturner, hans, thakis, takuto.ikuta Subscribers: aprantl, hiraditya, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D54370 llvm-svn: 347072	2018-11-16 18:47:41 +00:00
Simon Pilgrim	3c8baf4f90	[TargetLowering] Cleanup more of the EXTEND demanded bits cases so that they match. NFCI. Use the same variable names etc. llvm-svn: 347045	2018-11-16 12:26:26 +00:00
Sam Parker	ab99cfab21	[DAGCombine] Fix non-deterministic debug output PR37970 reported non-deterministic debug output, this was caused by iterating through a set and not a a vector. bugzilla: https://bugs.llvm.org/show_bug.cgi?id=37970 Differential Revision: https://reviews.llvm.org/D54570 llvm-svn: 347037	2018-11-16 08:35:19 +00:00
Craig Topper	bac7d9735a	[LegalizeVectorTypes] Teach WidenVecRes_Convert to turn ANY_EXTEND into ANY_EXTEND_VECTOR_INREG when the input and output types need to be widened to the same width. If we don't do it here, DAGCombine will just end up creating it from the scalar any_extend+build_vector so might as well save a step. llvm-svn: 347034	2018-11-16 07:13:34 +00:00
Heejin Ahn	095796a391	[WebAssembly] Split BBs after throw instructions Summary: `throw` instruction is a terminator in wasm, but BBs were not splitted after `throw` instructions, causing machine instruction verifier to fail. This patch - Splits BBs after `throw` instructions in WasmEHPrepare and adding an unreachable instruction after `throw`, which will be deleted in LateEHPrepare pass - Refactors WasmEHPrepare into two member functions - Changes the semantics of `eraseBBsAndChildren` in LateEHPrepare pass to match that of WasmEHPrepare pass, which is newly added. Now `eraseBBsAndChildren` does not delete BBs with remaining predecessors. - Fixes style nits, making static function names conform to clang-tidy - Re-enables the test temporarily disabled by rL346840 && rL346845 Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54571 llvm-svn: 347003	2018-11-16 00:47:18 +00:00
Jessica Paquette	ddb039a199	[MachineOutliner][NFC] Check if CandidatesForRepeatedSeq < 2 There's no reason to call getOutliningCandidateInfo with a single candidate. llvm-svn: 346913	2018-11-15 00:02:24 +00:00
Nirav Dave	1241dcb3cf	Bias physical register immediate assignments The machine scheduler currently biases register copies to/from physical registers to be closer to their point of use / def to minimize their live ranges. This change extends this to also physical register assignments from immediate values. This causes a reduction in reduction in overall register pressure and minor reduction in spills and indirectly fixes an out-of-registers assertion (PR39391). Most test changes are from minor instruction reorderings and register name selection changes and direct consequences of that. Reviewers: MatzeB, qcolombet, myatsina, pcc Subscribers: nemanjai, jvesely, nhaehnle, eraman, hiraditya, javed.absar, arphaman, jfb, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D54218 llvm-svn: 346894	2018-11-14 21:11:53 +00:00
Heejin Ahn	da419bdb5e	[WebAssembly] Add support for the event section Summary: This adds support for the 'event section' specified in the exception handling proposal. (This was named 'exception section' first, but later renamed to 'event section' to take possibilities of other kinds of events into consideration. But currently we only store exception info in this section.) The event section is added between the global section and the export section. This is for ease of validation per request of the V8 team. This patch: - Creates the event symbol type, which is a weak symbol - Makes 'throw' instruction take the event symbol '__cpp_exception' - Adds relocation support for events - Adds WasmObjectWriter / WasmObjectFile (Reader) support - Adds obj2yaml / yaml2obj support - Adds '.eventtype' printing support Reviewers: dschuff, sbc100, aardappel Subscribers: jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54096 llvm-svn: 346825	2018-11-14 02:46:21 +00:00
Eli Friedman	6bdabcf368	[CodeGen] Fix forward scan in MachineBasicBlock::computeRegisterLiveness. The scan was incorrectly skipping the first instruction, so a register could appear to be dead when it was actually live. This eventually leads to a machine verifier failure and miscompile in arm-ldst-opt. Differential Revision: https://reviews.llvm.org/D54491 llvm-svn: 346821	2018-11-14 00:39:29 +00:00
Jessica Paquette	cad864d49e	[MachineOutliner][NFC] Use MBB flags to avoid call checks in getOutliningInfo We already determine a bunch of information about an MBB in getMachineOutlinerMBBFlags. We can reuse that information to avoid calculating things that must be false/true. The first thing we can easily check is if an outlined sequence could ever contain calls. There's no reason to walk over the outlined range, checking for calls, if we already know that there are no calls in the block containing the sequence. llvm-svn: 346809	2018-11-13 23:01:34 +00:00
Jessica Paquette	b2d53c5d7d	[MachineOutliner][NFC] Exit getOutliningType if there are < 2 candidates Since we never outline anything with fewer than 2 occurrences, there's no reason to compute cost model information if there's less than that. llvm-svn: 346803	2018-11-13 22:16:27 +00:00
Stanislav Mekhanoshin	35de877e8c	Fixed DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT i1 handling Legalizer used to request an ext load from i8 to i1 when promoting vector element type to i8. Fixed. Differential Revision: https://reviews.llvm.org/D54440 llvm-svn: 346795	2018-11-13 20:26:27 +00:00
Fangrui Song	d8fd0ec032	[AsmPrinter] Rename a comment of .debug_gnu_pubnames entry Summary: The comment refers to the field as "Kind:". However, in gdb, https://sourceware.org/gdb//onlinedocs/gdb/Index-Section-Format.html names it "attributes", gdb/dwarf2read.c:dw2_symtab_iter_next refers to the whole value as "cu_index_and_attrs" Change it to `Attributes:` for consistency. Reviewers: dblaikie Reviewed By: dblaikie Subscribers: aprantl, JDevlieghere, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D54480 llvm-svn: 346790	2018-11-13 20:18:08 +00:00
David Blaikie	bb279116f2	DebugInfo: Add a CU metadata attribute for use of DWARF ranges base address specifiers Summary: Ranges base address specifiers can save a lot of object size in relocation records especially in optimized builds. For an optimized self-host build of Clang with split DWARF and debug info compression in object files, but uncompressed debug info in the executable, this change produces about 18% smaller object files and 6% larger executable. While it would've been nice to turn this on by default, gold's 32 bit gdb-index support crashes on this input & I don't think there's any perfect heuristic to implement solely in LLVM that would suffice - so we'll need a flag one way or another (also possible people might want to aggressively optimized for executable size that contains debug info (even with compression this would still come at some cost to executable size)) - so let's plumb it through. Differential Revision: https://reviews.llvm.org/D54242 llvm-svn: 346788	2018-11-13 20:08:10 +00:00
Craig Topper	aca8390216	[SelectionDAG][X86] Relax restriction on the width of an input to _EXTEND_VECTOR_INREG. Use them and regular _EXTEND to replace the X86 specific VSEXT/VZEXT opcodes Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output. This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes. X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code. I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements. The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits. Differential Revision: https://reviews.llvm.org/D54346 llvm-svn: 346784	2018-11-13 19:45:21 +00:00
Cameron McInally	cbde0d9c7b	[IR] Add a dedicated FNeg IR Instruction The IEEE-754 Standard makes it clear that fneg(x) and fsub(-0.0, x) are two different operations. The former is a bitwise operation, while the latter is an arithmetic operation. This patch creates a dedicated FNeg IR Instruction to model that behavior. Differential Revision: https://reviews.llvm.org/D53877 llvm-svn: 346774	2018-11-13 18:15:47 +00:00
Alexander Kornienko	3635c89070	Fix uninitialized variable. Flags variable was not initialized and later used (both isMBBSafeToOutlineFrom implementations assume it's initialized), which breaks test/CodeGen/AArch64/machine-outliner.mir. under memory sanitizer: MemorySanitizer: use-of-uninitialized-value #0 in llvm::AArch64InstrInfo::getOutliningType(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>&, unsigned int) const llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:5494:9 #1 in (anonymous namespace)::InstructionMapper::convertToUnsignedVec(llvm::MachineBasicBlock&, llvm::TargetInstrInfo const&) llvm/lib/CodeGen/MachineOutliner.cpp:772:19 #2 in (anonymous namespace)::MachineOutliner::populateMapper((anonymous namespace)::InstructionMapper&, llvm::Module&, llvm::MachineModuleInfo&) llvm/lib/CodeGen/MachineOutliner.cpp:1543:14 #3 in (anonymous namespace)::MachineOutliner::runOnModule(llvm::Module&) llvm/lib/CodeGen/MachineOutliner.cpp:1645:3 #4 in (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) llvm/lib/IR/LegacyPassManager.cpp:1744:27 #5 in llvm::legacy::PassManagerImpl::run(llvm::Module&) llvm/lib/IR/LegacyPassManager.cpp:1857:44 #6 in compileModule(char**, llvm::LLVMContext&) llvm/tools/llc/llc.cpp:597:8 llvm-svn: 346761	2018-11-13 16:41:05 +00:00
Craig Topper	0b33b468a1	[DAGCombiner] Enable tryToFoldExtendOfConstant to run after legalize vector ops It should be ok to create a new build_vector after legal operations so long as it doesn't cause an infinite loop in DAG combiner. Unfortunately, X86's custom constant folding in combineVSZext is hiding any test changes from this. But I'm trying to get to a point where that X86 specific code isn't necessary at all. Differential Revision: https://reviews.llvm.org/D54285 llvm-svn: 346728	2018-11-13 01:59:32 +00:00
Jessica Paquette	82d9c0a3fa	[MachineOutliner][NFC] Change getMachineOutlinerMBBFlags to isMBBSafeToOutlineFrom Instead of returning Flags, return true if the MBB is safe to outline from. This lets us check for unsafe situations, like say, in AArch64, X17 is live across a MBB without being defined in that MBB. In that case, there's no point in performing an instruction mapping. llvm-svn: 346718	2018-11-12 23:51:32 +00:00
Philip Reames	e44a55dc98	[GC][NFC] Simplify code now that we only have one safepoint kind This is the NFC follow up to exploit the semantic simplification from r346701 llvm-svn: 346712	2018-11-12 22:03:53 +00:00
Ali Tamur	d482b01a62	Use a data structure better suited for large sets in SimplificationTracker. Summary: D44571 changed SimplificationTracker to use SmallSetVector to keep phi nodes. As a result, when the number of phi nodes is large, the build time performance suffers badly. When building for power pc, we have a case where there are more than 600.000 nodes, and it takes too long to compile. In this change, I partially revert D44571 to use SmallPtrSet, which does an acceptable job with any number of elements. In the original patch, having a deterministic iteration order was mentioned as a motivation, however I think it only applies to the nodes already matched in MatchPhiSet method, which I did not touch. Reviewers: bjope, skatkov Reviewed By: bjope, skatkov Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54007 llvm-svn: 346710	2018-11-12 21:43:43 +00:00
Philip Reames	c75a0c3f69	[GC] Remove so called PreCall safepoints Remove another bit of unused configuration potential from GCStrategy. It's not entirely clear what the intention here was, but from the docs, it sounds like this may have been subsumed by patchable call support. Note: This change is deliberately small to make it clear that while implemented, there's nothing using the option. A following NFC will do most of the simplifications. llvm-svn: 346701	2018-11-12 20:15:34 +00:00
Stanislav Mekhanoshin	5f9513147a	Fix MachineInstr::findRegisterUseOperandIdx subreg checks The function only checks that instruction reads a super-register containing requested physical register. In case if a sub-register if being read that is also a use of a super-reg, so added the check. In particular MI->readsRegister() is broken because of the missing check. The resulting check is essentially regsOverlap(). Differential Revision: https://reviews.llvm.org/D54128 llvm-svn: 346686	2018-11-12 18:12:28 +00:00
Jessica Paquette	9702144341	[MachineOutliner][NFC] Early exit pruning when candidates don't share an MBB There's no way they can overlap in this case. This can save a few iterations when the candidate is close to the beginning of a MachineBasicBlock. It's particularly useful when the average length of a MachineBasicBlock in the program is small. llvm-svn: 346682	2018-11-12 17:50:56 +00:00
Jessica Paquette	3954272ac1	[MachineOutliner][NFC] Put suffix tree in buildCandidateList It's only used there, so it doesn't make much sense to have it in runOnModule. llvm-svn: 346681	2018-11-12 17:50:55 +00:00
Paul Robinson	5b302bfc8e	[DWARFv5] Emit split type units in .debug_info.dwo. Differential Revision: https://reviews.llvm.org/D54350 llvm-svn: 346674	2018-11-12 16:55:11 +00:00
Nirav Dave	a395e2df56	[DAGCombiner] Fix load-store forwarding of indexed loads. Summary: Handle extra output from index loads in cases where we wish to forward a load value directly from a preceeding store. Fixes PR39571. Reviewers: peter.smith, rengolin Subscribers: javed.absar, hiraditya, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D54265 llvm-svn: 346654	2018-11-12 14:05:40 +00:00
Philip Reames	8b48ceac80	[GC] Remove unused configuration variable The custom root mechanism didn't actually do anything. ShadowStackGC, the only one which used it, just removed the gcroots before they reached the normal lowering in SelectionDAG. As a result, the state flag had no value. llvm-svn: 346632	2018-11-12 02:34:54 +00:00
Philip Reames	1559021751	[GC] Minor style modernization llvm-svn: 346631	2018-11-12 02:26:26 +00:00
Philip Reames	18945d6c99	[GCRoot] Remove some unneccessary complexity The GCStrategy provides three configuration options were are largely redundant. 1) Support for conditionally lowering gcread and gcwrite to loads and stores. This is redundant since any GC which wished to use these abstractions would lower them out of existance before the built in lowering anyways. As such, there's no need to have the lowering being conditional. 2) Conditional initialization for allocas marked via gcroot. Semantically, roots have to be initialized before first potential use. Arguably, the frontend really should have responsibility for that, but the old API allowed the frontend to ignore this detail. Only one builtin GC used the non-initializing mode. Since no one to my knowledge actually uses the ErlangGC strategy, I decide the slight pessimization was worth the simplicity. If that turns out to be problematic, we can always improve the insertion algorithm to detect more existing initializing stores. llvm-svn: 346621	2018-11-11 21:13:09 +00:00
Craig Topper	d23cdbbeb2	[DAGCombiner] Make tryToFoldExtendOfConstant return an SDValue instead of an SDNode*. NFC Removes the need to call getNode internally and to recreate an SDValue after the call. llvm-svn: 346600	2018-11-10 23:46:03 +00:00
Sanjay Patel	0a515595a7	[x86] allow vector load narrowing with multi-use values This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision: https://reviews.llvm.org/D54073 llvm-svn: 346595	2018-11-10 20:05:31 +00:00
Philip Reames	9b8c102675	[GC] Rename a header for consistency llvm-svn: 346588	2018-11-10 16:08:10 +00:00
Matthias Braun	fb93aecf8d	RegAllocFast: Further cleanups; NFC llvm-svn: 346576	2018-11-10 00:36:27 +00:00
Philip Reames	afa1742b4b	[GC] Simplify linking of GC builtin GC strategies llvm-svn: 346569	2018-11-09 23:56:21 +00:00
Craig Topper	f2e65f8636	[SelectionDAG] Fix a -Wparentheses warning from gcc in an assert. NFC gcc wants parentheses around the logical OR since there is a logical AND for the string. llvm-svn: 346564	2018-11-09 23:11:30 +00:00
Paul Robinson	ddbde9a4ad	[DWARFv5] Emit normal type units in .debug_info comdats. Differential Revision: https://reviews.llvm.org/D54282 llvm-svn: 346540	2018-11-09 19:06:09 +00:00
Craig Topper	9a7e19b8f2	[DAGCombiner][X86][Mips] Enable combineShuffleOfScalars to run between vector op legalization and DAG legalization. Fix bad one use check in combineShuffleOfScalars It's possible for vector op legalization to generate a shuffle. If that happens we should give a chance for DAG combine to combine that with a build_vector input. I also fixed a bug in combineShuffleOfScalars that was considering the number of uses on a undef input to a shuffle. We don't care how many times undef is used. Differential Revision: https://reviews.llvm.org/D54283 llvm-svn: 346530	2018-11-09 18:04:34 +00:00
Serge Guelton	86f8b70f1b	Type safe version of MachinePassRegistry Previous version used type erasure through a `void* (*)()` pointer, which triggered gcc warning and implied a lot of reinterpret_cast. This version should make it harder to hit ourselves in the foot. Differential revision: https://reviews.llvm.org/D54203 llvm-svn: 346522	2018-11-09 17:19:45 +00:00
Zaara Syeda	5c179bf14b	[Power9] Allow gpr callee saved spills in prologue to vectors registers Currently in llvm, CalleeSavedInfo can only assign a callee saved register to stack frame index to be spilled in the prologue. We would like to enable spilling gprs to vector registers. This patch adds the capability to spill to other registers aside from just the stack. It also adds the changes for power9 to spill gprs to volatile vector registers when they are available. This happens only for leaf functions when using the option -ppc-enable-pe-vector-spills. Differential Revision: https://reviews.llvm.org/D39386 llvm-svn: 346512	2018-11-09 16:36:24 +00:00
Alexandros Lamprineas	e15c982f6d	[SelectionDAG] swap select_cc operands to enable folding The DAGCombiner tries to SimplifySelectCC as follows: select_cc(x, y, 16, 0, cc) -> shl(zext(set_cc(x, y, cc)), 4) It can't cope with the situation of reordered operands: select_cc(x, y, 0, 16, cc) In that case we just need to swap the operands and invert the Condition Code: select_cc(x, y, 16, 0, ~cc) Differential Revision: https://reviews.llvm.org/D53236 llvm-svn: 346484	2018-11-09 11:09:40 +00:00
Craig Topper	8cca8bd4aa	[SelectionDAG] Assert on the width of DemandedElts argument to computeKnownBits for all vector typed operations not just build_vector. Fix AArch64 unit test that fails with the assertion added. llvm-svn: 346437	2018-11-08 20:29:17 +00:00
Nirav Dave	6ce9f72f76	[DAGCombine] Improve alias analysis for chain of independent stores. FindBetterNeighborChains simulateanously improves the chain dependencies of a chain of related stores avoiding the generation of extra token factors. For chains longer than the GatherAllAliasDepths, stores further down in the chain will necessarily fail, a potentially significant waste and preventing otherwise trivial parallelization. This patch directly parallelize the chains of stores before improving each store. This generally improves DAG-level parallelism. Reviewers: courbet, spatel, RKSimon, bogner, efriedma, craig.topper, rnk Subscribers: sdardis, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D53552 llvm-svn: 346432	2018-11-08 19:14:20 +00:00
David Blaikie	c8f7e6c1a9	NFC: DebugInfo: Track the origin CU rather than just the base address for range lists Turns out knowing more than just the base address might be useful - specifically a future change to respect a DICompileUnit flag for the use of base address specifiers in DWARF < 5. llvm-svn: 346380	2018-11-08 00:35:54 +00:00
Jessica Paquette	c4cf775ae0	[MachineOutliner][NFC] Only map blocks which have adjacent legal instructions If a block doesn't have any ranges of adjacent legal instructions, then it can't have outlining candidates. There's no point in mapping legal isntructions in situations like this. I noticed this reduces the size of the suffix tree in sqlite3 for AArch64 at -Oz by about 3%. llvm-svn: 346379	2018-11-08 00:33:38 +00:00
Jessica Paquette	267d266c29	[MachineOutliner][NFC] Don't map MBBs that don't contain legal instructions I noticed that there are lots of basic blocks that don't have enough legal instructions in them to warrant outlining. We can skip mapping these entirely. In sqlite3, compiled for AArch64 at -Oz, this results in a 10% reduction of the total nodes in the suffix tree. These nodes can never be part of a repeated substring, and so they don't impact the result at all. Before this, there were 62128 nodes in the tree for sqlite3. After this, there are 56457 nodes. llvm-svn: 346373	2018-11-08 00:02:11 +00:00
Jessica Paquette	df5b09b8ce	[MachineOutliner][NFC] Remove Parent field from SuffixTreeNode This is only used for calculating ConcatLen. This isn't necessary, since it's easily derived from the traversal setting suffix indices. Remove that. Rename CurrIdx to CurrNodeLen to better describe what's going on. llvm-svn: 346349	2018-11-07 19:56:13 +00:00
Jessica Paquette	a409cc959b	[MachineOutliner][NFC] Traverse suffix tree using a RepeatedSubstring iterator This takes the traversal methods introduced in r346269 and adapts them into an iterator. This allows the outliner to iterate over repeated substrings within the suffix tree directly without having to initially find all of the substrings and then iterate over them after you've found them. llvm-svn: 346345	2018-11-07 19:20:55 +00:00
Jessica Paquette	a3eb0fac3b	[MachineOutliner] Don't store outlined function numberings on OutlinedFunction NFC-ish. This doesn't change the behaviour of the outliner, but does make sure that you won't end up with say OUTLINED_FUNCTION_2: ... ret OUTLINED_FUNCTION_248: ... ret as the only outlined functions in your module. Those should really be OUTLINED_FUNCTION_0: ... ret OUTLINED_FUNCTION_1: ... ret If we produce outlined functions, they probably should have sequential numbers attached to them. This makes it a bit easier+stable to write outliner tests. The point of this is to move towards a bit more stability in outlined function names. By doing this, we at least don't rely on the traversal order of the suffix tree. Instead, we rely on the order of the candidate list, which is far more consistent. The candidate list is ordered by the end indices of candidates, so we're more likely to get a stable ordering. This is still susceptible to changes in the cost model though (like, if we suddenly find new candidates, for example). llvm-svn: 346340	2018-11-07 18:36:43 +00:00
Serge Guelton	a4d9e2293a	Fix ignorded type qualifier warning [NFC] llvm-svn: 346332	2018-11-07 16:17:30 +00:00
James Y Knight	72f76bf230	Add support for llvm.is.constant intrinsic (PR4898) This adds the llvm-side support for post-inlining evaluation of the __builtin_constant_p GCC intrinsic. Also fixed SCCPSolver::visitCallSite to not blow up when seeing a call to a function where canConstantFoldTo returns true, and one of the arguments is a struct. Updated from patch initially by Janusz Sobczak. Differential Revision: https://reviews.llvm.org/D4276 llvm-svn: 346322	2018-11-07 15:24:12 +00:00
Matthias Braun	5b7c90b4e2	RegAllocFast: Leave unassigned virtreg entries in map Set `LiveReg::PhysReg` to zero when freeing a register instead of removing it from the entry from `LiveRegMap`. This way no iterators get invalidated and we can avoid passing around and updating iterators all over the place. This does not change any allocator decisions. It is not completely NFC because the arbitrary iteration order through `LiveRegMap` in `spillAll()` changes so we may get a different order in those spill sequences (the amount of spills does not change). This is in preparation of https://reviews.llvm.org/D52010. llvm-svn: 346298	2018-11-07 06:57:03 +00:00
Matthias Braun	b0ecbef428	RegAllocFast: Further cleanups; NFC This is in preparation of https://reviews.llvm.org/D52010. llvm-svn: 346297	2018-11-07 06:57:02 +00:00
Matthias Braun	0804dca358	RegAllocFast: Refactor PhysRegState usage; NFC This is in preparation of https://reviews.llvm.org/D52010. llvm-svn: 346296	2018-11-07 06:57:00 +00:00
Matthias Braun	b4c76ff77c	RegAllocFast: Factor spill/reload creation into their own functions; NFC This is in preparation of https://reviews.llvm.org/D52010. llvm-svn: 346289	2018-11-07 02:04:12 +00:00
Matthias Braun	ebcf5437bc	RegAllocFast: Cleanups; NFC This is in preparation of https://reviews.llvm.org/D52010. llvm-svn: 346288	2018-11-07 02:04:11 +00:00
Matthias Braun	14af82a608	RegAllocFast: Rename statistic from NumCopies to NumCoalesced The metric does not return the number of remaining (or inserted) copies but the number of copies that were coalesced. Pick a more descriptive name. llvm-svn: 346287	2018-11-07 02:04:07 +00:00
Jessica Paquette	935d373db9	[MachineOutliner][NFC] Remove OccurrenceCount from SuffixTreeNode After changing the way we find candidates in r346269, this is no longer used. llvm-svn: 346275	2018-11-06 22:23:13 +00:00
Jessica Paquette	979cf1e566	[MachineOutliner][NFC] Remove IsInTree from SuffixTreeNode After changing the way we find repeated substrings in r346269, this field is no longer used by anything, so it can be removed. llvm-svn: 346274	2018-11-06 22:21:11 +00:00
Jessica Paquette	4e54ef8883	[MachineOutliner][NFC] Add findRepeatedSubstrings to SuffixTree, kill LeafVector Instead of iterating over the leaves to find repeated substrings, and walking collecting leaf children when we don't necessarily need them, let's just calculate what we need and iterate over that. By doing this, we don't have to save every leaf. It's easier to read the code too and understand what's going on. The goal here, at the end of the day, is to set up to allow us to do something like for (RepeatedSubstring &RS : ST) { ... do stuff with RS ... } Which would let us perform the cost model stuff and the repeated substring query at the same time. llvm-svn: 346269	2018-11-06 21:46:41 +00:00
Matthias Braun	c6613879ce	LivePhysRegs/IfConversion: Change some types from unsigned to MCPhysReg; NFC Change the type in a couple of lists and sets that only store physical registers from unsigned to MCPhysRegs. The later is only 16bits and saves us a bit of memory. llvm-svn: 346254	2018-11-06 19:00:11 +00:00
Matthias Braun	7a75a91b5b	MachineFunction: Store more specific reference to LLVMTargetMachine; NFC MachineFunction can only be used in code using lib/CodeGen, hence we can keep a more specific reference to LLVMTargetMachine rather than just TargetMachine around. Do the same for references in ScheduleDAG and RegUsageInfoCollector. llvm-svn: 346183	2018-11-05 23:49:14 +00:00
Matthias Braun	3d849f67cb	MachineModuleInfo: Store more specific reference to LLVMTargetMachine; NFC MachineModuleInfo can only be used in code using lib/CodeGen, hence we can keep a more specific reference to LLVMTargetMachine rather than just TargetMachine around. llvm-svn: 346182	2018-11-05 23:49:13 +00:00
Cameron McInally	9757d5d6c1	[FPEnv] Add constrained CEIL/FLOOR/ROUND/TRUNC intrinsics Differential Revision: https://reviews.llvm.org/D53411 llvm-svn: 346141	2018-11-05 15:59:49 +00:00
Simon Pilgrim	6bd468bd8b	[TargetLowering] Begin generalizing TargetLowering::expandFP_TO_SINT support. NFCI. Prior to initial work to add vector expansion support, remove assumptions that we're working on scalar types. llvm-svn: 346139	2018-11-05 15:49:09 +00:00
Craig Topper	8f2f2a76b9	[DAGCombiner] Use tryFoldToZero to simplify some code and make it work correctly between LegalTypes and LegalOperations. The original code avoided creating a zero vector after type legalization, but if we're after type legalization the type we have is legal. The real hazard we need to avoid is creating a build vector after op legalization. tryFoldToZero takes care of checking for this. llvm-svn: 346119	2018-11-05 05:53:06 +00:00
Craig Topper	8d64abddd1	[DAGCombiner] Remove an unused argument from tryFoldToZero. NFC llvm-svn: 346118	2018-11-05 05:53:03 +00:00
Craig Topper	3292ea03d3	[DAGCombiner] Remove 'else' after return. NFC This makes this code consistent with the nearly identical code in visitZERO_EXTEND. llvm-svn: 346090	2018-11-04 06:56:32 +00:00
Craig Topper	1ba86188cf	[SelectionDAG] Remove special methods for creating *_EXTEND_VECTOR_INREG nodes. Move asserts into getNode. These methods were just wrappers around getNode with additional asserts (identical and repeated 3 times). But getNode already has a switch that can be used to hold these asserts that allows them to be shared for all 3 opcodes. This also enables checking on the places that create these nodes without using the wrappers. The rest of the patch is just changing all callers to use getNode directly. llvm-svn: 346087	2018-11-04 02:10:18 +00:00
Reid Kleckner	2bcb288ade	[codeview] Let the X86 backend tell us the VFRAME offset adjustment Use MachineFrameInfo's OffsetAdjustment field to pass this information from the target to CodeViewDebug.cpp. The X86 backend doesn't use it for any other purpose. This fixes PR38857 in the case where there is a non-aligned quantity of CSRs and a non-aligned quantity of locals. llvm-svn: 346062	2018-11-03 00:41:52 +00:00
Craig Topper	60c202a494	[X86] Don't emit *_extend_vector_inreg nodes when both the input and output types are legal with AVX1 We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible. I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here. I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector. For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size. Differential Revision: https://reviews.llvm.org/D54024 llvm-svn: 346043	2018-11-02 21:09:49 +00:00
Jeremy Morse	d538352b3e	[MachineSink][DebugInfo] Correctly sink DBG_VALUEs As reported in PR38952, postra-machine-sink relies on DBG_VALUE insns being adjacent to the def of the register that they reference. This is not always true, leading to register copies being sunk but not the associated DBG_VALUEs, which gives the debugger a bad variable location. This patch collects DBG_VALUEs as we walk through a BB looking for copies to sink, then passes them down to performSink. Compile-time impact should be negligable. Differential Revision: https://reviews.llvm.org/D53992 llvm-svn: 345996	2018-11-02 16:52:48 +00:00
Simon Pilgrim	cdcbeb4997	[DAGCombiner] Remove reduceBuildVecConvertToConvertBuildVec and rely on the vectorizers instead (PR35732) reduceBuildVecConvertToConvertBuildVec vectorizes int2float in the DAGCombiner, which means that even if the LV/SLP has decided to keep scalar code using the cost models, this will override this. While there are cases where vectorization is necessary in the DAG (mainly due to legalization artefacts), I don't think this is the case here, we should assume that the vectorizers know what they are doing. Differential Revision: https://reviews.llvm.org/D53712 llvm-svn: 345964	2018-11-02 11:06:18 +00:00
Matthias Braun	e8f717aea8	LLVMTargetMachine/TargetPassConfig: Simplify handling of start/stop options; NFC - Make some TargetPassConfig methods that just check whether options have been set static. - Shuffle code in LLVMTargetMachine around so addPassesToGenerateCode only deals with TargetPassConfig now (but not with MCContext or the creation of MachineModuleInfo) llvm-svn: 345918	2018-11-02 01:31:50 +00:00
Mandeep Singh Grang	547a0d765a	[COFF, ARM64] Implement Intrinsic.sponentry for AArch64 Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Patch by: Yin Ma (yinma@codeaurora.org) Reviewers: mgrang, ssijaric, eli.friedman, TomTan, mstorsjo, rnk, compnerd, efriedma Reviewed By: efriedma Subscribers: efriedma, javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53996 llvm-svn: 345909	2018-11-01 23:22:25 +00:00
Craig Topper	e2483020f2	[DAGCombiner] Make the isTruncateOf call from visitZERO_EXTEND work for vectors. Remove FIXME. I'm having trouble creating a test case for the ISD::TRUNCATE part of this that shows any codegen differences. But I was able to test the setcc path which is what the test changes here cover. llvm-svn: 345908	2018-11-01 23:21:45 +00:00
Jessica Paquette	c991cf3687	[MachineOutliner][NFC] Remember when you map something illegal across MBBs Instruction mapping in the outliner uses "illegal numbers" to signify that something can't ever be part of an outlining candidate. This means that the number is unique and can't be part of any repeated substring. Because each of these is unique, we can use a single unique number to represent a range of things we can't outline. The outliner tries to leverage this using a flag which is set in an MBB when the previous instruction we tried to map was "illegal". This patch improves that logic to work across MBBs. As a bonus, this also simplifies the mapping logic somewhat. This also updates the machine-outliner-remarks test, which was impacted by the order of Candidates on an OutlinedFunction changing. This order isn't guaranteed, so I added a FIXME to fix that in a follow-up. The order of Candidates on an OutlinedFunction isn't important, so this still is NFC. llvm-svn: 345906	2018-11-01 23:09:06 +00:00
Simon Pilgrim	b34a052852	[LegalizeDAG] Add generic vector CTPOP expansion (PR32655) This patch adds support for expanding vector CTPOP instructions and removes the x86 'bitmath' lowering which replicates the same expansion. Differential Revision: https://reviews.llvm.org/D53258 llvm-svn: 345869	2018-11-01 18:22:11 +00:00
Mandeep Singh Grang	b0cdf56dd7	Revert "[COFF, ARM64] Implement Intrinsic.sponentry for AArch64" This reverts commit 585b6667b4712e3c7f32401e929855b3313b4ff2. llvm-svn: 345863	2018-11-01 17:53:57 +00:00
Sanjay Patel	c5fe3ce2ec	[DAGCombiner] make sure we have a whole-number extract before trying to narrow a vector op (PR39511) The test causes a crash because we were trying to extract v4f32 to v3f32, and the narrowing factor was then 4/3 = 1 producing a bogus narrow type. This should fix: https://bugs.llvm.org/show_bug.cgi?id=39511 llvm-svn: 345842	2018-11-01 15:41:12 +00:00
Zachary Turner	56a5a0c3ce	[CodeView] Emit the correct TypeIndex for std::nullptr_t. The TypeIndex used by cl.exe is 0x103, which indicates a SimpleTypeMode of NearPointer (note the absence of the bitness, normally pointers use a mode of NearPointer32 or NearPointer64) and a SimpleTypeKind of void. So this is basically a void, but without a specified size, which makes sense given how std::nullptr_t is defined. clang-cl was actually not emitting anything* for this. Instead, when we encountered std::nullptr_t in a DIType, we would actually just emit a TypeIndex of 0, which is obviously wrong. std::nullptr_t in DWARF is represented as a DW_TAG_unspecified_type with a name of "decltype(nullptr)", so we add that logic along with a test, as well as an update to the dumping code so that we no longer print void* when dumping 0x103 (which would previously treat Void/NearPointer no differently than Void/NearPointer64). Differential Revision: https://reviews.llvm.org/D53957 llvm-svn: 345811	2018-11-01 04:02:41 +00:00
Mandeep Singh Grang	88ad9ac720	[COFF, ARM64] Implement Intrinsic.sponentry for AArch64 Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Reviewers: mgrang, TomTan, rnk, compnerd, mstorsjo, efriedma Reviewed By: efriedma Subscribers: majnemer, chrib, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D53673 llvm-svn: 345791	2018-10-31 23:16:20 +00:00
Stanislav Mekhanoshin	222e9c11f7	Check shouldReduceLoadWidth from SimplifySetCC SimplifySetCC could shrink a load without checking for profitability or legality of such shink with a target. Added checks to prevent shrinking of aligned scalar loads in AMDGPU below dword as scalar engine does not support it. Differential Revision: https://reviews.llvm.org/D53846 llvm-svn: 345778	2018-10-31 21:24:30 +00:00
Scott Linder	92bb783cfe	[SelectionDAG] Handle constant range [0,1) in lowerRangeToAssertZExt lowerRangeToAssertZExt currently relies on something like EarlyCSE having eliminated the constant range [0,1). At -O0 this leads to an assert. Differential Revision: https://reviews.llvm.org/D53888 llvm-svn: 345770	2018-10-31 19:57:36 +00:00
Craig Topper	eeac12af6d	[SelectionDAGISel] Suppress a -Wunused-but-set-variable warning in release builds. NFC llvm-svn: 345761	2018-10-31 18:46:15 +00:00
Simon Pilgrim	077a9adb00	Fix comment typo. NFCI. llvm-svn: 345758	2018-10-31 18:19:52 +00:00
Simon Pilgrim	805cdcfe73	[SelectionDAG] SelectionDAGLegalize::ExpandBITREVERSE - ensure we use ShiftTy We should be using the getShiftAmountTy value type for shift amounts. llvm-svn: 345756	2018-10-31 18:14:14 +00:00
Daniel Sanders	3b39040ad4	[globalisel][irtranslator] Verify that DILocations aren't lost in translation Summary: Also fix a couple bugs where DILocations are lost. EntryBuilder wasn't passing on debug locations for PHI's, constants, GLOBAL_VALUE, etc. Reviewers: aprantl, vsk, bogner, aditya_nandakumar, volkan, rtereshin, aemerson Reviewed By: aemerson Subscribers: aemerson, rovka, kristof.beyls, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53740 llvm-svn: 345743	2018-10-31 17:31:23 +00:00
Matthias Braun	8763c0c5b7	MachineModuleInfo: Initialize DbgInfoAvailable depending on debug_cus existing Before this patch DbgInfoAvailable was set to true in DwarfDebug::beginModule() or CodeViewDebug::CodeViewDebug(). This made MIR testing weird since passes would suddenly stop dealing with debug info just because we stopped the pipeline before the debug printers. This patch changes the logic to initialize DbgInfoAvailable based on the fact that debug_compile_units exist in the llvm Module. The debug printers may then override it with false in case of debug printing being disabled. Differential Revision: https://reviews.llvm.org/D53885 llvm-svn: 345740	2018-10-31 17:18:41 +00:00
David Bolvansky	d0080c3a5f	[DAGCombiner] Fold 0 div/rem X to 0 Reviewers: RKSimon, spatel, javed.absar, craig.topper, t.p.northover Reviewed By: RKSimon Subscribers: craig.topper, llvm-commits Differential Revision: https://reviews.llvm.org/D52504 llvm-svn: 345721	2018-10-31 14:18:57 +00:00
Matthias Braun	9fd397b423	ADT/STLExtras: Introduce llvm::empty; NFC This is modeled after C++17 std::empty(). Differential Revision: https://reviews.llvm.org/D53909 llvm-svn: 345679	2018-10-31 00:23:23 +00:00
Matthias Braun	a83403892a	MachineOperand/MIParser: Do not print debug-use flag, infer it The debug-use flag must be set exactly for uses on DBG_VALUEs. This is so obvious that it can be trivially inferred while parsing. This will reduce noise when printing while omitting an information that has little value to the user. The parser will keep recognizing the flag for compatibility with old `.mir` files. Differential Revision: https://reviews.llvm.org/D53903 llvm-svn: 345671	2018-10-30 23:28:27 +00:00
Cameron McInally	2ad870e785	[FPEnv] [FPEnv] Add constrained intrinsics for MAXNUM and MINNUM Differential Revision: https://reviews.llvm.org/D53216 llvm-svn: 345650	2018-10-30 21:01:29 +00:00
Craig Topper	4104c00658	[ScalarizeMaskedMemIntrin] Limit the scope of some variables that are only used inside loops. llvm-svn: 345638	2018-10-30 20:33:58 +00:00
Bjorn Pettersson	fe09a20f09	[DAGCombiner] Fix for big endian in ForwardStoreValueToDirectLoad Summary: Normalize the offset for endianess before checking if the store cover the load in ForwardStoreValueToDirectLoad. Without this we missed out on some optimizations for big endian targets. If for example having a 4 bytes store followed by a 1 byte load, loading the least significant byte from the store, the STCoversLD check would fail (see @test4 in test/CodeGen/AArch64/load-store-forwarding.ll). This patch also fixes a problem seen in an out-of-tree target. The target has i40 as a legal type, it is big endian, and the StoreSize for i40 is 48 bits. So when normalizing the offset for endianess we need to take the StoreSize into account (assuming that padding added when storing into a larger StoreSize always is added at the most significant end). Reviewers: niravd Reviewed By: niravd Subscribers: javed.absar, kristof.beyls, llvm-commits, uabelho Differential Revision: https://reviews.llvm.org/D53776 llvm-svn: 345636	2018-10-30 20:16:39 +00:00
Nirav Dave	eedd2ccd02	[DAG] Add const variants for BaseIndexOffset functions. llvm-svn: 345623	2018-10-30 18:26:43 +00:00
Jonas Paulsson	611b533f1d	[SchedModel] Fix for read advance cycles with implicit pseudo operands. The SchedModel allows the addition of ReadAdvances to express that certain operands of the instructions are needed at a later point than the others. RegAlloc may add pseudo operands that are not part of the instruction descriptor, and therefore cannot have any read advance entries. This meant that in some cases the desired read advance was nullified by such a pseudo operand, which still had the original latency. This patch fixes this by making sure that such pseudo operands get a zero latency during DAG construction. Review: Matthias Braun, Ulrich Weigand. https://reviews.llvm.org/D49671 llvm-svn: 345606	2018-10-30 15:04:40 +00:00
Sanjay Patel	8b207defea	[DAGCombiner] narrow vector binops when extraction is cheap Narrowing vector binops came up in the demanded bits discussion in D52912. I don't think we're going to be able to do this transform in IR as a canonicalization because of the risk of creating unsupported widths for vector ops, but we already have a DAG TLI hook to allow what I was hoping for: isExtractSubvectorCheap(). This is currently enabled for x86, ARM, and AArch64 (although only x86 has existing regression test diffs). This is artificially limited to not look through bitcasts because there are so many test diffs already, but that's marked with a TODO and is a small follow-up. Differential Revision: https://reviews.llvm.org/D53784 llvm-svn: 345602	2018-10-30 14:14:34 +00:00
Sanjay Patel	680c9227ca	[SelectionDAG] fix build warning for mismatched signs in compare; NFC llvm-svn: 345598	2018-10-30 13:47:19 +00:00
Francis Visoiu Mistrih	85d3f1ee8f	[llc] Error out when -print-machineinstrs is used with an unknown pass We used to assert instead of reporting an error. PR39494 llvm-svn: 345589	2018-10-30 12:07:18 +00:00
Simon Pilgrim	858303b827	[SelectionDAG] Add FoldBUILD_VECTOR to simplify new BUILD_VECTOR nodes Similar to FoldCONCAT_VECTORS, this patch adds FoldBUILD_VECTOR to simplify cases that can avoid the creation of the BUILD_VECTOR - if all the operands are UNDEF or if the BUILD_VECTOR simplifies to a copy. This exposed an assumption in some AMDGPU code that getBuildVector was guaranteed to be a BUILD_VECTOR node that I've tried to handle. Differential Revision: https://reviews.llvm.org/D53760 llvm-svn: 345578	2018-10-30 10:32:11 +00:00
David Bolvansky	dfdbb038e8	[DAGCombiner] Improve X div/rem Y fold if single bit element type Summary: Tests by @spatel, thanks Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: sdardis, atanasyan, llvm-commits, spatel Differential Revision: https://reviews.llvm.org/D52668 llvm-svn: 345575	2018-10-30 09:07:22 +00:00
Craig Topper	b293322cee	[LegalizeTypes] Teach PromoteIntRes_BITCAST to better handle a bitcast with vector output type and a vector input type that needs to be widened Summary: Previously if we had a bitcast vector output type that needs promotion and a vector input type that needs widening we would just do a stack store and load to handle the conversion. We can do a little better if we can widen the bitcast to a legal vector type the same size as the widened input type. Then we can do the bitcast between this widened type and the widened input type. Afterwards we can extract_subvector back to the original output and any_extend that. Type legalization will then circle back and handle promotion of the extract_subvector and the any_extend will just be removed. This will avoid going through the stack and allows us to remove a custom version of this legalization from X86. Reviewers: efriedma, RKSimon Reviewed By: efriedma Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53229 llvm-svn: 345567	2018-10-30 03:27:15 +00:00
Matt Arsenault	0ae1b52186	Remove dead declaration llvm-svn: 345555	2018-10-30 01:12:12 +00:00
Matt Arsenault	901a95f729	Pass TRI to printReg llvm-svn: 345553	2018-10-30 01:11:31 +00:00
Jessica Paquette	e3932eeea4	[MachineOutliner] Inherit target features from parent function If a function has target features, it may contain instructions that aren't represented in the default set of instructions. If the outliner pulls out one of these instructions, and the function doesn't have the right attributes attached, we'll run into an LLVM error explaining that the target doesn't support the necessary feature for the instruction. This makes outlined functions inherit target features from their parents. It also updates the machine-outliner.ll test to check that we're properly inheriting target features. llvm-svn: 345535	2018-10-29 20:27:07 +00:00
Leonard Chan	905abe5b5d	[Intrinsic] Signed and Unsigned Saturation Subtraction Intirnsics Add an intrinsic that takes 2 integers and perform saturation subtraction on them. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D53783 llvm-svn: 345512	2018-10-29 16:54:37 +00:00
Craig Topper	7a18b4bc51	[SelectionDAG] Fix bad indentation. NFC llvm-svn: 345481	2018-10-28 21:24:20 +00:00
Simon Pilgrim	3497d536f7	[TargetLowering] Move i64/vXi64 to f32/vXf32 UINT_TO_FP handling to TargetLowering::expandUINT_TO_FP. llvm-svn: 345478	2018-10-28 15:34:35 +00:00
Simon Pilgrim	9b77f0c291	[VectorLegalizer] Enable TargetLowering::expandFP_TO_UINT support. Add vector support to TargetLowering::expandFP_TO_UINT. This exposes an issue in X86TargetLowering::LowerVSELECT which was assuming that the select mask was the same width as the LHS/RHS ops - as long as the result is a sign splat we can easily sext/trunk this. llvm-svn: 345473	2018-10-28 13:07:25 +00:00
Craig Topper	c4b785ae1e	[DAGCombiner] Better constant vector support for FCOPYSIGN. Enable constant folding when both operands are vectors of constants. Turn into FNEG/FABS when the RHS is a splat constant vector. llvm-svn: 345469	2018-10-28 01:32:49 +00:00

... 2 3 4 5 6 ...

25526 Commits