teak-llvm

mirror of https://github.com/Gericom/teak-llvm.git synced 2025-06-24 14:05:49 -04:00

Author	SHA1	Message	Date
Francis Visoiu Mistrih	164560bd74	[AArch64] Emit CSR loads in the same order as stores Optionally allow the order of restoring the callee-saved registers in the epilogue to be reversed. The flag -reverse-csr-restore-seq generates the following code: ``` stp x26, x25, [sp, #-64]! stp x24, x23, [sp, #16] stp x22, x21, [sp, #32] stp x20, x19, [sp, #48] ; [..] ldp x24, x23, [sp, #16] ldp x22, x21, [sp, #32] ldp x20, x19, [sp, #48] ldp x26, x25, [sp], #64 ret ``` Note how the CSRs are restored in the same order as they are saved. One exception to this rule is the last `ldp`, which allows us to merge the stack adjustment and the ldp into a post-index ldp. This is done by first generating: ldp x26, x27, [sp] add sp, sp, #64 which gets merged by the arm64 load store optimizer into ldp x26, x25, [sp], #64 The flag is disabled by default. llvm-svn: 327569	2018-03-14 20:34:03 +00:00
Craig Topper	9c098ed819	[X86] Add back fast-isel code for handling i8 shifts. I removed this in r316797 because the coverage report showed no coverage and I thought it should have been handled by the auto generated table. I now see that there is code that bypasses the table if the shift amount is out of bounds. This adds back the code. We'll codegen out of bounds i8 shifts to effectively (amount & 0x1f). The 0x1f is a strange quirk of x86 that shift amounts are always masked to 5-bits(except 64-bits). So if the masked value is still out bounds the result will be 0. Fixes PR36731. llvm-svn: 327540	2018-03-14 17:57:19 +00:00
Francis Visoiu Mistrih	084e7d8770	[AArch64] Keep track of MIFlags in the LoadStoreOptimizer Merging: * $x26, $x25 = frame-setup LDPXi $sp, 0 * $sp = frame-destroy ADDXri $sp, 64, 0 into an LDPXpost should preserve the flags from both instructions as following: * frame-setup frame-destroy LDPXpost Differential Revision: https://reviews.llvm.org/D44446 llvm-svn: 327533	2018-03-14 17:10:58 +00:00
Craig Topper	b36cb20ef9	[X86] Teach X86TargetLowering::targetShrinkDemandedConstant to set non-demanded bits if it helps created an and mask that can be matched as a zero extend. I had to modify the bswap recognition to allow unshrunk masks to make this work. Fixes PR36689. Differential Revision: https://reviews.llvm.org/D44442 llvm-svn: 327530	2018-03-14 16:55:15 +00:00
Simon Pilgrim	d1c3c995c0	[X86][AVX] Use WriteFShuffleLd for broadcast reg-mem instructions They shouldn't be treated as pure loads. Found while investigating D44428 llvm-svn: 327524	2018-03-14 15:47:08 +00:00
Alexander Ivchenko	86ef9ab28f	[GlobalIsel][X86] Support for G_SDIV instruction Reviewed By: igorb Differential Revision: https://reviews.llvm.org/D44430 llvm-svn: 327520	2018-03-14 15:41:11 +00:00
Petar Jovanovic	3408caf686	[mips] Add support for CRC ASE This includes Instructions: crc32b, crc32h, crc32w, crc32d, crc32cb, crc32ch, crc32cw, crc32cd Assembler directives: .set crc, .set nocrc, .module crc, .module nocrc Attribute: crc .MIPS.abiflags: CRC (0x8000) Patch by Vladimir Stefanovic. Differential Revision: https://reviews.llvm.org/D44176 llvm-svn: 327511	2018-03-14 14:13:31 +00:00
Simon Pilgrim	d594942928	[X86][Btver2] Fix YMM shuffle, permute and permutevar scheduler costs Account for ymm double pumping and add proper pshufb/permutevar support llvm-svn: 327510	2018-03-14 14:05:19 +00:00
Simon Pilgrim	de995e6e37	[X86][SSE] Use WriteFShuffleLd for MOVDDUP/MOVSHDUP/MOVSLDUP reg-mem instructions They shouldn't be treated as pure loads. Found while investigating D44428 llvm-svn: 327505	2018-03-14 13:22:56 +00:00
Martin Storsjo	bde677289a	[AArch64] Don't produce R_AARCH64_TLSLE_LDST32_TPREL_LO12_NC Support for this relocation is missing in both LLD and GNU binutils at the moment. This reverts the ELF parts of SVN r327316. llvm-svn: 327503	2018-03-14 13:09:10 +00:00
Alexander Ivchenko	0bd4d8c901	[GlobalISel][X86] Support G_LSHR/G_ASHR/G_SHL Support G_LSHR/G_ASHR/G_SHL. We have 3 variance for shift instructions : shift gpr, shift imm, shift 1. Currently GlobalIsel TableGen generate patterns for shift imm and shift 1, but with shiftCount i8. In G_LSHR/G_ASHR/G_SHL like LLVM-IR both arguments has the same type, so for now only shift i8 can use auto generated TableGen patterns. The support of G_SHL/G_ASHR enables tryCombineSExt from LegalizationArtifactCombiner.h to hit, which results in different legalization for the following tests: LLVM :: CodeGen/X86/GlobalISel/ext-x86-64.ll LLVM :: CodeGen/X86/GlobalISel/gep.ll LLVM :: CodeGen/X86/GlobalISel/legalize-ext-x86-64.mir -; X64-NEXT: movsbl %dil, %eax +; X64-NEXT: movl $24, %ecx +; X64-NEXT: # kill: def $cl killed $ecx +; X64-NEXT: shll %cl, %edi +; X64-NEXT: movl $24, %ecx +; X64-NEXT: # kill: def $cl killed $ecx +; X64-NEXT: sarl %cl, %edi +; X64-NEXT: movl %edi, %eax ..which is not optimal and should be addressed later. Rework of the patch by igorb Reviewed By: igorb Differential Revision: https://reviews.llvm.org/D44395 llvm-svn: 327499	2018-03-14 11:23:57 +00:00
Alexander Ivchenko	327de80529	[GlobalIsel][X86] Support for G_ZEXT instruction Reviewed By: igorb Differential Revision: https://reviews.llvm.org/D44378 llvm-svn: 327482	2018-03-14 09:11:23 +00:00
Matt Arsenault	41e5ac4fa4	TargetMachine: Add address space to getPointerSize llvm-svn: 327467	2018-03-14 00:36:23 +00:00
Craig Topper	ec4881ad53	[X86] Simplify the LowerAVXCONCAT_VECTORS code a little by creating a single path for insert_subvector handling. We now only create recursive concats if we have more than two non-zero values. This keeps our subvector broadcast DAG combine functioning. llvm-svn: 327457	2018-03-13 22:36:07 +00:00
Craig Topper	cc060e921b	[X86] Rewrite LowerAVXCONCAT_VECTORS similar to how we handle vXi1 concats. This better able to detect undef and zeros pieces in the concat. Or cases when only one subvector is non-zero. This allows us to avoid silly things like double inserts into progressively larger undefs. This still builds 512 bit concats of 128 bits by building up through 256 bits first. But I don't know if that's best. We probably want to merge this with the vXi1 concat code since they are very similar. llvm-svn: 327454	2018-03-13 22:05:25 +00:00
Simon Dardis	e5f72dd5e1	Revert "[mips] Guard traps for microMIPS correctly" This appears to have broken the expensive checks bot in a strange fashion. Reverting until I can investigate. This reverts r327409. llvm-svn: 327427	2018-03-13 17:31:11 +00:00
Craig Topper	7e711a6822	[X86] Remove SplitBinaryOpsAndApply and use SplitOpsAndApply by adding curly braces around the ops. Summary: Unless you were intentionally avoiding this syntax? I saw you mentioned makeArrayRef in your commit that added SplitOpsAndApply. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44403 llvm-svn: 327418	2018-03-13 16:23:27 +00:00
Zaara Syeda	df28fb6ac2	test commit: fix formatting of a comment This is a simple change to do the test commit. llvm-svn: 327412	2018-03-13 15:49:05 +00:00
Simon Dardis	d5ae61d49d	[mips] Guard traps for microMIPS correctly This is part of fixing the instruction predicates for MIPS. Reviewers: atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D44212 llvm-svn: 327409	2018-03-13 15:46:58 +00:00
Simon Pilgrim	3d4c86d399	[X86][Btver2] Split i8/i16/i32/i64 div/idiv costs We were assuming a mixture of 32/64 division costs. llvm-svn: 327407	2018-03-13 15:22:24 +00:00
Simon Dardis	476ed8f26e	[mips] Fix the definitions of the EVA instructions Correct their availability to their respective ISAs. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D44209 llvm-svn: 327403	2018-03-13 14:39:44 +00:00
Simon Dardis	9d7e9032f1	[mips] Don't create nested CALLSEQ_START..CALLSEQ_END nodes. For the MIPS O32 ABI, the current call lowering logic naively lowers each call, creating the reserved argument area to hold the argument spill areas for $a0..$a3 and the outgoing parameter area if one is required at each call site. In the case of a sufficently large byval argument, a call to memcpy is used to write the start+16..end of the argument into the outgoing parameter area. This is done within the CALLSEQ_START..CALLSEQ_END of the callee. The CALLSEQ nodes are responsible for performing the necessary stack adjustments. Since the O32/N32/N64 MIPS ABIs do not have a red-zone and writing below the stack pointer and reading the values back is unpredictable, the call to memcpy cannot be hoisted out of the callee's CALLSEQ nodes. However, for the O32 ABI requires the reserved argument area for functions which have parameters. The naive lowering of calls will then create nested CALLSEQ sequences. For N32 and N64 these nodes are also created, but with zero stack adjustments as those ABIs do not have a reserved argument area. This patch addresses the correctness issue by recognizing the special case of lowering a byval argument that uses memcpy. By recognizing that the incoming chain already has a CALLSEQ_START node on it when calling memcpy, the CALLSEQ nodes are not created. For the N32 and N64 ABIs, this is not an issue, as no stack adjustment has to be performed. For the O32 ABI, the correctness reasoning is different. In the case of a sufficently large byval argument, registers a0..a3 are going to be used for the callee's arguments, mandating the creation of the reserved argument area. The call to memcpy in the naive case will also create its own reserved argument area. However, since the reserved argument area consists of undefined values, both calls can use the same reserved argument area. Reviewers: abeserminji, atanasyan Differential Revision: https://reviews.llvm.org/D44296 llvm-svn: 327388	2018-03-13 12:50:03 +00:00
Simon Pilgrim	93bd7187f4	[X86][SSE41] createVariablePermute v2X64 - PCMPEQQ can test for index 0/1 and select between them. llvm-svn: 327385	2018-03-13 12:22:58 +00:00
Yonghong Song	82bf8bcb4f	bpf: Enhance debug information for peephole optimization passes Add more debug information for peephole optimization passes. These would only be enabled for debug version binary and could help analyzing why some optimization opportunities were missed. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327371	2018-03-13 06:47:07 +00:00
Yonghong Song	e91802f336	bpf: New post-RA peephole optimization pass to eliminate bad RA codegen This new pass eliminate identical move: MOV rA, rA This is particularly possible to happen when sub-register support enabled. The special type cast insn MOV_32_64 involves different register class on src (i32) and dst (i64), RA could generate useless instruction due to this. This pass also could serve as the bast for further post-RA optimization. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327370	2018-03-13 06:47:06 +00:00
Yonghong Song	80b882ecc5	bpf: Don't expand BSWAP on i32, promote it Currently, there is no ALU32 bswap support in eBPF ISA. BSWAP on i32 was set to EXPAND which would need about eight instructions for single BSWAP. It would be more efficient to promote it to i64, then doing BSWAP on i64. For eBPF programs, most of the promotion are zero extensions which are likely be elimiated later by peephole optimizations. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327369	2018-03-13 06:47:05 +00:00
Yonghong Song	1d28a759d9	bpf: Support subregister definition check on PHI node This patch relax the subregister definition check on Phi node. Previously, we just cancel the optimizatoin when the definition is Phi node while actually we could further check the definitions of incoming parameters of PHI node. This helps catch more elimination opportunities. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327368	2018-03-13 06:47:04 +00:00
Yonghong Song	c88bcdec43	bpf: Extends zero extension elimination beyond comparison instructions The current zero extension elimination was restricted to operands of comparison. It actually could be extended to more cases. For example: int inc_p (int p, unsigned a) { return p + a; } 'a' will be promoted to i64 during addition, and the zero extension could be eliminated as well. For the elimination optimization, it should be much better to start recognizing the candidate sequence from the SRL instruction instead of J* instructions. This patch makes it an generic zero extension elimination pass instead of one restricted with comparison. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327367	2018-03-13 06:47:03 +00:00
Yonghong Song	905d13c123	bpf: J_RR should check both operands There is a mistake in current code that we "break" out the optimization when the first operand of J_RR doesn't qualify the elimination. This caused some elimination opportunities missed, for example the one in the testcase. The code should just fall through to handle the second operand. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327366	2018-03-13 06:47:02 +00:00
Yonghong Song	89e47ac671	bpf: Tighten subregister definition check The current subregister definition check stops after the MOV_32_64 instruction. This means we are thinking all the following instruction sequences are safe to be eliminated: MOV_32_64 rB, wA SLL_ri rB, rB, 32 SRL_ri rB, rB, 32 However, this is not true. The source subregister wA of MOV_32_64 could come from a implicit truncation of 64-bit register in which case the high bits of the 64-bit register is not zeroed, therefore we can't eliminate above sequence. For example, for i32_val, we shouldn't do the elimination: long long bar (); int foo (int b, int c) { unsigned int i32_val = (unsigned int) bar(); if (i32_val < 10) return b; else return c; } Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 327365	2018-03-13 06:47:00 +00:00
Simon Pilgrim	7f1b9196cb	[X86][Btver2] Clean up formatting/comments in scheduler model. NFCI. Moved 'special cases' to be closer to other system classes. llvm-svn: 327332	2018-03-12 21:35:12 +00:00
Lei Huang	cd4f385795	[PowerPC][NFC] Explicitly state types on FP SDAG patterns in anticipation of adding the f128 type llvm-svn: 327319	2018-03-12 19:26:18 +00:00
Martin Storsjo	7bc64bd889	[AArch64] Fold adds with tprel_lo12_nc and secrel_lo12 into a following ldr/str Differential Revision: https://reviews.llvm.org/D44355 llvm-svn: 327316	2018-03-12 18:47:43 +00:00
Krzysztof Parzyszek	2d08f2ebf8	[Hexagon] Counting leading/trailing bits is cheap llvm-svn: 327308	2018-03-12 18:18:23 +00:00
Simon Pilgrim	f0a9b25394	[X86][Btver2] FSqrt/FDiv reg-reg instructions don't use the AGU. I love you llvm-mca. llvm-svn: 327306	2018-03-12 18:12:46 +00:00
Krzysztof Parzyszek	5d41cc19bd	[Hexagon] Subtarget feature to emit one instruction per packet This adds two features: "packets", and "nvj". Enabling "packets" allows the compiler to generate instruction packets, while disabling it will prevent it and disable all optimizations that generate them. This feature is enabled by default on all subtargets. The feature "nvj" allows the compiler to generate new-value jumps and it implies "packets". It is enabled on all subtargets. The exception is made for packets with endloop instructions, since they require a certain minimum number of instructions in the packets to which they apply. Disabling "packets" will not prevent hardware loops from being generated. llvm-svn: 327302	2018-03-12 17:47:46 +00:00
Simon Pilgrim	a0536c17b9	[X86] Deleting README-MMX.txt now that all tasks have been completed. MMX buildvectors were improved at rL327247 - new MMX bugs should be raised on bugzilla llvm-svn: 327300	2018-03-12 17:29:54 +00:00
Dmitry Preobrazhensky	d98c97b4f9	[AMDGPU][MC][GFX8] Added BUFFER_STORE_LDS_DWORD Instruction See bug 36558: https://bugs.llvm.org/show_bug.cgi?id=36558 Differential Revision: https://reviews.llvm.org/D43950 Reviewers: artem.tamazov, arsenm llvm-svn: 327299	2018-03-12 17:29:24 +00:00
Simon Pilgrim	deface9c73	[X86][Btver2] Prefix all scheduler defs. NFCI. These are all global, so prefix with 'J' to help prevent accidental name clashes with other models. llvm-svn: 327296	2018-03-12 17:07:08 +00:00
Craig Topper	acaba3b402	[X86] Remove use of MVT class from the ShuffleDecode library. MVT belongs to the CodeGen layer, but ShuffleDecode is used by the X86 InstPrinter which is part of the MC layer. This only worked because MVT is completely implemented in a header file with no other library dependencies. Differential Revision: https://reviews.llvm.org/D44353 llvm-svn: 327292	2018-03-12 16:43:11 +00:00
Yaxun Liu	a99e7d8e44	[AMDGPU] Fix lowering enqueue kernel when kernel has no name Since the enqueued kernels have internal linkage, their names may be dropped. In this case, give them unique names __amdgpu_enqueued_kernel or __amdgpu_enqueued_kernel.n where n is a sequential number starting from 1. Differential Revision: https://reviews.llvm.org/D44322 llvm-svn: 327291	2018-03-12 16:34:06 +00:00
Simon Pilgrim	6f01e654b4	[X86][Btver2] Extend JWriteResFpuPair to accept resource/uop counts. NFCI. This allows the single resource classes (VarBlend, MPSAD, VarVecShift) to use the JWriteResFpuPair macro. llvm-svn: 327289	2018-03-12 16:02:56 +00:00
Simon Pilgrim	bc216b440f	[X86][Btver2] Use JWriteResFpuPair wrapper for AES/CLMUL/HADD scheduler cases. NFCI. These are single pipe and have the default resource/uop counts like JWriteResFpuPair so there's no need to handle them separately. llvm-svn: 327283	2018-03-12 15:29:00 +00:00
Dmitry Preobrazhensky	da4a7c01bf	[AMDGPU][MC] Corrected GATHER4 opcodes See bug 36252: https://bugs.llvm.org/show_bug.cgi?id=36252 Differential Revision: https://reviews.llvm.org/D43874 Reviewers: artem.tamazov, arsenm llvm-svn: 327278	2018-03-12 15:03:34 +00:00
Sam McCall	bbfe434185	[Hexagon] fix 'must explicitly initialize the const member' error which clang 3.8 emits llvm-svn: 327273	2018-03-12 14:40:48 +00:00
Matt Arsenault	7b9ed89dcf	AMDGPU/GlobalISel: Legality and RegBankInfo for G_{INSERT\|EXTRACT}_VECTOR_ELT llvm-svn: 327269	2018-03-12 13:35:53 +00:00
Matt Arsenault	c0aefd561e	AMDGPU/GlobalISel: InstrMapping for G_MERGE_VALUES llvm-svn: 327268	2018-03-12 13:35:49 +00:00
Matt Arsenault	503afda95f	AMDGPU/GlobalISel: Make some G_MERGE_VALUEs legal llvm-svn: 327267	2018-03-12 13:35:43 +00:00
Simon Dardis	1f0fe56460	[mips] Split out ASEPredicate from InsnPredicates (NFC) This simplifies tagging instructions with the correct ISA and ASE, albeit making instruction definitions a bit more verbose. Reviewers: atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D44299 llvm-svn: 327265	2018-03-12 13:16:12 +00:00
Nico Weber	73a699e592	MC intel asm parser: Allow @ at the start of function names. Ports parts of r193000 to the intel parser. Fixes part of PR36676. https://reviews.llvm.org/D44359 llvm-svn: 327262	2018-03-12 12:47:27 +00:00

1 2 3 4 5 ...

46487 Commits