Commit Graph

46487 Commits

Author SHA1 Message Date
Francis Visoiu Mistrih
164560bd74 [AArch64] Emit CSR loads in the same order as stores
Optionally allow the order of restoring the callee-saved registers in the
epilogue to be reversed.

The flag -reverse-csr-restore-seq generates the following code:

```
stp     x26, x25, [sp, #-64]!
stp     x24, x23, [sp, #16]
stp     x22, x21, [sp, #32]
stp     x20, x19, [sp, #48]

; [..]

ldp     x24, x23, [sp, #16]
ldp     x22, x21, [sp, #32]
ldp     x20, x19, [sp, #48]
ldp     x26, x25, [sp], #64
ret
```

Note how the CSRs are restored in the same order as they are saved.

One exception to this rule is the last `ldp`, which allows us to merge
the stack adjustment and the ldp into a post-index ldp. This is done by
first generating:

ldp x26, x27, [sp]
add sp, sp, #64

which gets merged by the arm64 load store optimizer into

ldp x26, x25, [sp], #64

The flag is disabled by default.

llvm-svn: 327569
2018-03-14 20:34:03 +00:00
Craig Topper
9c098ed819 [X86] Add back fast-isel code for handling i8 shifts.
I removed this in r316797 because the coverage report showed no coverage and I thought it should have been handled by the auto generated table. I now see that there is code that bypasses the table if the shift amount is out of bounds.

This adds back the code. We'll codegen out of bounds i8 shifts to effectively (amount & 0x1f). The 0x1f is a strange quirk of x86 that shift amounts are always masked to 5-bits(except 64-bits). So if the masked value is still out bounds the result will be 0.

Fixes PR36731.

llvm-svn: 327540
2018-03-14 17:57:19 +00:00
Francis Visoiu Mistrih
084e7d8770 [AArch64] Keep track of MIFlags in the LoadStoreOptimizer
Merging:

* $x26, $x25 = frame-setup LDPXi $sp, 0
* $sp = frame-destroy ADDXri $sp, 64, 0

into an LDPXpost should preserve the flags from both instructions as
following:

* frame-setup frame-destroy LDPXpost

Differential Revision: https://reviews.llvm.org/D44446

llvm-svn: 327533
2018-03-14 17:10:58 +00:00
Craig Topper
b36cb20ef9 [X86] Teach X86TargetLowering::targetShrinkDemandedConstant to set non-demanded bits if it helps created an and mask that can be matched as a zero extend.
I had to modify the bswap recognition to allow unshrunk masks to make this work.

Fixes PR36689.

Differential Revision: https://reviews.llvm.org/D44442

llvm-svn: 327530
2018-03-14 16:55:15 +00:00
Simon Pilgrim
d1c3c995c0 [X86][AVX] Use WriteFShuffleLd for broadcast reg-mem instructions
They shouldn't be treated as pure loads.

Found while investigating D44428

llvm-svn: 327524
2018-03-14 15:47:08 +00:00
Alexander Ivchenko
86ef9ab28f [GlobalIsel][X86] Support for G_SDIV instruction
Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D44430

llvm-svn: 327520
2018-03-14 15:41:11 +00:00
Petar Jovanovic
3408caf686 [mips] Add support for CRC ASE
This includes

  Instructions: crc32b, crc32h, crc32w, crc32d,
                crc32cb, crc32ch, crc32cw, crc32cd

  Assembler directives: .set crc, .set nocrc, .module crc, .module nocrc

  Attribute: crc

  .MIPS.abiflags: CRC (0x8000)

Patch by Vladimir Stefanovic.

Differential Revision: https://reviews.llvm.org/D44176

llvm-svn: 327511
2018-03-14 14:13:31 +00:00
Simon Pilgrim
d594942928 [X86][Btver2] Fix YMM shuffle, permute and permutevar scheduler costs
Account for ymm double pumping and add proper pshufb/permutevar support

llvm-svn: 327510
2018-03-14 14:05:19 +00:00
Simon Pilgrim
de995e6e37 [X86][SSE] Use WriteFShuffleLd for MOVDDUP/MOVSHDUP/MOVSLDUP reg-mem instructions
They shouldn't be treated as pure loads.

Found while investigating D44428

llvm-svn: 327505
2018-03-14 13:22:56 +00:00
Martin Storsjo
bde677289a [AArch64] Don't produce R_AARCH64_TLSLE_LDST32_TPREL_LO12_NC
Support for this relocation is missing in both LLD and GNU binutils
at the moment.

This reverts the ELF parts of SVN r327316.

llvm-svn: 327503
2018-03-14 13:09:10 +00:00
Alexander Ivchenko
0bd4d8c901 [GlobalISel][X86] Support G_LSHR/G_ASHR/G_SHL
Support G_LSHR/G_ASHR/G_SHL. We have 3 variance for
shift instructions : shift gpr, shift imm, shift 1.
Currently GlobalIsel TableGen generate patterns for
shift imm and shift 1, but with shiftCount i8.
In G_LSHR/G_ASHR/G_SHL like LLVM-IR both arguments
has the same type, so for now only shift i8 can use
auto generated TableGen patterns.

The support of G_SHL/G_ASHR enables tryCombineSExt
from LegalizationArtifactCombiner.h to hit, which
results in different legalization for the following tests:
    LLVM :: CodeGen/X86/GlobalISel/ext-x86-64.ll
    LLVM :: CodeGen/X86/GlobalISel/gep.ll
    LLVM :: CodeGen/X86/GlobalISel/legalize-ext-x86-64.mir

-; X64-NEXT:    movsbl %dil, %eax
+; X64-NEXT:    movl $24, %ecx
+; X64-NEXT:    # kill: def $cl killed $ecx
+; X64-NEXT:    shll %cl, %edi
+; X64-NEXT:    movl $24, %ecx
+; X64-NEXT:    # kill: def $cl killed $ecx
+; X64-NEXT:    sarl %cl, %edi
+; X64-NEXT:    movl %edi, %eax

..which is not optimal and should be addressed later.

Rework of the patch by igorb

Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D44395

llvm-svn: 327499
2018-03-14 11:23:57 +00:00
Alexander Ivchenko
327de80529 [GlobalIsel][X86] Support for G_ZEXT instruction
Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D44378

llvm-svn: 327482
2018-03-14 09:11:23 +00:00
Matt Arsenault
41e5ac4fa4 TargetMachine: Add address space to getPointerSize
llvm-svn: 327467
2018-03-14 00:36:23 +00:00
Craig Topper
ec4881ad53 [X86] Simplify the LowerAVXCONCAT_VECTORS code a little by creating a single path for insert_subvector handling.
We now only create recursive concats if we have more than two non-zero values. This keeps our subvector broadcast DAG combine functioning.

llvm-svn: 327457
2018-03-13 22:36:07 +00:00
Craig Topper
cc060e921b [X86] Rewrite LowerAVXCONCAT_VECTORS similar to how we handle vXi1 concats.
This better able to detect undef and zeros pieces in the concat. Or cases when only one subvector is non-zero. This allows us to avoid silly things like double inserts into progressively larger undefs.

This still builds 512 bit concats of 128 bits by building up through 256 bits first. But I don't know if that's best.

We probably want to merge this with the vXi1 concat code since they are very similar.

llvm-svn: 327454
2018-03-13 22:05:25 +00:00
Simon Dardis
e5f72dd5e1 Revert "[mips] Guard traps for microMIPS correctly"
This appears to have broken the expensive checks bot in
a strange fashion. Reverting until I can investigate.

This reverts r327409.

llvm-svn: 327427
2018-03-13 17:31:11 +00:00
Craig Topper
7e711a6822 [X86] Remove SplitBinaryOpsAndApply and use SplitOpsAndApply by adding curly braces around the ops.
Summary: Unless you were intentionally avoiding this syntax? I saw you mentioned makeArrayRef in your commit that added SplitOpsAndApply.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44403

llvm-svn: 327418
2018-03-13 16:23:27 +00:00
Zaara Syeda
df28fb6ac2 test commit: fix formatting of a comment
This is  a simple change to do the test commit.

llvm-svn: 327412
2018-03-13 15:49:05 +00:00
Simon Dardis
d5ae61d49d [mips] Guard traps for microMIPS correctly
This is part of fixing the instruction predicates for MIPS.

Reviewers: atanasyan, abeserminji

Differential Revision: https://reviews.llvm.org/D44212

llvm-svn: 327409
2018-03-13 15:46:58 +00:00
Simon Pilgrim
3d4c86d399 [X86][Btver2] Split i8/i16/i32/i64 div/idiv costs
We were assuming a mixture of 32/64 division costs.

llvm-svn: 327407
2018-03-13 15:22:24 +00:00
Simon Dardis
476ed8f26e [mips] Fix the definitions of the EVA instructions
Correct their availability to their respective ISAs.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D44209

llvm-svn: 327403
2018-03-13 14:39:44 +00:00
Simon Dardis
9d7e9032f1 [mips] Don't create nested CALLSEQ_START..CALLSEQ_END nodes.
For the MIPS O32 ABI, the current call lowering logic naively lowers each
call, creating the reserved argument area to hold the argument spill areas for
$a0..$a3 and the outgoing parameter area if one is required at each call site.

In the case of a sufficently large byval argument, a call to memcpy is used
to write the start+16..end of the argument into the outgoing parameter area.
This is done within the CALLSEQ_START..CALLSEQ_END of the callee. The CALLSEQ
nodes are responsible for performing the necessary stack adjustments.

Since the O32/N32/N64 MIPS ABIs do not have a red-zone and writing below the
stack pointer and reading the values back is unpredictable, the call to memcpy
cannot be hoisted out of the callee's CALLSEQ nodes.

However, for the O32 ABI requires the reserved argument area for functions
which have parameters. The naive lowering of calls will then create nested
CALLSEQ sequences. For N32 and N64 these nodes are also created, but with
zero stack adjustments as those ABIs do not have a reserved argument area.

This patch addresses the correctness issue by recognizing the special case
of lowering a byval argument that uses memcpy. By recognizing that the
incoming chain already has a CALLSEQ_START node on it when calling memcpy,
the CALLSEQ nodes are not created. For the N32 and N64 ABIs, this is not an
issue, as no stack adjustment has to be performed.

For the O32 ABI, the correctness reasoning is different. In the case of a
sufficently large byval argument, registers a0..a3 are going to be used for
the callee's arguments, mandating the creation of the reserved argument area.
The call to memcpy in the naive case will also create its own reserved
argument area. However, since the reserved argument area consists of undefined
values, both calls can use the same reserved argument area.

Reviewers: abeserminji, atanasyan

Differential Revision: https://reviews.llvm.org/D44296

llvm-svn: 327388
2018-03-13 12:50:03 +00:00
Simon Pilgrim
93bd7187f4 [X86][SSE41] createVariablePermute v2X64 - PCMPEQQ can test for index 0/1 and select between them.
llvm-svn: 327385
2018-03-13 12:22:58 +00:00
Yonghong Song
82bf8bcb4f bpf: Enhance debug information for peephole optimization passes
Add more debug information for peephole optimization passes.

These would only be enabled for debug version binary and could help
analyzing why some optimization opportunities were missed.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327371
2018-03-13 06:47:07 +00:00
Yonghong Song
e91802f336 bpf: New post-RA peephole optimization pass to eliminate bad RA codegen
This new pass eliminate identical move:

  MOV rA, rA

This is particularly possible to happen when sub-register support
enabled. The special type cast insn MOV_32_64 involves different
register class on src (i32) and dst (i64), RA could generate useless
instruction due to this.

This pass also could serve as the bast for further post-RA optimization.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327370
2018-03-13 06:47:06 +00:00
Yonghong Song
80b882ecc5 bpf: Don't expand BSWAP on i32, promote it
Currently, there is no ALU32 bswap support in eBPF ISA.

BSWAP on i32 was set to EXPAND which would need about eight instructions
for single BSWAP.

It would be more efficient to promote it to i64, then doing BSWAP on i64.
For eBPF programs, most of the promotion are zero extensions which are
likely be elimiated later by peephole optimizations.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327369
2018-03-13 06:47:05 +00:00
Yonghong Song
1d28a759d9 bpf: Support subregister definition check on PHI node
This patch relax the subregister definition check on Phi node.
Previously, we just cancel the optimizatoin when the definition is Phi
node while actually we could further check the definitions of incoming
parameters of PHI node.

This helps catch more elimination opportunities.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327368
2018-03-13 06:47:04 +00:00
Yonghong Song
c88bcdec43 bpf: Extends zero extension elimination beyond comparison instructions
The current zero extension elimination was restricted to operands of
comparison. It actually could be extended to more cases.

For example:

  int *inc_p (int *p, unsigned a)
  {
    return p + a;
  }

'a' will be promoted to i64 during addition, and the zero extension could
be eliminated as well.

For the elimination optimization, it should be much better to start
recognizing the candidate sequence from the SRL instruction instead of J*
instructions.

This patch makes it an generic zero extension elimination pass instead of
one restricted with comparison.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327367
2018-03-13 06:47:03 +00:00
Yonghong Song
905d13c123 bpf: J*_RR should check both operands
There is a mistake in current code that we "break" out the optimization
when the first operand of J*_RR doesn't qualify the elimination. This
caused some elimination opportunities missed, for example the one in the
testcase.

The code should just fall through to handle the second operand.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327366
2018-03-13 06:47:02 +00:00
Yonghong Song
89e47ac671 bpf: Tighten subregister definition check
The current subregister definition check stops after the MOV_32_64
instruction.

This means we are thinking all the following instruction sequences
are safe to be eliminated:

  MOV_32_64 rB, wA
  SLL_ri    rB, rB, 32
  SRL_ri    rB, rB, 32

However, this is *not* true. The source subregister wA of MOV_32_64 could
come from a implicit truncation of 64-bit register in which case the high
bits of the 64-bit register is not zeroed, therefore we can't eliminate
above sequence.

For example, for i32_val, we shouldn't do the elimination:

  long long bar ();

  int foo (int b, int c)
  {
    unsigned int i32_val = (unsigned int) bar();

    if (i32_val < 10)
      return b;
    else
      return c;
  }

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327365
2018-03-13 06:47:00 +00:00
Simon Pilgrim
7f1b9196cb [X86][Btver2] Clean up formatting/comments in scheduler model. NFCI.
Moved 'special cases' to be closer to other system classes.

llvm-svn: 327332
2018-03-12 21:35:12 +00:00
Lei Huang
cd4f385795 [PowerPC][NFC] Explicitly state types on FP SDAG patterns in anticipation of adding the f128 type
llvm-svn: 327319
2018-03-12 19:26:18 +00:00
Martin Storsjo
7bc64bd889 [AArch64] Fold adds with tprel_lo12_nc and secrel_lo12 into a following ldr/str
Differential Revision: https://reviews.llvm.org/D44355

llvm-svn: 327316
2018-03-12 18:47:43 +00:00
Krzysztof Parzyszek
2d08f2ebf8 [Hexagon] Counting leading/trailing bits is cheap
llvm-svn: 327308
2018-03-12 18:18:23 +00:00
Simon Pilgrim
f0a9b25394 [X86][Btver2] FSqrt/FDiv reg-reg instructions don't use the AGU.
I love you llvm-mca.

llvm-svn: 327306
2018-03-12 18:12:46 +00:00
Krzysztof Parzyszek
5d41cc19bd [Hexagon] Subtarget feature to emit one instruction per packet
This adds two features: "packets", and "nvj".

Enabling "packets" allows the compiler to generate instruction packets,
while disabling it will prevent it and disable all optimizations that
generate them. This feature is enabled by default on all subtargets.
The feature "nvj" allows the compiler to generate new-value jumps and it
implies "packets". It is enabled on all subtargets.

The exception is made for packets with endloop instructions, since they
require a certain minimum number of instructions in the packets to which
they apply. Disabling "packets" will not prevent hardware loops from
being generated.

llvm-svn: 327302
2018-03-12 17:47:46 +00:00
Simon Pilgrim
a0536c17b9 [X86] Deleting README-MMX.txt now that all tasks have been completed.
MMX buildvectors were improved at rL327247 - new MMX bugs should be raised on bugzilla

llvm-svn: 327300
2018-03-12 17:29:54 +00:00
Dmitry Preobrazhensky
d98c97b4f9 [AMDGPU][MC][GFX8] Added BUFFER_STORE_LDS_DWORD Instruction
See bug 36558: https://bugs.llvm.org/show_bug.cgi?id=36558

Differential Revision: https://reviews.llvm.org/D43950

Reviewers: artem.tamazov, arsenm
llvm-svn: 327299
2018-03-12 17:29:24 +00:00
Simon Pilgrim
deface9c73 [X86][Btver2] Prefix all scheduler defs. NFCI.
These are all global, so prefix with 'J' to help prevent accidental name clashes with other models.

llvm-svn: 327296
2018-03-12 17:07:08 +00:00
Craig Topper
acaba3b402 [X86] Remove use of MVT class from the ShuffleDecode library.
MVT belongs to the CodeGen layer, but ShuffleDecode is used by the X86 InstPrinter which is part of the MC layer. This only worked because MVT is completely implemented in a header file with no other library dependencies.

Differential Revision: https://reviews.llvm.org/D44353

llvm-svn: 327292
2018-03-12 16:43:11 +00:00
Yaxun Liu
a99e7d8e44 [AMDGPU] Fix lowering enqueue kernel when kernel has no name
Since the enqueued kernels have internal linkage, their names may be dropped.
In this case, give them unique names __amdgpu_enqueued_kernel or
__amdgpu_enqueued_kernel.n where n is a sequential number starting from 1.

Differential Revision: https://reviews.llvm.org/D44322

llvm-svn: 327291
2018-03-12 16:34:06 +00:00
Simon Pilgrim
6f01e654b4 [X86][Btver2] Extend JWriteResFpuPair to accept resource/uop counts. NFCI.
This allows the single resource classes (VarBlend, MPSAD, VarVecShift) to use the JWriteResFpuPair macro.

llvm-svn: 327289
2018-03-12 16:02:56 +00:00
Simon Pilgrim
bc216b440f [X86][Btver2] Use JWriteResFpuPair wrapper for AES/CLMUL/HADD scheduler cases. NFCI.
These are single pipe and have the default resource/uop counts like JWriteResFpuPair so there's no need to handle them separately.

llvm-svn: 327283
2018-03-12 15:29:00 +00:00
Dmitry Preobrazhensky
da4a7c01bf [AMDGPU][MC] Corrected GATHER4 opcodes
See bug 36252: https://bugs.llvm.org/show_bug.cgi?id=36252

Differential Revision: https://reviews.llvm.org/D43874

Reviewers: artem.tamazov, arsenm
llvm-svn: 327278
2018-03-12 15:03:34 +00:00
Sam McCall
bbfe434185 [Hexagon] fix 'must explicitly initialize the const member' error which clang 3.8 emits
llvm-svn: 327273
2018-03-12 14:40:48 +00:00
Matt Arsenault
7b9ed89dcf AMDGPU/GlobalISel: Legality and RegBankInfo for G_{INSERT|EXTRACT}_VECTOR_ELT
llvm-svn: 327269
2018-03-12 13:35:53 +00:00
Matt Arsenault
c0aefd561e AMDGPU/GlobalISel: InstrMapping for G_MERGE_VALUES
llvm-svn: 327268
2018-03-12 13:35:49 +00:00
Matt Arsenault
503afda95f AMDGPU/GlobalISel: Make some G_MERGE_VALUEs legal
llvm-svn: 327267
2018-03-12 13:35:43 +00:00
Simon Dardis
1f0fe56460 [mips] Split out ASEPredicate from InsnPredicates (NFC)
This simplifies tagging instructions with the correct ISA and ASE, albeit making
instruction definitions a bit more verbose.

Reviewers: atanasyan, abeserminji

Differential Revision: https://reviews.llvm.org/D44299

llvm-svn: 327265
2018-03-12 13:16:12 +00:00
Nico Weber
73a699e592 MC intel asm parser: Allow @ at the start of function names.
Ports parts of r193000 to the intel parser. Fixes part of PR36676.

https://reviews.llvm.org/D44359

llvm-svn: 327262
2018-03-12 12:47:27 +00:00