Commit Graph

50329 Commits

Author SHA1 Message Date
Simon Pilgrim
3c157d3fa3 [AMDGPU] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349912
2018-12-21 15:29:47 +00:00
Simon Pilgrim
ca8bca2ad3 [WebAssembly] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349911
2018-12-21 15:25:37 +00:00
Simon Pilgrim
d800ee4861 [ARM] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349909
2018-12-21 15:15:38 +00:00
Simon Pilgrim
148957f336 [AArch64] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349908
2018-12-21 15:05:10 +00:00
Simon Pilgrim
2482c51e99 [SystemZ] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349906
2018-12-21 14:50:54 +00:00
Simon Pilgrim
d43bdc715c [Lanai] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349905
2018-12-21 14:48:35 +00:00
Simon Pilgrim
af1ab22a76 [PPC] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349903
2018-12-21 14:32:39 +00:00
Simon Pilgrim
57733507fe [X86] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the old KnownBits output paramater version.

llvm-svn: 349902
2018-12-21 14:25:14 +00:00
Luke Cheeseman
41a9e53500 [Dwarf/AArch64] Return address signing B key dwarf support
- When signing return addresses with -msign-return-address=<scope>{+<key>},
  either the A key instructions or the B key instructions can be used. To
  correctly authenticate the return address, the unwinder/debugger must know
  which key was used to sign the return address.
- When and exception is thrown or a break point reached, it may be necessary to
  unwind the stack. To accomplish this, the unwinder/debugger must be able to
  first authenticate an the return address if it has been signed.
- To enable this, the augmentation string of CIEs has been extended to allow
  inclusion of a 'B' character. Functions that are signed using the B key
  variant of the instructions should have and FDE whose associated CIE has a 'B'
  in the augmentation string.
- One must also be able to preserve these semantics when first stepping from a
  high level language into assembly and then, as a second step, into an object
  file. To achieve this, I have introduced a new assembly directive
  '.cfi_b_key_frame ', that tells the assembler the current frame uses return
  address signing with the B key.
- This ensures that the FDE is associated with a CIE that has 'B' in the
  augmentation string.

Differential Revision: https://reviews.llvm.org/D51798

llvm-svn: 349895
2018-12-21 10:45:08 +00:00
Simon Pilgrim
5d403f6bf8 [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (llvm)
This auto upgrades the signed SSE saturated math intrinsics to SADD_SAT/SSUB_SAT generic intrinsics.

Clang counterpart: https://reviews.llvm.org/D55890

Differential Revision: https://reviews.llvm.org/D55894

llvm-svn: 349892
2018-12-21 09:04:14 +00:00
Thomas Lively
b6dac89c87 [WebAssembly] Fix invalid machine instrs in -O0, verify in tests
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55956

llvm-svn: 349889
2018-12-21 06:58:15 +00:00
Matt Arsenault
3eae3c4590 AMDGPU/GlobalISel: RegBankSelect for amdgcn.wqm.vote
llvm-svn: 349882
2018-12-21 03:20:54 +00:00
Matt Arsenault
f4c21c575a AMDGPU/GlobalISel: RegBankSelect for some fp ops
llvm-svn: 349880
2018-12-21 03:14:45 +00:00
Matt Arsenault
bee2ad7185 AMDGPU/GlobalISel: Redo legality for build_vector
It seems better to avoid using the callback if possible since
there are coverage assertions which are disabled if this is used.

Also fix missing tests. Only test the legal cases since it seems
legalization for build_vector is quite lacking.

llvm-svn: 349878
2018-12-21 03:03:11 +00:00
Craig Topper
54f1a7be13 [X86] Refactor hasNoCarryFlagUses and hasNoSignFlagUses in X86ISelDAGToDAG.cpp to tranlate opcode to condition code using the helpers in X86InstrInfo.cpp.
This shortens the switches in X86ISelDAGToDAG.cpp to only need to check condition code instead of a list of opcodes.

This also fixes a bug where the memory forms of SETcc were missing from hasNoCarryFlagUses.

llvm-svn: 349868
2018-12-21 01:14:25 +00:00
Craig Topper
e0cff10289 [X86] Add memory forms of some SETCC instructions to hasNoCarryFlagUses.
Found while working on another patch

llvm-svn: 349867
2018-12-21 01:14:23 +00:00
Eli Friedman
b1bbd5dca3 [ARM] Complete the Thumb1 shift+and->shift+shift transforms.
This saves materializing the immediate.  The additional forms are less
common (they don't usually show up for bitfield insert/extract), but
they're still relevant.

I had to add a new target hook to prevent DAGCombine from reversing the
transform. That isn't the only possible way to solve the conflict, but
it seems straightforward enough.

Differential Revision: https://reviews.llvm.org/D55630

llvm-svn: 349857
2018-12-20 23:39:54 +00:00
Jessica Paquette
a6b9c68a85 [GlobalISel][AArch64] Add G_FCEIL to isPreISelGenericFloatingPointOpcode
If you don't do this, then if you hit a G_LOAD in getInstrMapping, you'll end
up with GPRs on the G_FCEIL instead of FPRs. This causes a fallback.

Add it to the switch, and add a test verifying that this happens.

llvm-svn: 349822
2018-12-20 21:14:15 +00:00
Eli Friedman
48397102d0 [MC] [AArch64] Correctly resolve ":abs_g1:3" etc.
We have to treat constructs like this as if they were "symbolic", to use
the correct codepath to resolve them.  This mostly only affects movz
etc. because the other uses of classifySymbolRef conservatively treat
everything that isn't a constant as if it were a symbol.

Differential Revision: https://reviews.llvm.org/D55906

llvm-svn: 349800
2018-12-20 19:46:14 +00:00
Eli Friedman
4648209e16 [MC] [AArch64] Support resolving fixups for abs_g0 etc.
This requires a bit more code than other fixups, to distingush between
abs_g0/abs_g1/etc.  Actually, I think some of the other fixups are
missing some checks, but I won't try to address that here.

I haven't seen any real-world code that uses a construct like this, but
it clearly should work, and we're considering using it in the
implementation of localescape/localrecover on Windows (see
https://reviews.llvm.org/D53540). I've verified that binutils produces
the same code as llvm-mc for the testcase.

This currently doesn't include support for the *_s variants (that
requires a bit more work to set the opcode).

Differential Revision: https://reviews.llvm.org/D55896

llvm-svn: 349799
2018-12-20 19:38:07 +00:00
Simon Pilgrim
2a25360ae3 [X86] Auto upgrade XOP/AVX512 rotation intrinsics to generic funnel shift intrinsics (llvm)
This emits FSHL/FSHR generic intrinsics for the XOP VPROT and AVX512 VPROL/VPROR rotation intrinsics.

Clang counterpart: https://reviews.llvm.org/D55937

Differential Revision: https://reviews.llvm.org/D55938

llvm-svn: 349795
2018-12-20 19:01:07 +00:00
Yonghong Song
821c93d556 [BPF] Disable relocation for .BTF.ext section
Build llvm with assertion on, and then build bcc against this llvm.
Run any bcc tool with debug=8 (turning on -g for clang compilation),
you will get the following assertion errors,
  /home/yhs/work/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:888:
  void llvm::RuntimeDyldELF::resolveBPFRelocation(const llvm::SectionEntry&, uint64_t,
    uint64_t, uint32_t, int64_t): Assertion `Value <= (4294967295U)' failed.

The .BTF.ext ELF section uses Fixup's to get the instruction
offsets. The data width of the Fixup is 4 bytes since we only need
the insn offset within the section.

This caused the above error though since R_BPF_64_32 expects
4-byte value and the Runtime Dyld tried to resolve the actual
insn address which is 8 bytes.

Actually the offset within the section is all what we need.
Therefore, there is no need to perform any kind of relocation
for .BTF.ext section and such relocation will actually cause
incorrect result.

This patch changed BPFELFObjectWriter::getRelocType() such that
for Fixup Kind FK_Data_4, if the relocation Target is a temporary
symbol, let us skip the relocation (ELF::R_BPF_NONE).

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 349778
2018-12-20 17:40:23 +00:00
Krzysztof Parzyszek
30c42e2ab6 [Hexagon] Add patterns for funnel shifts
llvm-svn: 349770
2018-12-20 16:39:20 +00:00
Alex Bradbury
eb3a64a4da [RISCV] Properly evaluate fixup_riscv_pcrel_lo12
This is a update to D43157 to correctly handle fixup_riscv_pcrel_lo12.

Notable changes:

Rebased onto trunk
Handle and test S-type
Test case pcrel-hilo.s is merged into relocations.s

D43157 description:
VK_RISCV_PCREL_LO has to be handled specially. The MCExpr inside is
actually the location of an auipc instruction with a VK_RISCV_PCREL_HI fixup
pointing to the real target.

Differential Revision: https://reviews.llvm.org/D54029
Patch by Chih-Mao Chen and Michael Spencer.

llvm-svn: 349764
2018-12-20 14:52:15 +00:00
Simon Pilgrim
09c081176a [X86][AVX512] Don't custom lower v16i8 rotations.
As discussed on D55747, the expansion to (wider) shifts is better on all AVX512 cases, not just BWI.

llvm-svn: 349763
2018-12-20 14:38:35 +00:00
Ulrich Weigand
380bece7af [SystemZ] "Generic" vector assembler instructions shoud clobber CC
There are several vector instructions which may or may not set the
condition code register, depending on the value of an argument.

For codegen, we use two versions of the instruction, one that sets
CC and one that doesn't, which hard-code appropriate values of that
argument.  But we also have a "generic" version of the instruction
that is used for the assembler/disassembler.  These generic versions
should always be considered to clobber CC just to be safe.

llvm-svn: 349761
2018-12-20 14:24:17 +00:00
Ulrich Weigand
44d37ae38c [SystemZ] Make better use of VLLEZ
This patch fixes two deficiencies in current code that recognizes
the VLLEZ idiom:

- For the floating-point versions, we have ISel patterns that match
  on a bitconvert as the top node.  In more complex cases, that
  bitconvert may already have been merged into something else.
  Fix the patterns to match the inner nodes instead.

- For the 64-bit integer versions, depending on the surrounding code,
  we may get either a DAG tree based on JOIN_DWORDS or one based on
  INSERT_VECTOR_ELT.  Use a PatFrags to simply match both variants.

llvm-svn: 349749
2018-12-20 13:05:03 +00:00
Ulrich Weigand
8bb46b0f01 [SystemZ] Make better use of VGEF/VGEG
Current code in SystemZDAGToDAGISel::tryGather refuses to perform
any transformation if the Load SDNode has more than one use.  This
(erronously) counts uses of the chain result, which prevents the
optimization in many cases unnecessarily.  Fixed by this patch.

llvm-svn: 349748
2018-12-20 13:01:20 +00:00
Clement Courbet
36a3480385 Re-land r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.
Update PPC ir following GEP->bitcat to bitcat->GEP->bitcat change.

llvm-svn: 349747
2018-12-20 13:01:04 +00:00
Ulrich Weigand
f43b510015 [SystemZ] Make better use of VLDEB
We already have special code (DAG combine support for FP_ROUND)
to recognize cases where we an use a vector version of VLEDB to
perform two floating-point truncates in parallel, but equivalent
support for VLEDB (vector floating-point extends) has been
missing so far.  This patch adds corresponding DAG combine
support for FP_EXTEND.

llvm-svn: 349746
2018-12-20 12:59:05 +00:00
Clement Courbet
e22cf4d7cb Revert r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads."
Forgot to update PowerPC tests for the GEP->bitcast change.

llvm-svn: 349733
2018-12-20 09:58:33 +00:00
Clement Courbet
1bb6e1b0f2 [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.
Summary:
This allows expanding {7,11,13,14,15,21,22,23,25,26,27,28,29,30,31}-byte memcmp
in just two loads on X86. These were previously calling memcmp.

Reviewers: spatel, gchatelet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55263

llvm-svn: 349731
2018-12-20 09:13:47 +00:00
Kang Zhang
ca8db48974 [PowerPC] Implement the isSelectSupported() target hook
Summary:
PowerPC has scalar selects (isel) and vector mask selects (xxsel). But PowerPC
does not have vector CR selects, PowerPC does not support scalar condition 
selects on vectors.
In addition to implementing this hook, isSelectSupported() should return false
when the SelectSupportKind is ScalarCondVectorVal, so that predictable selects
are converted into branch sequences.

Reviewed By: steven.zhang,  hfinkel

Differential Revision: https://reviews.llvm.org/D55754

llvm-svn: 349727
2018-12-20 06:19:59 +00:00
Thomas Lively
feb18fe927 [WebAssembly] Emit a splat for v128 IMPLICIT_DEF
Summary:
This is a code size savings and is also important to get runnable code
while engines do not support v128.const.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55910

llvm-svn: 349724
2018-12-20 04:20:32 +00:00
Amara Emerson
321bfb210a Fix build errors introduced by r349712 on aarch64 bots.
llvm-svn: 349723
2018-12-20 03:27:42 +00:00
Thomas Lively
8dbf29af95 [WebAssembly] Gate unimplemented SIMD ops on flag
Summary:
Gates v128.const, f32x4.sqrt, f32x4.div, i8x16.extract_lane_u, and
i16x8.extract_lane_u on the --wasm-enable-unimplemented-simd flag,
since these ops are not implemented yet in V8.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55904

llvm-svn: 349720
2018-12-20 02:10:22 +00:00
Matt Arsenault
4339883710 AMDGPU: Make i1/i64/v2i32 and/or/xor legal
The 64-bit types do depend on the register bank,
but that's another issue to deal with later.

llvm-svn: 349716
2018-12-20 01:35:49 +00:00
Matt Arsenault
8cc98bee8a AMDGPU/GlobalISel: Fix ValueMapping tables for i1
This was incorrectly selecting SGPR for any i1 values,
e.g. G_TRUNC to i1 from a VGPR was still an SGPR.

llvm-svn: 349715
2018-12-20 01:33:43 +00:00
Craig Topper
9ca2f5605e [X86] Disable custom widening of signed/unsigned add/sub saturation intrinsics under -x86-experimental-vector-widening-legalization.
Generic legalization should take care of this.

llvm-svn: 349714
2018-12-20 01:32:06 +00:00
Amara Emerson
8cb186ce17 [AArch64][GlobalISel] Implement selection og G_MERGE of two s32s into s64.
This code pattern is an unfortunate side effect of the way some types get split
at call lowering. Ideally we'd either not generate it at all or combine it away
in the legalizer artifact combiner.

Until then, add selection support anyway which is a significant proportion of
our current fallbacks on CTMark.

rdar://46491420

llvm-svn: 349712
2018-12-20 01:11:04 +00:00
Matt Arsenault
dff33c38e1 AMDGPU/GlobalISel: RegBankSelect for fp conversions
llvm-svn: 349709
2018-12-20 00:37:02 +00:00
Matt Arsenault
36d4092173 AMDGPU/GlobalISel: Legality/regbankselect for atomicrmw/atomic_cmpxchg
llvm-svn: 349708
2018-12-20 00:33:49 +00:00
Craig Topper
217b3b20d8 [X86] Remove TLI variable from ReplaceNodeResults. NFC
We're already in X86TargetLowering which is a derived class of TargetLowering. We can just call methods directly.

llvm-svn: 349695
2018-12-19 23:13:03 +00:00
Rhys Perry
3931ad38b9 AMDGPU: Add patterns for v4i16/v4f16 -> v4i16/v4f16 bitcasts
Reviewers: arsenm, tstellar

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D55058

llvm-svn: 349694
2018-12-19 22:53:33 +00:00
Evandro Menezes
374ccf6768 [AArch64] Improve Exynos predicates
Expand the predicate `ExynosResetPred` to include all forms of immediate
moves.

llvm-svn: 349686
2018-12-19 22:24:36 +00:00
Evandro Menezes
ff827d737a [AArch64] Use canonical copy idiom
Use only the canonical form of the alias for register transfers in the
`IsCopyIdiomPred` predicate.

llvm-svn: 349685
2018-12-19 22:24:31 +00:00
Craig Topper
d16da2b479 [X86] Remove a bunch of 'else' after returns in reduceVMULWidth. NFC
This reduces indentation and makes it obvious this function always returns something.

llvm-svn: 349671
2018-12-19 19:39:34 +00:00
Jessica Paquette
3560e93dc1 [GlobalISel][AArch64] Add support for @llvm.ceil
This adds a G_FCEIL generic instruction and uses it in AArch64. This adds
selection for floating point ceil where it has a supported, dedicated
instruction. Other cases aren't handled here.

It updates the relevant gisel tests and adds a select-ceil test. It also adds a
check to arm64-vcvt.ll which ensures that we don't fall back when we run into
one of the relevant cases.

llvm-svn: 349664
2018-12-19 19:01:36 +00:00
Craig Topper
84a00bd98a [X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post processing
The (cmp (and X, Y) 0) pattern is greedy and ends up forming a TESTrr and consuming the and when it might be better to use one of the BMI/TBM like BLSR or BLSI.

This patch moves removes the pattern from isel and adds a post processing check to combine TESTrr+ANDrr into just a TESTrr. With this patch we are able to select the BMI/TBM instructions, but we'll also emit a TESTrr when the result is compared to 0. In many cases the peephole pass will be able to use optimizeCompareInstr to remove the TEST, but its probably not perfect.

Differential Revision: https://reviews.llvm.org/D55870

llvm-svn: 349661
2018-12-19 18:49:13 +00:00
Craig Topper
291470347a [X86] Fix assert fails in pass X86AvoidSFBPass
Fixes https://bugs.llvm.org/show_bug.cgi?id=38743

The function removeRedundantBlockingStores is supposed to remove any blocking stores contained in each other in lockingStoresDispSizeMap.
But it currently looks only at the previous one, which will miss some cases that result in assert.

This patch refine the function to check all previous layouts until find the uncontained one. So all redundant stores will be removed.

Patch by Pengfei Wang

Differential Revision: https://reviews.llvm.org/D55642

llvm-svn: 349660
2018-12-19 18:45:57 +00:00
Evandro Menezes
5d409b2278 [AArch64] Improve the Exynos M3 pipeline model
llvm-svn: 349652
2018-12-19 17:37:51 +00:00
Yonghong Song
7b410ac352 [BPF] Generate BTF DebugInfo under BPF target
This patch implements BTF (BPF Type Format).
The BTF is the debug info format for BPF, introduced
in the below linux patch:
  69b693f0ae (diff-06fb1c8825f653d7e539058b72c83332)
and further extended several times, e.g.,
  https://www.spinics.net/lists/netdev/msg534640.html
  https://www.spinics.net/lists/netdev/msg538464.html
  https://www.spinics.net/lists/netdev/msg540246.html

The main advantage of implementing in LLVM is:
   . better integration/deployment as no extra tools are needed.
   . bpf JIT based compilation (like bcc, bpftrace, etc.) can get
     BTF without much extra effort.
   . BTF line_info needs selective source codes, which can be
     easily retrieved when inside the compiler.

This patch implemented BTF generation by registering a BPF
specific DebugHandler in BPFAsmPrinter.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D55752

llvm-svn: 349640
2018-12-19 16:40:25 +00:00
Nicolai Haehnle
8d5e974076 AMDGPU: Use an ABS32_LO relocation for SCRATCH_RSRC_DWORD1
Summary:
Using HI here makes no logical sense, since the dword is only
32 bits to begin with.

Current Mesa master does not look at the relocation type at all,
so this change is fine. Future Mesa will rely on this, however.

Change-Id: I91085707834c4ac0370926602b93c94b90e44cb1

Reviewers: arsenm, rampitec, mareko

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D55369

llvm-svn: 349620
2018-12-19 11:55:03 +00:00
Carl Ritson
c521ac3a44 AMDGPU/InsertWaitcnts: Update VGPR/SGPR bounds when brackets are merged
Summary:
Fix an issue where VGPR/SGPR bounds are not properly extended when brackets are merged.
This manifests as missing waitcnt insertions when multiple brackets are forwarded to a successor block and the first forward has lower VGPR/SGPR bounds.

Irreducible loop test has been extended based on a CTS failure detected for GFX9.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D55602

llvm-svn: 349611
2018-12-19 10:17:49 +00:00
Diana Picus
6c35a1e5af [ARM GlobalISel] Support G_CONSTANT for Thumb2
All we have to do is mark it as legal.

This allows us to select a lot of new patterns handled by TableGen. This
patch adds tests for them and splits up the existing test file for
binary operators into 2 files, one for arithmetic ops and one for
logical ones.

llvm-svn: 349610
2018-12-19 09:55:10 +00:00
Matt Arsenault
b110e2277c AMDGPU/GlobalISel: Regbankselect for fsub
llvm-svn: 349608
2018-12-19 09:07:58 +00:00
Kewen Lin
a6247e7cf4 [PowerPC]Exploit P9 vabsdu for unsigned vselect patterns
For type v4i32/v8ii16/v16i8, do following transforms:
  (vselect (setcc a, b, setugt), (sub a, b), (sub b, a)) -> (vabsd a, b)
  (vselect (setcc a, b, setuge), (sub a, b), (sub b, a)) -> (vabsd a, b)
  (vselect (setcc a, b, setult), (sub b, a), (sub a, b)) -> (vabsd a, b)
  (vselect (setcc a, b, setule), (sub b, a), (sub a, b)) -> (vabsd a, b)

Differential Revision: https://reviews.llvm.org/D55812

llvm-svn: 349599
2018-12-19 03:04:07 +00:00
Evandro Menezes
f03c45d582 [AArch64] Simplify the Exynos M3 pipeline model
llvm-svn: 349569
2018-12-18 23:19:57 +00:00
Evandro Menezes
4e39fa4474 [AArch64] Fix instructions order (NFC)
llvm-svn: 349568
2018-12-18 23:19:55 +00:00
Craig Topper
18a9d545e1 [X86] Add BSR to isUseDefConvertible.
We already had BSF here as part of __builtin_ffs improvements and I was just wondering yesterday whether we should have BSR there.

This addresses one issue from PR40090.

llvm-svn: 349531
2018-12-18 20:03:54 +00:00
Farhana Aleen
59ee2c5362 [AMDGPU] Removed the unnecessary operand size-check-assert from processBaseWithConstOffset().
Summary: 32bit operand sizes are guaranteed by the opcode check AMDGPU::V_ADD_I32_e64 and
         AMDGPU::V_ADDC_U32_e64. Therefore, we don't any additional operand size-check-assert.

Author: FarhanaAleen
llvm-svn: 349529
2018-12-18 19:58:39 +00:00
Craig Topper
8434ef7d1e [X86] Don't use SplitOpsAndApply to create ISD::UADDSAT/ISD::USUBSAT nodes. Let type legalization and op legalization deal with it.
Now that we've switched to target independent nodes we can rely on generic infrastructure to do the legalization for us.

llvm-svn: 349526
2018-12-18 19:29:08 +00:00
Nikita Popov
f6058ff140 [X86] Use SADDSAT/SSUBSAT instead of ADDS/SUBS
Migrate the X86 backend from X86ISD opcodes ADDS and SUBS to generic
ISD opcodes SADDSAT and SSUBSAT. This also improves scodegen for
@llvm.sadd.sat() and @llvm.ssub.sat() intrinsics.

This is a followup to D55787 and part of PR40056.

Differential Revision: https://reviews.llvm.org/D55833

llvm-svn: 349520
2018-12-18 18:28:22 +00:00
Craig Topper
20a6db5a84 [X86] Create PSUBUS from (add (umax X, C), -C)
InstCombine seems to canonicalize or PSUB patter into a max with the cosntant and an add with an inverse of the constant.

This patch recognizes this pattern and turns it into PSUBUS. Future work could improve undef element handling.

Fixes some of PR40053

Differential Revision: https://reviews.llvm.org/D55780

llvm-svn: 349519
2018-12-18 18:26:25 +00:00
Simon Pilgrim
1411917431 [X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for constant rotation amounts
Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion.

llvm-svn: 349510
2018-12-18 17:31:11 +00:00
Simon Pilgrim
e9effe9744 [X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for splat rotation amounts
Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion.

llvm-svn: 349500
2018-12-18 16:02:23 +00:00
Petar Avramovic
0a5e4eb776 [MIPS GlobalISel] Select G_SDIV, G_UDIV, G_SREM and G_UREM
Add support for s64 libcalls for G_SDIV, G_UDIV, G_SREM and G_UREM
and use integer type of correct size when creating arguments for
CLI.lowerCall.
Select G_SDIV, G_UDIV, G_SREM and G_UREM for types s8, s16, s32 and s64
on MIPS32.

Differential Revision: https://reviews.llvm.org/D55651

llvm-svn: 349499
2018-12-18 15:59:51 +00:00
Nikita Popov
665ab08178 [X86] Use UADDSAT/USUBSAT instead of ADDUS/SUBUS
Replace the X86ISD opcodes ADDUS and SUBUS with generic ISD opcodes
UADDSAT and USUBSAT. As a side-effect, this also makes codegen for
the @llvm.uadd.sat and @llvm.usub.sat intrinsics reasonable.

This only replaces use in the X86 backend, and does not move any of
the ADDUS/SUBUS X86 specific combines into generic codegen.

Differential Revision: https://reviews.llvm.org/D55787

llvm-svn: 349481
2018-12-18 13:23:03 +00:00
Petar Avramovic
150fd430f6 [MIPS GlobalISel] ClampScalar G_AND G_OR and G_XOR
Add narrowScalar for G_AND and G_XOR.
Legalize G_AND G_OR and G_XOR for types other then s32 
with clampScalar on MIPS32.

Differential Revision: https://reviews.llvm.org/D55362

llvm-svn: 349475
2018-12-18 11:36:14 +00:00
Luke Cheeseman
f57d7d8237 [AArch64] - Return address signing dwarf support
- Reapply changes intially introduced in r343089
- The archtecture info is no longer loaded whenever a DWARFContext is created
- The runtimes libraries (santiziers) make use of the dwarf context classes but
  do not intialise the target info
- The architecture of the object can be obtained without loading the target info
- Adding a method to the dwarf context to get this information and multiplex the
  string printing later on

Differential Revision: https://reviews.llvm.org/D55774

llvm-svn: 349472
2018-12-18 10:37:42 +00:00
Matt Arsenault
c94e26c71d AMDGPU: Legalize/regbankselect frame_index
llvm-svn: 349468
2018-12-18 09:46:13 +00:00
Matt Arsenault
c0ea221068 AMDGPU: Legalize/regbankselect fma
llvm-svn: 349467
2018-12-18 09:39:56 +00:00
Matt Arsenault
e01e7c81f2 AMDGPU/GlobalISel: Legalize/regbankselect fneg/fabs/fsub
llvm-svn: 349463
2018-12-18 09:19:03 +00:00
Simon Pilgrim
8488a44c34 [X86][SSE] Move VSRAI sign extend in reg fold into SimplifyDemandedBits
(VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1

This works better as part of SimplifyDemandedBits than part of the general combine.

llvm-svn: 349462
2018-12-18 09:11:34 +00:00
Simon Pilgrim
26c630f416 [X86][SSE] Replace (VSRLI (VSRAI X, Y), 31) -> (VSRLI X, 31) fold.
This fold was incredibly specific - replace with a SimplifyDemandedBits fold to remove a VSRAI if only the original sign bit is demanded (its guaranteed to stay the same).

Test change is merely a rescheduling.

llvm-svn: 349459
2018-12-18 08:55:47 +00:00
Kristof Beyls
e66bc1f756 Introduce control flow speculation tracking pass for AArch64
The pass implements tracking of control flow miss-speculation into a "taint"
register. That taint register can then be used to mask off registers with
sensitive data when executing under miss-speculation, a.k.a. "transient
execution".
This pass is aimed at mitigating against SpectreV1-style vulnarabilities.

At the moment, it implements the tracking of miss-speculation of control
flow into a taint register, but doesn't implement a mechanism yet to then
use that taint register to mask off vulnerable data in registers (something
for a follow-on improvement). Possible strategies to mask out vulnerable
data that can be implemented on top of this are:
- speculative load hardening to automatically mask of data loaded
  in registers.
- using intrinsics to mask of data in registers as indicated by the
  programmer (see https://lwn.net/Articles/759423/).

For AArch64, the following implementation choices are made.
Some of these are different than the implementation choices made in
the similar pass implemented in X86SpeculativeLoadHardening.cpp, as
the instruction set characteristics result in different trade-offs.
- The speculation hardening is done after register allocation. With a
  relative abundance of registers, one register is reserved (X16) to be
  the taint register. X16 is expected to not clash with other register
  reservation mechanisms with very high probability because:
  . The AArch64 ABI doesn't guarantee X16 to be retained across any call.
  . The only way to request X16 to be used as a programmer is through
    inline assembly. In the rare case a function explicitly demands to
    use X16/W16, this pass falls back to hardening against speculation
    by inserting a DSB SYS/ISB barrier pair which will prevent control
    flow speculation.
- It is easy to insert mask operations at this late stage as we have
  mask operations available that don't set flags.
- The taint variable contains all-ones when no miss-speculation is detected,
  and contains all-zeros when miss-speculation is detected. Therefore, when
  masking, an AND instruction (which only changes the register to be masked,
  no other side effects) can easily be inserted anywhere that's needed.
- The tracking of miss-speculation is done by using a data-flow conditional
  select instruction (CSEL) to evaluate the flags that were also used to
  make conditional branch direction decisions. Speculation of the CSEL
  instruction can be limited with a CSDB instruction - so the combination of
  CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL
  aren't speculated. When conditional branch direction gets miss-speculated,
  the semantics of the inserted CSEL instruction is such that the taint
  register will contain all zero bits.
  One key requirement for this to work is that the conditional branch is
  followed by an execution of the CSEL instruction, where the CSEL
  instruction needs to use the same flags status as the conditional branch.
  This means that the conditional branches must not be implemented as one
  of the AArch64 conditional branches that do not use the flags as input
  (CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction
  selectors to not produce these instructions when speculation hardening
  is enabled. This pass will assert if it does encounter such an instruction.
- On function call boundaries, the miss-speculation state is transferred from
  the taint register X16 to be encoded in the SP register as value 0.

Future extensions/improvements could be:
- Implement this functionality using full speculation barriers, akin to the
  x86-slh-lfence option. This may be more useful for the intrinsics-based
  approach than for the SLH approach to masking.
  Note that this pass already inserts the full speculation barriers if the
  function for some niche reason makes use of X16/W16.
- no indirect branch misprediction gets protected/instrumented; but this
  could be done for some indirect branches, such as switch jump tables.

Differential Revision: https://reviews.llvm.org/D54896

llvm-svn: 349456
2018-12-18 08:50:02 +00:00
Martin Storsjo
8f0cb9c3a8 [AArch64] [MinGW] Allow enabling SEH exceptions
The default still is dwarf, but SEH exceptions can now be enabled
optionally for the MinGW target.

Differential Revision: https://reviews.llvm.org/D55748

llvm-svn: 349451
2018-12-18 08:32:37 +00:00
Kewen Lin
44ace92596 [PowerPC] Exploit power9 new instruction setb
Check the expected pattens feeding to SELECT_CC like:
   (select_cc lhs, rhs,  1, (sext (setcc [lr]hs, [lr]hs, cc2)), cc1)
   (select_cc lhs, rhs, -1, (zext (setcc [lr]hs, [lr]hs, cc2)), cc1)
   (select_cc lhs, rhs,  0, (select_cc [lr]hs, [lr]hs,  1, -1, cc2), seteq)
   (select_cc lhs, rhs,  0, (select_cc [lr]hs, [lr]hs, -1,  1, cc2), seteq)
Further transform the sequence to comparison + setb if hits.

Differential Revision: https://reviews.llvm.org/D53275

llvm-svn: 349445
2018-12-18 07:53:26 +00:00
Craig Topper
1ff7356f96 [X86] Const correct some helper functions X86InstrInfo.cpp. NFC
llvm-svn: 349440
2018-12-18 04:58:05 +00:00
Kewen Lin
3dac1252da [PowerPC] Improve vec_abs on P9
Improve the current vec_abs support on P9, generate ISD::ABS node for vector types,
combine ABS node to VABSD node for some special cases to make use of P9 VABSD* insns,
do custom lowering to vsub(vneg later)+vmax if it has no combination opportunity.

Differential Revision: https://reviews.llvm.org/D54783

llvm-svn: 349437
2018-12-18 03:16:43 +00:00
Simon Pilgrim
7e2975a44c [X86][SSE] Improve immediate vector shift known bits handling.
Convert VSRAI to VSRLI is the sign bit is known zero and improve KnownBits output for all shift instruction.

Fixes the poor codegen comments in D55768.

llvm-svn: 349407
2018-12-17 22:09:47 +00:00
Wouter van Oortmerssen
d3c544aa6e [WebAssembly] Fix assembler parsing of br_table.
Summary:
We use `variable_ops` in the tablegen defs to denote the list of
branch targets in `br_table`, but unlike other uses of `variable_ops`
(e.g. call) the these branch targets need to actually be encoded in the
instruction. The existing tables for `variable_ops` cause not operands
to be accepted by the assembly matcher.

Following the example of ARM:
2cc0a7da87/lib/Target/ARM/ARMInstrInfo.td (L550-L555)
we introduce a new operand type to capture this list, and we use the
same {} syntax as ARM as well to differentiate them from regular
integer operands.

Also removed definition and use of TSFlags in tablegen defs, since
`br_table` now has a non-variable_ops immediate operand, so the
previous logic of only the variable_ops arguments being labels didn't
make sense anymore.

Reviewers: dschuff, aheejin, sunfish

Subscribers: javed.absar, sbc100, jgravelle-google, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D55401

llvm-svn: 349405
2018-12-17 22:04:44 +00:00
Craig Topper
8c9d772991 [X86] Add T1MSKC and TZMSK to isDefConvertible used by optimizeCompareInstr.
These seem to have been missed when the other TBM instructions were added.

llvm-svn: 349404
2018-12-17 21:50:06 +00:00
Simon Pilgrim
6b5e0b7b2b [X86][SSE] Split SimplifyDemandedBitsForTargetNode X86ISD::VSRLI/VSRAI handling.
First step towards adding more capable combines to fix comments in D55768.

llvm-svn: 349400
2018-12-17 21:36:17 +00:00
Craig Topper
728cbc0378 Convert (CMP (srl/shl X, C), 0) to (CMP (and X, C'), 0) when only the zero flag is used.
This allows a TEST to be used and can be combined with any AND that may already exist as an input to the shift.

This was already done in EmitTest, but was easily tricked by multiple uses because the setcc might be used by multiple instructions. Once the SETCC and users are legalized then we can look for the shift to be used by a single CMP, but the CMP itself can have multiple users.

This appears to fix the case in PR39968.

llvm-svn: 349385
2018-12-17 20:02:16 +00:00
Simon Pilgrim
9274f17a5e [TargetLowering] Add DemandedElts mask to SimplifyDemandedBits (PR40000)
This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen.

I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well.

Differential Revision: https://reviews.llvm.org/D55768

llvm-svn: 349374
2018-12-17 18:43:43 +00:00
Tim Northover
256a16d031 FastIsel: take care to update iterators when removing instructions.
We keep a few iterators into the basic block we're selecting while
performing FastISel. Usually this is fine, but occasionally code wants
to remove already-emitted instructions. When this happens we have to be
careful to update those iterators so they're not pointint at dangling
memory.

llvm-svn: 349365
2018-12-17 17:25:53 +00:00
Petar Avramovic
f9c9bc09ab [MIPS GlobalISel] Remove switch statement (fix r349346 for MSVC)
Temporarily remove switch statement without any case labels 
in function legalizeCustom in order to fix r349346 for MSVC.

llvm-svn: 349356
2018-12-17 15:12:53 +00:00
Tim Northover
ae3b66b7b0 ARM: use acquire/release instruction variants when available.
These features (fairly) recently got split out into their own feature, so we
should make CodeGen use them when available. The main change here is that the
check used to be based on the triple, but now it's based on CPU features.

llvm-svn: 349355
2018-12-17 15:05:32 +00:00
Petar Avramovic
b8276f2280 [MIPS GlobalISel] Lower G_UADDE and narrowScalar G_ADD
Lower G_UADDE and legalize G_ADD using narrowScalar on MIPS32.

Differential Revision: https://reviews.llvm.org/D54580

llvm-svn: 349346
2018-12-17 12:31:07 +00:00
Alexandros Lamprineas
490ae11717 [AArch64] Re-run load/store optimizer after aggressive tail duplication
The Load/Store Optimizer runs before Machine Block Placement. At O3 the
Tail Duplication Threshold is set to 4 instructions and this can create
new opportunities for the Load/Store Optimizer. It seems worthwhile to
run it once again.

llvm-svn: 349338
2018-12-17 10:45:43 +00:00
Craig Topper
fa4907d671 [X86] Fix bad operand lookup for cmov introduced in r349315
The CC is operand 2 not operand 3.

llvm-svn: 349330
2018-12-17 06:40:35 +00:00
Simon Pilgrim
d0c9e43b1c [X86] Pull out constant splat rotation detection.
We had 3 different approaches - consistently use getTargetConstantBitsFromNode and allow undef elts.

llvm-svn: 349319
2018-12-16 19:46:04 +00:00
Craig Topper
10f8892837 [X86] Remove truncation handling from EmitTest. Replace it with a DAG combine.
I'd like to try to move a lot of the flag matching out of EmitTest and push it to isel or isel preprocessing. This is a step towards that.

The test-shrink-bug.ll changie is an improvement because we are no longer interfering with test shrink handling in isel.

The pr34137.ll change is a regression, but the IR came from -O0 and was not reduced by InstCombine. So it contains a lot of redundancies like duplicate loads that made it combine poorly.

llvm-svn: 349315
2018-12-16 18:35:55 +00:00
Sanjay Patel
13ac2f15b0 [x86] increment/decrement constant vector with min/max in vsetcc lowering (PR39859)
This is part of fixing PR39859:
https://bugs.llvm.org/show_bug.cgi?id=39859

We have a crippled vector ISA, so we have to invert a typical fold and create min/max here.

As discussed in the bug report, we can probably do better by using saturating subtract when 
it's available, but we should have this improvement for the min/max patterns regardless.

Alive proofs:
https://rise4fun.com/Alive/zsf
https://rise4fun.com/Alive/Qrl

Differential Revision: https://reviews.llvm.org/D55515

llvm-svn: 349304
2018-12-16 15:05:48 +00:00
Simon Pilgrim
52c982406e [X86] Begin cleaning up combineOr -> SHLD/SHRD. NFCI.
In preparation for converting to funnel shifts.

llvm-svn: 349286
2018-12-15 21:11:49 +00:00
Simon Pilgrim
ef7b5949e5 [X86] Lower to SHLD/SHRD on slow machines for optsize
Use consistent rules for when to lower to SHLD/SHRD for slow machines - fixes a weird issue where funnel shift gets expanded but then X86ISelLowering's combineOr sees the optsize and combines to SHLD/SHRD, but now with the modulo amount guard......

llvm-svn: 349285
2018-12-15 19:43:44 +00:00
Simon Pilgrim
9831d4058c Fix -Wunused-variable warning. NFCI.
llvm-svn: 349265
2018-12-15 12:25:22 +00:00
Florian Hahn
abe32c9125 [SILoadStoreOptimizer] Use std::abs to avoid truncation.
Using regular abs() causes the following warning

error: absolute value function 'abs' given an argument of type 'int64_t' (aka 'long') but has parameter of type 'int' which may cause truncation of value [-Werror,-Wabsolute-value]
        (uint32_t)abs(Dist) > MaxDist) {
                  ^
lib/Target/AMDGPU/SILoadStoreOptimizer.cpp:1369:19: note: use function 'std::abs' instead

which causes a bot to fail:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/18284/steps/bootstrap%20clang/logs/stdio

llvm-svn: 349224
2018-12-15 01:32:58 +00:00
Craig Topper
1fc257d97f [X86] Rename hasNoSignedComparisonUses to hasNoSignFlagUses. Add the instruction that only modify the O flag to the waiver list.
The only caller of this turns CMP with 0 into TEST. CMP with 0 and TEST both set OF to 0 so we should have no issues with instructions that only use OF.

Though I don't think there's any reason we would read just OF after a compare with 0 anyway. So this probably isn't an observable change.

llvm-svn: 349223
2018-12-15 01:07:19 +00:00
Craig Topper
5c304eac41 [X86] Make hasNoCarryFlagUses/hasNoSignedComparisonUses take an SDValue that indicates which result is the flag result. NFCI
hasNoCarryFlagUses hardcoded that the flag result is 1 and used that to filter which uses were of interest. hasNoSignedComparisonUses just assumes the only result is flags and checks whether any user of the node is a CopyToReg instruction.

After this patch we now do a result number check in both and rely on the caller to provide the result number.

This shouldn't change behavior it was just an odd difference between the two functions that I noticed.

llvm-svn: 349222
2018-12-15 01:07:16 +00:00
Artem Belevich
6d74bd638a [NVPTX] Lower instructions that expand into libcalls.
The change is an effort to split and refactor abandoned
D34708 into smaller parts.

Here the behaviour of unsupported instructions is changed
to match the behaviour of explicit intrinsics calls.
Currently LLVM crashes with:
> Assertion getInstruction() && "Not a call or invoke instruction!" failed.

With this patch LLVM produces a more sensible error message:
> Cannot select: ... i32 = ExternalSymbol'__foobar'

Author: Denys Zariaiev <denys.zariaiev@gmail.com>

Differential Revision: https://reviews.llvm.org/D55145

llvm-svn: 349213
2018-12-14 23:53:06 +00:00
Krzysztof Parzyszek
26d994f56e [Hexagon] Add patterns for shifts of v2i16
This fixes https://llvm.org/PR39983.

llvm-svn: 349202
2018-12-14 22:33:48 +00:00
Krzysztof Parzyszek
c0fc0a9775 [Hexagon] Use IMPLICIT_DEF to any-extend 32-bit values to 64 bits
llvm-svn: 349199
2018-12-14 22:05:44 +00:00
Farhana Aleen
ce095c564a [AMDGPU] Promote constant offset to the immediate by finding a new base with 13bit constant offset from the nearby instructions.
Summary: Promote constant offset to immediate by recomputing the relative 13bit offset from nearby instructions.
 E.g.
  s_movk_i32 s0, 0x1800
  v_add_co_u32_e32 v0, vcc, s0, v2
  v_addc_co_u32_e32 v1, vcc, 0, v6, vcc

  s_movk_i32 s0, 0x1000
  v_add_co_u32_e32 v5, vcc, s0, v2
  v_addc_co_u32_e32 v6, vcc, 0, v6, vcc
  global_load_dwordx2 v[5:6], v[5:6], off
  global_load_dwordx2 v[0:1], v[0:1], off
  =>
  s_movk_i32 s0, 0x1000
  v_add_co_u32_e32 v5, vcc, s0, v2
  v_addc_co_u32_e32 v6, vcc, 0, v6, vcc
  global_load_dwordx2 v[5:6], v[5:6], off
  global_load_dwordx2 v[0:1], v[5:6], off offset:2048

Author: FarhanaAleen

Reviewed By: arsenm, rampitec

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D55539

llvm-svn: 349196
2018-12-14 21:13:14 +00:00
Evandro Menezes
ea9d90083f [AArch64] Simplify the scheduling predicates (NFC)
The instruction encodings make it unnecessary to distinguish extended W-form
from X-form instructions.

llvm-svn: 349185
2018-12-14 20:04:58 +00:00
Ehsan Amiri
de1742c284 NFC. Adding an empty line to test the updated commit credentials.
llvm-svn: 349158
2018-12-14 16:19:02 +00:00
Diana Picus
02c8343c75 [ARM GlobalISel] Thumb2: casts between int and ptr
Mark as legal and add tests. Nothing special to do.

llvm-svn: 349147
2018-12-14 13:45:38 +00:00
Diana Picus
813af0d283 [ARM GlobalISel] Minor refactoring. NFCI
Refactor the ARMInstructionSelector to cache some opcodes in the
constructor instead of checking all the time if we're in ARM or Thumb
mode.

llvm-svn: 349143
2018-12-14 12:37:24 +00:00
Diana Picus
14dc3b2959 [ARM GlobalISel] Allow simple binary ops in Thumb2
Mark G_ADD, G_SUB, G_MUL, G_AND, G_OR and G_XOR as legal for both ARM
and Thumb2.

Extract the legalizer tests for these opcodes into another file.

Add tests for the instruction selector.

llvm-svn: 349142
2018-12-14 11:58:14 +00:00
Craig Topper
257ce3871e [DAGCombiner][X86] Prevent visitSIGN_EXTEND from returning N when (sext (setcc)) already has the target desired type for the setcc
Summary:
If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node.

To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term.

Reviewers: RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55459

llvm-svn: 349137
2018-12-14 08:28:24 +00:00
Craig Topper
178abc59ac [X86] Demote EmitTest to a helper function of EmitCmp. Route all callers except EmitCmp through EmitCmp.
This requires the two callers to manifest a 0 to make EmitCmp call EmitTest.

I'm looking into changing how we combine TEST and flag setting instructions to not be part of lowering. And instead be part of DAG combine or isel. Which will mean EmitTest will probably become gutted and maybe disappear entirely.

llvm-svn: 349094
2018-12-13 23:55:30 +00:00
Evandro Menezes
6fe51ac973 [AArch64] Fix Exynos predicates (NFC)
Fix the logic in the definition of the `ExynosShiftExPred` as a more
specific version of `ExynosShiftPred`.  But, since `ExynosShiftExPred` is
not used yet, this change has NFC.

llvm-svn: 349091
2018-12-13 23:19:46 +00:00
Aakanksha Patil
bc568766b2 Revert r348971: [AMDGPU] Support for "uniform-work-group-size" attribute
This patch breaks RADV (and probably RadeonSI as well)

llvm-svn: 349084
2018-12-13 21:23:12 +00:00
Matt Arsenault
934e534c47 AMDGPU/GlobalISel: Legalize/regbankselect block_addr
llvm-svn: 349081
2018-12-13 20:34:15 +00:00
Mircea Trofin
41c729e78e [llvm] Address base discriminator overflow in X86DiscriminateMemOps
Summary:
Macros are expanded on a single line. In case of large expansions,
with sufficiently many instructions with memory operands (and when
-fdebug-info-for-profiling is requested), we may be unable to generate
new base discriminator values - new values overflow (base
discriminators may not be larger than 2^12).

This CL warns instead of asserting in such a case. A subsequent CL
will add APIs to check for overflow before creating new debug info.

See https://bugs.llvm.org/show_bug.cgi?id=39890

Reviewers: davidxl, wmi, gbedwell

Reviewed By: davidxl

Subscribers: aprantl, llvm-commits

Differential Revision: https://reviews.llvm.org/D55643

llvm-svn: 349075
2018-12-13 19:40:59 +00:00
Simon Pilgrim
b5aaa673c6 [X86][SSE] Add SSE vector imm/var shift support to SimplifyDemandedVectorEltsForTargetNode
llvm-svn: 349057
2018-12-13 16:39:29 +00:00
Simon Pilgrim
b0b2f1503a [X86][SSE] Fix all remaining modulo vector rotation amounts (PR38243)
There's still a couple of minor SimplifyDemandedElts regressions in some of the shift amount splats that will be fixed in future patches.

llvm-svn: 349052
2018-12-13 15:50:31 +00:00
Daniel Cederman
77611426e1 [Sparc] Add membar assembler tags
Summary: The Sparc V9 membar instruction can enforce different types of
memory orderings depending on the value in its immediate field.  In the
architectural manual the type is selected by combining different assembler
tags into a mask. This patch adds support for these tags.

Reviewers: jyknight, venkatra, brad

Reviewed By: jyknight

Subscribers: fedor.sergeev, jrtc27, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D53491

llvm-svn: 349048
2018-12-13 15:29:12 +00:00
Simon Pilgrim
ba91ff4a86 [X86][SSE] Fix modulo rotation amounts for v8i16/v16i16/v4i32 (PR38243)
llvm-svn: 349047
2018-12-13 15:23:09 +00:00
Daniel Cederman
b5d284408e [Sparc] Use float register for integer constrained with "f" in inline asm
Summary:
Constraining an integer value to a floating point register using "f"
causes an llvm_unreachable to trigger. This patch allows i32 integers
to be placed in a single precision float register and i64 integers to
be placed in a double precision float register. This matches the behavior
of GCC.

For other types the llvm_unreachable is removed to instead trigger an
error message that points out the offending line.

Reviewers: jyknight, venkatra

Reviewed By: jyknight

Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D51614

llvm-svn: 349045
2018-12-13 15:13:29 +00:00
Jinsong Ji
c7b43b94ce [PowerPC][NFC] Sorting out Pseudo related classes to avoid confusion
There are several Pseudo in PowerPC backend. 
eg:

* ISel Pseudo-instructions , which has let usesCustomInserter=1 in td 
ExpandISelPseudos -> EmitInstrWithCustomInserter will deal with them.
* Post-RA pseudo instruction, which has let isPseudo = 1 in td, or Standard pseudo (SUBREG_TO_REG,COPY etc.) 
ExpandPostRAPseudos -> expandPostRAPseudo will expand them
* Multi-instruction pseudo operations will expand them PPCAsmPrinter::EmitInstruction
* Pseudo instruction in CodeEmitter, which has encoding of 0.

Currently, in td files, especially PPCInstrVSX.td, 
we did not distinguish Post-RA pseudo instruction and Pseudo instruction in CodeEmitter very clearly.

This patch is to

* Rename Pseudo<> class to PPCEmitTimePseudo, which means encoding of 0 in CodeEmitter
* Introduce new class PPCPostRAExpPseudo <> for previous PostRA Pseudo
* Introduce new class PPCCustomInserterPseudo <> for previous Isel Pseudo

Differential Revision: https://reviews.llvm.org/D55143

llvm-svn: 349044
2018-12-13 15:12:57 +00:00
Simon Pilgrim
7c84f7ae3a [X86][SSE] Merge the vXi16/vXi32 vector rotation expansion cases. NFCI.
Merged the repeated code into a single if().

llvm-svn: 349040
2018-12-13 14:51:28 +00:00
Jonas Paulsson
e79b1b986d [SystemZ] Pass copy-hinted regs first from getRegAllocationHints().
When computing register allocation hints for a GRX32Bit register, make sure
that any of the hinted registers that are also copy hints are returned first
in the list.

Review: Ulrich Weigand.
llvm-svn: 349037
2018-12-13 14:37:05 +00:00
Simon Pilgrim
320fd7383f [X86][BWI] Don't custom lower vXi8 rotations.
We always expand to shifts anyhow - test changes are just different scheduling only.

llvm-svn: 349034
2018-12-13 13:44:33 +00:00
Chen Zheng
9c6fa536e0 [PowerPC] intrinsic llvm.eh.sjlj.setjmp should not have flag isBarrier.
Differential Revision: https://reviews.llvm.org/D55499

llvm-svn: 349029
2018-12-13 12:25:20 +00:00
Simon Pilgrim
ab973a45b9 [DAGCombine] Moved X86 rotate_amount % bitwidth == 0 early out to DAGCombiner
Remove common code from custom lowering (code is still safe if somehow a zero value gets used).

llvm-svn: 349028
2018-12-13 12:23:32 +00:00
Diana Picus
99cd644b6c [ARM GlobalISel] Support exts and truncs for Thumb2
Mark G_SEXT, G_ZEXT and G_ANYEXT to 32 bits as legal and add support for
them in the instruction selector. This uses handwritten code again
because the patterns that are generated with TableGen are tuned for what
the DAG combiner would produce and not for simple sext/zext nodes.
Luckily, we only need to update the opcodes to use the Thumb2 variants,
everything else can be reused from ARM.

llvm-svn: 349026
2018-12-13 12:06:54 +00:00
Simon Pilgrim
77fc551d1a [TargetLowering] Add ISD::ROTL/ROTR vector expansion
Move existing rotation expansion code into TargetLowering and set it up for vectors as well.

Ideally this would share more of the funnel shift expansion, but we handle the shift amount modulo quite differently at the moment.

Begun removing x86 vector rotate custom lowering to use the expansion.

llvm-svn: 349025
2018-12-13 11:20:48 +00:00
Alex Bradbury
919f5fb8ca [RISCV] Add support for the various RISC-V FMA instruction variants
Adds support for the various RISC-V FMA instructions (fmadd, fmsub, fnmsub, fnmadd).

The criteria for choosing whether a fused add or subtract is used, as well as
whether the product is negated or not, is whether some of the arguments to the
llvm.fma.* intrinsic are negated or not. In the tests, extraneous fadd
instructions were added to avoid the negation being performed using a xor
trick, which prevented the proper FMA forms from being selected and thus
tested.

The FMA instruction patterns might seem incorrect (e.g., fnmadd: -rs1 * rs2 -
rs3), but they should be correct. The misleading names were inherited from
MIPS, where the negation happens after computing the sum.

The llvm.fmuladd.* intrinsics still do not generate RISC-V FMA instructions,
as that depends on TargetLowering::isFMAFasterthanFMulAndFAdd.

Some comments in the test files about what type of instructions are there
tested were updated, to better reflect the current content of those test
files.

Differential Revision: https://reviews.llvm.org/D54205
Patch by Luís Marques.

llvm-svn: 349023
2018-12-13 10:49:05 +00:00
Arnaud A. de Grandmaison
dfe861087d [AArch64] Catch some more CMN opportunities.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33486

llvm-svn: 349022
2018-12-13 10:31:32 +00:00
Matt Arsenault
577b9fc543 AMDGPU/GlobalISel: Legalize f64 fadd/fmul
llvm-svn: 349014
2018-12-13 08:27:48 +00:00
Matt Arsenault
f38f483bef AMDGPU/GlobalISel: RegBankSelect some simple operations
llvm-svn: 349012
2018-12-13 08:23:51 +00:00
Craig Topper
a048d58de7 [X86] Remove assert leftover from when i1 was a legal type. Add more accurate assert. NFC
llvm-svn: 349007
2018-12-13 06:14:25 +00:00
Stanislav Mekhanoshin
d933c2ced7 [AMDGPU] Fix build failure, second attempt
Some compilers complain that variable is captured and some
complain when it is not. Switch to [&].

llvm-svn: 349006
2018-12-13 05:52:11 +00:00
Stanislav Mekhanoshin
5225746e03 [AMDGPU] Fix build failure
Fixed error 'lambda capture 'CondReg' is not required to be captured
for this use'.

llvm-svn: 349005
2018-12-13 05:21:25 +00:00
Stanislav Mekhanoshin
6071e1aa58 [AMDGPU] Simplify negated condition
Optimize sequence:

  %sel = V_CNDMASK_B32_e64 0, 1, %cc
  %cmp = V_CMP_NE_U32 1, %1
  $vcc = S_AND_B64 $exec, %cmp
  S_CBRANCH_VCC[N]Z
=>
  $vcc = S_ANDN2_B64 $exec, %cc
  S_CBRANCH_VCC[N]Z

It is the negation pattern inserted by DAGCombiner::visitBRCOND() in the
rebuildSetCC().

Differential Revision: https://reviews.llvm.org/D55402

llvm-svn: 349003
2018-12-13 03:17:40 +00:00
Craig Topper
d1c61861dd [X86] Don't emit MULX by default with BMI2
MULX has somewhat improved register allocation constraints compared to the legacy MUL instruction. Both output registers are encoded instead of fixed to EAX/EDX, but EDX is used as input. It also doesn't touch flags. Unfortunately, the encoding is longer.

Prefering it whenever BMI2 is enabled is probably not optimal. Choosing it should somehow be a function of register allocation constraints like converting adds to three address. gcc and icc definitely don't pick MULX by default. Not sure what if any rules they have for using it.

Differential Revision: https://reviews.llvm.org/D55565

llvm-svn: 348975
2018-12-12 21:21:31 +00:00
Aakanksha Patil
729309cc89 [AMDGPU] Support for "uniform-work-group-size" attribute
Updated the annotate-kernel-features pass to support the propagation of uniform-work-group attribute from the kernel to the called functions. Once this pass is run, all kernels, even the ones which initially did not have the attribute, will be able to indicate weather or not they have uniform work group size depending on the value of the attribute. 

Differential Revision: https://reviews.llvm.org/D50200

llvm-svn: 348971
2018-12-12 20:49:17 +00:00
Scott Linder
f5b36e56fb [AMDGPU] Emit MessagePack HSA Metadata for v3 code object
Continue to present HSA metadata as YAML in ASM and when output by tools
(e.g. llvm-readobj), but encode it in Messagepack in the code object.

Differential Revision: https://reviews.llvm.org/D48179

llvm-svn: 348963
2018-12-12 19:39:27 +00:00
Craig Topper
4937adf75f [X86] Emit SBB instead of SETCC_CARRY from LowerSELECT. Break false dependency on the SBB input.
I'm hoping we can just replace SETCC_CARRY with SBB. This is another step towards that.

I've explicitly used zero as the input to the setcc to avoid a false dependency that we've had with the SETCC_CARRY. I changed one of the patterns that used NEG to instead use an explicit compare with 0 on the LHS. We needed the zero anyway to avoid the false dependency. The negate would clobber its input register. By using a CMP we can avoid that which could be useful.

Differential Revision: https://reviews.llvm.org/D55414

llvm-svn: 348959
2018-12-12 19:20:21 +00:00
Simon Pilgrim
eb508f8ccb [SelectionDAG] Add a generic isSplatValue function
This patch introduces a generic function to determine whether a given vector type is known to be a splat value for the specified demanded elements, recursing up the DAG looking for BUILD_VECTOR or VECTOR_SHUFFLE splat patterns.

It also keeps track of the elements that are known to be UNDEF - it returns true if all the demanded elements are UNDEF (as this may be useful under some circumstances), so this needs to be handled by the caller.

A wrapper variant is also provided that doesn't take the DemandedElts or UndefElts arguments for cases where we just want to know if the SDValue is a splat or not (with/without UNDEFS).

I had hoped to completely remove the X86 local version of this function, but I'm seeing some regressions in shift/rotate codegen that will take a little longer to fix and I hope to get this in sooner so I can continue work on PR38243 which needs more capable splat detection.

Differential Revision: https://reviews.llvm.org/D55426

llvm-svn: 348953
2018-12-12 18:32:29 +00:00
Artem Belevich
f802b9324a [NVPTX] do not rely on cached subtarget info.
If a module has function references, but no functions
themselves, we may end up never calling runOnMachineFunction
and therefore would never initialize nvptxSubtarget field
which would eventually cause a crash.

Instead of relying on nvptxSubtarget being initialized by
one of the methods, retrieve subtarget info directly.

Differential Revision: https://reviews.llvm.org/D55580

llvm-svn: 348952
2018-12-12 18:31:04 +00:00
Sanjay Patel
44eaa492b8 [x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA
This extends the code that handles 16-bit add promotion to form LEA to also allow 8-bit adds. 
That allows us to combine add ops with register moves and save some instructions. This is 
another step towards allowing add truncation in generic DAGCombiner (see D54640).

Differential Revision: https://reviews.llvm.org/D55494

llvm-svn: 348946
2018-12-12 17:58:27 +00:00
Neil Henning
76504a4c5e [AMDGPU] Extend the SI Load/Store optimizer to combine more things.
I've extended the load/store optimizer to be able to produce dwordx3
loads and stores, This change allows many more load/stores to be combined,
and results in much more optimal code for our hardware.

Differential Revision: https://reviews.llvm.org/D54042

llvm-svn: 348937
2018-12-12 16:15:21 +00:00
Simon Atanasyan
fa020082e4 [mips] Enable using of integrated assembler in all cases.
llvm-svn: 348934
2018-12-12 15:32:03 +00:00
Piotr Sobczak
3732b4ce25 [AMDGPU] Set metadata access for explicit section
Summary:
This patch provides a means to set Metadata section kind
for a global variable, if its explicit section name is
prefixed with ".AMDGPU.metadata."
This could be useful to make the global variable go to
an ELF section without any section flags set.

Reviewers: dstuttard, tpr, kzhuravl, nhaehnle, t-tye

Reviewed By: dstuttard, kzhuravl

Subscribers: llvm-commits, arsenm, jvesely, wdng, yaxunl, t-tye

Differential Revision: https://reviews.llvm.org/D55267

llvm-svn: 348922
2018-12-12 11:20:04 +00:00
Diana Picus
59720b422a [ARM GlobalISel] Select load/store for Thumb2
Unfortunately we can't use TableGen for this because it doesn't yet
support predicates on the source pattern root. Therefore, add a bit of
handwritten code to the instruction selector to handle the most basic
cases.

Also mark them as legal and extract their legalizer test cases to a new
test file.

llvm-svn: 348920
2018-12-12 10:32:15 +00:00
Jonas Paulsson
896775c2d3 [SystemZ] Minor cleanup of SchedModels
Some fixes of a few InstRWs for z13 and z14.

Review: Ulrich Weigand
llvm-svn: 348917
2018-12-12 08:26:24 +00:00
Craig Topper
1fe466689b [X86] Combine vpmovdw+vpacksswb into vpmovdb.
This is similar to the combine we already have for vpmovdw+vpackuswb.

llvm-svn: 348910
2018-12-12 05:56:01 +00:00