Commit Graph

50329 Commits

Author SHA1 Message Date
Mandeep Singh Grang
802dc40f41 [COFF, ARM64] Emit COFF function header
Summary:
Emit COFF header when printing out the function. This is important as the
header contains two important pieces of information: the storage class for the
symbol and the symbol type information. This bit of information is required for
the linker to correctly identify the type of symbol that it is dealing with.

This patch mimics X86 and ARM COFF behavior for function header emission.

Reviewers: rnk, mstorsjo, compnerd, TomTan, ssijaric

Reviewed By: mstorsjo

Subscribers: dmajor, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D55535

llvm-svn: 348875
2018-12-11 18:36:14 +00:00
Craig Topper
b51283bfd7 Fix not correct imm operand assertion for SUB32ri in X86CondBrFolding::analyzeCompare
Summary:
When doing X86CondBrFolding::analyzeCompare, it will meet the SUB32ri instruction as below to use the global address for its operand,
  %733:gr32 = SUB32ri %62:gr32(tied-def 0), @img2buf_normal, implicit-def $eflags
  JNE_1 %bb.41, implicit $eflags

so the assertion "assert(MI.getOperand(ValueIndex).isImm() && "Expecting Imm operand")" is not correct and change the assert to if make X86CondBrFolding::analyzeCompare return false as not finding the compare for this

Patch by Jianping Chen

Reviewers: smaslov, LuoYuanke, liutianle, Jianping

Reviewed By: Jianping

Subscribers: lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D54250

llvm-svn: 348853
2018-12-11 15:32:14 +00:00
Sanjay Patel
05e36982dd [x86] clean up code for converting 16-bit ops to LEA; NFC
As discussed in D55494, we want to extend this to handle 8-bit
ops too, but that could be extended further to enable this on
32-bit systems too.

llvm-svn: 348851
2018-12-11 15:29:40 +00:00
Sanjay Patel
9765ba5f86 [x86] remove dead code for 16-bit LEA formation; NFC
As discussed in:
D55494
...this code has been disabled/dead for a long time (the code references
Athlon and Pentium 4), and there's almost no chance that it will be used 
given the last decade of uarch evolution. Also, in SDAG we promote 16-bit 
ops to 32-bit, so there's almost no way to test this code any more.

llvm-svn: 348845
2018-12-11 14:05:03 +00:00
Martell Malone
0b3ddec7ed [PPC][NFC] store operands are dst not src
Differential Revision: https://reviews.llvm.org/D55502

llvm-svn: 348826
2018-12-11 03:14:56 +00:00
Heejin Ahn
be5e5874f6 [WebAssembly] Add '.eventtype' directive support
Summary:
This patch supports `.eventtype` directive printing and parsing in the
same syntax with `.functype`.

Reviewers: aardappel, sbc100

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55353

llvm-svn: 348818
2018-12-11 01:11:04 +00:00
Heejin Ahn
21d45a2c98 [WebAssembly] TargetStreamer cleanup (NFC)
Summary:
- Unify mixed argument names (`Symbol` and `Sym`) to `Sym`
- Changed `MCSymbolWasm*` argument of `emit***` functions to `const
  MCSymbolWasm*`. It seems not very intuitive that emit function in the
  streamer modifies symbol contents.
- Moved empty function bodies to the header
- clang-format

Reviewers: aardappel, dschuff, sbc100

Subscribers: jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55347

llvm-svn: 348816
2018-12-11 00:53:59 +00:00
Aditya Nandakumar
cef44a2342 [GISel]: Refactor MachineIRBuilder to allow passing additional parameters to build Instrs
https://reviews.llvm.org/D55294

Previously MachineIRBuilder::buildInstr used to accept variadic
arguments for sources (which were either unsigned or
MachineInstrBuilder). While this worked well in common cases, it doesn't
allow us to build instructions that have multiple destinations.
Additionally passing in other optional parameters in the end (such as
flags) is not possible trivially. Also a trivial call such as

B.buildInstr(Opc, Reg1, Reg2, Reg3)
can be interpreted differently based on the opcode (2defs + 1 src for
unmerge vs 1 def + 2srcs).
This patch refactors the buildInstr to

buildInstr(Opc, ArrayRef<DstOps>, ArrayRef<SrcOps>)
where DstOps and SrcOps are typed unions that know how to add itself to
MachineInstrBuilder.
After this patch, most invocations would look like

B.buildInstr(Opc, {s32, DstReg}, {SrcRegs..., SrcMIBs..});
Now all the other calls (such as buildAdd, buildSub etc) forward to
buildInstr. It also makes it possible to build instructions with
multiple defs.
Additionally in a subsequent patch, we should make it possible to add
flags directly while building instructions.
Additionally, the main buildInstr method is now virtual and other
builders now only have to override buildInstr (for say constant
folding/cseing) is straightforward.

Also attached here (https://reviews.llvm.org/F7675680) is a clang-tidy
patch that should upgrade the API calls if necessary.

llvm-svn: 348815
2018-12-11 00:48:50 +00:00
Krzysztof Parzyszek
9f003f9262 [Hexagon] Couple of fixes in optimize addressing mode
- Check if an operand is an immediate before calling getImm. Some operands
  that take constant values can actually have global symbols or other
  constant expressions.
- When a load-constant instruction can be folded into users, make sure to
  only delete it when all users have been successfully converted.

llvm-svn: 348802
2018-12-10 21:56:04 +00:00
Krzysztof Parzyszek
c1b2d5905a Revert "[Hexagon] Check if operand is an immediate before getImm"
This reverts r348787. The patch wasn't quite correct.

llvm-svn: 348792
2018-12-10 19:30:08 +00:00
Amara Emerson
5ec146046c [GlobalISel] Restrict G_MERGE_VALUES capability and replace with new opcodes.
This patch restricts the capability of G_MERGE_VALUES, and uses the new
G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places.

This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32>
and <2 x s64> vectors.

Differential Revisions: https://reviews.llvm.org/D53629

llvm-svn: 348788
2018-12-10 18:44:58 +00:00
Krzysztof Parzyszek
c6e9380a56 [Hexagon] Check if operand is an immediate before getImm
llvm-svn: 348787
2018-12-10 18:39:47 +00:00
Krzysztof Parzyszek
914f2d1c46 [Hexagon] Add patterns for any_extend from i1 and short vectors of i1
llvm-svn: 348785
2018-12-10 18:36:06 +00:00
Sanjay Patel
134f56e702 [x86] fix formatting; NFC
This should really be generalized to allow increment and/or
we should replace it by using ISD::matchUnaryPredicate().
See D55515 for context.

llvm-svn: 348776
2018-12-10 17:23:44 +00:00
Evandro Menezes
53f0d41dc4 [AArch64] Refactor the Exynos scheduling predicates
Refactor the scheduling predicates based on `MCInstPredicate`.  In this
case, for the Exynos processors.

Differential revision: https://reviews.llvm.org/D55345

llvm-svn: 348774
2018-12-10 17:17:26 +00:00
Neil Henning
e448351b77 [AMDGPU] Change the l1 flush instruction for AMDPAL/MESA3D.
This commit changes which l1 flush instruction is used for AMDPAL and
MESA3d workloads to flush the entire l1 cache instead of just the
volatile lines.

Differential Revision: https://reviews.llvm.org/D55367

llvm-svn: 348771
2018-12-10 16:35:53 +00:00
Evandro Menezes
1ec1a0d342 [AArch64] Refactor the scheduling predicates
Refactor the scheduling predicates based on `MCInstPredicate`.  Augment the
number of helper predicates used by processor specific predicates.

Differential revision: https://reviews.llvm.org/D55375

llvm-svn: 348768
2018-12-10 16:24:30 +00:00
Tim Corringham
2faadb15f4 [AMDGPU] Add new Mode Register pass - minor fix
Trivial change to add parentheses to an expression to avoid a
sanitizer error in SIModeRegister.cpp, which was committed earlier.

llvm-svn: 348767
2018-12-10 16:23:30 +00:00
Cameron McInally
872ed41a1e [AVX512] Update typo in comment
Should be "Sae" for "Suppress All Exceptions".

NFC

llvm-svn: 348763
2018-12-10 15:21:35 +00:00
Vladimir Stefanovic
4433f93afe [mips][mc] Emit R_{MICRO}MIPS_JALR when expanding jal to jalr
When replacing jal with jalr, also emit '.reloc R_MIPS_JALR' (R_MICROMIPS_JALR
for micromips). The linker might then be able to turn jalr into a direct
call.
Add '-mips-jalr-reloc' to enable/disable this feature (default is true).

Differential revision: https://reviews.llvm.org/D55292

llvm-svn: 348760
2018-12-10 15:07:36 +00:00
Tim Corringham
4c4d2fe280 [AMDGPU] Add new Mode Register pass
A new pass to manage the Mode register.

Currently this just manages the floating point double precision
rounding requirements, but is intended to be easily extended to
encompass all Mode register settings.

The immediate motivation comes from the requirement to use the
round-to-zero rounding mode for the 16 bit interpolation
instructions, where the rounding mode setting is shared between
16 and 64 bit operations.

llvm-svn: 348754
2018-12-10 12:06:10 +00:00
Nikita Popov
e79477895e [X86] Fix AvoidStoreForwardingBlocks pass for negative displacements
Fixes https://bugs.llvm.org/show_bug.cgi?id=39926.

The size of the first copy was computed as
std::abs(std::abs(LdDisp2) - std::abs(LdDisp1)), which results in
skipped bytes if the signs of LdDisp2 and LdDisp1 differ. As far as
I can see, this should just be LdDisp2 - LdDisp1. The case where
LdDisp1 > LdDisp2 is already handled in the code above, in which case
LdDisp2 is set to LdDisp1 and this subtraction will evaluate to
Size1 = 0, which is the correct value to skip an overlapping copy.

Differential Revision: https://reviews.llvm.org/D55485

llvm-svn: 348750
2018-12-10 10:16:50 +00:00
Craig Topper
02b614abc8 [X86] Merge addcarryx/addcarry intrinsic into a single addcarry intrinsic.
Both intrinsics do the exact same thing so we really only need one.

Earlier in the 8.0 cycle we changed the signature of this intrinsic without renaming it. But it looks difficult to get the autoupgrade code to allow me to merge the intrinsics and change the signature at the same time. So I've renamed the intrinsic slightly for the new merged intrinsic. I'm skipping autoupgrading from the previous new to 8.0 signature. I've also renamed the subborrow for consistency.

llvm-svn: 348737
2018-12-10 06:07:50 +00:00
Brian Gesiak
b963c5150d [AMDGPU] Fix discarded result of addAttribute
Summary:
`llvm::AttributeList` and `llvm::AttributeSet` are immutable, and so methods
defined on these classes, such as `addAttribute`, return a new immutable
object with the attribute added. In https://reviews.llvm.org/D55217 I attempted
to annotate methods such as `addAttribute` with `LLVM_NODISCARD`, since
calling these methods has no side-effects, and so ignoring the result
that is returned is almost certainly a programmer error.

However, committing the change resulted in new warnings in the AMDGPU target.
The AMDGPU simplify libcalls pass added in https://reviews.llvm.org/D36436
attempts to add the readonly and nounwind attributes to simplified
library functions, but instead calls the `addAttribute` methods and
ignores the result.

Modify the simplify libcalls pass to actually add the nounwind and
readonly attributes. Also update the simplify libcalls test to assert
that these attributes are actually being set.

Reviewers: rampitec, vpykhtin, rnk

Reviewed By: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D55435

llvm-svn: 348732
2018-12-09 21:56:50 +00:00
Craig Topper
2b09d17d93 [X86] If the carry input to an addcarry/subborrow intrinsic is known to be 0, emit a flag setting ADD/SUB instead of ADC/SBB.
Previously we had to take the carry in and add -1 to it to set the carry flag so we could use it with ADC/SBB. But if we know its 0 then we don't need to bother.

This should go a long way towards fixing PR24545.

llvm-svn: 348727
2018-12-09 18:02:37 +00:00
Nico Weber
b961661977 Remove unneeded dependency from lib/Target/X86/Utils/ to lib/IR (aka Core).
The dependency was added in r213995 in response to r213986 which did make
X86/Utils depend on IR, but r256680 later removed that dependency again.

llvm-svn: 348724
2018-12-09 15:15:13 +00:00
Sanjay Patel
19bc850220 [x86] don't try to convert add with undef operands to LEA
The existing code tries to handle an undef operand while transforming an add to an LEA, 
but it's incomplete because we will crash on the i16 test with the debug output shown below. 
It's better to just give up instead. Really, GlobalIsel should have folded these before we 
could get into trouble.

# Machine code for function add_undef_i16: NoPHIs, TracksLiveness, Legalized, RegBankSelected, Selected

bb.0 (%ir-block.0):
  liveins: $edi
  %1:gr32 = COPY killed $edi
  %0:gr16 = COPY %1.sub_16bit:gr32
  %5:gr64_nosp = IMPLICIT_DEF
  %5.sub_16bit:gr64_nosp = COPY %0:gr16
  %6:gr64_nosp = IMPLICIT_DEF
  %6.sub_16bit:gr64_nosp = COPY %2:gr16
  %4:gr32 = LEA64_32r killed %5:gr64_nosp, 1, killed %6:gr64_nosp, 0, $noreg
  %3:gr16 = COPY killed %4.sub_16bit:gr32
  $ax = COPY killed %3:gr16
  RET 0, implicit killed $ax

# End machine code for function add_undef_i16.

*** Bad machine code: Reading virtual register without a def ***
- function:    add_undef_i16
- basic block: %bb.0  (0x7fe6cd83d940)
- instruction: %6.sub_16bit:gr64_nosp = COPY %2:gr16
- operand 1:   %2:gr16
LLVM ERROR: Found 1 machine code errors.

Differential Revision: https://reviews.llvm.org/D54710

llvm-svn: 348722
2018-12-09 14:40:37 +00:00
Simon Pilgrim
e9d8275e43 [X86] Extend pfm counter coverage for llvm-exegesis
Extension to rL348617, turns out llvm-exegesis doesn't need to match the perf counter name against a scheduler model resource name - so I've added a few more counters that I could find in the libpfm4 source code (and fix a typo in the knl/knm retired_uops counter - which uses 'all' instead of 'any').

llvm-svn: 348721
2018-12-09 13:45:15 +00:00
Matt Arsenault
b5613ecf17 AMDGPU: Fix offsets for < 4-byte aggregate kernel arguments
We were still using the rounded down offset and alignment even though
they aren't handled because you can't trivially bitcast the loaded
value.

llvm-svn: 348658
2018-12-07 22:12:17 +00:00
Krzysztof Parzyszek
b754f7a2e0 [Hexagon] Fix post-ra expansion of PS_wselect
llvm-svn: 348655
2018-12-07 22:00:53 +00:00
Simon Pilgrim
44dfd81d01 Fix unused variable warning. NFCI.
llvm-svn: 348649
2018-12-07 21:44:25 +00:00
Heejin Ahn
7ce5edf1ea [WebAssembly] clang-format/clang-tidy AsmParser (NFC)
Summary:
- LLVM clang-format style doesn't allow one-line ifs.
- LLVM clang-tidy style says method names should start with a lowercase
  letter. But currently WebAssemblyAsmParser's parent class
  MCTargetAsmParser is mixing lowercase and uppercase method names
  itself so overridden methods cannot be renamed now.
- Changed else ifs after returns to ifs.
- Added some newlines for readability.

Reviewers: aardappel, sbc100

Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55350

llvm-svn: 348648
2018-12-07 21:35:37 +00:00
Heejin Ahn
d2fd70991d Delete registerScope function
`unregisterScope()` is not currently used, so removing it.

llvm-svn: 348647
2018-12-07 21:31:14 +00:00
Simon Pilgrim
9b8fdab26c [X86] Replace instregex with instrs list. NFCI.
llvm-svn: 348626
2018-12-07 18:47:05 +00:00
Matt Arsenault
ce2e053134 AMDGPU: Allow f32 types for llvm.amdgcn.s.buffer.load
llvm-svn: 348625
2018-12-07 18:41:39 +00:00
Craig Topper
ba3ab78291 [X86] Initialize and Register X86CondBrFoldingPass
To make X86CondBrFoldingPass can be run with --run-pass option, this can test one wrong assertion on analyzeCompare function for SUB32ri when its operand is not imm

Patch by Jianping Chen

Differential Revision: https://reviews.llvm.org/D55412

llvm-svn: 348620
2018-12-07 18:10:34 +00:00
Matt Arsenault
ca8eb0b672 AMDGPU: Remove llvm.SI.tbuffer.store
llvm-svn: 348619
2018-12-07 18:03:47 +00:00
Simon Pilgrim
6155b32250 [X86] Improve pfm counter coverage for llvm-exegesis
This patch attempts to improve pfm perf counter coverage for all the x86 CPUs that libpfm4 supports.

Intel/AMD CPU families tend to share names for cycle/uops counters so even if they don't have a scheduler model yet they can at least use the default values (checked against the libpfm4 source code).

The remaining CPUs (where their port/pipe resource counters are known) I've tried to add to the existing model mappings.

These are untested but don't represent a regression to current llvm-exegesis behaviour for these CPUs.

Differential Revision: https://reviews.llvm.org/D55432

llvm-svn: 348617
2018-12-07 17:48:40 +00:00
Matt Arsenault
3ff764a944 AMDGPU: Remove llvm.SI.buffer.load.dword
llvm-svn: 348616
2018-12-07 17:46:20 +00:00
Matt Arsenault
aa9bcd56b1 AMDGPU: Remove llvm.AMDGPU.kill
This is the last of the old AMDGPU intrinsics.

llvm-svn: 348615
2018-12-07 17:46:16 +00:00
Graham Sellers
b297379ef0 [AMDGPU] Shrink scalar AND, OR, XOR instructions
This change attempts to shrink scalar AND, OR and XOR instructions which take an immediate that isn't inlineable.

It performs:
AND s0, s0, ~(1 << n) -> BITSET0 s0, n
OR s0, s0, (1 << n) -> BITSET1 s0, n
AND s0, s1, x -> ANDN2 s0, s1, ~x
OR s0, s1, x -> ORN2 s0, s1, ~x
XOR s0, s1, x -> XNOR s0, s1, ~x

In particular, this catches setting and clearing the sign bit for fabs (and x, 0x7ffffffff -> bitset0 x, 31 and or x, 0x80000000 -> bitset1 x, 31).

llvm-svn: 348601
2018-12-07 15:33:21 +00:00
Tim Northover
4bf394be3a ARM: use correct offset from base pointer (r6) in call frame regions.
When we had dynamic call frames (i.e. sp adjustment around each call) we
were including that adjustment into offsets calculated based on r6, even
though it's only sp that changes. This led to incorrect stack slot
accesses.

llvm-svn: 348591
2018-12-07 13:43:55 +00:00
David Green
ca29c271d2 [Targets] Add errors for tiny and kernel codemodel on targets that don't support them
Adds fatal errors for any target that does not support the Tiny or Kernel
codemodels by rejigging the getEffectiveCodeModel calls.

Differential Revision: https://reviews.llvm.org/D50141

llvm-svn: 348585
2018-12-07 12:10:23 +00:00
Simon Pilgrim
74c371da7b Fix gcc7.3 -Wparentheses warning. NFCI.
llvm-svn: 348581
2018-12-07 11:10:03 +00:00
Simon Pilgrim
9c7d85bc62 [X86] Add ivybridge to llvm-exegesis PFM counter mappings
llvm-svn: 348575
2018-12-07 09:27:35 +00:00
Zi Xuan Wu
cf4d477b0b [PowerPC] Fix assert from machine verify pass that missing undef register flag
Fix assert about using an undefined physical register in machine instruction verify pass. 
The reason is that register flag undef is missing when doing transformation from If Conversion Pass.

```
Bad machine code: Using an undefined physical register 
- function:    func_65
- basic block: %bb.0 entry (0x10024740738)
- instruction: BCLR killed $cr5lt, implicit $lr8, implicit $rm, implicit undef $x3
- operand 0:   killed $cr5lt
LLVM ERROR: Found 1 machine code errors.
```

There are also other existing testcases with same issue. So I add -verify-machineinstrs option to open verifying.

Differential Revision: https://reviews.llvm.org/D55408

llvm-svn: 348566
2018-12-07 05:25:16 +00:00
Craig Topper
2c7a9476e0 [X86] Directly create ADC/SBB nodes instead of using ADD/SUB with (and SETCC_CARRY, 1)
This addresses a FIXME and avoids depending on an isel pattern match I think. I've remove the isel patterns too since he have no lit tests left that cover them. Hopefully that really means they are unused.

I'm trying to decide if we need SETCC_CARRY. This removes one of its usages.

Differential Revision: https://reviews.llvm.org/D55355

llvm-svn: 348536
2018-12-06 22:26:59 +00:00
Evandro Menezes
799b76eae2 [AArch64] Fix Exynos predicate
Fix predicate for arithmetic instructions with shift and/or extend.

llvm-svn: 348510
2018-12-06 18:25:37 +00:00
Simon Pilgrim
bb650daeaf [X86] Refactored IsSplatVector to use switch. NFCI.
Initial step towards making the function more generic (and probably move into SelectionDAG).

This is necessary to avoid massive codegen bloat for PR38243 (Add modulo rotate support to LowerRotate).

llvm-svn: 348498
2018-12-06 16:29:14 +00:00
Alexey Bataev
2e1a782189 [DEBUGINFO, NVPTX] Disable emission of ',debug' option if only debug directives are allowed.
Summary:
If the output of debug directives only is requested, we should drop
emission of ',debug' option from the target directive. Required for
supporting of nvprof profiler.

Reviewers: echristo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46061

llvm-svn: 348497
2018-12-06 16:25:35 +00:00
Alexey Bataev
64ad0ad5ed [DEBUGINFO, NVPTX]Emit last debugging directives.
Summary:
We may end up with not emitted debug directives at the end of the module
emission. Patch fixes this problem emitting those last directives the
end of the module emission.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D54320

llvm-svn: 348495
2018-12-06 16:02:09 +00:00
Diogo N. Sampaio
9c9067316b [NFC][AArch64] Split out backend features
This patch splits backend features currently
hidden behind architecture versions.

For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.

This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html


Reviewers: DavidSpickett, olista01, t.p.northover

Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio

Differential revision: https://reviews.llvm.org/D54633

--

It was reverted in rL348249 due a	build bot failure in one of the
regression tests:
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/14386

The problem seems to be that FileCheck behaves
different in windows and linux. This new patch
splits the test file in multiple,
and does more exact pattern matching attempting
to circumvent the issue.

llvm-svn: 348493
2018-12-06 15:39:17 +00:00
Nicolai Haehnle
ca4a32945f AMDGPU: Generate VALU ThreeOp Integer instructions
Summary:
Original patch by: Fabian Wahlster <razor@singul4rity.com>

Change-Id: I148f692a88432541fad468963f58da9ddf79fac5

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, b-sumner, llvm-commits

Differential Revision: https://reviews.llvm.org/D51995

llvm-svn: 348488
2018-12-06 14:33:40 +00:00
Valery Pykhtin
f479fbba5f [AMDGPU] Partial revert of rL348371: Turn on the DPP combiner by default
Turn the combiner back off as there're failures until the issue is fixed.

Differential revision: https://reviews.llvm.org/D55314

llvm-svn: 348487
2018-12-06 14:20:02 +00:00
Diana Picus
1027249ec9 [ARM GlobalISel] Nothing is legal for Thumb
...yet!

A lot of the current code should be shared for arm and thumb mode, but
until we add tests and work out some of the details (e.g. checking the
correct subtarget feature for G_SDIV) it's safer to bail out as early as
possible for thumb targets.

This should have arguably been part of r348347, which allowed Thumb
functions to be handled by the IR Translator.

llvm-svn: 348472
2018-12-06 09:26:14 +00:00
Craig Topper
6a6d77b851 [X86] Remove some leftover code for handling an i1 setcc type. NFC
We should only need to handle i8 now.

llvm-svn: 348460
2018-12-06 07:00:02 +00:00
Matthias Braun
d041212c07 AArch64: Fix invalid CCMP emission
The code emitting AND-subtrees used to check whether any of the operands
was an OR in order to figure out if the result needs to be negated.
However the OR could be hidden in further subtrees and not immediately
visible.

Change the code so that canEmitConjunction() determines whether the
result of the generated subtree needs to be negated. Cleanup emission
logic to use this. I also changed the code a bit to make all negation
decisions early before we actually emit the subtrees.

This fixes http://llvm.org/PR39550

Differential Revision: https://reviews.llvm.org/D54137

llvm-svn: 348444
2018-12-06 01:40:23 +00:00
Krzysztof Parzyszek
8eb394d764 [Hexagon] Add intrinsics for Hexagon V66
llvm-svn: 348413
2018-12-05 21:14:51 +00:00
Krzysztof Parzyszek
545a68ca4b [Hexagon] Add instruction definitions for Hexagon V66
llvm-svn: 348411
2018-12-05 21:01:07 +00:00
Krzysztof Parzyszek
13a9cf28a1 [Hexagon] Foundation of support for Hexagon V66
llvm-svn: 348407
2018-12-05 20:18:09 +00:00
Aditya Nandakumar
f75d4f329c [GISel]: Provide standard interface to observe changes in GISel passes
https://reviews.llvm.org/D54980

This provides a standard API across GISel passes to observe and notify
passes about changes (insertions/deletions/mutations) to MachineInstrs.
This patch also removes the recordInsertion method in MachineIRBuilder
and instead provides method to setObserver.

Reviewed by: vkeles.

llvm-svn: 348406
2018-12-05 20:14:52 +00:00
Evandro Menezes
4b5707550c [AArch64] Reword description of feature (NFC)
Reword the description of the feature that enables custom handling of cheap
instructions.

llvm-svn: 348398
2018-12-05 18:42:57 +00:00
Chandler Carruth
71c14a36a2 [SLH] Fix a nasty bug in SLH.
Whenever we effectively take the address of a basic block we need to
manually update that basic block to reflect that fact or later passes
such as tail duplication and tail merging can break the invariants of
the code. =/ Sadly, there doesn't appear to be any good way of
automating this or even writing a reasonable assert to catch it early.

The change seems trivially and obviously correct, but sadly the only
really good test case I have is 1000s of basic blocks. I've tried
directly writing a test case that happens to make tail duplication do
something that crashes later on, but this appears to require an
*amazingly* complex set of conditions that I've not yet reproduced.

The change is technically covered by the tests because we mark the
blocks as having their address taken, but that doesn't really count as
properly testing the functionality.

llvm-svn: 348374
2018-12-05 15:42:11 +00:00
Valery Pykhtin
5b4db77b13 [AMDGPU]: Turn on the DPP combiner by default
Differential revision: https://reviews.llvm.org/D55314

llvm-svn: 348371
2018-12-05 15:21:17 +00:00
Simon Pilgrim
32483668d7 [X86][SSE] Begun adding modulo rotate support to LowerRotate
Prep work for PR38243 - mainly adding comments on where we need to add modulo support (doing so at the moment causes massive codegen regressions).

I've also consistently added support for modulo folding for uniform constants (although at the moment we have no way to trigger this) and removed the old assertions.

llvm-svn: 348366
2018-12-05 14:46:37 +00:00
Simon Pilgrim
180639afe5 [SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467)
This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions.

Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code.

Differential Revision: https://reviews.llvm.org/D54698

llvm-svn: 348353
2018-12-05 11:12:12 +00:00
Diana Picus
8a1b4f57c9 [ARM GlobalISel] Implement call lowering for Thumb2
The only things that are different from arm are:
* different opcodes for calls and returns
* Thumb calls take predicate operands

llvm-svn: 348347
2018-12-05 10:35:28 +00:00
Saleem Abdulrasool
efd2cb8a0d AArch64: support funclets in fastcall and swift_call
Functions annotated with `__fastcall` or `__attribute__((__fastcall__))`
or `__attribute__((__swiftcall__))` may contain SEH handlers even on
Win64.  This matches the behaviour of cl which allows for
`__try`/`__except` inside a `__fastcall` function.  This was detected
while trying to self-host clang on Windows ARM64.

llvm-svn: 348337
2018-12-05 07:09:20 +00:00
Amara Emerson
8547f4fb7f [AArch64][GlobalISel] Re-enable selection of volatile loads.
We previously disabled this in r323371 because of a bug where we selected an
extending load, but didn't delete the old G_LOAD, resulting in two loads being
generated for volatile loads.

Since we now have dedicated G_SEXTLOAD/G_ZEXTLOAD operations, and that the
tablegen patterns should no longer be able to select (ext(load x)) patterns, it
should be safe to re-enable it.

The old test case should still work as expected.

llvm-svn: 348320
2018-12-05 00:03:09 +00:00
Saleem Abdulrasool
a9248fdfab AArch64: clean up some whitespace in Windows CC (NFC)
Drive by clean up for Windows ARM64 variadic CC (NFC).

llvm-svn: 348310
2018-12-04 22:19:29 +00:00
Nirav Dave
9c593e8676 [AVR] Silence fallthrough warning. NFC.
llvm-svn: 348304
2018-12-04 21:41:52 +00:00
Stefan Pintilie
46f840f286 [PowerPC] Make no-PIC default to match GCC - LLVM
Change the default for PowerPC LE to -fno-PIC.

Differential Revision: https://reviews.llvm.org/D53383

llvm-svn: 348298
2018-12-04 20:14:57 +00:00
Nirav Dave
ce26c27b2a [SelectionDAG] Redefine isGAPlusOffset in terms of unwrapAddress. NFCI.
llvm-svn: 348288
2018-12-04 17:59:43 +00:00
Matt Arsenault
0f414b81cc AMDGPU: Add f32 vectors to SGPR register classes
llvm-svn: 348286
2018-12-04 17:51:36 +00:00
Simon Pilgrim
07843640d5 [X86][SSE] Add SimplifyDemandedBitsForTargetNode handling for MOVMSK
Moves existing SimplifyDemandedBits call out of combineMOVMSK and add SimplifyDemandedVectorElts call based on the sign bits we need.

llvm-svn: 348282
2018-12-04 16:52:32 +00:00
Krzysztof Parzyszek
9fc0a2fe30 [Hexagon] Remove unused checker functions from asm parser
llvm-svn: 348269
2018-12-04 14:58:14 +00:00
Simon Pilgrim
6a088b2ce5 Fix MSVC "unknown pragma" warning. NFCI.
llvm-svn: 348256
2018-12-04 12:31:52 +00:00
Simon Pilgrim
58d44235e5 Fix -Wparentheses warning. NFCI.
llvm-svn: 348254
2018-12-04 12:24:10 +00:00
Simon Pilgrim
b1d6db7693 [X86] Remove unnecessary peekThroughEXTRACT_SUBVECTORs call.
The GetSplatValue/IsSplatVector call will call this anyhow and the later code is just for a v2i64 type so doesn't need it.

llvm-svn: 348253
2018-12-04 12:21:43 +00:00
Simon Pilgrim
0add090e24 [TargetLowering] expandFP_TO_UINT - avoid FPE due to out of range conversion (PR17686)
PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction.

The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment.

The X87 cases don't need the strict flag but generates much nicer code with it....

Differential Revision: https://reviews.llvm.org/D53794

llvm-svn: 348251
2018-12-04 11:21:30 +00:00
Simon Pilgrim
1a2e0200ac Revert rL348121 from llvm/trunk: [NFC][AArch64] Split out backend features
This patch splits backend features currently
hidden behind architecture versions.

For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.

This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html

Reviewers: DavidSpickett, olista01, t.p.northover

Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio

Differential revision: https://reviews.llvm.org/D54633

........

This has been causing buildbots failures for the past 24 hours: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/14386

llvm-svn: 348249
2018-12-04 10:55:48 +00:00
Craig Topper
35585aff34 [X86] Remove custom DAG combine for SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG.
We only needed this because it provided really aggressive constant folding even through constant pool entries created from build_vectors. The main case was for vXi8 MULH legalization which was happening as part of legalize DAG instead of as part of legalize vector ops. Now its part of vector op legalization and we've added special handling for build vectors of all constants there. This has removed the need for this code on the list tests we have.

llvm-svn: 348237
2018-12-04 04:51:07 +00:00
Sanjin Sijaric
dc6403d133 [ARM64][Windows] Fix local stack size for funclets
The comment was misplaced, and the code didn't do what the comment indicated,
namely ignoring the varargs portion when computing the local stack size of a
funclet in emitEpilogue.  This results in incorrect offset computations within
funclets that are contained in vararg functions.

Differential Revision: https://reviews.llvm.org/D55096

llvm-svn: 348222
2018-12-04 00:54:52 +00:00
Jessica Paquette
bce2086ad1 [MachineOutliner] Move stack instr check logic to getOutliningCandidateInfo
This moves the stack check logic into a lambda within getOutliningCandidateInfo.

This allows us to be less conservative with stack checks. Whether or not a
stack instruction is safe to outline is dependent on the frame variant and call
variant of the outlined function; only in cases where we modify the stack can
these be unsafe.

So, if we move that logic later, when we're looking at an individual candidate,
we can make better decisions here.

This gives some code size savings as a result.

llvm-svn: 348220
2018-12-04 00:31:55 +00:00
Jessica Paquette
2f5833ecd9 [MachineOutliner][AArch64][NFC] Add early exit to candidate discarding logic
If we dropped too many candidates to be beneficial when dropping candidates
that modify the stack, there's no reason to check for other cost model
qualities.

llvm-svn: 348219
2018-12-04 00:31:47 +00:00
Krzysztof Parzyszek
44c1f81b27 [Hexagon] Switch to auto-generated intrinsic definitions and patterns
llvm-svn: 348206
2018-12-03 22:40:36 +00:00
Krzysztof Parzyszek
9dafa8a2c6 [Hexagon] Extract operand decoders into a separate file, NFC
These decoders are automatically generated. Keeping them separated makes
updating architectures easier.

llvm-svn: 348196
2018-12-03 21:59:21 +00:00
Sanjay Patel
d24f63477d [DAGCombiner] narrow truncated vector binops when legal
This is the smallest vector enhancement I could find to D54640.
Here, we're allowing narrowing to only legal vector ops because we'll see
regressions without that. All of the test diffs are wins from what I can tell.
With AVX/AVX512, we can shrink ymm/zmm ops to xmm.

x86 vector multiplies are the problem case that we're avoiding due to the
patchwork ISA, and it's not clear to me if we can dance around those
regressions using TLI hooks or if we need preliminary patches to plug those
holes.

Differential Revision: https://reviews.llvm.org/D55126

llvm-svn: 348195
2018-12-03 21:57:35 +00:00
Simon Atanasyan
f76884b0d3 [mips] Fix TestDWARF32Version5Addr8AllForms test failure on MIPS hosts
The `DIEExpr` is used in debug information entries for either TLS variables
or call sites. For now the last case is unsupported for targets with delay
slots, for MIPS in particular.

The `DIEExpr::EmitValue` method calls a virtual `EmitDebugThreadLocal`
routine which, in case of MIPS, always emits either `.dtprelword` or
`.dtpreldword` directives. That is okay for "main" code, but in unit
tests `DIEExpr` instances can be created not for TLS variables only even
on MIPS hosts. That is a reason of the `TestDWARF32Version5Addr8AllForms`
failure because handling of the `R_MIPS_TLS_DTPREL` relocation writes
incorrect value into dwarf structures. And anyway unconditional emitting
of `.dtprelword` directives will be incorrect when/if debug information
entries for call sites become supported on MIPS.

The patch solves the problem by wrapping expression created in the
`MipsTargetObjectFile::getDebugThreadLocalSymbol` method in to the
`MipsMCExpr` expression with a new `MEK_DTPREL` tag. This tag is
recognized in the `MipsAsmPrinter::EmitDebugThreadLocal` method and
`.dtprelword` directives created in this case only. In other cases the
expression saved as a regular data.

Differential Revision: http://reviews.llvm.org/D54937

llvm-svn: 348194
2018-12-03 21:54:43 +00:00
Krzysztof Parzyszek
a45a55fc67 [Hexagon] Remove unused encodings, NFC
llvm-svn: 348193
2018-12-03 21:49:12 +00:00
Wouter van Oortmerssen
c7b89f0f62 [WebAssembly] Enforce assembler emits to streamer in order.
Summary:
The assembler processes directives and instructions in whatever order
they are in the file, then directly emits them to the streamer. This
could cause badly written (or generated) .s files to produce
incorrect binaries.

It now has state that tracks what it has most recently seen, to
enforce they are emitted in a given order that always produces
correct wasm binaries.

Also added a new test that compares obj2yaml output from llc (the
backend) to that going via .s and the assembler to ensure both paths
generate the same binaries.

The features this test covers could be extended.

Passes all wasm Lit tests.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=39557

Reviewers: sbc100, dschuff, aheejin

Subscribers: jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55149

llvm-svn: 348185
2018-12-03 20:30:28 +00:00
Krzysztof Parzyszek
6290a73f29 [Hexagon] Update timing classes
llvm-svn: 348183
2018-12-03 20:13:18 +00:00
Krzysztof Parzyszek
1cbc5cd364 [Hexagon] Change instruction type field in TSFlags to 7 bits
llvm-svn: 348171
2018-12-03 19:34:04 +00:00
Jessica Paquette
2accb31690 [MachineOutliner] Drop candidates that require fixups if it's beneficial
If it's a bigger code size win to drop candidates that require stack fixups
than to demote every candidate to that variant, the outliner should do that.

This happens if the number of bytes taken by calls to functions that don't
require fixups, plus the number of bytes that'd be left is less than the
number of bytes that it'd take to emit a save + restore for all candidates.

Also add tests for each possible new behaviour.

- machine-outliner-compatible-candidates shows that when we have candidates
that don't use the stack, we can use the default call variant along with the
no save/regsave variant.

- machine-outliner-all-stack shows that when it's better to fix up the stack,
we still will demote all candidates to that case

- machine-outliner-drop-stack shows that we can discard candidates that
require stack fixups when it would be beneficial to do so.

llvm-svn: 348168
2018-12-03 19:11:27 +00:00
Krzysztof Parzyszek
71a7f447f6 [Hexagon] Add HasV5 predicate for compatibility with auto-generated files
llvm-svn: 348167
2018-12-03 19:05:42 +00:00
Krzysztof Parzyszek
a55515f9a6 [Hexagon] Remove unused operand definitions, NFC
llvm-svn: 348163
2018-12-03 18:54:24 +00:00
Krzysztof Parzyszek
7ecc277ef9 [Hexagon] Some formatting changes, NFC
llvm-svn: 348162
2018-12-03 18:40:15 +00:00
Craig Topper
5440b63fa8 [X86] Teach LowerMUL/LowerMULH for vXi8 to unpack constant RHS.
Summary:
We need to unpackl and unpackh the operands to use two vXi16 multiplies. Previously it looks like the low unpack would get constant folded at least in the 128-bit case after shuffle lowering turned the unpackl into ZERO_EXTEND_VECTOR_INREG and X86 custom DAG combined it. The same doesn't happen for the high half. So we'd load a constant and then shuffle it. But the low half would just be loaded and used by the multiply directly.

After this patch we now end up with a constant pool entry for the low and high unpacks separately with no shuffle operations.

This is a step towards removing custom constant folding for ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG in the X86 backend.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55165

llvm-svn: 348159
2018-12-03 18:26:27 +00:00
Craig Topper
e35b01f8ea [X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8.
Summary:
Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction.

The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54836

llvm-svn: 348158
2018-12-03 18:26:24 +00:00
Jonas Paulsson
8ae0f88b13 [SystemZ::TTI] Return zero cost for ICmp that becomes Load And Test.
A loaded value with multiple users compared with 0 will become a load and
test single instruction. The load is not folded in this case (multiple
users), but the compare instruction is eliminated.

This patch returns 0 cost for the icmp in these cases.

Review: Ulrich Weigand
https://reviews.llvm.org/D55111

llvm-svn: 348141
2018-12-03 14:30:18 +00:00
Pablo Barrio
a17f855698 [AArch64] Add command-line option for SSBS
Summary:
SSBS (Speculative Store Bypass Safe) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SSBS, as it was previously only possible to
enable by selecting -march=armv8.5-a.

Similar patch upstream in GNU binutils:
https://sourceware.org/ml/binutils/2018-09/msg00274.html

Reviewers: olista01, samparker, aemerson

Reviewed By: samparker

Subscribers: javed.absar, kristof.beyls, kristina, llvm-commits

Differential Revision: https://reviews.llvm.org/D54629

llvm-svn: 348137
2018-12-03 14:00:47 +00:00
Ron Lieberman
16de4fd2eb [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos
The introduction of S_{ADD|SUB}_U64_PSEUDO instructions which are decomposed
into VOP3 instruction pairs for S_ADD_U64_PSEUDO:
  V_ADD_I32_e64
  V_ADDC_U32_e64
and for S_SUB_U64_PSEUDO
  V_SUB_I32_e64
  V_SUBB_U32_e64
preclude the use of SDWA to encode a constant.
SDWA: Sub-Dword addressing is supported on VOP1 and VOP2 instructions,
but not on VOP3 instructions.

We desire to fold the bit-and operand into the instruction encoding
for the V_ADD_I32 instruction. This requires that we transform the
VOP3 into a VOP2 form of the instruction (_e32).
  %19:vgpr_32 = V_AND_B32_e32 255,
      killed %16:vgpr_32, implicit $exec
  %47:vgpr_32, %49:sreg_64_xexec = V_ADD_I32_e64
      %26.sub0:vreg_64, %19:vgpr_32, implicit $exec
 %48:vgpr_32, dead %50:sreg_64_xexec = V_ADDC_U32_e64
      %26.sub1:vreg_64, %54:vgpr_32, killed %49:sreg_64_xexec, implicit $exec

which then allows the SDWA encoding and becomes
  %47:vgpr_32 = V_ADD_I32_sdwa
      0, %26.sub0:vreg_64, 0, killed %16:vgpr_32, 0, 6, 0, 6, 0,
      implicit-def $vcc, implicit $exec
  %48:vgpr_32 = V_ADDC_U32_e32
      0, %26.sub1:vreg_64, implicit-def $vcc, implicit $vcc, implicit $exec


Differential Revision: https://reviews.llvm.org/D54882

llvm-svn: 348132
2018-12-03 13:04:54 +00:00
Tim Northover
5745b6ac3b ARM: use target-specific SUBS node when combining cmp with cmov.
This has two positive effects. First, using a custom node prevents
recombination leading to an infinite loop since the output DAG is notionally a
little more complex than the input one. Using a flag-setting instruction also
allows the subtraction to be folded with the related comparison more easily.

https://reviews.llvm.org/D53190

llvm-svn: 348122
2018-12-03 11:16:21 +00:00
Diogo N. Sampaio
3c7d062b6b [NFC][AArch64] Split out backend features
This patch splits backend features currently
hidden behind architecture versions.

For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.

This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html

Reviewers: DavidSpickett, olista01, t.p.northover

Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio

Differential revision: https://reviews.llvm.org/D54633

llvm-svn: 348121
2018-12-03 11:08:13 +00:00
Oliver Stannard
4cf35b4ab0 [ARM][MC] Move information about variadic register defs into tablegen
Currently, variadic operands on an MCInst are assumed to be uses,
because they come after the defs. However, this is not always the case,
for example the Arm/Thumb LDM instructions write to a variable number of
registers.

This adds a property of instruction definitions which can be used to
mark variadic operands as defs. This only affects MCInst, because
MachineInstruction already tracks use/def per operand in each instance
of the instruction, so can already represent this.

This property can then be checked in MCInstrDesc, allowing us to remove
some special cases in ARMAsmParser::isITBlockTerminator.

Differential revision: https://reviews.llvm.org/D54853

llvm-svn: 348114
2018-12-03 10:32:42 +00:00
Oliver Stannard
c588110f13 [ARM][Asm] Debug trace for the processInstruction loop
In the Arm assembly parser, we first match an instruction, then call
processInstruction to possibly change it to a different encoding, to
match rules in the architecture manual which can't be expressed by the
table-generated matcher.

This adds debug printing so that this process is visible when using the
-debug option.

To support this, I've added a new overload of MCInst::dump_pretty which
takes the opcode name as a StringRef, since we don't have an InstPrinter
instance in the assembly parser. Instead, we can get the same
information directly from the MCInstrInfo.

Differential revision: https://reviews.llvm.org/D54852

llvm-svn: 348113
2018-12-03 10:21:28 +00:00
Sjoerd Meijer
5afc957eba [ARM] FP16: support vld1.16 for vector loads with post-increment
Differential Revision: https://reviews.llvm.org/D55112

llvm-svn: 348110
2018-12-03 08:26:34 +00:00
Kang Zhang
51986417f9 [PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction
Summary:
There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the
function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD.
These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D54738

llvm-svn: 348109
2018-12-03 03:32:57 +00:00
QingShan Zhang
8b7653db72 [NFC] [PowerPC] add an routine in PPCTargetLowering to determine if a global is accessed as got-indirect or not.
In theory, we should let the PPC target to determine how to lower the TOC Entry for globals. 
And the PPCTargetLowering requires this query to do some optimization for TOC_Entry. 

Differential Revision: https://reviews.llvm.org/D54925

llvm-svn: 348108
2018-12-03 03:32:16 +00:00
Craig Topper
959b415e2f [X86] Add a DAG combine to turn stores of vXi1 on pre-avx512 targets into a bitcast and a store of a iX scalar.
llvm-svn: 348104
2018-12-02 19:47:14 +00:00
Craig Topper
6f54ff57fd [X86] Fix bad comment. NFC
llvm-svn: 348103
2018-12-02 19:47:13 +00:00
Craig Topper
204e4110e0 [X86] Simplify LowerBITCAST code for v2i32/v4i16/v8i8/i64->mmx/i64/f64 bitcast.
Previously this code generated its own extracts and build_vector. But we can use a simpler concat_vectors or scalar_to_vector operation and let type legalization do additional legalization of those operations.

llvm-svn: 348087
2018-12-02 07:52:39 +00:00
Craig Topper
4bb077910a [X86] Add custom type legalization for v2i32/v4i16/v8i8->mmx bitcasts to avoid a store/load to/from the stack.
Widen the input to a 128 bit vector by padding with undef elements. Then use a movdq2q to convert from xmm register to mmx register.

llvm-svn: 348086
2018-12-02 05:46:50 +00:00
Craig Topper
ec096a1dae [X86] Custom type legalize v2i32/v4i16/v8i8->i64 bitcasts in 64-bit mode similar to what's done when the destination is f64.
The generic legalizer will fall back to a stack spill that uses a truncating store. That store will get expanded into a shuffle and non-truncating store on pre-avx512 targets. Once that happens the stack store/load pair will be combined away leaving behind the shuffle and bitcasts. On avx512 targets the truncating store is legal so doesn't get folded away.

By custom legalizing it we can avoid this churn and maybe produce better code.

llvm-svn: 348085
2018-12-02 05:46:48 +00:00
Jessica Paquette
9a7103b0f8 [MachineOutliner][AArch64] Improve checks for stack instructions
If we know that we'll definitely save LR to a register, there's no reason to
pre-check whether or not a stack instruction is unsafe to fix up.

This makes it so that we check for that condition before mapping instructions.

This allows us to outline more, since we don't pessimise as many instructions.

Also update some tests, since we outline more.

llvm-svn: 348081
2018-12-01 21:24:06 +00:00
Craig Topper
f4b13927e7 [X86] Don't use zero_extend_vector_inreg for mulhu lowering with sse 4.1
Summary: With sse4.1 we use two zero_extend_vector_inreg and a pshufd to expand the v16i8 input into two v8i16 vectors for the multiply. That's 3 shuffles to extend one operand. The other operand is usually constant as this is mostly used by division by constant optimization. Pre sse4.1 we use a punpckhbw and a punpcklbw with a zero vector. That's two shuffles and an xor and a copy due to tied register constraints. That seems maybe better than the 3 shuffles. With AVX we avoid the copy so that's obviously better.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55138

llvm-svn: 348079
2018-12-01 19:26:31 +00:00
Graham Sellers
ba559ac058 [AMDGPU] Split 64-Bit XNOR to 64-Bit NOT/XOR
The identity ~(x ^ y) == (~x ^ y) == (x ^ ~y) allows XNOR (XOR/NOT) to turn into NOT/XOR. Handling this case with its own split means we can make the NOT remain in the scalar unit. Previously, we split 64-bit XNOR into two 32-bit XNOR, then lowered. Now, we get three instructions (s_not, v_xor, v_xor) rather than four in the case where either of the sources is a scalar 64-bit.

Add test cases to xnor.ll to attempt XNOR Vx, Sy and XNOR Sx, Vy. Also adding test that uses the opposite identity such that (~x ^ y) on the scalar unit (or vector for gfx906) can generate XNOR. This already worked, but I didn't see a test for it.

Differential: https://reviews.llvm.org/D55071
llvm-svn: 348075
2018-12-01 12:27:53 +00:00
Alex Bradbury
757d296222 [RISCV] Remove RV64I SLLW/SRLW/SRAW patterns and add new test cases
As noted by Eli Friedman <https://reviews.llvm.org/D52977?id=168629#1315291>, 
the RV64I shift patterns for SLLW/SRLW/SRAW make some incorrect assumptions. 
SRAW assumed that (sext_inreg foo, i32) could only be produced when 
sign-extended an i32. However, it can be produced by input such as:

define i64 @tricky_ashr(i64 %a, i64 %b) {
  %1 = shl i64 %a, 32
  %2 = ashr i64 %1, 32
  %3 = ashr i64 %2, %b
  ret i64 %3
}

It's important not to select sraw in the above case, because sraw only uses 
bits lower 5 bits from the shift, while a shift of 32-63 would be valid.

Similarly, the patterns for srlw assumed (and foo, 0xffffffff) would only be 
produced when zero-extending a value that was originally i32 in LLVM IR. This
is obviously incorrect.

This patch removes the SLLW/SRLW/SRAW shift patterns for the time being and 
adds test cases that would demonstrate a miscompile if the incorrect patterns 
were re-added.

llvm-svn: 348067
2018-12-01 05:00:00 +00:00
Artem Belevich
e5664b1559 [NVPTX] Add lowering of i128 numbers as struct fields
Addition to D34555 - override VTs computation with ComputePTXValueVTs
for struct fields.

Author: Denys Zariaiev<denys.zariaiev@gmail.com>

Differential Revision: https://reviews.llvm.org/D55144

llvm-svn: 348057
2018-12-01 00:21:52 +00:00
Nicolai Haehnle
a7b00058e0 AMDGPU: Divergence-driven selection of scalar buffer load intrinsics
Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar load intrinsics directly
to either VMEM or SMRD buffer loads based on divergence analysis.

If an offset happens to end up in a VGPR -- either because a floating
point calculation was involved, or due to other remaining deficiencies
in SIFixSGPRCopies -- we use v_readfirstlane.

There is some unrelated churn in tests since we now select MUBUF offsets
in a unified way with non-scalar buffer loads.

Change-Id: I170e6816323beb1348677b358c9d380865cd1a19

Reviewers: arsenm, alex-t, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53283

llvm-svn: 348050
2018-11-30 22:55:38 +00:00
Nicolai Haehnle
a9cc92c247 AMDGPU: Fix various issues around the VirtReg2Value mapping
Summary:
The VirtReg2Value mapping is crucial for getting consistently
reliable divergence information into the SelectionDAG. This
patch fixes a bunch of issues that lead to incorrect divergence
info and introduces tight assertions to ensure we don't regress:

1. VirtReg2Value is generated lazily; there were some cases where
   a lookup was performed before all relevant virtual registers were
   created, leading to an out-of-sync mapping. Those cases were:

  - Complex code to lower formal arguments that generated CopyFromReg
    nodes from live-in registers (fixed by never querying the mapping
    for live-in registers).

  - Code that generates CopyToReg for formal arguments that are used
    outside the entry basic block (fixed by never querying the
    mapping for Register nodes, which don't need the divergence info
    anyway).

2. For complex values that are lowered to a sequence of registers,
   all registers must be reflected in the VirtReg2Value mapping.

I am not adding any new tests, since I'm not actually aware of any
bugs that these problems are causing with trunk as-is. However,
I recently added a test case (in r346423) which fails when D53283 is
applied without this change. Also, the new assertions should provide
most of the effective test coverage.

There is one test change in sdwa-peephole.ll. The underlying issue
is that since the divergence info is now correct, the DAGISel will
select V_OR_B32 directly instead of S_OR_B32. This leads to an extra
COPY which affects the behavior of MachineLICM in a way that ends up
with the S_MOV_B32 with the constant in a different basic block than
the V_OR_B32, which is presumably what defeats the peephole.

Reviewers: alex-t, arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D54340

llvm-svn: 348049
2018-11-30 22:55:29 +00:00
Jessica Paquette
1cb18ec4ec [MachineOutliner] Outline both register save calls + no LR save calls together
Instead of treating the outlined functions for these as distinct frames, they
should be combined into one case. Neither allows for stack fixups, and both
generate the same frame. Thus, they ought to be considered one case.

This makes the code far easier to understand, for one thing. It also offers
some small code size improvements. It's fairly rare to see a class of outlined
functions that doesn't fall entirely into one variant (on CTMark anyway). It
does happen from time to time though.

This mostly offers some serious simplification.

Also update the test to show the added functionality.

llvm-svn: 348036
2018-11-30 21:14:58 +00:00
Peter Collingbourne
35fcc294ab AArch64: Don't emit CFI for SCS register in nounwind functions.
All that you can legitimately do with the CFI for a nounwind function
is get a backtrace, and adjusting the SCS register is not (currently)
required for this purpose.

Differential Revision: https://reviews.llvm.org/D54988

llvm-svn: 348035
2018-11-30 21:04:25 +00:00
Craig Topper
4d80f199e8 [X86] Change vXi8 MULHU lowering to unpack high and low half of lanes instead of extracting and concating low and high half registers.
This reduces the number of shuffle operations that need to be done. The splitting strategy requires the shuffle unit for the extraction and the extension. With the unpack strategy the unpacks accomplish a splitting and extending in one operation.

llvm-svn: 348019
2018-11-30 18:43:18 +00:00
Craig Topper
8191307d09 [X86] Prefer lowerVectorShuffleAsBitMask over using a avx512 masked operation when avx512bw/avx512vl is enabled.
This does require a constant pool load instead of loading an immediate into a gpr, moving to a k register and masking. But its less instructions and more consistent with previous ISAs. It probably opens up more combine opportunities as one of the test cases demonstrates.

llvm-svn: 348018
2018-11-30 18:43:15 +00:00
Ron Lieberman
f48e43bbf7 [AMDGPU] Disable SReg Global LD/ST, perf regression
Differential Revision: https://reviews.llvm.org/D55093

llvm-svn: 348014
2018-11-30 18:29:17 +00:00
Valery Pykhtin
3d9afa273f [AMDGPU] Combine DPP mov with use instructions (VOP1/2/3)
Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses.

Differential revision: https://reviews.llvm.org/D53762

llvm-svn: 347993
2018-11-30 14:21:56 +00:00
Alex Bradbury
4830fdd21a [RISCV] Add additional CSR instruction aliases (imm. operands)
This patch adds CSR instructions aliases for the cases where the instruction 
takes an immediate operand but the alias doesn't have the i suffix. This is 
necessary for gas/gcc compatibility.

gas doesn't do a similar conversion for fsflags or fsrm, so this should be 
complete.

Differential Revision: https://reviews.llvm.org/D55008
Patch by Luís Marques.

llvm-svn: 347991
2018-11-30 14:10:52 +00:00
Alex Bradbury
26403def69 [RISCV] Add UNIMP instruction (32- and 16-bit forms)
This patch adds support for UNIMP in both 32- and 16-bit forms. The 32-bit 
form can be seen as a variant of the ECALL/EBREAK/etc. family of instructions. 
The 16-bit form is just all zeroes, which isn't a valid RISC-V instruction, 
but still follows the 16-bit instruction form (i.e. bits 0-1 != 11).

Until recently unimp was undocumented and supported just by binutils, which 
printed unimp for either the 16 or 32-bit form. Both forms are now documented 
<https://github.com/riscv/riscv-asm-manual/pull/20> and binutils now supports 
c.unimp <https://sourceware.org/ml/binutils-cvs/2018-11/msg00179.html>.

Differential Revision: https://reviews.llvm.org/D54316
Patch by Luís Marques.

llvm-svn: 347988
2018-11-30 13:39:17 +00:00
Alex Bradbury
e0e62e97df [TargetLowering][RISCV] Introduce isSExtCheaperThanZExt hook and implement for RISC-V
DAGTypeLegalizer::PromoteSetCCOperands currently prefers to zero-extend 
operands when it is able to do so. For some targets this is more expensive 
than a sign-extension, which is also a valid choice. Introduce the 
isSExtCheaperThanZExt hook and use it in the new SExtOrZExtPromotedInteger 
helper. On RISC-V, we prefer sign-extension for FromTy == MVT::i32 and ToTy == 
MVT::i64, as it can be performed using a single instruction.

Differential Revision: https://reviews.llvm.org/D52978

llvm-svn: 347977
2018-11-30 09:56:54 +00:00
Alex Bradbury
bc96a98ed0 [RISCV] Introduce codegen patterns for instructions introduced in RV64I
As discussed in the RFC 
<http://lists.llvm.org/pipermail/llvm-dev/2018-October/126690.html>, 64-bit 
RISC-V has i64 as the only legal integer type.  This patch introduces patterns 
to support codegen of the new instructions 
introduced in RV64I: addiw, addiw, subw, sllw, slliw, srlw, srliw, sraw, 
sraiw, ld, sd.

Custom selection code is needed for srliw as SimplifyDemandedBits will remove 
lower bits from the mask, meaning the obvious pattern won't work:

def : Pat<(sext_inreg (srl (and GPR:$rs1, 0xffffffff), uimm5:$shamt), i32),
          (SRLIW GPR:$rs1, uimm5:$shamt)>;
This is sufficient to compile and execute all of the GCC torture suite for 
RV64I other than those files using frameaddr or returnaddr intrinsics 
(LegalizeDAG doesn't know how to promote the operands - a future patch 
addresses this).

When promoting i32 sltu/sltiu operands, it would be more efficient to use 
sign-extension rather than zero-extension for RV64. A future patch adds a hook 
to allow this.

Differential Revision: https://reviews.llvm.org/D52977

llvm-svn: 347973
2018-11-30 09:38:44 +00:00
Craig Topper
a2133061c0 [X86] Emit PACKUS directly from the v16i8 LowerMULH code instead of using a shuffle.
llvm-svn: 347967
2018-11-30 08:32:05 +00:00
Craig Topper
6e4b266a0d [X86] Change the pre-sse4.1 code in the v16i8 MULHU lowering to be what we get after DAG combine cleans it up.
Previously we emitted a punpcklbw/punpckhbw to move the byte elements into the upper half of 16 bit elements then shifted right by 8 to zero the upper bits. After DAG combine we end up with punpcklbw/punpckhbw into the lower bits with zeros in the uppers bits and no shifts. So just emit that directly.

llvm-svn: 347966
2018-11-30 08:32:01 +00:00
Sjoerd Meijer
ecc7dcb879 [ARM] Don't expand sdiv when optimising for minsize
Don't expand SDIV with an immediate that is a power of 2 if we optimise for
minimum code size. For example:

sdiv %1, i32 4

gets expanded to a sequence of 3 instructions, but this is suboptimal for
minimum code size so instead we just generate a MOV and a SDIV if integer
division is supported.

Differential Revision: https://reviews.llvm.org/D54546

llvm-svn: 347965
2018-11-30 08:14:28 +00:00
Jonas Paulsson
b1d014883c [SystemZ::TTI] i8/i16 operands extension costs revisited
Three minor changes to these extra costs:

* For ICmp instructions, instead of adding 2 all the time for extending each
  operand, this is only done if that operand is neither a load or an
  immediate.

* The operands extension costs for divides removed, because we now use a high
  cost already for the divide (20).

* The costs for lhsr/ashr extra costs removed as this did not seem useful.

Review: Ulrich Weigand
https://reviews.llvm.org/D55053

llvm-svn: 347961
2018-11-30 07:09:34 +00:00
Craig Topper
0850e8a6b6 [X86] Fix a couple types in SimplifyDemandedVectorEltsForTargetNode. NFCI
We had a EVT variable capturing the result of getSimpleValueType which returns an MVT. Another place using EVT that could have been MVT. And an 'int' that should be 'unsigned'.

llvm-svn: 347959
2018-11-30 06:23:55 +00:00
Mircea Trofin
5e0b21fb45 Fix build warnings introduced in rL347938
Summary:
Suppressed warnings in release builds due to variable used
only in assert statement.

Subscribers: llvm-commits, eraman, mgorny

Differential Revision: https://reviews.llvm.org/D55100

llvm-svn: 347939
2018-11-30 01:53:17 +00:00
Mircea Trofin
f1a49e8525 Revert "Revert r347596 "Support for inserting profile-directed cache prefetches""
Summary:
This reverts commit d8517b96dfbd42e6a8db33c50d1fa1e58e63fbb9.

Fix: correct  the use of DenseMap.

Reviewers: davidxl, hans, wmi

Reviewed By: wmi

Subscribers: mgorny, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D55088

llvm-svn: 347938
2018-11-30 01:01:52 +00:00
Thomas Lively
66ea30c7bc [WebAssembly] Expand unavailable integer operations for vectors
Summary:
Expands for vector types all of the integer operations that are
expanded for scalars because they are not supported at all by
WebAssembly.

This CL has no tests because such tests would really be testing the
target-independent expansion, but I'm happy to add tests if reviewers
think it would be helpful.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55010

llvm-svn: 347923
2018-11-29 22:01:01 +00:00
Jonas Devlieghere
ccf7d4b4aa Produce an error on non-encodable offsets for darwin ARM scattered relocations.
Scattered ARM relocations for Mach-O's only have 24 bits available to
encode the offset. This is not checked but just truncated and can result
in corrupt binaries after linking because the relocations are applied to
the wrong offset. This patch will check and error out in those
situations instead of emitting a wrong relocation.

Patch by: Sander Bogaert (dzn)

Differential revision: https://reviews.llvm.org/D54776

llvm-svn: 347922
2018-11-29 21:58:23 +00:00
Alex Bradbury
66d9a752b9 [RISCV] Implement codegen for cmpxchg on RV32IA
Utilise a similar ('late') lowering strategy to D47882. The changes to 
AtomicExpandPass allow this strategy to be utilised by other targets which 
implement shouldExpandAtomicCmpXchgInIR.

All cmpxchg are lowered as 'strong' currently and failure ordering is ignored. 
This is conservative but correct.

Differential Revision: https://reviews.llvm.org/D48131

llvm-svn: 347914
2018-11-29 20:43:42 +00:00
Craig Topper
73c1d75d58 [X86] Change the pre-type legalization DAG combine added in r347898 into a custom type legalization operation instead.
This seems to produce the same results on the tests we have.

llvm-svn: 347912
2018-11-29 20:18:58 +00:00
David Stuttard
c6603861d8 Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic"
Also revert fix r347876

One of the buildbots was reporting a failure in some relevant tests that I can't
repro or explain at present, so reverting until I can isolate.

llvm-svn: 347911
2018-11-29 20:14:17 +00:00
Francis Visoiu Mistrih
0b8dd4488e [MachineScheduler] Order FI-based memops based on stack direction
It makes more sense to order FI-based memops in descending order when
the stack goes down. This allows offsets to stay "consecutive" and allow
easier pattern matching.

llvm-svn: 347906
2018-11-29 20:03:19 +00:00
Craig Topper
129d529ab3 [SelectionDAG][AArch64][X86] Move legalization of vector MULHS/MULHU from LegalizeDAG to LegalizeVectorOps
I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG.

I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse.

Differential Revision: https://reviews.llvm.org/D54276

llvm-svn: 347902
2018-11-29 19:36:17 +00:00
Craig Topper
6cd0b17078 [X86] Add a DAG combine pre type legalization to widen division by constant splat on narrow vectors to avoid scalarization
This is another patch for -x86-experimental-vector-widening. This pre widens narrow division by constants so that we can get pass the legal type check in the generic DAG combiner. Otherwise we end up scalarizing.

I've restricted this to splats for now because it was easy to just call DAG.getConstant. Not sure what we should do for non-splat? Increase the element size?Widen the constant vector by padding with 1?

Differential Revision: https://reviews.llvm.org/D54919

llvm-svn: 347898
2018-11-29 19:13:38 +00:00
Graham Sellers
04f7a4d2d2 [AMDGPU] Add and update scalar instructions
This patch adds support for S_ANDN2, S_ORN2 32-bit and 64-bit instructions and adds splits to move them to the vector unit (for which there is no equivalent instruction). It modifies the way that the more complex scalar instructions are lowered to vector instructions by first breaking them down to sequences of simpler scalar instructions which are then lowered through the existing code paths. The pattern for S_XNOR has also been updated to apply inversion to one input rather than the output of the XOR as the result is equivalent and may allow leaving the NOT instruction on the scalar unit.

A new tests for NAND, NOR, ANDN2 and ORN2 have been added, and existing tests now hit the new instructions (and have been modified accordingly).

Differential: https://reviews.llvm.org/D54714
llvm-svn: 347877
2018-11-29 16:05:38 +00:00
David Stuttard
535c1af0bf Fix: Add support for TFE/LWE in image intrinsic
My change svn-id: 347871 caused a buildbot failure due to an unused
variable def (used in an assert).

Change-Id: Ia882d18bb6fa79b4d7bbfda422b9ea5d23eab336
llvm-svn: 347876
2018-11-29 15:56:36 +00:00
David Stuttard
de02e4b1cc Add support for TFE/LWE in image intrinsics
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.

This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.

This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).

There's an additional fix now to avoid a dmask=0

For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.

Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.

The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:

%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
                                      i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1

Differential revision: https://reviews.llvm.org/D48826

Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda
llvm-svn: 347871
2018-11-29 15:21:13 +00:00
Hans Wennborg
6e3be9d12e Revert r347596 "Support for inserting profile-directed cache prefetches"
It causes asserts building BoringSSL. See https://crbug.com/91009#c3 for
repro.

This also reverts the follow-ups:
Revert r347724 "Do not insert prefetches with unsupported memory operands."
Revert r347606 "[X86] Add dependency from X86 to ProfileData after rL347596"
Revert r347607 "Add new passes to X86 pipeline tests"

llvm-svn: 347864
2018-11-29 13:58:02 +00:00
Petr Pavlu
e6406d568c [GlobalISel] Make EnableGlobalISel always set when GISel is enabled
Change meaning of TargetOptions::EnableGlobalISel. The flag was
previously set only when a target switched on GlobalISel but it is now
always set when the GlobalISel pipeline is enabled. This makes the flag
consistent with TargetOptions::EnableFastISel and allows its use in
other parts of the compiler to determine when GlobalISel is enabled.

The EnableGlobalISel flag had previouly only one use in
TargetPassConfig::isGlobalISelAbortEnabled(). The method used its value
to determine if GlobalISel was enabled by a target and returned false in
such a case. To preserve the current behaviour, a new flag
TargetOptions::GlobalISelAbort is introduced to separately record the
abort behaviour.

Differential Revision: https://reviews.llvm.org/D54518

llvm-svn: 347861
2018-11-29 12:56:32 +00:00
Andrea Di Biagio
373a4ccf6c [llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666).
This patch adds the ability to specify via tablegen which processor resources
are load/store queue resources.

A new tablegen class named MemoryQueue can be optionally used to mark resources
that model load/store queues.  Information about the load/store queue is
collected at 'CodeGenSchedule' stage, and analyzed by the 'SubtargetEmitter' to
initialize two new fields in struct MCExtraProcessorInfo named `LoadQueueID` and
`StoreQueueID`.  Those two fields are identifiers for buffered resources used to
describe the load queue and the store queue.
Field `BufferSize` is interpreted as the number of entries in the queue, while
the number of units is a throughput indicator (i.e. number of available pickers
for loads/stores).

At construction time, LSUnit in llvm-mca checks for the presence of extra
processor information (i.e. MCExtraProcessorInfo) in the scheduling model.  If
that information is available, and fields LoadQueueID and StoreQueueID are set
to a value different than zero (i.e. the invalid processor resource index), then
LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value
declared by the two processor resources.

With this patch, we more accurately track dynamic dispatch stalls caused by the
lack of LS tokens (i.e. load/store queue full). This is also shown by the
differences in two BdVer2 tests. Stalls that were previously classified as
generic SCHEDULER FULL stalls, are not correctly classified either as "load
queue full" or "store queue full".

About the differences in the -scheduler-stats view: those differences are
expected, because entries in the load/store queue are not released at
instruction issue stage. Instead, those are released at instruction executed
stage.  This is the main reason why for the modified tests, the load/store
queues gets full before PdEx is full.

Differential Revision: https://reviews.llvm.org/D54957

llvm-svn: 347857
2018-11-29 12:15:56 +00:00
Nicolai Haehnle
7bed696915 AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo
Summary:
MachineLoopInfo cannot be relied on for correctness, because it cannot
properly recognize loops in irreducible control flow which can be
introduced by late machine basic block optimization passes. See the new
test case for the reduced form of an example that occurred in practice.

Use a simple fixpoint iteration instead.

In order to facilitate this change, refactor WaitcntBrackets so that it
only tracks pending events and registers, rather than also maintaining
state that is relevant for the high-level algorithm. Various accessor
methods can be removed or made private as a consequence.

Affects (in radv):
- dEQP-VK.glsl.loops.special.{for,while}_uniform_iterations.select_iteration_count_{fragment,vertex}

Fixes: r345719 ("AMDGPU: Rewrite SILowerI1Copies to always stay on SALU")

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54231

llvm-svn: 347853
2018-11-29 11:06:26 +00:00
Nicolai Haehnle
ab43bf60fe AMDGPU/InsertWaitcnt: Consistently use uint32_t for scores / time points
Summary:
There is one obsolete reference to using -1 as an indication of "unknown",
but this isn't actually used anywhere.

Using unsigned makes robust wrapping checks easier.

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, llvm-commits, tpr, t-tye, hakzsam

Differential Revision: https://reviews.llvm.org/D54230

llvm-svn: 347852
2018-11-29 11:06:21 +00:00
Nicolai Haehnle
f96456c611 AMDGPU/InsertWaitcnt: Remove unused WaitAtBeginning
Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54229

llvm-svn: 347851
2018-11-29 11:06:18 +00:00
Nicolai Haehnle
d1f45dad84 AMDGPU/InsertWaitcnts: Simplify pending events tracking
Summary:
Instead of storing the "score" (last time point) of the various relevant
events, only store whether an event is pending or not.

This is sufficient, because whenever only one event of a count type is
pending, its last time point is naturally the upper bound of all time
points of this count type, and when multiple event types are pending,
the count type has gone out of order and an s_waitcnt to 0 is required
to clear any pending event type (and will then clear all pending event
types for that count type).

This also removes the special handling of GDS_GPR_LOCK and EXP_GPR_LOCK.
I do not understand what this special handling ever attempted to achieve.
It has existed ever since the original port from an internal code base,
so my best guess is that it solved a problem related to EXEC handling in
that internal code base.

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54228

llvm-svn: 347850
2018-11-29 11:06:14 +00:00
Nicolai Haehnle
ae369d70c3 AMDGPU/InsertWaitcnts: Use foreach loops for inst and wait event types
Summary:
It hides the type casting ugliness, and I happened to have to add a new
such loop (in a later patch).

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54227

llvm-svn: 347849
2018-11-29 11:06:11 +00:00
Nicolai Haehnle
1a94cbb3f5 AMDGPU/InsertWaitcnts: Untangle some semi-global state
Summary:
Reduce the statefulness of the algorithm in two ways:

1. More clearly split generateWaitcntInstBefore into two phases: the
   first one which determines the required wait, if any, without changing
   the ScoreBrackets, and the second one which actually inserts the wait
   and updates the brackets.

2. Communicate pre-existing s_waitcnt instructions using an argument to
   generateWaitcntInstBefore instead of through the ScoreBrackets.

To simplify these changes, a Waitcnt structure is introduced which carries
the counts of an s_waitcnt instruction in decoded form.

There are some functional changes:

1. The FIXME for the VCCZ bug workaround was implemented: we only wait for
   SMEM instructions as required instead of waiting on all counters.

2. We now properly track pre-existing waitcnt's in all cases, which leads
   to less conservative waitcnts being emitted in some cases.

     s_load_dword ...
     s_waitcnt lgkmcnt(0)    <-- pre-existing wait count
     ds_read_b32 v0, ...
     ds_read_b32 v1, ...
     s_waitcnt lgkmcnt(0)    <-- this is too conservative
     use(v0)
     more code
     use(v1)

   This increases code size a bit, but the reduced latency should still be a
   win in basically all cases. The worst code size regressions in my shader-db
   are:

 WORST REGRESSIONS - Code Size
 Before After     Delta Percentage
   1724  1736        12    0.70 %   shaders/private/f1-2015/1334.shader_test [0]
   2276  2284         8    0.35 %   shaders/private/f1-2015/1306.shader_test [0]
   4632  4640         8    0.17 %   shaders/private/ue4_elemental/62.shader_test [0]
   2376  2384         8    0.34 %   shaders/private/f1-2015/1308.shader_test [0]
   3284  3292         8    0.24 %   shaders/private/talos_principle/1955.shader_test [0]

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54226

llvm-svn: 347848
2018-11-29 11:06:06 +00:00
Craig Topper
c2540995ed [X86] Correct comment. NFC
llvm-svn: 347835
2018-11-29 05:56:03 +00:00
Li Jia He
bcae407a3c [PowerPC] Fix a conversion is not considered when the ISD::BR_CC node making the instruction selection
Summary:
 A signed comparison of i1 values produces the opposite result to an unsigned one if the condition code 
 includes less-than or greater-than. This is so because 1 is the most negative signed i1 number and the 
 most positive unsigned i1 number. The CR-logical operations used for such comparisons are non-commutative
 so for signed comparisons vs. unsigned ones, the input operands just need to be swapped.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D54825

llvm-svn: 347831
2018-11-29 03:04:39 +00:00
Sanjay Patel
2de209313e [x86] try select simplification for target-specific nodes
This failed to select (which might be a separate bug) in
X86ISelDAGToDAG because we try to create a select node
that can be simplified away after rL347227.

This change avoids the problem by simplifying the SHRUNKBLEND
node sooner. In the test case, we manage to realize that the
true/false values of the select (SHRUNKBLEND) are the same thing,
so it simplifies away completely.

llvm-svn: 347818
2018-11-28 22:51:04 +00:00
Craig Topper
81f1b4a361 [X86] Make X86TTIImpl::getCastInstrCost properly handle the case where AVX512 is enabled, but 512-bit vectors aren't legal.
Unlike most cost model functions this code makes a lot of table lookups without using the results from getTypeLegalizationCost. This means 512-bit vectors can be looked up even when the type isn't legal.

This patch adds a check around the two tables that contain 512-bit types to make sure that neither of the types would be split by type legalization. Meaning 512 bit types are illegal. I wanted to write this in a somewhat generic way that uses type legalization query hooks. But if prefered, I can switch to just using is512BitVector and the subtarget feature.

Differential Revision: https://reviews.llvm.org/D54984

llvm-svn: 347786
2018-11-28 18:11:42 +00:00
Craig Topper
d3bb036bc9 [X86] Add some cost model entries for sext/zext for avx512bw
This fixes some of scalarization costs reported for sext/zext using avx512bw. This does not fix all scalarization costs being reported. Just the worst.

I've restricted this only to combinations of types that are legal with avx512bw like v32i1/v64i1/v32i16/v64i8 and conversions between vXi1 and vXi8/vXi16 with legal vXi8/vXi16 result types.

Differential Revision: https://reviews.llvm.org/D54979

llvm-svn: 347785
2018-11-28 18:11:39 +00:00
Craig Topper
f3b6f583e2 [X86] Add a combine for back to back VSRAI instructions
Expansion of SIGN_EXTEND_INREG can create a VSRAI instruction. If there is already a VSRAI after it, we should combine them into a larger VSRAI

Differential Revision: https://reviews.llvm.org/D54959

llvm-svn: 347784
2018-11-28 18:03:38 +00:00
Alex Bradbury
893e5bc774 [RISCV] Support .option push and .option pop
This adds support in the RISCVAsmParser the storing of Subtarget feature bits to a stack so that they can be pushed/popped to enable/disable multiple features at once.

Differential Revision: https://reviews.llvm.org/D46424
Patch by Lewis Revill.

llvm-svn: 347774
2018-11-28 16:39:14 +00:00
Francis Visoiu Mistrih
879087ce5b [MachineScheduler] Add support for clustering mem ops with FI base operands
Before this patch, the following stores in `merge_fail` would fail to be
merged, while they would get merged in `merge_ok`:

```
void use(unsigned long long *);
void merge_fail(unsigned key, unsigned index)
{
  unsigned long long args[8];
  args[0] = key;
  args[1] = index;
  use(args);
}
void merge_ok(unsigned long long *dst, unsigned a, unsigned b)
{
  dst[0] = a;
  dst[1] = b;
}
```

The reason is that `getMemOpBaseImmOfs` would return false for FI base
operands.

This adds support for this.

Differential Revision: https://reviews.llvm.org/D54847

llvm-svn: 347747
2018-11-28 12:00:28 +00:00
Francis Visoiu Mistrih
d7eebd6d83 [CodeGen][NFC] Make TII::getMemOpBaseImmOfs return a base operand
Currently, instructions doing memory accesses through a base operand that is
not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`.

This means that functions such as `TII::shouldClusterMemOps` will bail
out on instructions using an FI as a base instead of a register.

The goal of this patch is to refactor all this to return a base
operand instead of a base register.

Then in a separate patch, I will add FI support to the mem op clustering
in the MachineScheduler.

Differential Revision: https://reviews.llvm.org/D54846

llvm-svn: 347746
2018-11-28 12:00:20 +00:00
Simon Atanasyan
af860d44fe [DebugInfo] Rename EmitDebugThreadLocal back to EmitDebugValue. NFC
This reverts r294500. DwarfCompileUnit::addAddressExpr uses DIEExpr
for PCOffset. In that case the expression is unrelated to thread locals
and so emitting a value of the DIEExpr does not have to always mean
emit-debug-thread-local.

llvm-svn: 347744
2018-11-28 11:48:07 +00:00
Jonas Paulsson
06acb3a236 [SystemZ::TTI] Improve cost for compare of i64 with extended i32 load
CGF/CLGF compares an i64 register with a sign/zero extended loaded i32 value
in memory.

This patch makes such a load considered foldable and so gets a 0 cost.

Review: Ulrich Weigand
https://reviews.llvm.org/D54944

llvm-svn: 347735
2018-11-28 08:58:27 +00:00
Jonas Paulsson
d6b7aca911 [SystemZ::TTI] Improve costs for i16 add, sub and mul against memory.
AH, SH and MH costs are already covered in the cases where LHS is 32 bits and
RHS is 16 bits of memory sign-extended to i32.

As these instructions are also used when LHS is i16, this patch recognizes
that the loads will get folded then as well.

Review: Ulrich Weigand
https://reviews.llvm.org/D54940

llvm-svn: 347734
2018-11-28 08:31:50 +00:00
Jonas Paulsson
011a503f25 [SystemZ::TTI] Improved cost values for comparison against memory.
Single instructions exist for i8 and i16 comparisons of memory against a
small immediate.

This patch makes sure that if the load in these cases has a single user (the
ICmp), it gets a 0 cost (folded), and also that the ICmp gets a cost of 1.

Review: Ulrich Weigand
https://reviews.llvm.org/D54897

llvm-svn: 347733
2018-11-28 08:08:05 +00:00
Jonas Paulsson
5da8e432b9 [SystemZ::TTI] Return zero cost for scalar load/store connected with a bswap.
Since byte-swapping loads and stores are supported, a 'load -> bswap' or
'bswap -> store' sequence should have the cost of one.

Review: Ulrich Weigand
https://reviews.llvm.org/D54870

llvm-svn: 347732
2018-11-28 07:52:34 +00:00
Mircea Trofin
35f0e5cd2d Do not insert prefetches with unsupported memory operands.
Summary:
Ignore advices where the memory operand of the 'anchor' instruction
uses unsupported register types.

Reviewers: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54983

llvm-svn: 347724
2018-11-28 01:08:45 +00:00
Evandro Menezes
9ef79c884a [TableGen] Refactor macro names (NFC)
Make the names for the macros for `TargetInstrInfo` uniform.

llvm-svn: 347706
2018-11-27 20:58:27 +00:00
Craig Topper
7ceef03dc9 [X86] Replace an APInt that is guaranteed to be 8-bits with just an 'unsigned'
We're already mixing this APInt with other 'unsigned' variables. This allows us to use regular comparison operators instead of needing to use APInt::ult or APInt::uge. And it removes a later conversion from APInt to unsigned.

I might be adding another combine to this function and this will probably simplify the logic required for that.

llvm-svn: 347684
2018-11-27 18:24:56 +00:00
Craig Topper
5fb34b5498 [X86] Add cascade lake arch in X86 target.
This is skylake-avx512 with the addition of avx512vnni ISA.

Patch by Jianping Chen

Differential Revision: https://reviews.llvm.org/D54785

llvm-svn: 347681
2018-11-27 18:05:00 +00:00
Stanislav Mekhanoshin
443a7f9788 [AMDGPU] Disable DAG combine at -O0
Differential Revision: https://reviews.llvm.org/D54358

llvm-svn: 347659
2018-11-27 15:13:37 +00:00
Craig Topper
196fd31e33 [X86] Use getUnpackl/getUnpackh instead of directly creating UNPCKL/UNPCKH nodes.
llvm-svn: 347642
2018-11-27 06:24:56 +00:00
Craig Topper
4325505f05 [X86] Prevent DAG combine from folding a bitcast from vXi1 to iX with a store on pre-AVX512 targets.
If we fold the bitcast into the store we'll end up creating a truncating store to vXi1 that will get scalarized. Instead allow the bitcast to be turned into a movmsk.

We probably need to do something if the store itself is a vXi1 type, but I'll leave that til a testcase appears.

llvm-svn: 347632
2018-11-27 02:57:27 +00:00
Sterling Augustine
9cc1ffadc5 Notify the linker when a TU compiled with split-stack has a function without a prologue.
More context here: https://go-review.googlesource.com/c/go/+/148819/

llvm-svn: 347614
2018-11-26 23:26:31 +00:00
David Blaikie
1fecbec5fa AArch64ISelLowering: Remove a return-of-assignment to allow NRVO
Patch by Arthur O'Dwyer!

llvm-svn: 347609
2018-11-26 22:57:18 +00:00
Mircea Trofin
183df14520 Add new passes to X86 pipeline tests
Summary: Fixes test failures introduced by rL347596.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54916

llvm-svn: 347607
2018-11-26 22:49:17 +00:00
Fangrui Song
82ddb8154e [X86] Add dependency from X86 to ProfileData after rL347596
llvm-svn: 347606
2018-11-26 22:16:19 +00:00
Evandro Menezes
6a38a5effe [AArch64] Refactor the scheduling predicates (3/3) (NFC)
Refactor the scheduling predicates based on `MCInstPredicate`.  In this
case, `AArch64InstrInfo::hasExtendedReg()`.

Differential revision: https://reviews.llvm.org/D54822

llvm-svn: 347599
2018-11-26 21:47:46 +00:00
Evandro Menezes
56368c6fa5 [AArch64] Refactor the scheduling predicates (2/3) (NFC)
Refactor the scheduling predicates based on `MCInstPredicate`.  In this
case, `AArch64InstrInfo::hasShiftedReg()`.

Differential revision: https://reviews.llvm.org/D54820

llvm-svn: 347598
2018-11-26 21:47:41 +00:00
Evandro Menezes
b02ac8bd21 [AArch64] Refactor the scheduling predicates (1/3) (NFC)
Refactor the scheduling predicates based on `MCInstPredicate`.  In this
case, `AArch64InstrInfo::isScaledAddr()`

Differential revision: https://reviews.llvm.org/D54777

llvm-svn: 347597
2018-11-26 21:47:28 +00:00
Mircea Trofin
cfbc1788d6 Support for inserting profile-directed cache prefetches
Summary:
Support for profile-driven cache prefetching (X86)

This change is part of a larger system, consisting of a cache prefetches recommender, create_llvm_prof (https://github.com/google/autofdo), and LLVM.

A proof of concept recommender is DynamoRIO's cache miss analyzer. It processes memory access traces obtained from a running binary and identifies patterns in cache misses. Based on them, it produces a csv file with recommendations. The expectation is that, by leveraging such recommendations, we can reduce the amount of clock cycles spent waiting for data from memory. A microbenchmark based on the DynamoRIO analyzer is available as a proof of concept: https://goo.gl/6TM2Xp.

The recommender makes prefetch recommendations in terms of:

* the binary offset of an instruction with a memory operand;
* a delta;
* and a type (nta, t0, t1, t2)

meaning: a prefetch of that type should be inserted right before the instrution at that binary offset, and the prefetch should be for an address delta away from the memory address the instruction will access.

For example:

0x400ab2,64,nta

and assuming the instruction at 0x400ab2 is:

movzbl (%rbx,%rdx,1),%edx

means that the recommender determined it would be beneficial for a prefetchnta instruction to be inserted right before this instruction, as such:

prefetchnta 0x40(%rbx,%rdx,1)
movzbl (%rbx, %rdx, 1), %edx

The workflow for prefetch cache instrumentation is as follows (the proof of concept script details these steps as well):

1. build binary, making sure -gmlt -fdebug-info-for-profiling is passed. The latter option will enable the X86DiscriminateMemOps pass, which ensures instructions with memory operands are uniquely identifiable (this causes ~2% size increase in total binary size due to the additional debug information).

2. collect memory traces, run analysis to obtain recommendations (see above-referenced DynamoRIO demo as a proof of concept).

3. use create_llvm_prof to convert recommendations to reference insertion locations in terms of debug info locations.

4. rebuild binary, using the exact same set of arguments used initially, to which -mllvm -prefetch-hints-file=<file> needs to be added, using the afdo file obtained at step 3.

Note that if sample profiling feedback-driven optimization is also desired, that happens before step 1 above. In this case, the sample profile afdo file that was used to produce the binary at step 1 must also be included in step 4.

The data needed by the compiler in order to identify prefetch insertion points is very similar to what is needed for sample profiles. For this reason, and given that the overall approach (memory tracing-based cache recommendation mechanisms) is under active development, we use the afdo format as a syntax for capturing this information. We avoid confusing semantics with sample profile afdo data by feeding the two types of information to the compiler through separate files and compiler flags. Should the approach prove successful, we can investigate improvements to this encoding mechanism.

Reviewers: davidxl, wmi, craig.topper

Reviewed By: davidxl, wmi, craig.topper

Subscribers: davide, danielcdh, mgorny, aprantl, eraman, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D54052

llvm-svn: 347596
2018-11-26 21:36:18 +00:00
Matt Arsenault
88ce3dcbc8 AMDGPU: Record SGPR spills when restoring too
It's possible in some cases to have a restore present
without a corresponding spill. Due to an apparent bug
in D54366 <https://reviews.llvm.org/D54366>, only the
restore for a register was emitted. It's probably
always a bug for this to happen, but due to how SGPR
spilling is implemented, this makes the issues appear
worse than it is.

llvm-svn: 347595
2018-11-26 21:28:40 +00:00
Craig Topper
b955bf382c [LegalizeVectorTypes][X86][ARM][AArch64][PowerPC] Don't use SplitVecOp_TruncateHelper for FP_TO_SINT/UINT.
SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512.

This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral.

Differential Revision: https://reviews.llvm.org/D54906

llvm-svn: 347593
2018-11-26 21:12:39 +00:00
Than McIntosh
30c804bbb1 [CodeGen] Support custom format of stack maps
Summary:
Add a hook to the GCMetadataPrinter for emitting stack maps in
custom format. The hook will be called at stack map generation
time. The default stack map format is used if there is no hook.

For this to be useful a few data structures and accessors are
exposed from the StackMaps class, so the custom printer can
access the stack map data.

This patch authored by Cherry Zhang <cherryyz@google.com>.

Reviewers: thanm, apilipenko, reames

Reviewed By: reames

Subscribers: reames, apilipenko, nemanjai, javed.absar, kbarton, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D53892

llvm-svn: 347584
2018-11-26 18:43:48 +00:00
Matt Arsenault
105fc1a5f3 AMDGPU: Don't optimize exec masks at -O0
llvm-svn: 347573
2018-11-26 17:02:02 +00:00
Matt Arsenault
6384d9ea31 AMDGPU: Only add implicit super-reg def for first subreg
llvm-svn: 347572
2018-11-26 17:02:01 +00:00
Sanjay Patel
d31220e0de [x86] promote all multiply i8 by constant to i32
We have these 2 "isDesirable" promotion hooks (I'm not sure why we need both of them, but that's 
independent of this patch), and we can adjust them to promote "mul i8 X, C" to i32. Then, all of 
our existing LEA and other multiply expansion magic happens as it would for i32 ops.

Some of the test diffs show that we could end up with an actual 32-bit mul instruction here 
because we choose not to expand to simpler ops. That instruction could be slower depending on the 
subtarget. On the plus side, this means we don't need a separate instruction to load the constant 
operand and possibly an extra instruction to move the result. If we need to tune mul i32 further, 
we could add a later transform that tries to shrink it back to i8 based on subtarget timing.

I did not bother to duplicate all of the 32-bit test file RUNs and target settings that exist to 
test whether LEA expansion is cheap or not. The diffs here assume a default target, so that means 
LEA is generally cheap.

Differential Revision: https://reviews.llvm.org/D54803

llvm-svn: 347557
2018-11-26 15:22:30 +00:00
Diana Picus
0528e2cfb3 [ARM GlobalISel] Support G_CTLZ and G_CTLZ_ZERO_UNDEF
We can now select CLZ via the TableGen'erated code, so support G_CTLZ
and G_CTLZ_ZERO_UNDEF throughout the pipeline for types <= s32.

Legalizer:
If the CLZ instruction is available, use it for both G_CTLZ and
G_CTLZ_ZERO_UNDEF. Otherwise, use a libcall for G_CTLZ_ZERO_UNDEF and
lower G_CTLZ in terms of it.

In order to achieve this we need to add support to the LegalizerHelper
for the legalization of G_CTLZ_ZERO_UNDEF for s32 as a libcall (__clzsi2).

We also need to allow lowering of G_CTLZ in terms of G_CTLZ_ZERO_UNDEF
if that is supported as a libcall, as opposed to just if it is Legal or
Custom. Due to a minor refactoring of the helper function in charge of
this, we will also allow the same behaviour for G_CTTZ and G_CTPOP.
This is not going to be a problem in practice since we don't yet have
support for treating G_CTTZ and G_CTPOP as libcalls (not even in
DAGISel).

Reg bank select:
Map G_CTLZ to GPR. G_CTLZ_ZERO_UNDEF should not make it to this point.

Instruction select:
Nothing to do.

llvm-svn: 347545
2018-11-26 11:07:02 +00:00
Sam Parker
5338f7aae4 [ARM] Prevent parallel macs for unsigned values
Both zext and sext are currently allowed during the search for narrow
sequences and sexts operands are later added to the mac candidates.
But operands of muls are also added, without checking whether they're
sext or zext, which means we can generate a signed smlad when we
shouldn't.

Differential Revision: https://reviews.llvm.org/D54790

llvm-svn: 347542
2018-11-26 10:22:55 +00:00
Kang Zhang
840e98f9f1 Revert "[PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction"
This reverts commits r347532. Forget add the option 
-mtriple powerpc64-unknown-linux-gnu. So other platform is error except
for PowerPC.

llvm-svn: 347534
2018-11-26 07:15:31 +00:00
Kang Zhang
e98d4f511c [PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction
Summary:
There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the
function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD.
These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D54738

llvm-svn: 347532
2018-11-26 06:03:25 +00:00
Sanjay Patel
7336e7c67a [x86] limit transform for select-of-fp-constants
This should likely be adjusted to limit this transform
further, but these diffs should be clear wins.

If we have blendv/conditional move, then we should assume 
those are cheap ops. The loads become independent of the
compare, so those can be speculated before we need to use 
the values in the blend/mov.

llvm-svn: 347526
2018-11-25 17:27:02 +00:00
Fangrui Song
220f2a9cac [ARM] Add dependency from ARMAsmParser to ARMAsmPrinter after r347494
This fixes -DBUILD_SHARED_LIBS=on

llvm-svn: 347506
2018-11-23 23:43:46 +00:00
Luke Cheeseman
6db3a6a4a7 Revert r347490 as it breaks address sanitizer builds
llvm-svn: 347499
2018-11-23 17:13:06 +00:00
Oliver Stannard
173bc2bb7f [ARM][AsmParser] Improve debug printing of parsed asm operands
In ARMOperand::print:
- Print human-readable register names, instead of numbers.
- Print the correct names for IT condition masks (these were in the wrong order
  before).
- Print all parts of memory operands, not just the base register.

This makes the output of llvm-mc -show-inst-operands more readable.

Differential revision: https://reviews.llvm.org/D54850

llvm-svn: 347494
2018-11-23 14:27:21 +00:00
Luke Cheeseman
d6dbd64104 Revert r343341
- Cannot reproduce the build failure locally and the build logs have
  been deleted.

llvm-svn: 347490
2018-11-23 11:01:47 +00:00
John Brawn
d6e0ebea10 [AArch64] Fix SelectionDAG infinite loop for v1i64 SCALAR_TO_VECTOR
A consequence of r347274 is that SCALAR_TO_VECTOR can be converted into
BUILD_VECTOR by SimplifyDemandedBits, but LowerBUILD_VECTOR can turn
BUILD_VECTOR into SCALAR_TO_VECTOR so we get an infinite loop.

Fix this by making LowerBUILD_VECTOR not do this transformation for those
vectors that would get transformed back, i.e. BUILD_VECTOR of a single-element
constant vector. Doing that means we get a DUP, which we then need to recognise
in ISel as a copy.

llvm-svn: 347456
2018-11-22 11:45:23 +00:00
Jonas Paulsson
96782c2c0b [SystemZTTIImpl] Give correct cost values for vector bswap intrinsics.
Implement getIntrinsicInstrCost() and return costs reflecting that bswap can
be done with a vperm per vector register.

Review: Ulrich Weigand
https://reviews.llvm.org/D54789

llvm-svn: 347445
2018-11-22 07:17:29 +00:00
Stefan Pintilie
9d6445d34c [PowerPC][NFC] Split PPCMCCodeEmitter into header and cpp file.
This is further cleanup for PPCMCCodeEmitter. The class had been contained
within the cpp file alone. Now it has been split up between a header file and
a cpp file which allows other classes to make use of the functions in this class
if required.

llvm-svn: 347428
2018-11-21 21:23:50 +00:00
Stefan Pintilie
46e3cd76e2 [PowerPC][NFC] Minor Code Cleaup for PPCMCCodeEmitter.
llvm-svn: 347422
2018-11-21 20:47:59 +00:00
Sanjay Patel
cadf62f360 [x86] fix predicate for avoiding vblendv
It only makes sense to produce the logic ops when 1 of the
constants is +0.0. Otherwise, go with vblendv to reduce code.

llvm-svn: 347403
2018-11-21 18:02:50 +00:00
Vladimir Stefanovic
64ad1cf24b [mips][mc] Add basic support for R_MIPS_JALR/R_MICROMIPS_JALR
R_MIPS_JALR/R_MICROMIPS_JALR can now be parsed in .s files and emitted to .o.
They are still not generated with JALR.

Differential revision: https://reviews.llvm.org/D54721

llvm-svn: 347398
2018-11-21 16:38:34 +00:00
Michal Gorny
71e902101d [nios2] Add missing Nios2CodeGen -> Nios2AsmPrinter linkage
Add missing linkage from Nios2CodeGen library to Nios2AsmPrinter
library.  The missing dependency causes shared-lib build to fail with
the following reason:

  lib/Target/Nios2/CMakeFiles/LLVMNios2CodeGen.dir/Nios2AsmPrinter.cpp.o: In function `(anonymous namespace)::Nios2AsmPrinter::PrintAsmMemoryOperand(llvm::MachineInstr const*, unsigned int, unsigned int, char const*, llvm::raw_ostream&)':
  Nios2AsmPrinter.cpp:(.text._ZN12_GLOBAL__N_115Nios2AsmPrinter21PrintAsmMemoryOperandEPKN4llvm12MachineInstrEjjPKcRNS1_11raw_ostreamE+0x2b): undefined reference to `llvm::Nios2InstPrinter::getRegisterName(unsigned int)'
  lib/Target/Nios2/CMakeFiles/LLVMNios2CodeGen.dir/Nios2AsmPrinter.cpp.o: In function `(anonymous namespace)::Nios2AsmPrinter::PrintAsmOperand(llvm::MachineInstr const*, unsigned int, unsigned int, char const*, llvm::raw_ostream&)':
  Nios2AsmPrinter.cpp:(.text._ZN12_GLOBAL__N_115Nios2AsmPrinter15PrintAsmOperandEPKN4llvm12MachineInstrEjjPKcRNS1_11raw_ostreamE+0x97): undefined reference to `llvm::Nios2InstPrinter::getRegisterName(unsigned int)'
  collect2: error: ld returned 1 exit status

Differential Revision: https://reviews.llvm.org/D47810

llvm-svn: 347387
2018-11-21 11:25:01 +00:00
Simon Pilgrim
66bae9aee8 [X86][AVX] Remove BROADCAST if we only need the 0'th element
We don't catch this with target shuffle simplification if the src/dst types are different.

llvm-svn: 347386
2018-11-21 11:00:09 +00:00
Craig Topper
e9b4001a82 [X86] In getScalarMaskingNode, replace scalar_to_vector with a bitcast to v8i1 and an extract_subvector to convert i8 to v1i1.
The bitcast can be nicely merged with any i8 loads that exist for argument passing in 32 mode for example.

llvm-svn: 347380
2018-11-21 07:01:22 +00:00
Nemanja Ivanovic
5cf902ccd4 [PowerPC] Do not use vectors to codegen bswap with Altivec turned off
We have efficient codegen on P9 for lowering bswap that involves moving
the value into a vector reg and moving it back. However, the check under
which we custom lowered it did not adequately reflect the actual requirements.
It required only that the subtarget be an implementation of ISA 3.0 since all
compliant implementations have to provide the vector instructions.
However, the kernel builds have a valid use case for -mno-altivec -mcpu=pwr9
(i.e. don't emit vector code, don't have to save vector regs for context
switch). So we should require the correct features for this lowering.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39334

llvm-svn: 347376
2018-11-21 02:53:50 +00:00
Craig Topper
27a5896fe8 [X86] Correct 256 vpmovzx/vpmovsx isel patterns to check HasAVX2 instead of HasAVX to prevent fast-isel from using them incorrectly.
These are AVX2 instructions, but have been incorrectly marked in tablegen for a while. This wasn't a problem until r346784 switched the patterns to use target independent ISD opcodes. This made the patterns visible to fast isel.

Fixes PR39733

llvm-svn: 347375
2018-11-21 01:39:38 +00:00
Craig Topper
aa52ee2770 [X86] Emit a PACKUS instead of a VECTOR_SHUFFLE from LowerTRUNCATE for v16i16->v16i8.
We can't guarantee that demanded bits passing through the vector shuffle won't cause the AND in front of this to be removed. This would prevent the PACKUS from being matched during shuffle lowering.

Unfortunately, this adds a packuswb to one of the vector-reduce-mul.ll tests since we were removing the shuffle via SimplifyDemandedVectorElts. We appear to have similar issues with vpmovwb on the same test case on other targets.

llvm-svn: 347361
2018-11-20 22:57:48 +00:00
Craig Topper
24b346da42 [X86] Emit a single shuffle for the v16i8->v4i32 step of a SIGN_EXTEND_VECTOR_INREG lowering on pre-sse4.1 targets.
Previously we emitted to separate shuffles, one for unpcklbw and one for unpcklwd. Instead emit a single shuffle equivalent to both of the original shuffles. Shuffle lowering seems able to handle it. This avoids a bitcast between the two shuffles which seems helpful to DAG combine.

Remove the custom type legalization for v8i8->v8i32. I had put that in to avoid some almost duplicate punpcklbw instructions I was seeing, but this lowering change seems to fix that. It also fixes some duplicate shuffles seen in vector-sext.ll

llvm-svn: 347348
2018-11-20 21:21:52 +00:00
Sam Clegg
4791a668f5 [WebAssembly] WebAssemblyLowerEmscriptenEHSjLj: use getter/setter for accessing tempRet0
Rather than assuming that `tempRet0` exists in linear memory only assume
the getter/setter functions exist.  This avoids conflicting with
binaryen which declares a wasm global for this purpose and defines it's
own getter and setter for that.

The other advantage of doing things this way is that it leaving
it up to the linker/finalizer to decide how to actually store this
temporary.  As it happens binaryen uses a wasm global which is more
appropriate since it is thread safe.

This also allows us to change the way this is stored in the future
(memory, TLS memory, wasm global) without modifying LLVM.

This is part of a 4 part change:
LLVM: https://reviews.llvm.org/D53240
fastcomp: https://github.com/kripken/emscripten-fastcomp/pull/237
emscripten: https://github.com/kripken/emscripten/pull/7358
binaryen: https://github.com/WebAssembly/binaryen/pull/1709

Differential Revision: https://reviews.llvm.org/D53240

llvm-svn: 347340
2018-11-20 19:25:07 +00:00
Jinsong Ji
9a0ed20072 [PowerPC] Add Itineraries for STWU/STWUX etc
When doing some instruction scheduling work, we noticed some missing itineraries.

Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling, 
because we can still get same latency due to default values.

With machine scheduler, however, itineraries will have impact to scheduling.
eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class.
And most of the instruction class with itineraries will have NumMicroOps default to 1.

This will has impact on the count of RetiredMOps, affects the Pending/Available Queue, 
then causing different scheduling or suboptimal scheduling further.

This patch is for STWU/STWUX (IIC_LdStStoreUpd ) for P8.

Since there are already multiple IIC for store update, this patch also merge
IIC_LdStSTDU/IIC_LdStStoreUpd to IIC_LdStSTU
IIC_LdStSTDUX to IIC_LdStSTUX

and we add a new testcase in https://reviews.llvm.org/D54699 to show the difference.

Differential Revision: https://reviews.llvm.org/D54700

llvm-svn: 347311
2018-11-20 15:11:42 +00:00
Simon Pilgrim
c9cc6cca42 Fix MSVC 'truncation of constant value' warning. NFCI.
llvm-svn: 347308
2018-11-20 14:29:40 +00:00
Simon Pilgrim
ee8b96f253 [X86][SSE] Add computeKnownBits/ComputeNumSignBits support for PACKSS/PACKUS instructions.
Pull out getPackDemandedElts demanded elts remapping helper from computeKnownBitsForTargetNode and use in computeKnownBits/ComputeNumSignBits.

llvm-svn: 347303
2018-11-20 13:23:37 +00:00
Simon Pilgrim
ed7e2fda18 [X86][SSE] XFormVExtractWithShuffleIntoLoad - getVectorShuffle won't accept SM_SentinelZero
Noticed while working on improving demanded elts target shuffle shuffle combining

llvm-svn: 347302
2018-11-20 12:17:50 +00:00
Simon Pilgrim
a6fb85ffa7 [X86][SSE] Lower immediately to PACKUS instead of VECTOR_SHUFFLE.
As discussed on rL347240, this avoids some regressions on D54679 and also helps some combines to kick in a bit earlier.

llvm-svn: 347300
2018-11-20 11:46:37 +00:00
Simon Pilgrim
7198506ba8 [X86][SSE] Add SimplifyDemandedVectorElts support for PACKSS/PACKUS instructions.
As discussed on rL347240.

llvm-svn: 347299
2018-11-20 11:09:46 +00:00
Craig Topper
17fa42a69b [X86] Preserve undef information when creating a punpckl/hbw from a v16i8 where all the even or odd elements are undef.
Previously if V2 was unused we ended up using V1 for both inputs as part of the code that follows the new code. By using lowerVectorShuffleWithUNPCK we keep the undef nature of V2 in the output.

As near as I can tell this makes v16i8 behavior consistent with every other VT now.

This does mean that we give the register allocator freedom to fill in random registers now and create false dependencies. But like I said we're already doing that for other types.

llvm-svn: 347296
2018-11-20 09:04:01 +00:00
Craig Topper
b06d1aa3a1 [X86] Add custom type legalization for v8i8->v8i32 sign extend pre-SSE4.1
This helps with a future patch and makes us less reliant on DAG combine merging shuffles.

llvm-svn: 347295
2018-11-20 09:03:58 +00:00
Craig Topper
c733c7bf94 [X86] Replace more calls to getZeroVector with regular getConstant.
getZeroVector produces a specifically canonicalized zero vector, but we can just let DAG legalization take care of it.

The test changes are because MULH lowering happens later than it should and this change gave us the opportunity to constant fold away a multiply during a DAG combine before the build_vector got legalized with a bitcast.

llvm-svn: 347290
2018-11-20 06:54:01 +00:00
Nemanja Ivanovic
9b393909e2 [PowerPC] Don't combine to bswap store on 1-byte truncating store
Turns out that there was no check for a store that truncates down
to a single byte when combining a (store (bswap...)) into a byte-swapping
store. This patch just adds that check.

Fixes https://bugs.llvm.org/show_bug.cgi?id=39478.

llvm-svn: 347288
2018-11-20 04:42:31 +00:00
Craig Topper
808d0dd689 [X86] Rename combineVSZext->combineExtendVectorInreg. NFC
Now that we no longer have target specific vector extend nodes let's make the function name match the nodes we do use.

llvm-svn: 347268
2018-11-19 22:18:47 +00:00
Konstantin Zhuravlyov
700b1ef54d AMDGPU: Fix V_FMA_F16 selection on GFX9
GFX9 should select opsel version.

Differential Revision: https://reviews.llvm.org/D54545

llvm-svn: 347265
2018-11-19 21:10:16 +00:00
Stanislav Mekhanoshin
8bafbae889 [AMDGPU] Restored selection of scalar_to_vector (v2x16)
This works if DAG combiner is enabled, but without combining
we cannot select scalar_to_vector of <2 x half> and <2 x i16>.

Differential Revision: https://reviews.llvm.org/D54718

llvm-svn: 347259
2018-11-19 19:58:13 +00:00
Craig Topper
a5e0380c30 [X86][CostModel] Don't lookup intrinsic cost tables if the intrinsic isn't one we care about
We're seeing some issues internally where we sent some intrinsics into the cost model that the getTypeLegalizationCost call fails on, but X86 specific tables don't care about. Our base class implementation takes care of them. We'd just like X86 backend to ignore them.

This patch makes sure the switch returned something X86 cares about and skips the table lookups and type legalization call if not. Probably more efficient too since we don't go scanning the tables for every intrinsic we could possibly see.

Differential Revision: https://reviews.llvm.org/D54711

llvm-svn: 347248
2018-11-19 18:57:31 +00:00
Simon Pilgrim
c4861ab170 [X86][SSE] Remove unnecessary bit-and in pshufb vector ctlz (PR39703)
SSE PSHUFB vector ctlz lowering works at the i4 nibble level. As detailed in PR39703, we were masking the lower nibble off but we only actually use it in the case where the upper nibble is known to be zero, making it safe to remove the mask and save an instruction.

Differential Revision: https://reviews.llvm.org/D54707

llvm-svn: 347242
2018-11-19 18:40:59 +00:00
Craig Topper
311bbcd535 [X86] Attempt to improve v32i8/v64i8 multiply lowering by applying the v16i8 non-avx2 algorithm to each 128-bit lane.
Previously we split the vectors in half to allow the two halves to be any extended then concatenated the results back together.

This patch instead instead extends the v16i8 sse algorithm to extend half of each 128-bit lane using punpcklbw/punpckhbw. Multiplies all the low half lanes and high half lanes together in separate operations. Then merges the half lane results back together using packuswb.

Unfortunately, some of the cases in vector-reduce-mul.ll regress because we aren't narrowing the vector width of the multiplies as we reduce. The splitting was somewhat making up for that before by causing halves to be discarded after the split.

Differential Revision: https://reviews.llvm.org/D54668

llvm-svn: 347240
2018-11-19 18:32:53 +00:00
Fangrui Song
d83a5526d5 [AMDGPU] Fix -Wunused-variable
llvm-svn: 347234
2018-11-19 17:54:27 +00:00
Stanislav Mekhanoshin
054f8101f1 [AMDGPU] Convert insert_vector_elt into set of selects
This allows to avoid scratch use or indirect VGPR addressing for
small vectors.

Differential Revision: https://reviews.llvm.org/D54606

llvm-svn: 347231
2018-11-19 17:39:20 +00:00
Wouter van Oortmerssen
49482f824a [WebAssembly] replaced .param/.result by .functype
Summary:
This makes it easier/cleaner to generate a single signature from
this directive. Also:
- Adds the symbol name, such that we don't depend on the location
  of this directive anymore.
- Actually constructs the signature in the assembler, and make the
  assembler own it.
- Refactor the use of MVT vs ValType in the streamer and assembler
  to require less conversions overall.
- Changed 700 or so tests to use it.

Reviewers: sbc100, dschuff

Subscribers: jgravelle-google, eraman, aheejin, sunfish, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D54652

llvm-svn: 347228
2018-11-19 17:10:36 +00:00
David Stuttard
be3d7ba9fb [AMDGPU] Derive GCNSubtarget from MF to get overridden target features
Summary:
AMDGPUAsmPrinter has a getSTI function that derives a GCNSubtarget from the
TM. However, this means that overridden target features are not detected and can
result in incorrect behaviour.

Switch to using STM which is a GCNSubtarget derived from the MF (used elsewhere
in the same function).

Change-Id: Ib6328ad667b7fcdc87e9c06344e59859207db9b0

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D54301

llvm-svn: 347221
2018-11-19 15:44:20 +00:00
Martin Elshuber
fef3036d37 Subject: [PATCH] [CodeGen] Add pass to combine interleaved loads.
This patch defines an interleaved-load-combine pass. The pass searches
for ShuffleVector instructions that represent interleaved loads. Matches are
converted such that they will be captured by the InterleavedAccessPass.

The pass extends LLVMs capabilities to use target specific instruction
selection of interleaved load patterns (e.g.: ld4 on Aarch64
architectures).

Differential Revision: https://reviews.llvm.org/D52653

llvm-svn: 347208
2018-11-19 14:26:10 +00:00
Nicolai Haehnle
c548d91419 AMDGPU/InsertWaitcnts: Some more const-correctness
Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54225

llvm-svn: 347192
2018-11-19 12:03:11 +00:00
Sam Parker
e7c42dd7e2 [ARM] Remove trunc sinks in ARM CGP
Truncs are treated as sources if their produce a value of the same
type as the one we currently trying to promote. Truncs used to be
considered as a sink if their operand was the same value type.
    
We now allow smaller types in the search, so we should search through
truncs that produce a smaller value. These truncs can then be
converted to an AND mask.
    
This leaves sinks as being:
  - points where the value in the register is being observed, such as
    an icmp, switch or store.
  - points where value types have to match, such as calls and returns.
  - zext are included to ease the transformation and are generally
    removed later on.
    
During this change, it also became apart from truncating sinks was
broken: if a sink used a source, its type information had already
been lost by the time the truncation happens. So I've changed the
method of caching the type information.

Differential Revision: https://reviews.llvm.org/D54515

llvm-svn: 347191
2018-11-19 11:34:40 +00:00
Anton Korobeynikov
4df19b75c0 [MSP430] Optimize srl/sra in case of A >> (8 + N)
There is no variable-length shifts on MSP430. Therefore
"eat" 8 bits of shift via bswap & ext.

Path by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D54623

llvm-svn: 347187
2018-11-19 10:43:02 +00:00
Craig Topper
8b22bcd39f [X86] Use a pcmpgt with 0 instead of psrad 31, to fill elements with the sign bit in v4i32 MULH lowering.
The shift requires a copy to avoid clobbering a register. Comparing with 0 uses an xor to produce 0 that will be overwritten with the compare results. So still requires 2 instructions, but should be one byte shorter since it doesn't need to encode an immediate.

llvm-svn: 347185
2018-11-19 07:22:26 +00:00
Craig Topper
3616891046 [X86] Use compare with 0 to fill an element with sign bits when sign extending to v2i64 pre-sse4.1
Previously we used an arithmetic shift right by 31, but that requires a copy to preserve the input. So we might as well materialize a zero and compare to it since the comparison will overwrite the register that contains the zeros. This should be one byte shorter.

llvm-svn: 347181
2018-11-19 04:33:20 +00:00
Craig Topper
053f1eea96 [X86] Remove most of the SEXTLOAD Custom setOperationAction calls under -x86-experimental-vector-widening-legalization.
Leave just the v4i8->v4i64 and v8i8->v8i64, but only enable them on pre-sse4.1 targets when 64-bit mode is enabled. In those cases we end up creating sext loads that get scalarized to code that looks better than what we get from loading into a vector register and doing a multiple step sign extend using unpacks and shifts.

llvm-svn: 347180
2018-11-19 00:33:16 +00:00
Simon Pilgrim
7f92efa5a9 [X86][SSE] Add SimplifyDemandedVectorElts support for SSE packed i2fp conversions.
llvm-svn: 347177
2018-11-18 22:13:31 +00:00
Craig Topper
0468c860b7 [X86] Add custom type legalization for extending v4i8/v4i16->v4i64.
Pre-SSE4.1 sext_invec for v2i64 is complicated because we don't have a v2i64 sra instruction. So instead we sign extend to i32 using unpack and sra, then copy the elements and do a v4i32 sra to fill with sign bits, then interleave the i32 sign extend and the sign bits. So really we're doing to two sign extends but only using half of the v4i32 intermediate result.

When the result is more than 128 bits, default type legalization would prefer to split the destination type all the way down to v2i64 with shuffles followed by v16i8/v8i16->v2i64 sext_inreg operations. This results in more instructions than necessary because we are only utilizing the lower 2 elements of the v4i32 intermediate result. Instead we can custom split a v4i8/v4i16->v4i64 sign_extend. Then we can sign extend v4i8/v4i16->v4i32 invec producing a full v4i32 result. Create the sign bit vector as a v4i32 then split and interleave with the sign bits using an punpackldq and punpackhdq.

llvm-svn: 347176
2018-11-18 21:28:50 +00:00
Simon Pilgrim
b31bdbd2e9 [X86][SSE] Add SimplifyDemandedVectorElts support for SSE splat-vector-shifts.
SSE vector shifts only use the bottom 64-bits of the shift amount vector.

llvm-svn: 347173
2018-11-18 20:21:52 +00:00
Craig Topper
11d50948e2 [X86] Disable combineToExtendVectorInReg under -x86-experimental-vector-widening-legalization. Add custom type legalization for extends.
If we widen illegal types instead of promoting, we should be able to rely on the type legalizer to create the vector_inreg operations for us with some caveats.

This patch disables combineToExtendVectorInReg when we are using widening.

I've enabled custom legalization for v8i8->v8i64 extends under avx512f since the type legalizer would want to create a vector_inreg with a v64i8 input type which isn't legal without avx512bw. So we go to v16i8 with custom code using the relaxation of rules we get from D54346.

I've also enable custom legalization of v8i64 and v16i32 operations with with AVX. When the input type is 128 bits, the default splitting legalization would extend first 128->256, then do the a split to two 128 pieces. Extend each half to 256 and then concat the result. The custom legalization I've added instead uses a 128->256 bit vector_inreg extend that only reads the lower 64-bits for the low half of the split. Then shuffles the high 64-bits to the low 64-bits and does another vector_inreg extend.

llvm-svn: 347172
2018-11-18 18:11:25 +00:00
Craig Topper
bc8148f7b0 [X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an extract_subvector, and a packuswb instruction.
Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54671

llvm-svn: 347171
2018-11-18 17:59:28 +00:00
Simon Pilgrim
ec808cf541 Remove unused variable. NFCI.
llvm-svn: 347169
2018-11-18 17:24:59 +00:00
Simon Pilgrim
50828c75d0 [X86][SSE] Split IsSplatValue into GetSplatValue and IsSplatVector
Refactor towards making this recursive (necessary for PR38243 rotation splat detection).
IsSplatVector returns the original vector source of the splat and the splat index.
GetSplatValue returns the scalar splatted value as an extraction from IsSplatVector.

llvm-svn: 347168
2018-11-18 17:15:06 +00:00
Simon Pilgrim
fec9f8657b [X86][SSE] Relax IsSplatValue - remove the 'variable shift' limit on subtracts.
Means we don't use the per-lane-shifts as much when we can cheaply use the older splat-variable-shifts.

llvm-svn: 347162
2018-11-18 15:52:08 +00:00
Simon Pilgrim
cc1f5d2407 [X86][SSE] Use raw shuffle mask decode in SimplifyDemandedVectorEltsForTargetNode (PR39549)
We were using the 'normalized' shuffle mask from resolveTargetShuffleInputs, which replaces zero/undef inputs with sentinel values. For SimplifyDemandedVectorElts we need the raw mask so we can correctly demand those 'zero' inputs that got normalized away, this requires an extra bit of logic to locally normalize undef inputs.

llvm-svn: 347158
2018-11-18 13:34:53 +00:00
Heejin Ahn
e0f8b9bfc6 [WebAssembly] Add null streamer support
Summary: Now `llc -filetype=null` works.

Reviewers: eush

Subscribers: dschuff, jgravelle-google, sbc100, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54660

llvm-svn: 347155
2018-11-18 11:58:47 +00:00
Craig Topper
cd94a7c227 [X86] Add -x86-experimental-vector-widening-legalization check to combineSelect and combineSetCC to cover vXi16/vXi8 promotion without BWI.
I don't yet have any test cases for this, but its the right thing to do based on log file inspection.

llvm-svn: 347151
2018-11-18 08:30:09 +00:00
Craig Topper
b03f80a21c [X86] Rename WidenMaskArithmetic->PromoteMaskArithmetic since we usually use widen to refer to adding elements not making elements larger. NFC
llvm-svn: 347150
2018-11-18 07:35:08 +00:00
Craig Topper
f56a57518d [X86] Don't use a pmaddwd for vXi32 multiply if the inputs are zero extends from i8 or smaller without SSE4.1. Prefer to shrink the mul instead.
The zero extend will require two stages of unpacks to implement. So its better to shrink the multiply using pmullw and then extend that result back to v4i32 using a single unpack.

llvm-svn: 347149
2018-11-18 05:53:21 +00:00
Craig Topper
0438d791fa [X86] Add support for matching PACKUSWB from a v64i8 shuffle.
llvm-svn: 347143
2018-11-17 18:54:43 +00:00
Craig Topper
dd61f11642 [X86] Don't extend v32i8 multiplies to v32i16 with avx512bw and prefer-vector-width=256.
llvm-svn: 347131
2018-11-17 02:36:07 +00:00
Craig Topper
b05ea28f1f [X86] Use getUnpackl/getUnpackh instead of hardcoding a shuffle mask.
llvm-svn: 347127
2018-11-17 02:18:12 +00:00
Fangrui Song
7570932977 Use llvm::copy. NFC
llvm-svn: 347126
2018-11-17 01:44:25 +00:00
Craig Topper
ee0333b4a9 [X86] Add custom promotion of narrow fp_to_uint/fp_to_sint operations under -x86-experimental-vector-widening-legalization.
This tries to force the result type to vXi32 followed by a truncate. This can help avoid scalarization that would otherwise occur.

There's some annoying examples of an avx512 truncate instruction followed by a packus where we should really be able to just use one truncate. But overall this is still a net improvement.

llvm-svn: 347105
2018-11-16 22:53:00 +00:00
Craig Topper
87bc07b3dd [X86] Qualify part of the masked gather handling in ReplaceNodeResults with a getTypeAction call to know if we can use default legalization.
If we managed to switch to -x86-experimental-vector-widening-legalization this block can be removed.

llvm-svn: 347100
2018-11-16 22:04:29 +00:00
Craig Topper
567aaeb40d [X86] Remove a branch on SSE4.1 from LowerLoad
We should be able to use getExtendInVec with or without sse4.1 to produce a SIGN_EXTEND_VECTOR_INREG.

llvm-svn: 347095
2018-11-16 21:05:00 +00:00
Craig Topper
7fff9a9aef [X86] In LowerLoad, fix assert messages and rename a variable that use Zize instead of Size. NFC
llvm-svn: 347093
2018-11-16 21:04:56 +00:00
Peter Collingbourne
527024469a AArch64: Emit a call frame instruction for the shadow call stack register.
When unwinding past a function that uses shadow call stack, we must
subtract 8 from the value of the x18 register. This patch causes us
to emit a call frame instruction that causes that to happen.

Differential Revision: https://reviews.llvm.org/D54609

llvm-svn: 347089
2018-11-16 20:08:54 +00:00
Anton Korobeynikov
e5cb1c35b4 [MSP430] Add RTLIB::[SRL/SRA/SHL]_I32 lowering to EABI lib calls
Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D54626

llvm-svn: 347080
2018-11-16 19:36:15 +00:00
Rong Xu
3a38175723 [X86] Disable Condbr_merge pass
Disable Condbr_merge pass for now due to PR39658.
Will reenable the pass once the bug is fixed.

llvm-svn: 347079
2018-11-16 19:35:00 +00:00
Stefan Pintilie
9004444d81 Revert "[PowerPC] Make no-PIC default to match GCC - LLVM"
This reverts commit r347069

llvm-svn: 347076
2018-11-16 19:24:23 +00:00
Anton Korobeynikov
883c70959d [MSP430] Use R_MSP430_16_BYTE type for FK_Data_2 fixup
Linker fails to link example like this (simplified case from newlib
sources):

$ cat test.c

extern const char _ctype_b[];
struct _t { char *ptr; };
struct _t T = { ((char *) _ctype_b + 3) };
$ cat ctype.c

char _ctype_b[4] = { 0, 0, 0, 0 };
LD: test.o:(.data+0x0): warning: internal error: unsupported relocation error

We also follow gnu toolchain here, where 2-byte relocation mapped to
R_MSP430_16_BYTE, instead of R_MSP430_16.

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D54620

llvm-svn: 347074
2018-11-16 19:20:51 +00:00
Sam Clegg
74f5fd4e32 [WebAssembly] Default to static reloc model
Differential Revision: https://reviews.llvm.org/D54637

llvm-svn: 347073
2018-11-16 18:59:51 +00:00
Stefan Pintilie
046eff502f [PowerPC] Make no-PIC default to match GCC - LLVM
Set -fno-PIC as the default option.

Differential Revision: https://reviews.llvm.org/D53383

llvm-svn: 347069
2018-11-16 18:36:21 +00:00
Simon Pilgrim
66f42ea6e1 [SelectionDAG] Move (repeated) SDTIntShiftDOp double shift node def to common code. NFCI.
Prep work for PR39467.

llvm-svn: 347067
2018-11-16 17:50:59 +00:00
Simon Pilgrim
bcd6631a2a [X86][SSE] Move number of input limit out of resolveTargetShuffleInputs.
Only combineX86ShufflesRecursively needs this limit.

llvm-svn: 347054
2018-11-16 15:01:05 +00:00
Roman Lebedev
90c5b3f78e [X86] X86DAGToDAGISel::matchBitExtract(): extract 'lshr' from X
Summary:
As discussed in previous review, and noted in the FIXME, if `X` is actually an `lshr Y, Z` (logical!),
we can fold the `Z` into 'control`, and let the `BEXTR` do this too.
We could just insert those 8 bits of shift amount into control,
but it is better to instead zero-extend them, and 'or' them in place.

We can only do this for `lshr`, not `ashr`, because we do not know that the mask cover only the bits of `Y`,
and not any of the sign-extended bits.

The obvious question is, is this actually legal to do?
I believe it is. Relevant quotes, from `Intel® 64 and IA-32 Architectures Software Developer’s Manual`, `BEXTR — Bit Field Extract`:
* `Bit 7:0 of the second source operand specifies the starting bit position of bit extraction.`
* `A START value exceeding the operand size will not extract any bits from the second source operand.`
* `Only bit positions up to (OperandSize -1) of the first source operand are extracted.`
* `All higher order bits in the destination operand (starting at bit position LENGTH) are zeroed.`
* `The destination register is cleared if no bits are extracted.`

FIXME: if we can do this, i wonder if we should prefer `BEXTR` over `BZHI` in such cases.

Reviewers: RKSimon, craig.topper, spatel, andreadb

Reviewed By: RKSimon, craig.topper, andreadb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54095

llvm-svn: 347048
2018-11-16 13:04:54 +00:00
Alex Bradbury
b4a64cede8 [RISCV][NFC] Define and use the new CA instruction format
The RISC-V ISA manual was updated on 2018-11-07 (commit 00557c3) to define a 
new compressed instruction format, RVC format CA (no actual instruction 
encodings were changed). This patch updates the RISC-V backend to define the 
new format, and to use it in the relevant instructions.

Differential Revision: https://reviews.llvm.org/D54302
Patch by Luís Marques.

llvm-svn: 347043
2018-11-16 10:33:23 +00:00
Alex Bradbury
2146e8fb1e [RISCV] Constant materialisation for RV64I
This commit introduces support for materialising 64-bit constants for RV64I,
making use of the RISCVMatInt::generateInstSeq helper in order to share logic
for immediate materialisation with the MC layer (where it's used for the li
pseudoinstruction).

test/CodeGen/RISCV/imm.ll is updated to test RV64, and gains new 64-bit
constant tests. It would be preferable if anyext constant returns were sign
rather than zero extended (see PR39092). This patch simply adds an explicit
signext to the returns in imm.ll.

Further optimisations for constant materialisation are possible, most notably
for mask-like values which can be generated my loading -1 and shifting right.
A future patch will standardise on the C++ codepath for immediate selection on
RV32 as well as RV64, and then add further such optimisations to
RISCVMatInt::generateInstSeq in order to benefit both RV32 and RV64 for
codegen and li expansion.

Differential Revision: https://reviews.llvm.org/D52962

llvm-svn: 347042
2018-11-16 10:14:16 +00:00
Anton Korobeynikov
411773d227 [MSP430] Add support for .refsym directive
Introduces support for '.refsym' assembler directive.

From GCC docs (for MSP430):
'.refsym' - This directive instructs assembler to add an undefined reference
to the symbol following the directive. No relocation is created for this symbol;
it will exist purely for pulling in object files from archives.

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D54618

llvm-svn: 347041
2018-11-16 09:50:24 +00:00
Craig Topper
079c37da58 [X86] Add custom type legalization for v2i8/v4i8/v8i8 mul under -x86-experimental-vector-widening.
By early promoting the multiply to use an i16 element type we can avoid op legalization emit a second multiply for the 8 upper elements of the v16i8 type we would otherwise get.

llvm-svn: 347032
2018-11-16 06:15:21 +00:00
Matt Arsenault
eabb8dd015 AMDGPU: Fix analyzeBranch failing with pseudoterminators
If a block had one of the _term instructions used for gluing
exec modifying instructions to the end of the block,
analyzeBranch would fail, preventing the verifier from catching
a broken successor list.

llvm-svn: 347027
2018-11-16 05:03:02 +00:00
Craig Topper
5802b82b40 [X86] Use ANY_EXTEND instead of SIGN_EXTEND in the AVX2 and later path for legalizing vXi8 multiply.
We aren't going to use the upper bits of the multiply result that the extend would effect. So we don't need a specific type of extend.

This makes some reduction test cases shorter because we were previously trying to sign_extend a truncate which we can't eliminate.

llvm-svn: 347011
2018-11-16 01:16:59 +00:00
Craig Topper
1acafd863f [X86] Update a couple comments to remove a mention of a sign extending that no longer happens. NFC
llvm-svn: 347010
2018-11-16 01:16:51 +00:00
Ron Lieberman
cac749ac88 [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST
Add a pass to fixup various vector ISel issues.
Currently we handle converting GLOBAL_{LOAD|STORE}_*
and GLOBAL_Atomic_* instructions into their _SADDR variants.
This involves feeding the sreg into the saddr field of the new instruction.

llvm-svn: 347008
2018-11-16 01:13:34 +00:00
Heejin Ahn
095796a391 [WebAssembly] Split BBs after throw instructions
Summary:
`throw` instruction is a terminator in wasm, but BBs were not splitted
after `throw` instructions, causing machine instruction verifier to
fail.

This patch
- Splits BBs after `throw` instructions in WasmEHPrepare and adding an
  unreachable instruction after `throw`, which will be deleted in
  LateEHPrepare pass
- Refactors WasmEHPrepare into two member functions
- Changes the semantics of `eraseBBsAndChildren` in LateEHPrepare pass
  to match that of WasmEHPrepare pass, which is newly added. Now
  `eraseBBsAndChildren` does not delete BBs with remaining predecessors.
- Fixes style nits, making static function names conform to clang-tidy
- Re-enables the test temporarily disabled by rL346840 && rL346845

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54571

llvm-svn: 347003
2018-11-16 00:47:18 +00:00
Ron Lieberman
2f5683e6b0 [AMDGPU] NFC Test commit
llvm-svn: 347002
2018-11-16 00:46:51 +00:00
Konstantin Zhuravlyov
af7b5d7092 AMDHSA: More code object v3 fixes:
- Make sure IsaInfo::hasCodeObjectV3 returns true only
    for AMDHSA
  - Update assembler metadata tests to use v2 by default

llvm-svn: 347001
2018-11-15 23:14:23 +00:00
Craig Topper
22bfa99448 [X86] Remove ANY_EXTEND special case from canReduceVMulWidth
Removing this code doesn't affect any lit tests so it doesn't appear to be tested anymore. I assume it was when it was added, but I guess something else changed? Code coverage report also says its unused.

I mostly didn't like that it seemed to count the sign bits as if it was a sign_extend, but then set isPositive as if it was a zero_extend. It feels like we should have picked one interpretation?

Differential Revision: https://reviews.llvm.org/D54596

llvm-svn: 346995
2018-11-15 21:19:32 +00:00
Craig Topper
b144c7a6fb [X86] Minor cleanup to getExtendInVec. NFCI
Use unsigned to calculate the subvector index to avoid a cast.

Remove an unnecessary condition and replace it with a stronger assert.

Use the InVT variable we updated when we extracted instead of grabbing it from the In SDValue.

llvm-svn: 346983
2018-11-15 19:20:22 +00:00
Craig Topper
73bb04ab6f [X86] Add -x86-experimental-vector-widening support to reduceVMULWidth and combineMulToPMADDWD
In reduceVMULWidth, we no longer need to worry about extending the vector to 128 bits first. Regular widening of extends, muls and shuffles will take care of that for us.

In combineMulToPMADDWD, we can handle v2i32 multiplies and allow the VPMADDWD to be widened to v4i32 during type legalization by adding custom widening like we do have for AVG/ADDUS/SUBUS. I had to modify that code a little to allow different and output VTs.

Differential Revision: https://reviews.llvm.org/D54512

llvm-svn: 346980
2018-11-15 18:59:31 +00:00
Thomas Lively
fc3163b67a [WebAssembly] Fix return type of nextByte
Summary:
The old return type did not allow for correct error reporting and was
causing a compiler warning.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54586

llvm-svn: 346979
2018-11-15 18:56:49 +00:00
Simon Pilgrim
0db8cb0147 [X86] Fix MCNullStreamer support for modules with a CodeView flag
This fixes -filetype=null support when compiling for a Win32 target and the module has a CodeView flag.

The only places changed are the uses of getTargetStreamer function - this patch guards both of them with null checks.

Committed on behalf of @eush (Eugene Sharygin)

Differential Revision: https://reviews.llvm.org/D54008

llvm-svn: 346962
2018-11-15 15:17:15 +00:00
Alex Bradbury
f809d89980 [RISCV] Mark C.EBREAK instruction as having side effects
C.EBREAK was defined with hasSideEffects = 0, which is incorrect and 
inconsistent with the non-compressed instruction form. This patch corrects 
this oversight.

This wouldn't cause codegen issues, as compressed instructions are only ever 
generated by converting the non-compressed form as an MCInst. But having 
correct flags is still worthwhile.

Differential Revision: https://reviews.llvm.org/D54256
Patch by Luís Marques.

llvm-svn: 346959
2018-11-15 14:52:24 +00:00
Alex Bradbury
7727240438 [RISCV] Mark FREM as Expand
Mark the FREM SelectionDAG node as Expand, which is necessary in order to 
support the frem IR instruction on RISC-V. This is expanded into a library 
call. Adds the corresponding test. Previously, this would have triggered an 
assertion at instruction selection time.

Differential Revision: https://reviews.llvm.org/D54159
Patch by Luís Marques.

llvm-svn: 346958
2018-11-15 14:46:11 +00:00
Anton Korobeynikov
f0001f4186 Add missed files from prev. commit
llvm-svn: 346949
2018-11-15 12:35:04 +00:00
Anton Korobeynikov
49045c6a0d [MSP430] Add MC layer
Reapply r346374 with the fixes for modules build.

Original summary:

This change implements assembler parser, code emitter, ELF object writer
and disassembler for the MSP430 ISA.  Also, more instruction forms are added
to the target description.

Patch by Michael Skvortsov!

llvm-svn: 346948
2018-11-15 12:29:43 +00:00
Alex Bradbury
22c091fc3c [RISCV] Introduce the RISCVMatInt::generateInstSeq helper
Logic to load 32-bit and 64-bit immediates is currently present in
RISCVAsmParser::emitLoadImm in order to support the li pseudoinstruction. With
the introduction of RV64 codegen, there is a greater benefit of sharing
immediate materialisation logic between the MC layer and codegen. The
generateInstSeq helper allows this by producing a vector of simple structs
representing the chosen instructions. This can then be consumed in the MC
layer to produce MCInsts or at instruction selection time to produce
appropriate SelectionDAG node. Sharing this logic means that both the li
pseudoinstruction and codegen can benefit from future optimisations, and
that this logic can be used for materialising constants during RV64 codegen.

This patch does contain a behaviour change: addi will now be produced on RV64
when no lui is necessary to materialise the constant. In that case addiw takes
x0 as the source register, so is semantically identical to addi.

Differential Revision: https://reviews.llvm.org/D52961

llvm-svn: 346937
2018-11-15 10:11:31 +00:00
Craig Topper
553ac560aa [X86] Add some custom type legalization rules for truncate with -x86-experimental-vector-widening-legalization.
This avoids some nasty shuffles when we have avx512. It will also prevent using zmm truncate instructions when a ymm instruction that zeroes part of an xmm register will do. Also avoid using avx512 truncate instructions when the input is 128 bits or less. These instructions are 2 uops on skx so we can probably find a better single uop shuffle like pshufb.

llvm-svn: 346936
2018-11-15 08:23:40 +00:00
Thomas Lively
77b33c86f5 [WebAssembly] Renumber SIMD bitwise instructions
Summary: Changed to match https://github.com/WebAssembly/simd/pull/54.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54561

llvm-svn: 346931
2018-11-15 03:38:59 +00:00
Konstantin Zhuravlyov
a25e0524c0 AMDGPU: Enable code object v3 for AMDHSA only
Differential Revision: https://reviews.llvm.org/D54186

llvm-svn: 346923
2018-11-15 02:32:43 +00:00
Craig Topper
ea6ced9d1a [X86] Don't mark SEXTLOADS with narrow types as Custom with -x86-experimental-vector-widening-legalization.
The narrow types end up requesting widening, but generic legalization will end up scalaring and using a build_vector to do the widening.

llvm-svn: 346916
2018-11-15 00:21:41 +00:00
Benjamin Kramer
6b7d6fe079 [X86] Remove unused variable
llvm-svn: 346909
2018-11-14 23:13:27 +00:00