Commit Graph

50329 Commits

Author SHA1 Message Date
Oliver Stannard
173bc2bb7f [ARM][AsmParser] Improve debug printing of parsed asm operands
In ARMOperand::print:
- Print human-readable register names, instead of numbers.
- Print the correct names for IT condition masks (these were in the wrong order
  before).
- Print all parts of memory operands, not just the base register.

This makes the output of llvm-mc -show-inst-operands more readable.

Differential revision: https://reviews.llvm.org/D54850

llvm-svn: 347494
2018-11-23 14:27:21 +00:00
Luke Cheeseman
d6dbd64104 Revert r343341
- Cannot reproduce the build failure locally and the build logs have
  been deleted.

llvm-svn: 347490
2018-11-23 11:01:47 +00:00
John Brawn
d6e0ebea10 [AArch64] Fix SelectionDAG infinite loop for v1i64 SCALAR_TO_VECTOR
A consequence of r347274 is that SCALAR_TO_VECTOR can be converted into
BUILD_VECTOR by SimplifyDemandedBits, but LowerBUILD_VECTOR can turn
BUILD_VECTOR into SCALAR_TO_VECTOR so we get an infinite loop.

Fix this by making LowerBUILD_VECTOR not do this transformation for those
vectors that would get transformed back, i.e. BUILD_VECTOR of a single-element
constant vector. Doing that means we get a DUP, which we then need to recognise
in ISel as a copy.

llvm-svn: 347456
2018-11-22 11:45:23 +00:00
Jonas Paulsson
96782c2c0b [SystemZTTIImpl] Give correct cost values for vector bswap intrinsics.
Implement getIntrinsicInstrCost() and return costs reflecting that bswap can
be done with a vperm per vector register.

Review: Ulrich Weigand
https://reviews.llvm.org/D54789

llvm-svn: 347445
2018-11-22 07:17:29 +00:00
Stefan Pintilie
9d6445d34c [PowerPC][NFC] Split PPCMCCodeEmitter into header and cpp file.
This is further cleanup for PPCMCCodeEmitter. The class had been contained
within the cpp file alone. Now it has been split up between a header file and
a cpp file which allows other classes to make use of the functions in this class
if required.

llvm-svn: 347428
2018-11-21 21:23:50 +00:00
Stefan Pintilie
46e3cd76e2 [PowerPC][NFC] Minor Code Cleaup for PPCMCCodeEmitter.
llvm-svn: 347422
2018-11-21 20:47:59 +00:00
Sanjay Patel
cadf62f360 [x86] fix predicate for avoiding vblendv
It only makes sense to produce the logic ops when 1 of the
constants is +0.0. Otherwise, go with vblendv to reduce code.

llvm-svn: 347403
2018-11-21 18:02:50 +00:00
Vladimir Stefanovic
64ad1cf24b [mips][mc] Add basic support for R_MIPS_JALR/R_MICROMIPS_JALR
R_MIPS_JALR/R_MICROMIPS_JALR can now be parsed in .s files and emitted to .o.
They are still not generated with JALR.

Differential revision: https://reviews.llvm.org/D54721

llvm-svn: 347398
2018-11-21 16:38:34 +00:00
Michal Gorny
71e902101d [nios2] Add missing Nios2CodeGen -> Nios2AsmPrinter linkage
Add missing linkage from Nios2CodeGen library to Nios2AsmPrinter
library.  The missing dependency causes shared-lib build to fail with
the following reason:

  lib/Target/Nios2/CMakeFiles/LLVMNios2CodeGen.dir/Nios2AsmPrinter.cpp.o: In function `(anonymous namespace)::Nios2AsmPrinter::PrintAsmMemoryOperand(llvm::MachineInstr const*, unsigned int, unsigned int, char const*, llvm::raw_ostream&)':
  Nios2AsmPrinter.cpp:(.text._ZN12_GLOBAL__N_115Nios2AsmPrinter21PrintAsmMemoryOperandEPKN4llvm12MachineInstrEjjPKcRNS1_11raw_ostreamE+0x2b): undefined reference to `llvm::Nios2InstPrinter::getRegisterName(unsigned int)'
  lib/Target/Nios2/CMakeFiles/LLVMNios2CodeGen.dir/Nios2AsmPrinter.cpp.o: In function `(anonymous namespace)::Nios2AsmPrinter::PrintAsmOperand(llvm::MachineInstr const*, unsigned int, unsigned int, char const*, llvm::raw_ostream&)':
  Nios2AsmPrinter.cpp:(.text._ZN12_GLOBAL__N_115Nios2AsmPrinter15PrintAsmOperandEPKN4llvm12MachineInstrEjjPKcRNS1_11raw_ostreamE+0x97): undefined reference to `llvm::Nios2InstPrinter::getRegisterName(unsigned int)'
  collect2: error: ld returned 1 exit status

Differential Revision: https://reviews.llvm.org/D47810

llvm-svn: 347387
2018-11-21 11:25:01 +00:00
Simon Pilgrim
66bae9aee8 [X86][AVX] Remove BROADCAST if we only need the 0'th element
We don't catch this with target shuffle simplification if the src/dst types are different.

llvm-svn: 347386
2018-11-21 11:00:09 +00:00
Craig Topper
e9b4001a82 [X86] In getScalarMaskingNode, replace scalar_to_vector with a bitcast to v8i1 and an extract_subvector to convert i8 to v1i1.
The bitcast can be nicely merged with any i8 loads that exist for argument passing in 32 mode for example.

llvm-svn: 347380
2018-11-21 07:01:22 +00:00
Nemanja Ivanovic
5cf902ccd4 [PowerPC] Do not use vectors to codegen bswap with Altivec turned off
We have efficient codegen on P9 for lowering bswap that involves moving
the value into a vector reg and moving it back. However, the check under
which we custom lowered it did not adequately reflect the actual requirements.
It required only that the subtarget be an implementation of ISA 3.0 since all
compliant implementations have to provide the vector instructions.
However, the kernel builds have a valid use case for -mno-altivec -mcpu=pwr9
(i.e. don't emit vector code, don't have to save vector regs for context
switch). So we should require the correct features for this lowering.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39334

llvm-svn: 347376
2018-11-21 02:53:50 +00:00
Craig Topper
27a5896fe8 [X86] Correct 256 vpmovzx/vpmovsx isel patterns to check HasAVX2 instead of HasAVX to prevent fast-isel from using them incorrectly.
These are AVX2 instructions, but have been incorrectly marked in tablegen for a while. This wasn't a problem until r346784 switched the patterns to use target independent ISD opcodes. This made the patterns visible to fast isel.

Fixes PR39733

llvm-svn: 347375
2018-11-21 01:39:38 +00:00
Craig Topper
aa52ee2770 [X86] Emit a PACKUS instead of a VECTOR_SHUFFLE from LowerTRUNCATE for v16i16->v16i8.
We can't guarantee that demanded bits passing through the vector shuffle won't cause the AND in front of this to be removed. This would prevent the PACKUS from being matched during shuffle lowering.

Unfortunately, this adds a packuswb to one of the vector-reduce-mul.ll tests since we were removing the shuffle via SimplifyDemandedVectorElts. We appear to have similar issues with vpmovwb on the same test case on other targets.

llvm-svn: 347361
2018-11-20 22:57:48 +00:00
Craig Topper
24b346da42 [X86] Emit a single shuffle for the v16i8->v4i32 step of a SIGN_EXTEND_VECTOR_INREG lowering on pre-sse4.1 targets.
Previously we emitted to separate shuffles, one for unpcklbw and one for unpcklwd. Instead emit a single shuffle equivalent to both of the original shuffles. Shuffle lowering seems able to handle it. This avoids a bitcast between the two shuffles which seems helpful to DAG combine.

Remove the custom type legalization for v8i8->v8i32. I had put that in to avoid some almost duplicate punpcklbw instructions I was seeing, but this lowering change seems to fix that. It also fixes some duplicate shuffles seen in vector-sext.ll

llvm-svn: 347348
2018-11-20 21:21:52 +00:00
Sam Clegg
4791a668f5 [WebAssembly] WebAssemblyLowerEmscriptenEHSjLj: use getter/setter for accessing tempRet0
Rather than assuming that `tempRet0` exists in linear memory only assume
the getter/setter functions exist.  This avoids conflicting with
binaryen which declares a wasm global for this purpose and defines it's
own getter and setter for that.

The other advantage of doing things this way is that it leaving
it up to the linker/finalizer to decide how to actually store this
temporary.  As it happens binaryen uses a wasm global which is more
appropriate since it is thread safe.

This also allows us to change the way this is stored in the future
(memory, TLS memory, wasm global) without modifying LLVM.

This is part of a 4 part change:
LLVM: https://reviews.llvm.org/D53240
fastcomp: https://github.com/kripken/emscripten-fastcomp/pull/237
emscripten: https://github.com/kripken/emscripten/pull/7358
binaryen: https://github.com/WebAssembly/binaryen/pull/1709

Differential Revision: https://reviews.llvm.org/D53240

llvm-svn: 347340
2018-11-20 19:25:07 +00:00
Jinsong Ji
9a0ed20072 [PowerPC] Add Itineraries for STWU/STWUX etc
When doing some instruction scheduling work, we noticed some missing itineraries.

Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling, 
because we can still get same latency due to default values.

With machine scheduler, however, itineraries will have impact to scheduling.
eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class.
And most of the instruction class with itineraries will have NumMicroOps default to 1.

This will has impact on the count of RetiredMOps, affects the Pending/Available Queue, 
then causing different scheduling or suboptimal scheduling further.

This patch is for STWU/STWUX (IIC_LdStStoreUpd ) for P8.

Since there are already multiple IIC for store update, this patch also merge
IIC_LdStSTDU/IIC_LdStStoreUpd to IIC_LdStSTU
IIC_LdStSTDUX to IIC_LdStSTUX

and we add a new testcase in https://reviews.llvm.org/D54699 to show the difference.

Differential Revision: https://reviews.llvm.org/D54700

llvm-svn: 347311
2018-11-20 15:11:42 +00:00
Simon Pilgrim
c9cc6cca42 Fix MSVC 'truncation of constant value' warning. NFCI.
llvm-svn: 347308
2018-11-20 14:29:40 +00:00
Simon Pilgrim
ee8b96f253 [X86][SSE] Add computeKnownBits/ComputeNumSignBits support for PACKSS/PACKUS instructions.
Pull out getPackDemandedElts demanded elts remapping helper from computeKnownBitsForTargetNode and use in computeKnownBits/ComputeNumSignBits.

llvm-svn: 347303
2018-11-20 13:23:37 +00:00
Simon Pilgrim
ed7e2fda18 [X86][SSE] XFormVExtractWithShuffleIntoLoad - getVectorShuffle won't accept SM_SentinelZero
Noticed while working on improving demanded elts target shuffle shuffle combining

llvm-svn: 347302
2018-11-20 12:17:50 +00:00
Simon Pilgrim
a6fb85ffa7 [X86][SSE] Lower immediately to PACKUS instead of VECTOR_SHUFFLE.
As discussed on rL347240, this avoids some regressions on D54679 and also helps some combines to kick in a bit earlier.

llvm-svn: 347300
2018-11-20 11:46:37 +00:00
Simon Pilgrim
7198506ba8 [X86][SSE] Add SimplifyDemandedVectorElts support for PACKSS/PACKUS instructions.
As discussed on rL347240.

llvm-svn: 347299
2018-11-20 11:09:46 +00:00
Craig Topper
17fa42a69b [X86] Preserve undef information when creating a punpckl/hbw from a v16i8 where all the even or odd elements are undef.
Previously if V2 was unused we ended up using V1 for both inputs as part of the code that follows the new code. By using lowerVectorShuffleWithUNPCK we keep the undef nature of V2 in the output.

As near as I can tell this makes v16i8 behavior consistent with every other VT now.

This does mean that we give the register allocator freedom to fill in random registers now and create false dependencies. But like I said we're already doing that for other types.

llvm-svn: 347296
2018-11-20 09:04:01 +00:00
Craig Topper
b06d1aa3a1 [X86] Add custom type legalization for v8i8->v8i32 sign extend pre-SSE4.1
This helps with a future patch and makes us less reliant on DAG combine merging shuffles.

llvm-svn: 347295
2018-11-20 09:03:58 +00:00
Craig Topper
c733c7bf94 [X86] Replace more calls to getZeroVector with regular getConstant.
getZeroVector produces a specifically canonicalized zero vector, but we can just let DAG legalization take care of it.

The test changes are because MULH lowering happens later than it should and this change gave us the opportunity to constant fold away a multiply during a DAG combine before the build_vector got legalized with a bitcast.

llvm-svn: 347290
2018-11-20 06:54:01 +00:00
Nemanja Ivanovic
9b393909e2 [PowerPC] Don't combine to bswap store on 1-byte truncating store
Turns out that there was no check for a store that truncates down
to a single byte when combining a (store (bswap...)) into a byte-swapping
store. This patch just adds that check.

Fixes https://bugs.llvm.org/show_bug.cgi?id=39478.

llvm-svn: 347288
2018-11-20 04:42:31 +00:00
Craig Topper
808d0dd689 [X86] Rename combineVSZext->combineExtendVectorInreg. NFC
Now that we no longer have target specific vector extend nodes let's make the function name match the nodes we do use.

llvm-svn: 347268
2018-11-19 22:18:47 +00:00
Konstantin Zhuravlyov
700b1ef54d AMDGPU: Fix V_FMA_F16 selection on GFX9
GFX9 should select opsel version.

Differential Revision: https://reviews.llvm.org/D54545

llvm-svn: 347265
2018-11-19 21:10:16 +00:00
Stanislav Mekhanoshin
8bafbae889 [AMDGPU] Restored selection of scalar_to_vector (v2x16)
This works if DAG combiner is enabled, but without combining
we cannot select scalar_to_vector of <2 x half> and <2 x i16>.

Differential Revision: https://reviews.llvm.org/D54718

llvm-svn: 347259
2018-11-19 19:58:13 +00:00
Craig Topper
a5e0380c30 [X86][CostModel] Don't lookup intrinsic cost tables if the intrinsic isn't one we care about
We're seeing some issues internally where we sent some intrinsics into the cost model that the getTypeLegalizationCost call fails on, but X86 specific tables don't care about. Our base class implementation takes care of them. We'd just like X86 backend to ignore them.

This patch makes sure the switch returned something X86 cares about and skips the table lookups and type legalization call if not. Probably more efficient too since we don't go scanning the tables for every intrinsic we could possibly see.

Differential Revision: https://reviews.llvm.org/D54711

llvm-svn: 347248
2018-11-19 18:57:31 +00:00
Simon Pilgrim
c4861ab170 [X86][SSE] Remove unnecessary bit-and in pshufb vector ctlz (PR39703)
SSE PSHUFB vector ctlz lowering works at the i4 nibble level. As detailed in PR39703, we were masking the lower nibble off but we only actually use it in the case where the upper nibble is known to be zero, making it safe to remove the mask and save an instruction.

Differential Revision: https://reviews.llvm.org/D54707

llvm-svn: 347242
2018-11-19 18:40:59 +00:00
Craig Topper
311bbcd535 [X86] Attempt to improve v32i8/v64i8 multiply lowering by applying the v16i8 non-avx2 algorithm to each 128-bit lane.
Previously we split the vectors in half to allow the two halves to be any extended then concatenated the results back together.

This patch instead instead extends the v16i8 sse algorithm to extend half of each 128-bit lane using punpcklbw/punpckhbw. Multiplies all the low half lanes and high half lanes together in separate operations. Then merges the half lane results back together using packuswb.

Unfortunately, some of the cases in vector-reduce-mul.ll regress because we aren't narrowing the vector width of the multiplies as we reduce. The splitting was somewhat making up for that before by causing halves to be discarded after the split.

Differential Revision: https://reviews.llvm.org/D54668

llvm-svn: 347240
2018-11-19 18:32:53 +00:00
Fangrui Song
d83a5526d5 [AMDGPU] Fix -Wunused-variable
llvm-svn: 347234
2018-11-19 17:54:27 +00:00
Stanislav Mekhanoshin
054f8101f1 [AMDGPU] Convert insert_vector_elt into set of selects
This allows to avoid scratch use or indirect VGPR addressing for
small vectors.

Differential Revision: https://reviews.llvm.org/D54606

llvm-svn: 347231
2018-11-19 17:39:20 +00:00
Wouter van Oortmerssen
49482f824a [WebAssembly] replaced .param/.result by .functype
Summary:
This makes it easier/cleaner to generate a single signature from
this directive. Also:
- Adds the symbol name, such that we don't depend on the location
  of this directive anymore.
- Actually constructs the signature in the assembler, and make the
  assembler own it.
- Refactor the use of MVT vs ValType in the streamer and assembler
  to require less conversions overall.
- Changed 700 or so tests to use it.

Reviewers: sbc100, dschuff

Subscribers: jgravelle-google, eraman, aheejin, sunfish, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D54652

llvm-svn: 347228
2018-11-19 17:10:36 +00:00
David Stuttard
be3d7ba9fb [AMDGPU] Derive GCNSubtarget from MF to get overridden target features
Summary:
AMDGPUAsmPrinter has a getSTI function that derives a GCNSubtarget from the
TM. However, this means that overridden target features are not detected and can
result in incorrect behaviour.

Switch to using STM which is a GCNSubtarget derived from the MF (used elsewhere
in the same function).

Change-Id: Ib6328ad667b7fcdc87e9c06344e59859207db9b0

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D54301

llvm-svn: 347221
2018-11-19 15:44:20 +00:00
Martin Elshuber
fef3036d37 Subject: [PATCH] [CodeGen] Add pass to combine interleaved loads.
This patch defines an interleaved-load-combine pass. The pass searches
for ShuffleVector instructions that represent interleaved loads. Matches are
converted such that they will be captured by the InterleavedAccessPass.

The pass extends LLVMs capabilities to use target specific instruction
selection of interleaved load patterns (e.g.: ld4 on Aarch64
architectures).

Differential Revision: https://reviews.llvm.org/D52653

llvm-svn: 347208
2018-11-19 14:26:10 +00:00
Nicolai Haehnle
c548d91419 AMDGPU/InsertWaitcnts: Some more const-correctness
Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54225

llvm-svn: 347192
2018-11-19 12:03:11 +00:00
Sam Parker
e7c42dd7e2 [ARM] Remove trunc sinks in ARM CGP
Truncs are treated as sources if their produce a value of the same
type as the one we currently trying to promote. Truncs used to be
considered as a sink if their operand was the same value type.
    
We now allow smaller types in the search, so we should search through
truncs that produce a smaller value. These truncs can then be
converted to an AND mask.
    
This leaves sinks as being:
  - points where the value in the register is being observed, such as
    an icmp, switch or store.
  - points where value types have to match, such as calls and returns.
  - zext are included to ease the transformation and are generally
    removed later on.
    
During this change, it also became apart from truncating sinks was
broken: if a sink used a source, its type information had already
been lost by the time the truncation happens. So I've changed the
method of caching the type information.

Differential Revision: https://reviews.llvm.org/D54515

llvm-svn: 347191
2018-11-19 11:34:40 +00:00
Anton Korobeynikov
4df19b75c0 [MSP430] Optimize srl/sra in case of A >> (8 + N)
There is no variable-length shifts on MSP430. Therefore
"eat" 8 bits of shift via bswap & ext.

Path by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D54623

llvm-svn: 347187
2018-11-19 10:43:02 +00:00
Craig Topper
8b22bcd39f [X86] Use a pcmpgt with 0 instead of psrad 31, to fill elements with the sign bit in v4i32 MULH lowering.
The shift requires a copy to avoid clobbering a register. Comparing with 0 uses an xor to produce 0 that will be overwritten with the compare results. So still requires 2 instructions, but should be one byte shorter since it doesn't need to encode an immediate.

llvm-svn: 347185
2018-11-19 07:22:26 +00:00
Craig Topper
3616891046 [X86] Use compare with 0 to fill an element with sign bits when sign extending to v2i64 pre-sse4.1
Previously we used an arithmetic shift right by 31, but that requires a copy to preserve the input. So we might as well materialize a zero and compare to it since the comparison will overwrite the register that contains the zeros. This should be one byte shorter.

llvm-svn: 347181
2018-11-19 04:33:20 +00:00
Craig Topper
053f1eea96 [X86] Remove most of the SEXTLOAD Custom setOperationAction calls under -x86-experimental-vector-widening-legalization.
Leave just the v4i8->v4i64 and v8i8->v8i64, but only enable them on pre-sse4.1 targets when 64-bit mode is enabled. In those cases we end up creating sext loads that get scalarized to code that looks better than what we get from loading into a vector register and doing a multiple step sign extend using unpacks and shifts.

llvm-svn: 347180
2018-11-19 00:33:16 +00:00
Simon Pilgrim
7f92efa5a9 [X86][SSE] Add SimplifyDemandedVectorElts support for SSE packed i2fp conversions.
llvm-svn: 347177
2018-11-18 22:13:31 +00:00
Craig Topper
0468c860b7 [X86] Add custom type legalization for extending v4i8/v4i16->v4i64.
Pre-SSE4.1 sext_invec for v2i64 is complicated because we don't have a v2i64 sra instruction. So instead we sign extend to i32 using unpack and sra, then copy the elements and do a v4i32 sra to fill with sign bits, then interleave the i32 sign extend and the sign bits. So really we're doing to two sign extends but only using half of the v4i32 intermediate result.

When the result is more than 128 bits, default type legalization would prefer to split the destination type all the way down to v2i64 with shuffles followed by v16i8/v8i16->v2i64 sext_inreg operations. This results in more instructions than necessary because we are only utilizing the lower 2 elements of the v4i32 intermediate result. Instead we can custom split a v4i8/v4i16->v4i64 sign_extend. Then we can sign extend v4i8/v4i16->v4i32 invec producing a full v4i32 result. Create the sign bit vector as a v4i32 then split and interleave with the sign bits using an punpackldq and punpackhdq.

llvm-svn: 347176
2018-11-18 21:28:50 +00:00
Simon Pilgrim
b31bdbd2e9 [X86][SSE] Add SimplifyDemandedVectorElts support for SSE splat-vector-shifts.
SSE vector shifts only use the bottom 64-bits of the shift amount vector.

llvm-svn: 347173
2018-11-18 20:21:52 +00:00
Craig Topper
11d50948e2 [X86] Disable combineToExtendVectorInReg under -x86-experimental-vector-widening-legalization. Add custom type legalization for extends.
If we widen illegal types instead of promoting, we should be able to rely on the type legalizer to create the vector_inreg operations for us with some caveats.

This patch disables combineToExtendVectorInReg when we are using widening.

I've enabled custom legalization for v8i8->v8i64 extends under avx512f since the type legalizer would want to create a vector_inreg with a v64i8 input type which isn't legal without avx512bw. So we go to v16i8 with custom code using the relaxation of rules we get from D54346.

I've also enable custom legalization of v8i64 and v16i32 operations with with AVX. When the input type is 128 bits, the default splitting legalization would extend first 128->256, then do the a split to two 128 pieces. Extend each half to 256 and then concat the result. The custom legalization I've added instead uses a 128->256 bit vector_inreg extend that only reads the lower 64-bits for the low half of the split. Then shuffles the high 64-bits to the low 64-bits and does another vector_inreg extend.

llvm-svn: 347172
2018-11-18 18:11:25 +00:00
Craig Topper
bc8148f7b0 [X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an extract_subvector, and a packuswb instruction.
Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54671

llvm-svn: 347171
2018-11-18 17:59:28 +00:00
Simon Pilgrim
ec808cf541 Remove unused variable. NFCI.
llvm-svn: 347169
2018-11-18 17:24:59 +00:00
Simon Pilgrim
50828c75d0 [X86][SSE] Split IsSplatValue into GetSplatValue and IsSplatVector
Refactor towards making this recursive (necessary for PR38243 rotation splat detection).
IsSplatVector returns the original vector source of the splat and the splat index.
GetSplatValue returns the scalar splatted value as an extraction from IsSplatVector.

llvm-svn: 347168
2018-11-18 17:15:06 +00:00
Simon Pilgrim
fec9f8657b [X86][SSE] Relax IsSplatValue - remove the 'variable shift' limit on subtracts.
Means we don't use the per-lane-shifts as much when we can cheaply use the older splat-variable-shifts.

llvm-svn: 347162
2018-11-18 15:52:08 +00:00
Simon Pilgrim
cc1f5d2407 [X86][SSE] Use raw shuffle mask decode in SimplifyDemandedVectorEltsForTargetNode (PR39549)
We were using the 'normalized' shuffle mask from resolveTargetShuffleInputs, which replaces zero/undef inputs with sentinel values. For SimplifyDemandedVectorElts we need the raw mask so we can correctly demand those 'zero' inputs that got normalized away, this requires an extra bit of logic to locally normalize undef inputs.

llvm-svn: 347158
2018-11-18 13:34:53 +00:00
Heejin Ahn
e0f8b9bfc6 [WebAssembly] Add null streamer support
Summary: Now `llc -filetype=null` works.

Reviewers: eush

Subscribers: dschuff, jgravelle-google, sbc100, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54660

llvm-svn: 347155
2018-11-18 11:58:47 +00:00
Craig Topper
cd94a7c227 [X86] Add -x86-experimental-vector-widening-legalization check to combineSelect and combineSetCC to cover vXi16/vXi8 promotion without BWI.
I don't yet have any test cases for this, but its the right thing to do based on log file inspection.

llvm-svn: 347151
2018-11-18 08:30:09 +00:00
Craig Topper
b03f80a21c [X86] Rename WidenMaskArithmetic->PromoteMaskArithmetic since we usually use widen to refer to adding elements not making elements larger. NFC
llvm-svn: 347150
2018-11-18 07:35:08 +00:00
Craig Topper
f56a57518d [X86] Don't use a pmaddwd for vXi32 multiply if the inputs are zero extends from i8 or smaller without SSE4.1. Prefer to shrink the mul instead.
The zero extend will require two stages of unpacks to implement. So its better to shrink the multiply using pmullw and then extend that result back to v4i32 using a single unpack.

llvm-svn: 347149
2018-11-18 05:53:21 +00:00
Craig Topper
0438d791fa [X86] Add support for matching PACKUSWB from a v64i8 shuffle.
llvm-svn: 347143
2018-11-17 18:54:43 +00:00
Craig Topper
dd61f11642 [X86] Don't extend v32i8 multiplies to v32i16 with avx512bw and prefer-vector-width=256.
llvm-svn: 347131
2018-11-17 02:36:07 +00:00
Craig Topper
b05ea28f1f [X86] Use getUnpackl/getUnpackh instead of hardcoding a shuffle mask.
llvm-svn: 347127
2018-11-17 02:18:12 +00:00
Fangrui Song
7570932977 Use llvm::copy. NFC
llvm-svn: 347126
2018-11-17 01:44:25 +00:00
Craig Topper
ee0333b4a9 [X86] Add custom promotion of narrow fp_to_uint/fp_to_sint operations under -x86-experimental-vector-widening-legalization.
This tries to force the result type to vXi32 followed by a truncate. This can help avoid scalarization that would otherwise occur.

There's some annoying examples of an avx512 truncate instruction followed by a packus where we should really be able to just use one truncate. But overall this is still a net improvement.

llvm-svn: 347105
2018-11-16 22:53:00 +00:00
Craig Topper
87bc07b3dd [X86] Qualify part of the masked gather handling in ReplaceNodeResults with a getTypeAction call to know if we can use default legalization.
If we managed to switch to -x86-experimental-vector-widening-legalization this block can be removed.

llvm-svn: 347100
2018-11-16 22:04:29 +00:00
Craig Topper
567aaeb40d [X86] Remove a branch on SSE4.1 from LowerLoad
We should be able to use getExtendInVec with or without sse4.1 to produce a SIGN_EXTEND_VECTOR_INREG.

llvm-svn: 347095
2018-11-16 21:05:00 +00:00
Craig Topper
7fff9a9aef [X86] In LowerLoad, fix assert messages and rename a variable that use Zize instead of Size. NFC
llvm-svn: 347093
2018-11-16 21:04:56 +00:00
Peter Collingbourne
527024469a AArch64: Emit a call frame instruction for the shadow call stack register.
When unwinding past a function that uses shadow call stack, we must
subtract 8 from the value of the x18 register. This patch causes us
to emit a call frame instruction that causes that to happen.

Differential Revision: https://reviews.llvm.org/D54609

llvm-svn: 347089
2018-11-16 20:08:54 +00:00
Anton Korobeynikov
e5cb1c35b4 [MSP430] Add RTLIB::[SRL/SRA/SHL]_I32 lowering to EABI lib calls
Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D54626

llvm-svn: 347080
2018-11-16 19:36:15 +00:00
Rong Xu
3a38175723 [X86] Disable Condbr_merge pass
Disable Condbr_merge pass for now due to PR39658.
Will reenable the pass once the bug is fixed.

llvm-svn: 347079
2018-11-16 19:35:00 +00:00
Stefan Pintilie
9004444d81 Revert "[PowerPC] Make no-PIC default to match GCC - LLVM"
This reverts commit r347069

llvm-svn: 347076
2018-11-16 19:24:23 +00:00
Anton Korobeynikov
883c70959d [MSP430] Use R_MSP430_16_BYTE type for FK_Data_2 fixup
Linker fails to link example like this (simplified case from newlib
sources):

$ cat test.c

extern const char _ctype_b[];
struct _t { char *ptr; };
struct _t T = { ((char *) _ctype_b + 3) };
$ cat ctype.c

char _ctype_b[4] = { 0, 0, 0, 0 };
LD: test.o:(.data+0x0): warning: internal error: unsupported relocation error

We also follow gnu toolchain here, where 2-byte relocation mapped to
R_MSP430_16_BYTE, instead of R_MSP430_16.

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D54620

llvm-svn: 347074
2018-11-16 19:20:51 +00:00
Sam Clegg
74f5fd4e32 [WebAssembly] Default to static reloc model
Differential Revision: https://reviews.llvm.org/D54637

llvm-svn: 347073
2018-11-16 18:59:51 +00:00
Stefan Pintilie
046eff502f [PowerPC] Make no-PIC default to match GCC - LLVM
Set -fno-PIC as the default option.

Differential Revision: https://reviews.llvm.org/D53383

llvm-svn: 347069
2018-11-16 18:36:21 +00:00
Simon Pilgrim
66f42ea6e1 [SelectionDAG] Move (repeated) SDTIntShiftDOp double shift node def to common code. NFCI.
Prep work for PR39467.

llvm-svn: 347067
2018-11-16 17:50:59 +00:00
Simon Pilgrim
bcd6631a2a [X86][SSE] Move number of input limit out of resolveTargetShuffleInputs.
Only combineX86ShufflesRecursively needs this limit.

llvm-svn: 347054
2018-11-16 15:01:05 +00:00
Roman Lebedev
90c5b3f78e [X86] X86DAGToDAGISel::matchBitExtract(): extract 'lshr' from X
Summary:
As discussed in previous review, and noted in the FIXME, if `X` is actually an `lshr Y, Z` (logical!),
we can fold the `Z` into 'control`, and let the `BEXTR` do this too.
We could just insert those 8 bits of shift amount into control,
but it is better to instead zero-extend them, and 'or' them in place.

We can only do this for `lshr`, not `ashr`, because we do not know that the mask cover only the bits of `Y`,
and not any of the sign-extended bits.

The obvious question is, is this actually legal to do?
I believe it is. Relevant quotes, from `Intel® 64 and IA-32 Architectures Software Developer’s Manual`, `BEXTR — Bit Field Extract`:
* `Bit 7:0 of the second source operand specifies the starting bit position of bit extraction.`
* `A START value exceeding the operand size will not extract any bits from the second source operand.`
* `Only bit positions up to (OperandSize -1) of the first source operand are extracted.`
* `All higher order bits in the destination operand (starting at bit position LENGTH) are zeroed.`
* `The destination register is cleared if no bits are extracted.`

FIXME: if we can do this, i wonder if we should prefer `BEXTR` over `BZHI` in such cases.

Reviewers: RKSimon, craig.topper, spatel, andreadb

Reviewed By: RKSimon, craig.topper, andreadb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54095

llvm-svn: 347048
2018-11-16 13:04:54 +00:00
Alex Bradbury
b4a64cede8 [RISCV][NFC] Define and use the new CA instruction format
The RISC-V ISA manual was updated on 2018-11-07 (commit 00557c3) to define a 
new compressed instruction format, RVC format CA (no actual instruction 
encodings were changed). This patch updates the RISC-V backend to define the 
new format, and to use it in the relevant instructions.

Differential Revision: https://reviews.llvm.org/D54302
Patch by Luís Marques.

llvm-svn: 347043
2018-11-16 10:33:23 +00:00
Alex Bradbury
2146e8fb1e [RISCV] Constant materialisation for RV64I
This commit introduces support for materialising 64-bit constants for RV64I,
making use of the RISCVMatInt::generateInstSeq helper in order to share logic
for immediate materialisation with the MC layer (where it's used for the li
pseudoinstruction).

test/CodeGen/RISCV/imm.ll is updated to test RV64, and gains new 64-bit
constant tests. It would be preferable if anyext constant returns were sign
rather than zero extended (see PR39092). This patch simply adds an explicit
signext to the returns in imm.ll.

Further optimisations for constant materialisation are possible, most notably
for mask-like values which can be generated my loading -1 and shifting right.
A future patch will standardise on the C++ codepath for immediate selection on
RV32 as well as RV64, and then add further such optimisations to
RISCVMatInt::generateInstSeq in order to benefit both RV32 and RV64 for
codegen and li expansion.

Differential Revision: https://reviews.llvm.org/D52962

llvm-svn: 347042
2018-11-16 10:14:16 +00:00
Anton Korobeynikov
411773d227 [MSP430] Add support for .refsym directive
Introduces support for '.refsym' assembler directive.

From GCC docs (for MSP430):
'.refsym' - This directive instructs assembler to add an undefined reference
to the symbol following the directive. No relocation is created for this symbol;
it will exist purely for pulling in object files from archives.

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D54618

llvm-svn: 347041
2018-11-16 09:50:24 +00:00
Craig Topper
079c37da58 [X86] Add custom type legalization for v2i8/v4i8/v8i8 mul under -x86-experimental-vector-widening.
By early promoting the multiply to use an i16 element type we can avoid op legalization emit a second multiply for the 8 upper elements of the v16i8 type we would otherwise get.

llvm-svn: 347032
2018-11-16 06:15:21 +00:00
Matt Arsenault
eabb8dd015 AMDGPU: Fix analyzeBranch failing with pseudoterminators
If a block had one of the _term instructions used for gluing
exec modifying instructions to the end of the block,
analyzeBranch would fail, preventing the verifier from catching
a broken successor list.

llvm-svn: 347027
2018-11-16 05:03:02 +00:00
Craig Topper
5802b82b40 [X86] Use ANY_EXTEND instead of SIGN_EXTEND in the AVX2 and later path for legalizing vXi8 multiply.
We aren't going to use the upper bits of the multiply result that the extend would effect. So we don't need a specific type of extend.

This makes some reduction test cases shorter because we were previously trying to sign_extend a truncate which we can't eliminate.

llvm-svn: 347011
2018-11-16 01:16:59 +00:00
Craig Topper
1acafd863f [X86] Update a couple comments to remove a mention of a sign extending that no longer happens. NFC
llvm-svn: 347010
2018-11-16 01:16:51 +00:00
Ron Lieberman
cac749ac88 [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST
Add a pass to fixup various vector ISel issues.
Currently we handle converting GLOBAL_{LOAD|STORE}_*
and GLOBAL_Atomic_* instructions into their _SADDR variants.
This involves feeding the sreg into the saddr field of the new instruction.

llvm-svn: 347008
2018-11-16 01:13:34 +00:00
Heejin Ahn
095796a391 [WebAssembly] Split BBs after throw instructions
Summary:
`throw` instruction is a terminator in wasm, but BBs were not splitted
after `throw` instructions, causing machine instruction verifier to
fail.

This patch
- Splits BBs after `throw` instructions in WasmEHPrepare and adding an
  unreachable instruction after `throw`, which will be deleted in
  LateEHPrepare pass
- Refactors WasmEHPrepare into two member functions
- Changes the semantics of `eraseBBsAndChildren` in LateEHPrepare pass
  to match that of WasmEHPrepare pass, which is newly added. Now
  `eraseBBsAndChildren` does not delete BBs with remaining predecessors.
- Fixes style nits, making static function names conform to clang-tidy
- Re-enables the test temporarily disabled by rL346840 && rL346845

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54571

llvm-svn: 347003
2018-11-16 00:47:18 +00:00
Ron Lieberman
2f5683e6b0 [AMDGPU] NFC Test commit
llvm-svn: 347002
2018-11-16 00:46:51 +00:00
Konstantin Zhuravlyov
af7b5d7092 AMDHSA: More code object v3 fixes:
- Make sure IsaInfo::hasCodeObjectV3 returns true only
    for AMDHSA
  - Update assembler metadata tests to use v2 by default

llvm-svn: 347001
2018-11-15 23:14:23 +00:00
Craig Topper
22bfa99448 [X86] Remove ANY_EXTEND special case from canReduceVMulWidth
Removing this code doesn't affect any lit tests so it doesn't appear to be tested anymore. I assume it was when it was added, but I guess something else changed? Code coverage report also says its unused.

I mostly didn't like that it seemed to count the sign bits as if it was a sign_extend, but then set isPositive as if it was a zero_extend. It feels like we should have picked one interpretation?

Differential Revision: https://reviews.llvm.org/D54596

llvm-svn: 346995
2018-11-15 21:19:32 +00:00
Craig Topper
b144c7a6fb [X86] Minor cleanup to getExtendInVec. NFCI
Use unsigned to calculate the subvector index to avoid a cast.

Remove an unnecessary condition and replace it with a stronger assert.

Use the InVT variable we updated when we extracted instead of grabbing it from the In SDValue.

llvm-svn: 346983
2018-11-15 19:20:22 +00:00
Craig Topper
73bb04ab6f [X86] Add -x86-experimental-vector-widening support to reduceVMULWidth and combineMulToPMADDWD
In reduceVMULWidth, we no longer need to worry about extending the vector to 128 bits first. Regular widening of extends, muls and shuffles will take care of that for us.

In combineMulToPMADDWD, we can handle v2i32 multiplies and allow the VPMADDWD to be widened to v4i32 during type legalization by adding custom widening like we do have for AVG/ADDUS/SUBUS. I had to modify that code a little to allow different and output VTs.

Differential Revision: https://reviews.llvm.org/D54512

llvm-svn: 346980
2018-11-15 18:59:31 +00:00
Thomas Lively
fc3163b67a [WebAssembly] Fix return type of nextByte
Summary:
The old return type did not allow for correct error reporting and was
causing a compiler warning.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54586

llvm-svn: 346979
2018-11-15 18:56:49 +00:00
Simon Pilgrim
0db8cb0147 [X86] Fix MCNullStreamer support for modules with a CodeView flag
This fixes -filetype=null support when compiling for a Win32 target and the module has a CodeView flag.

The only places changed are the uses of getTargetStreamer function - this patch guards both of them with null checks.

Committed on behalf of @eush (Eugene Sharygin)

Differential Revision: https://reviews.llvm.org/D54008

llvm-svn: 346962
2018-11-15 15:17:15 +00:00
Alex Bradbury
f809d89980 [RISCV] Mark C.EBREAK instruction as having side effects
C.EBREAK was defined with hasSideEffects = 0, which is incorrect and 
inconsistent with the non-compressed instruction form. This patch corrects 
this oversight.

This wouldn't cause codegen issues, as compressed instructions are only ever 
generated by converting the non-compressed form as an MCInst. But having 
correct flags is still worthwhile.

Differential Revision: https://reviews.llvm.org/D54256
Patch by Luís Marques.

llvm-svn: 346959
2018-11-15 14:52:24 +00:00
Alex Bradbury
7727240438 [RISCV] Mark FREM as Expand
Mark the FREM SelectionDAG node as Expand, which is necessary in order to 
support the frem IR instruction on RISC-V. This is expanded into a library 
call. Adds the corresponding test. Previously, this would have triggered an 
assertion at instruction selection time.

Differential Revision: https://reviews.llvm.org/D54159
Patch by Luís Marques.

llvm-svn: 346958
2018-11-15 14:46:11 +00:00
Anton Korobeynikov
f0001f4186 Add missed files from prev. commit
llvm-svn: 346949
2018-11-15 12:35:04 +00:00
Anton Korobeynikov
49045c6a0d [MSP430] Add MC layer
Reapply r346374 with the fixes for modules build.

Original summary:

This change implements assembler parser, code emitter, ELF object writer
and disassembler for the MSP430 ISA.  Also, more instruction forms are added
to the target description.

Patch by Michael Skvortsov!

llvm-svn: 346948
2018-11-15 12:29:43 +00:00
Alex Bradbury
22c091fc3c [RISCV] Introduce the RISCVMatInt::generateInstSeq helper
Logic to load 32-bit and 64-bit immediates is currently present in
RISCVAsmParser::emitLoadImm in order to support the li pseudoinstruction. With
the introduction of RV64 codegen, there is a greater benefit of sharing
immediate materialisation logic between the MC layer and codegen. The
generateInstSeq helper allows this by producing a vector of simple structs
representing the chosen instructions. This can then be consumed in the MC
layer to produce MCInsts or at instruction selection time to produce
appropriate SelectionDAG node. Sharing this logic means that both the li
pseudoinstruction and codegen can benefit from future optimisations, and
that this logic can be used for materialising constants during RV64 codegen.

This patch does contain a behaviour change: addi will now be produced on RV64
when no lui is necessary to materialise the constant. In that case addiw takes
x0 as the source register, so is semantically identical to addi.

Differential Revision: https://reviews.llvm.org/D52961

llvm-svn: 346937
2018-11-15 10:11:31 +00:00
Craig Topper
553ac560aa [X86] Add some custom type legalization rules for truncate with -x86-experimental-vector-widening-legalization.
This avoids some nasty shuffles when we have avx512. It will also prevent using zmm truncate instructions when a ymm instruction that zeroes part of an xmm register will do. Also avoid using avx512 truncate instructions when the input is 128 bits or less. These instructions are 2 uops on skx so we can probably find a better single uop shuffle like pshufb.

llvm-svn: 346936
2018-11-15 08:23:40 +00:00
Thomas Lively
77b33c86f5 [WebAssembly] Renumber SIMD bitwise instructions
Summary: Changed to match https://github.com/WebAssembly/simd/pull/54.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54561

llvm-svn: 346931
2018-11-15 03:38:59 +00:00
Konstantin Zhuravlyov
a25e0524c0 AMDGPU: Enable code object v3 for AMDHSA only
Differential Revision: https://reviews.llvm.org/D54186

llvm-svn: 346923
2018-11-15 02:32:43 +00:00
Craig Topper
ea6ced9d1a [X86] Don't mark SEXTLOADS with narrow types as Custom with -x86-experimental-vector-widening-legalization.
The narrow types end up requesting widening, but generic legalization will end up scalaring and using a build_vector to do the widening.

llvm-svn: 346916
2018-11-15 00:21:41 +00:00
Benjamin Kramer
6b7d6fe079 [X86] Remove unused variable
llvm-svn: 346909
2018-11-14 23:13:27 +00:00
Craig Topper
0b2089da4b [X86] Support v2i32/v4i16/v8i8 load/store using f64 on 32-bit targets under -x86-experimental-vector-widening-legalization.
On 64-bit targets the type legalizer will use i64 to legalize these. But when i64 isn't legal, the type legalizer won't try an FP type. So do it manually instead.

There are a few regressions in here due to some v2i32 operations like mul and div now being reassembled into a full vector just to store instead of storing the pieces. But this was already occuring in 64-bit mode so its not a new issue.

llvm-svn: 346908
2018-11-14 23:02:09 +00:00
Jessica Paquette
27e1754fc9 [MachineOutliner][NFC] Don't compute liveness if X16/X17/NZCV are unused
Using the MBB flags, we can tell if X16/X17/NZCV are unused in a block,
and also not live out.

If this holds for all MBBs, then we can avoid checking for liveness on
that candidate. Furthermore, if it holds for an individual candidate's
MBB, then we can avoid checking for liveness on that candidate.

llvm-svn: 346901
2018-11-14 22:23:38 +00:00
Nirav Dave
1241dcb3cf Bias physical register immediate assignments
The machine scheduler currently biases register copies to/from
physical registers to be closer to their point of use / def to
minimize their live ranges. This change extends this to also physical
register assignments from immediate values.

This causes a reduction in reduction in overall register pressure and
minor reduction in spills and indirectly fixes an out-of-registers
assertion (PR39391).

Most test changes are from minor instruction reorderings and register
name selection changes and direct consequences of that.

Reviewers: MatzeB, qcolombet, myatsina, pcc

Subscribers: nemanjai, jvesely, nhaehnle, eraman, hiraditya,
  javed.absar, arphaman, jfb, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D54218

llvm-svn: 346894
2018-11-14 21:11:53 +00:00
Aakanksha Patil
1a60116b5c AMDGPU: Additional pattern for i16 median3 matching
min(max(a, b), max(min(a, b), c))

Differential Revision: https://reviews.llvm.org/D54494

llvm-svn: 346886
2018-11-14 20:10:41 +00:00
Craig Topper
6c94264b1f [X86] Allow pmulh to be formed from narrow vXi16 vectors under -x86-experimental-vector-widening-legalization
Narrower vectors will be widened to 128 bits without changing the element size. And generic type legalization can already handle widening mulhu/mulhs.

Differential Revision: https://reviews.llvm.org/D54513

llvm-svn: 346879
2018-11-14 18:16:21 +00:00
Simon Pilgrim
cdb170794b [CostModel] Add generic expansion funnel shift cost support
Add support for the expansion of funnelshift/rotates to getIntrinsicInstrCost.

This also required us to move the X86 fshl/fshr costs to the same place as the rotates to avoid expansion and get correct scalarization vs vectorization costs.

llvm-svn: 346854
2018-11-14 12:24:50 +00:00
Simon Pilgrim
7501780ec6 [X86][AVX512] Remove constant pool shuffle decoding from SelectionDAG
This patch removes the last use of the constant pool shuffle decode helper and consistently uses the 'getTargetShuffleMaskIndices' versions instead. The constant pool versions are now purely used for assembly comments.

The avx512vbmi intrinsic upgrades had to be altered as they were being decoded as broadcasts, similar to what I fixed in rL346032. I don't think the change is critical - although its annoying that we lose the {k}{z} instruction test coverage as they are tricky to generate....

Differential Revision: https://reviews.llvm.org/D54083

llvm-svn: 346850
2018-11-14 11:26:35 +00:00
Heejin Ahn
da419bdb5e [WebAssembly] Add support for the event section
Summary:
This adds support for the 'event section' specified in the exception
handling proposal. (This was named 'exception section' first, but later
renamed to 'event section' to take possibilities of other kinds of
events into consideration. But currently we only store exception info in
this section.)

The event section is added between the global section and the export
section. This is for ease of validation per request of the V8 team.

This patch:
- Creates the event symbol type, which is a weak symbol
- Makes 'throw' instruction take the event symbol '__cpp_exception'
- Adds relocation support for events
- Adds WasmObjectWriter / WasmObjectFile (Reader) support
- Adds obj2yaml / yaml2obj support
- Adds '.eventtype' printing support

Reviewers: dschuff, sbc100, aardappel

Subscribers: jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54096

llvm-svn: 346825
2018-11-14 02:46:21 +00:00
Zi Xuan Wu
6a3c279d1c [PowerPC] Enhance the selection(ISD::VSELECT) of vector type
To make ISD::VSELECT available(legal) so long as there are altivec instruction, otherwise it's default behavior is expanding,
which is legalized at type-legalization phase. Use xxsel to match vselect if vsx is open, or use vsel.


Differential Revision: https://reviews.llvm.org/D49531

llvm-svn: 346824
2018-11-14 02:34:45 +00:00
Jessica Paquette
4e97ec94d9 [MachineOutliner][NFC] Use flags set in all candidates to check for calls
If we keep track of if the ContainsCalls bit is set in the MBB flags for each
candidate, then we have a better chance of not checking the candidate for calls
at all.

This saves quite a few checks in some CTMark tests (~200 in Bullet, for
example.)

llvm-svn: 346816
2018-11-13 23:41:31 +00:00
Jessica Paquette
cad864d49e [MachineOutliner][NFC] Use MBB flags to avoid call checks in getOutliningInfo
We already determine a bunch of information about an MBB in
getMachineOutlinerMBBFlags. We can reuse that information to avoid calculating
things that must be false/true.

The first thing we can easily check is if an outlined sequence could ever
contain calls. There's no reason to walk over the outlined range, checking for
calls, if we already know that there are no calls in the block containing the
sequence.

llvm-svn: 346809
2018-11-13 23:01:34 +00:00
Jessica Paquette
b2d53c5d7d [MachineOutliner][NFC] Exit getOutliningType if there are < 2 candidates
Since we never outline anything with fewer than 2 occurrences, there's no
reason to compute cost model information if there's less than that.

llvm-svn: 346803
2018-11-13 22:16:27 +00:00
Stanislav Mekhanoshin
bcb34ac2ea [AMDGPU] combine extractelement into several selects
An extractelement with non-constant index will be lowered either to
scratch or movrel loop in most cases. This patch converts such
instruction into a set of selects if vector size is not too big.

Differential Revision: https://reviews.llvm.org/D54351

llvm-svn: 346800
2018-11-13 21:18:21 +00:00
Craig Topper
aca8390216 [SelectionDAG][X86] Relax restriction on the width of an input to *_EXTEND_VECTOR_INREG. Use them and regular *_EXTEND to replace the X86 specific VSEXT/VZEXT opcodes
Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output.

This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes.

X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code.

I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements.

The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits.

Differential Revision: https://reviews.llvm.org/D54346

llvm-svn: 346784
2018-11-13 19:45:21 +00:00
Sam Clegg
f98ba05f3d [WebAssembly] Fix broken assumption that all bitcasts are to functions types
Specifically, we can bitcast to void.

Fixes PR39591

Differential Revision: https://reviews.llvm.org/D54447

llvm-svn: 346778
2018-11-13 19:14:02 +00:00
Simon Pilgrim
e827fe09b3 [CostModel][X86] Fix constant vector XOP rights shifts
We'll constant fold these cases so they are as cheap as vector left shift cases.

Noticed while improving funnel shift costs.

llvm-svn: 346760
2018-11-13 16:40:10 +00:00
Simon Pilgrim
72a7fbc1a3 Fix comment for XOP rotates. NFCI.
llvm-svn: 346753
2018-11-13 12:09:27 +00:00
Alexander Richardson
4eb93907f7 Fix modules build of AVRAsmParser.cpp
Summary:
Without this change I get the following error:

lib/Target/AVR/AVRGenAsmMatcher.inc:1135:1: error: redundant #include of module 'LLVM_Utils.Support.Format' appears within namespace 'llvm' [-Wmodules-import-nested-redundant]

Reviewers: dylanmckay

Reviewed By: dylanmckay

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53425

llvm-svn: 346750
2018-11-13 10:54:44 +00:00
Jonas Paulsson
f9b2b5e67e [SystemZ] Increase the number of VLREPs
If a loaded value is replicated it is best to combine these two operations
into a VLREP (load and replicate), but isel will not produce this if the load
has other users as well.

This patch handles this by putting the other users of the load to use the
REPLICATE 0-element instead of the load. This way the load has only the
REPLICATE node as user, and we get a VLREP.

Review: Ulrich Weigand
https://reviews.llvm.org/D54264

llvm-svn: 346746
2018-11-13 08:37:09 +00:00
Jessica Paquette
106946329d [MachineOutliner][NFC] Simplify isMBBSafeToOutlineFrom check in AArch64 outliner
Turns out it's way simpler to do this check with one LRU. Instead of
maintaining two, just keep one. Check if each of the registers is available,
and then check if it's a live out from the block. If it's a live out, but
available in the block, we know we're in an unsafe case.

llvm-svn: 346721
2018-11-13 00:32:09 +00:00
Jessica Paquette
82d9c0a3fa [MachineOutliner][NFC] Change getMachineOutlinerMBBFlags to isMBBSafeToOutlineFrom
Instead of returning Flags, return true if the MBB is safe to outline from.

This lets us check for unsafe situations, like say, in AArch64, X17 is live
across a MBB without being defined in that MBB. In that case, there's no point
in performing an instruction mapping.

llvm-svn: 346718
2018-11-12 23:51:32 +00:00
Simon Pilgrim
e565e5a962 [X86][SSE] Add lowerVectorShuffleAsByteRotateAndPermute (PR39387)
This patch adds the ability to use a PALIGNR to rotate a pair of inputs to select a range containing all the referenced elements, followed by a single input permute to put them in the right location.

Differential Revision: https://reviews.llvm.org/D54267

llvm-svn: 346706
2018-11-12 21:12:38 +00:00
Aakanksha Patil
a992c694c6 AMDGPU: Adding more median3 patterns
min(max(a, b), max(min(a, b), c)) -> med3 a, b, c

Differential Revision: https://reviews.llvm.org/D54331

llvm-svn: 346704
2018-11-12 21:04:06 +00:00
Wouter van Oortmerssen
cc75e77df5 [WebAssembly] Added WasmAsmParser.
Summary:
This is to replace the ELFAsmParser that WebAssembly was using, which
so far was a stub that didn't do anything, and couldn't work correctly
with wasm.

This new class is there to implement generic directives related to
wasm as a binary format. Wasm target specific directives are still
parsed in WebAssemblyAsmParser as before. The two classes now
cooperate more correctly too.

Also implemented .result which was missing. Any unknown directives
will now result in errors.

Reviewers: dschuff, sbc100

Subscribers: mgorny, jgravelle-google, eraman, aheejin, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54360

llvm-svn: 346700
2018-11-12 20:15:01 +00:00
Craig Topper
c48712b341 [X86] In LowerMULH, use generic truncate and vector shuffle nodes instead of directly emitting PACKUS.
Truncate and shuffle lowering are already capable of matching to PACKUS using known bits analysis.

This features one test change where we now prefer to extend v16i16->v16i32 then trunc v16i32->v16i8 over extract_subvector+packus when avx512f is available, but avx512bw is not.

llvm-svn: 346697
2018-11-12 19:37:29 +00:00
Stanislav Mekhanoshin
e86c8d33b1 [AMDGPU] Optimize S_CBRANCH_VCC[N]Z -> S_CBRANCH_EXEC[N]Z
Sometimes after basic block placement we end up with a code like:

  sreg = s_mov_b64 -1
  vcc = s_and_b64 exec, sreg
  s_cbranch_vccz

This happens as a join of a block assigning -1 to a saved mask and
another block which consumes that saved mask with s_and_b64 and a
branch.

This is essentially a single s_cbranch_execz instruction when moved
into a single new basic block.

Differential Revision: https://reviews.llvm.org/D54164

llvm-svn: 346690
2018-11-12 18:48:17 +00:00
Simon Pilgrim
93c64e5c76 [CostModel][X86] Add funnel shift rotation special case costs
When we repeat the 2 shifting operands then this is a bit rotation - annoyingly this has to be done in the other getIntrinsicInstrCost than most intrinsics as we need to check the operands are the same.

llvm-svn: 346688
2018-11-12 18:27:54 +00:00
Simon Pilgrim
49e93d2f0e [CostModel][X86] Add SHLD/SHRD scalar funnel shift costs
The costs match the typical reg-reg cases - the RMW case can be a lot slower but we don't model that at this level

llvm-svn: 346683
2018-11-12 17:56:59 +00:00
Simon Pilgrim
f4cd292ba2 [CostModel][X86] SK_ExtractSubvector is cheap if the (legal) subvector is aligned within the source vector
llvm-svn: 346664
2018-11-12 15:48:06 +00:00
Jonas Paulsson
5cea85dd59 [SystemZ::TTI] Improve accuracy of costs for vector fp <-> int conversions
Improve getCastInstrCost() by respecting the different types of Src and Dst
for vector integer <-> fp conversions.

This means that extracting from integer becomes more expensive (by the
extraction penalty), and the extraction from fp becomes cheaper (no longer
has a false extraction penalty).

Review: Ulrich Weigand
https://reviews.llvm.org/D54423

llvm-svn: 346663
2018-11-12 15:32:27 +00:00
Alex Bradbury
9c03e4cacd [RISCV] Support .option relax and .option norelax
This extends the .option support from D45864 to enable/disable the relax 
feature flag from D44886

During parsing of the relax/norelax directives, the RISCV::FeatureRelax 
feature bits of the SubtargetInfo stored in the AsmParser are updated 
appropriately to reflect whether relaxation is currently enabled in the 
parser. When an instruction is parsed, the parser checks if relaxation is 
currently enabled and if so, gets a handle to the AsmBackend and sets the 
ForceRelocs flag. The AsmBackend uses a combination of the original 
RISCV::FeatureRelax feature bits set by e.g -mattr=+/-relax and the 
ForceRelocs flag to determine whether to emit relocations for symbol and 
branch diffs. Diff relocations should therefore only not be emitted if the 
relax flag was not set on the command line and no instruction was ever parsed 
in a section with relaxation enabled to ensure correct diffs are emitted.

Differential Revision: https://reviews.llvm.org/D46423
Patch by Lewis Revill.

llvm-svn: 346655
2018-11-12 14:25:07 +00:00
Jonas Paulsson
c0ee028dc3 [SystemZ] Replicate the load with most uses in buildVector()
Iterate over all elements and count the number of uses among them for each
used load. Then make sure to REPLICATE the load which has the most uses in
order to minimize the number of needed element insertions.

Review: Ulrich Weigand
https://reviews.llvm.org/D54322

llvm-svn: 346637
2018-11-12 08:12:20 +00:00
Craig Topper
2eab39f77b [X86] Use DAG.getConstant instead of getZeroVector.
llvm-svn: 346605
2018-11-11 07:24:36 +00:00
Craig Topper
ef33a190bc [X86] Replace calls to getOnesVector/getZeroVector with getConstant.
getConstant will create a BUILD_VECTOR for us and use a legal type if necessary. So just create the simple node and let BUILD_VECTOR legalization do the canonicalization.

llvm-svn: 346603
2018-11-11 01:40:04 +00:00
Sanjay Patel
0a515595a7 [x86] allow vector load narrowing with multi-use values
This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more
opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs.
Apart from 2-3 strange cases, these are all wins.

I've structured this to be no-functional-change-intended for any target except for x86
because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those
targets have existing regression tests (4, 4, 10 files respectively) that would be
affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show
any regression test diffs. The trade-off is deciding if an extra vector load is better
than a single wide load + extract_subvector.

For x86, this is almost always better (on paper at least) because we often can fold
loads into subsequent ops and not increase the official instruction count. There's also
some unknown -- but potentially large -- benefit from using narrower vector ops if wide
ops are implemented with multiple uops and/or frequency throttling is avoided.

Differential Revision: https://reviews.llvm.org/D54073

llvm-svn: 346595
2018-11-10 20:05:31 +00:00
Benjamin Kramer
37c691e867 [X86] Remove unused variable
llvm-svn: 346592
2018-11-10 18:11:11 +00:00
Craig Topper
7956a256e9 [X86] Remove apparently unneeded code from combineVSZext.
No lit tests fail with this code removed.

This is a pre-commit for D54346.

llvm-svn: 346590
2018-11-10 17:44:28 +00:00
Simon Pilgrim
d3ca710ec9 [CostModel][X86] SK_ExtractSubvector costs must only be tested for vector types (PR39615)
llvm-svn: 346589
2018-11-10 17:37:52 +00:00
Roman Lebedev
b428b8b214 [X86][BdVer2] Fix loads/stores throughput for Piledriver (PR39465)
There are two AGU units, and per 1cy, there can be either two loads,
or a load and a store; but not two stores, or two loads and a store.

Additionally, loads shouldn't affect the store scheduler and vice versa.
(but *should* affect the PdEX scheduler.)

Required rL346545.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39465

llvm-svn: 346587
2018-11-10 14:31:43 +00:00
Craig Topper
a1b6667c6a [X86] Use a MOVSX instruction instead of a MOVZX instruction in isel for an any_extend of the remainder from an 8-bit sdivrem.
The sdivrem will emit its own MOVSX to move %ah to the low byte of a register. By using a MOVSX for an any_extend this allows a post-isel peephole to merge them.

llvm-svn: 346581
2018-11-10 06:04:33 +00:00
Craig Topper
0364085281 [X86] In LowerHorizontalByteSum, emit vector_shuffle nodes instead of directly using X86ISD::UNPCKL/X86ISD::UNPCKH.
This gives shuffle lowering the freedom to use zero_extend_vector_inreg for the unpckl shuffle. Shuffle combining usually makes this swap later, but not when AVX512 is enabled it seems.

While there also use DAG.getConstant to create a 0 vector instead of using the helper the forces a specific BUILD_VECTOR. I don't think that helper is usually needed. We're basically free to create a constant build_vector anytime and it will be legalized on its own.

llvm-svn: 346574
2018-11-10 00:26:42 +00:00
Thomas Lively
936734b777 [WebAssembly] Update bleeding-edge cpu features
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D54362

llvm-svn: 346570
2018-11-10 00:11:14 +00:00
Eli Friedman
ad1151cf6a [ARM64] [Windows] Handle funclets
This patch adds support for funclets in frame lowering and ISel
lowering. Together with D50288 and D50166, it enables C++ exception
handling.

Patch by Sanjin Sijaric, with some fixes by me.

Differential Revision: https://reviews.llvm.org/D51524

llvm-svn: 346568
2018-11-09 23:33:30 +00:00
Eli Friedman
0bbb0d0720 [ARM] Add MemOperand to LDRcp to enable DCE.
LDRcp should be deleted when the dest register is dead in register
coalescing. Without MemOp, dead LDRcp will cause dead constant pool
value which references to non-existing label.

Patch by Yin Ma.

Differential Revision: https://reviews.llvm.org/D54173

llvm-svn: 346563
2018-11-09 23:09:17 +00:00
Craig Topper
17d64c71c5 [X86] Move the promotion of v16i16->v16i8 for avx512f but not avx512bw from lowering to isel. Change to use vpmovzx instead of vpmovsx.
With avx512f but not avx512bw we need to extend to v16i32 then truncate that to to v16i8. Previously we emitted both nodes during lowering, but I'm trying to switch to using target independent nodes and with that switched the extend+truncate wou

This patch changes the implementation to what will be necessary with that patch which helps minimize test diffs.

llvm-svn: 346552
2018-11-09 20:09:53 +00:00
Bryan Chan
123553921f [AArch64] Support HiSilicon's TSV110 processor
Reviewers: t.p.northover, SjoerdMeijer, kristof.beyls

Reviewed By: kristof.beyls

Subscribers: olista01, javed.absar, kristof.beyls, kristina, llvm-commits

Differential Revision: https://reviews.llvm.org/D53908

llvm-svn: 346546
2018-11-09 19:32:08 +00:00
Fangrui Song
60b7fb46e1 [Hexagon] Fix some -Wunused-function with LLVM_DUMP_METHOD and -Wunused-variable
llvm-svn: 346543
2018-11-09 19:24:48 +00:00
Craig Topper
731ea7dbc1 [X86] Turn X86ISD::VSEXT into X86ISD::VZEXT if the upper bits aren't demanded.
This makes X86ISD::VSEXT more similar to ISD::SIGN_EXTEND and ISD::ZERO_EXTEND.

I'm hoping to replace X86ISD::VSEXT/VZEXT with target independent nodes. Making the target specific nodes similar to the target independent nodes helps minimize test diffs in that patch.

llvm-svn: 346539
2018-11-09 19:05:51 +00:00
Simon Pilgrim
fc8f1d7da7 [CostModel][X86] SK_ExtractSubvector is free if the subvector is at the start of the source vector
llvm-svn: 346538
2018-11-09 19:04:27 +00:00
Jordan Rupprecht
c1741a5a8a [Hexagon] Fix unused variable warning in release builds
llvm-svn: 346537
2018-11-09 18:54:27 +00:00
Fangrui Song
4955066366 [WebAssembly] Hotfix of WebAssemblyInstructionTableSize after rL346465
llvm-svn: 346535
2018-11-09 18:32:20 +00:00
Brendon Cahoon
ac8fed68d5 [Hexagon] Implement noreturn optimization
Eliminate the stack frame in functions with the noreturn nounwind
attributes, and when the noreturn-stack-elim target feature is
enabled. This reduces the code and stack space needed for noreturn
functions.

Differential Revision: https://reviews.llvm.org/D54210

llvm-svn: 346532
2018-11-09 18:16:24 +00:00
Stanislav Mekhanoshin
13d3371e68 [AMDGPU] Always pass TRI into findRegister[Use/Def]OperandIdx
This only covers AMDGPU BE, hopefully all occurrences.

Differential Revision: https://reviews.llvm.org/D54235

llvm-svn: 346528
2018-11-09 17:58:59 +00:00
Krzysztof Parzyszek
8567de0871 [Hexagon] Place globals with explicit .sdata section in small data
Both -fPIC and -G0 disable placement of globals in small data section,
but if a global has an explicit section assigmnent placing it in small
data, it should go there anyway.

llvm-svn: 346523
2018-11-09 17:31:22 +00:00
Zaara Syeda
5c179bf14b [Power9] Allow gpr callee saved spills in prologue to vectors registers
Currently in llvm, CalleeSavedInfo can only assign a callee saved register to
stack frame index to be spilled in the prologue. We would like to enable
spilling gprs to vector registers. This patch adds the capability to spill to
other registers aside from just the stack. It also adds the changes for power9
to spill gprs to volatile vector registers when they are available.
This happens only for leaf functions when using the option
-ppc-enable-pe-vector-spills.

Differential Revision: https://reviews.llvm.org/D39386

llvm-svn: 346512
2018-11-09 16:36:24 +00:00
Alexey Bataev
93d018a916 Revert "[DEBUGINFO, NVPTX]DO not emit ',debug' option if no debug info or only debug directives are requested."
This reverts commit r345972. Need to update the description + possibly
to update the patch itself after discussion with Eric Christofer.

llvm-svn: 346508
2018-11-09 16:22:35 +00:00
Jonas Paulsson
458b7c0b39 [SystemZ] Avoid inserting same value after replication
A minor improvement of buildVector() that skips creating an
INSERT_VECTOR_ELT for a Value which has already been used for the
REPLICATE.

Review: Ulrich Weigand
https://reviews.llvm.org/D54315

llvm-svn: 346504
2018-11-09 15:44:28 +00:00
Sam Parker
2804f32ec4 [ARM] Don't promote i1 types in ARM CGP
Now that we have mixed type sizes, i1 values need to be explicitly
handled as we want to avoid promoting these values.

Differential Revision: https://reviews.llvm.org/D54308

llvm-svn: 346499
2018-11-09 15:06:33 +00:00
Sanjay Patel
fa1c0fe478 [x86] try to form broadcast before widening shuffle elements
I noticed that we weren't generating broadcasts as much I thought we would with 
D54271, and this is part of the problem.

Widening the shuffle elements means adding bitcasts and hiding the relationship 
between a splatted scalar and the vector. If we can form a broadcast, do that 
before going through the rest of the shuffle lowering because broadcasts should 
be cheap and can often be load-folded.

Differential Revision: https://reviews.llvm.org/D54280

llvm-svn: 346498
2018-11-09 14:54:58 +00:00
Alex Bradbury
1cc2d0b9fb [RISCV] Avoid unnecessary XOR for seteq/setne 0
Differential Revision: https://reviews.llvm.org/D53492

Patch by James Clarke.

llvm-svn: 346497
2018-11-09 14:47:36 +00:00
Petar Avramovic
2cefaa2747 [MIPS GlobalISel] narrowScalar G_CONSTANT
Legalize s64 G_CONSTANT using narrowScalar on MIPS 32.

Differential Revision: https://reviews.llvm.org/D54255

llvm-svn: 346495
2018-11-09 14:21:16 +00:00
Simon Pilgrim
ea51f98b9b [X86] Add Subtarget to more lowerVectorShuffle functions. NFCI.
This will be necessary for an update to D54267

llvm-svn: 346490
2018-11-09 13:19:03 +00:00
Clement Courbet
eee2e06e2a [llvm-exegesis][NFC] Add a way to declare the default counter binding for unbound CPUs for a target.
Summary:
This simplifies the code and moves everything to tablegen for consistency. This
also prepares the ground for adding issue counters.

Reviewers: gchatelet, john.brawn, jsji

Subscribers: nemanjai, mgorny, javed.absar, kbarton, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54297

llvm-svn: 346489
2018-11-09 13:15:32 +00:00
Clement Courbet
e6b727e552 [X86] Fix VZEROUPPER scheduling info on SNB,HSW,BDW,SXL,SKX.
Summary:
Starting from SNB, VZEROUPPER is handled by the renamer and uses no proc resources.
After HSW, it also has zero latency.

This fixes PR35606.

To reproduce:
Uops:
  llvm-exegesis -mode=uops -opcode-name=VZEROUPPER
Latency:
  echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper' | /tmp/llvm-exegesis -mode=latency -snippets-file=-
  echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper\naddps %xmm0, %xmm1' | /tmp/llvm-exegesis -mode=latency -snippets-file=-

Reviewers: RKSimon, craig.topper, andreadb

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D54107

llvm-svn: 346482
2018-11-09 09:49:06 +00:00
Sam Parker
08979cd125 [ARM] Enable mixed types in ARM CGP
Previously, during the search, all values had to have the same
'TypeSize', which is equal to number of bits of the integer type of
the icmp operand. All values in the tree had to match this size;
meaning that, if we searched from i16, we wouldn't accept i8s. A
change in type size requires zext and truncs to perform the casts so,
to allow mixed narrow types, the handling of these instructions is
now slightly different:

- we allow casts if their result or operand is <= TypeSize.
- zexts are sinks if their result > TypeSize.
- truncs are still sinks if their operand == TypeSize.
- truncs are still sources if their result == TypeSize.

The transformation bails on finding an icmp that operates on data
smaller than the current TypeSize.

Differential Revision: https://reviews.llvm.org/D54108

llvm-svn: 346480
2018-11-09 09:28:27 +00:00
Sam Parker
453ba916a0 [ARM] Small reorganisation in ARMParallelDSP
A few code movement things:

- AreSymmetrical is now a method of BinOpChain.
- Created a lambda in CreateParallelMACPairs to reduce loop nesting.
- A Reduction object now gets pasted in a couple of places instead,
  including CreateParallelMACPairs so it doesn't need to return a
  value.
I've also added RecordSequentialLoads, which is run before the
transformation begins, and caches the interesting loads. This can then
be queried later instead of cross checking many load values.

Differential Revision: https://reviews.llvm.org/D54254

llvm-svn: 346479
2018-11-09 09:18:00 +00:00
Mandeep Singh Grang
397765bc51 [COFF, ARM64] Add support for MSVC buffer security check
Reviewers: rnk, mstorsjo, compnerd, efriedma, TomTan

Reviewed By: rnk

Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D54248

llvm-svn: 346469
2018-11-09 02:48:36 +00:00
Thomas Lively
2faf079494 [WebAssembly] Read prefixed opcodes as ULEB128s
Summary: Depends on D54126.

Reviewers: aheejin, dschuff, aardappel

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54138

llvm-svn: 346465
2018-11-09 01:57:00 +00:00
Thomas Lively
4ddd22581e [WebAssembly][NFC] Reorder SIMD section
Summary:
Reorders the sections in the SIMD tablegen file to roughly match the
new opcode ordering. Depends on D54126.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54134

llvm-svn: 346464
2018-11-09 01:49:19 +00:00
Thomas Lively
299d214aba [WebAssembly] Renumber and LEB128-encode SIMD opcodes
Reviewers: aheejin, dschuff, aardappel

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54126

llvm-svn: 346463
2018-11-09 01:45:56 +00:00
Thomas Lively
38c902bc2e [WebAssembly] Lower select for vectors
Summary:

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53675

llvm-svn: 346462
2018-11-09 01:38:44 +00:00
Heejin Ahn
0c68a875fa [WebAssembly] Fix LowerEmscriptenEHSjLj when there's only longjmp
Summary:
The pass incorrectly assumed if there's a longjmp declaration in the
module, there is also a setjmp function declaration. Fixed it, and now
the pass only converts longjmp and does not do any other transformation
when there's no setjmp declaration in the module.

Fixes PR39562.

Reviewers: jgravelle-google, sbc100

Subscribers: dschuff, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54273

llvm-svn: 346445
2018-11-08 22:56:26 +00:00
Sanjay Patel
b5535dc7b3 [x86] use shuffles for scalar insertion into high elements of a constant vector
As discussed in D54073, we have a potential regression from more aggressive vector narrowing here, so let's try to avoid that by changing build-vector lowering slightly.

Insert-vector-element lowering always does this since there's no "pinsr" for ymm/zmm:

// If the vector is wider than 128 bits, extract the 128-bit subvector, insert
// into that, and then insert the subvector back into the result.

...but we can sometimes do better for insert-into-constant-vector by using shuffle lowering.

Differential Revision: https://reviews.llvm.org/D54271

llvm-svn: 346433
2018-11-08 19:16:27 +00:00
Davide Italiano
ac8279ab8b Revert "[MSP430] Add MC layer"
This commit broke the module buildbots.
Error:

lib/Target/MSP430/MSP430GenAsmMatcher.inc:1027:1: error: redundant
namespace 'llvm' [-Wmodules-import-nested-redundant]
^

llvm-svn: 346410
2018-11-08 16:21:29 +00:00
Jonas Paulsson
1993894c03 [SystemZ] Bugfix in shouldCoalesce()
It was discovered in randomized testing that the SystemZ implementation of
shouldCoalesce() could be caused to crash when subreg liveness was
enabled. This was because an undef use of the virtual register was copied
outside current MBB at the point of shouldCoalesce() being called. For more
details, see https://bugs.llvm.org/show_bug.cgi?id=39276.

This patch changes the check for MBB locality from livein/liveout checks to
do checks for all instructions of both intervals being inside MBB. This
avoids the cases with dead defs / undef uses outside MBB, which are not
affecting liveness in/out of MBB.

The original test case included as a reduced .mir test case.

Review: Ulrich Weigand
https://reviews.llvm.org/D54197

llvm-svn: 346406
2018-11-08 15:29:48 +00:00
Petr Pavlu
7c84b2e3ab [ARM] Enable spilling of the hGPR register class in Thumb2
Generalize code in Thumb2InstrInfo::storeRegToStackSlot() and
loadRegToStackSlot() to allow the GPR class or any of its sub-classes
(including hGPR) to be stored/loaded by ARM::t2STRi12/ARM::t2LDRi12.

Differential Revision: https://reviews.llvm.org/D51927

llvm-svn: 346401
2018-11-08 13:02:10 +00:00
Anton Korobeynikov
5eb3d339d3 [MSP430] Fix encodeInstruction() for big endian hosts
Reviewers: asl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54251

llvm-svn: 346391
2018-11-08 10:17:52 +00:00
Thomas Lively
897171902b [WebAssembly] Add V128 to WebAssemblyInstrInfo::copyPhysReg
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53872

llvm-svn: 346384
2018-11-08 02:35:28 +00:00
Stanislav Mekhanoshin
6cc8b2fc65 [AMDGPU] Extend promote alloca vectorization
Promote alloca can vectorize a small array by bitcasting it to a
vector type. Extend vectorization for the case when alloca is
already a vector type. We still want to replace GEPs with an
insert/extract element instructions in this case.

Differential Revision: https://reviews.llvm.org/D54219

llvm-svn: 346376
2018-11-08 00:16:23 +00:00
Anton Korobeynikov
09dff53840 [MSP430] Add MC layer
Summary:
This change implements assembler parser, code emitter, ELF object writer
and disassembler for the MSP430 ISA.  Also, more instruction forms are added
to the target description.

Reviewers: asl

Reviewed By: asl

Subscribers: pftbest, krisb, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D53661

llvm-svn: 346374
2018-11-08 00:03:45 +00:00
Eli Friedman
0917d0c80c [AArch64] [Windows] Address post-commit review comment on r346358.
In this context, usesWindowsCFI() is basically the same thing as
isOSWindows(), but it makes the relevant property of the target
more explicit.

llvm-svn: 346366
2018-11-07 22:30:56 +00:00
Nicolai Haehnle
bc233f5523 Revert "AMDGPU: Divergence-driven selection of scalar buffer load intrinsics"
This reverts commit r344696 for now (except for some test additions).

See https://bugs.freedesktop.org/show_bug.cgi?id=108611.

llvm-svn: 346364
2018-11-07 21:53:43 +00:00
Nicolai Haehnle
61396ff67c AMDGPU/InsertWaitcnts: Cleanup some old cruft (NFCI)
Summary: Remove redundant logic and simplify control flow.

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D54086

llvm-svn: 346363
2018-11-07 21:53:36 +00:00
Nicolai Haehnle
0ab31c9c44 AMDGPU/InsertWaitcnts: Remove kill-related logic
Summary:
This is not needed, because we don't actually insert relevant branches
for KILLs that late in the compilation flow.

Besides, this was always checking for the wrong kill opcode anyway...

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D54085

llvm-svn: 346362
2018-11-07 21:53:29 +00:00
Konstantin Zhuravlyov
15e90e331c AMDGPU/NFC: Split FLAT_Global_Atomic_Pseudo into RTN/NO_RTN multiclasses
llvm-svn: 346361
2018-11-07 21:42:13 +00:00
Eli Friedman
d00fb2e0a8 [AArch64] [Windows] Trap after noreturn calls.
Like the comment says, this isn't the most efficient fix in terms of
codesize, but it works.

Differential Revision: https://reviews.llvm.org/D54129

llvm-svn: 346358
2018-11-07 21:31:14 +00:00
Konstantin Zhuravlyov
7f1959ebb3 AMDGPU/NFC: Split MUBUF_Pseudo_Atomics into RTN/NO_RTN multiclasses
llvm-svn: 346357
2018-11-07 21:21:32 +00:00
Eli Friedman
7d7d41debc [ARM] Fix CPSR liveness in tMOVCCr_pseudo lowering.
The lowering was missing live-ins in certain cases, like a sequence of
multiple tMOVCCr_pseudo instructions.  This would lead to a verifier
failure, and on pre-v6 Thumb CPSR would be incorrectly clobbered.

For reasons I don't completely understand, it's hard to get a sequence
of multiple tMOVCCr_pseudo instructions; the issue only seems to show up
with 64-bit comparisons where the result is zero-extended. I added some
extra testcases in case that changes in the future. Probably some
optimization opportunities here if anyone is interested. (@test_slt_not
is the case that was getting miscompiled.)

The code to check the liveness of CPSR was stolen from
X86ISelLowering.cpp; maybe it could be refactored into common helper,
but I have no idea where to put it.

Differential Revision: https://reviews.llvm.org/D54192

llvm-svn: 346355
2018-11-07 21:08:13 +00:00
Matt Arsenault
8ba740a5a8 Allow subclassing ExternalAA
This allows testing AMDGPU alias analysis like any
other alias analysis pass. This fixes the existing
test pointlessly running opt -O3 when it really
just wants to run the one analysis.

Before there was no way to test this using -aa-eval
with opt, since the default constructed pass
is run. The wrapper subclass allows the
default constructor to pass the necessary callback.

llvm-svn: 346353
2018-11-07 20:26:42 +00:00
Than McIntosh
5bcdea5118 [X86] improve split-stack machine BB placement
Summary:
The conditional branch created to support -fsplit-stack for X86 is
left unbiased/unhinted, resulting in less than ideal block placement:
the __morestack call block is kept on the main hot path. Bias the
branch to insure that the stack allocation block is treated as a
"cold" block during machine basic block placement.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54123

llvm-svn: 346336
2018-11-07 17:41:57 +00:00
Sanjay Patel
de58e93666 fix typos aggressively; NFC
llvm-svn: 346316
2018-11-07 14:35:36 +00:00
Andrea Di Biagio
4ae974e745 [X86][FixupLEA] Avoid checking target features for every single processed instruction. NFCI
llvm-svn: 346309
2018-11-07 12:26:00 +00:00
Petar Avramovic
2624c8db68 [MIPS GlobalISel] Set operand order for G_MERGE and G_UNMERGE
Set operands order for G_MERGE_VALUES and G_UNMERGE_VALUES so
that least significant bits always go first, regardless of endianness.

Differential Revision: https://reviews.llvm.org/D54098

llvm-svn: 346305
2018-11-07 11:45:43 +00:00
Evandro Menezes
f1a0d93b1d [PATCH] [AArch64] Refactor helper functions (NFC)
Refactor helper functions in AArch64InstrInfo to be static methods.

llvm-svn: 346273
2018-11-06 22:17:14 +00:00
Yaxun Liu
73bf0af32f AMDGPU: Add an option -disable-promote-alloca-to-lds
Add this option for debugging and providing workaround.

By default it is off so no behavior change in backend.

Differential Revision: https://reviews.llvm.org/D54158

llvm-svn: 346267
2018-11-06 21:28:17 +00:00
Craig Topper
6428a2cd9a [X86] Add custom promotion of v2i8/v2i16 fp_to_sint to avoid over promotion to v2i64 which would force scalarization.
llvm-svn: 346259
2018-11-06 19:24:21 +00:00
Matthias Braun
c6613879ce LivePhysRegs/IfConversion: Change some types from unsigned to MCPhysReg; NFC
Change the type in a couple of lists and sets that only store physical
registers from unsigned to MCPhysRegs. The later is only 16bits and
saves us a bit of memory.

llvm-svn: 346254
2018-11-06 19:00:11 +00:00
Simon Atanasyan
bb36aea1d5 [mips] Support sigrie instruction
The `sigrie` instruction signals a Reserved Instruction Exception.
This patch adds support for assembling / disassembling the instruction.

Differential Revision: http://reviews.llvm.org/D53861

llvm-svn: 346230
2018-11-06 14:37:24 +00:00
Clement Courbet
54a1184fff [X86][NFC] Fix comment.
llvm-svn: 346226
2018-11-06 13:48:56 +00:00
Matthias Braun
96d12513a1 AArch64: Cleanup CCMP code; NFC
Cleanup CCMP pattern matching code in preparation for review/bugfix:
- Rename `isConjunctionDisjunctionTree()` to `canEmitConjunction()`
  (it won't accept arbitrary disjunctions and is really about whether we
   can transform the subtree into a conjunction that we can emit).
- Rename `emitConjunctionDisjunctionTree()` to `emitConjunction()`

llvm-svn: 346203
2018-11-06 03:15:22 +00:00
Sam Clegg
5292d17ec8 Revert "[WebAssembly] Fixup main signature by default"
This reverts rL345880.  It caused some test failures on the
webassembly waterfall.  e.g. binaryen2.test_mainenv fails due
the fact that `envp` ends up being undef rather than 0.

Differential Revision: https://reviews.llvm.org/D54117

llvm-svn: 346187
2018-11-06 00:31:02 +00:00
Matthias Braun
7a75a91b5b MachineFunction: Store more specific reference to LLVMTargetMachine; NFC
MachineFunction can only be used in code using lib/CodeGen, hence we
can keep a more specific reference to LLVMTargetMachine rather than just
TargetMachine around.

Do the same for references in ScheduleDAG and RegUsageInfoCollector.

llvm-svn: 346183
2018-11-05 23:49:14 +00:00
Craig Topper
0b5f8169b0 [TargetLowering] Change TargetLoweringBase::getPreferredVectorAction to take an MVT instead of an EVT. NFC
The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit.

llvm-svn: 346180
2018-11-05 23:26:13 +00:00
Konstantin Zhuravlyov
108927b944 AMDGPU: Add sram-ecc feature
Differential Revision: https://reviews.llvm.org/D53222

llvm-svn: 346177
2018-11-05 22:44:19 +00:00
Craig Topper
def82a81af [X86] Don't turn any_extend from a mask register into a sign_extend during lowering. Add patterns to match any_extend during isel instead.
SimplifyDemandedBits can turn a sign_extend back into an any_extend and trigger an infinite loop. So instead legalize it the same way as a sign_extend, but preserve the opcode. Then just pattern match it the same as sign_extend during isel.

I don't have a reduced test case for such an infinite loop yet.

llvm-svn: 346170
2018-11-05 22:08:17 +00:00
Zaara Syeda
7509880b54 [Power9] Add support for stxvw4x.be and stxvd2x.be intrinsics
On Power9, we don't have patterns to select the following intrinsics:
llvm.ppc.vsx.stxvw4x.be
llvm.ppc.vsx.stxvd2x.be

This patch adds support for these.

Differential Revision: https://reviews.llvm.org/D53581

llvm-svn: 346148
2018-11-05 17:31:26 +00:00
Stefan Maksimovic
8d7c351799 [Mips] Supplement long branch pseudo instructions
Expand on LONG_BRANCH_LUi and LONG_BRANCH_(D)ADDiu pseudo
instructions by creating variants which support
less operands/accept GPR64Opnds as their operand in order
to appease the machine verifier pass.

Differential Revision: https://reviews.llvm.org/D53977

llvm-svn: 346133
2018-11-05 14:37:41 +00:00
Neil Henning
233a02d0ed [AMDGPU] Fix the new atomic optimizer in pixel shaders.
The new atomic optimizer I previously added in D51969 did not work
correctly when a pixel shader was using derivatives, and had helper
lanes active.

To fix this we add an llvm.amdgcn.ps.live call that guards a branch
around the entire atomic operation - ensuring that all helper lanes are
inactive within the wavefront when we compute our atomic results.

I've added a test case that can cause derivatives, and exposes the
problem.

Differential Revision: https://reviews.llvm.org/D53930

llvm-svn: 346128
2018-11-05 12:04:48 +00:00
Sam Parker
fec793c98f [ARM] Turn assert into condition in ARMCGP
Turn the assert in PrepareConstants into a conditon so that we can
handle mul instructions with negative immediates.

Differential Revision: https://reviews.llvm.org/D54094

llvm-svn: 346126
2018-11-05 11:26:04 +00:00
Sam Parker
fcd8adab30 [ARM][ARMCGP] Remove unecessary zexts and truncs
r345840 slightly changed the way promotion happens which could
result in zext and truncs having the same source and destination
types. This fixes that issue.

We can now also remove the zext and trunc in the following case:
(zext (trunc (promoted op)), i32)

This means that we can no longer treat a value, that is only used by
a sink, to be safe to promote.

I've also added in some extra asserts and replaced a cast for a
dyn_cast.

Differential Revision: https://reviews.llvm.org/D54032

llvm-svn: 346125
2018-11-05 10:58:37 +00:00
Dylan McKay
4c5a5c8db6 [AVR] Fix a backend bug that left extraneous operands after expansion
This patch fixes a bug in the AVR FRMIDX expansion logic.

The expansion would leave a leftover operand from the original FRMIDX,
but now attached to a MOVWRdRr instruction. The MOVWRdRr instruction
did not expect this operand and so LLVM rejected the machine
instruction.

This would trigger an assertion:

    Assertion failed: ((isImpReg || Op.isRegMask() || MCID->isVariadic() ||
                        OpNo < MCID->getNumOperands() || isMetaDataOp) &&
                        "Trying to add an operand to a machine instr that is already done!"),
    function addOperand, file llvm/lib/CodeGen/MachineInstr.cpp

Tim fixed this so that now the FRMIDX is expanded correctly into
a well-formed MOVWRdRr.

Patch by Tim Neumann

llvm-svn: 346117
2018-11-05 05:49:04 +00:00
Craig Topper
30b627e5c9 [X86] Custom type legalize v2i8/v2i16/v2i32 mul to use to pmuludq.
v2i8/v2i16/v2i32 are promoted to v2i64. pmuludq takes a v2i64 input and produces a v2i64 output. Since we don't about the upper bits of the type legalized multiply we can use the pmuludq to produce the multiply result for the bits we do care about.

llvm-svn: 346115
2018-11-05 05:02:12 +00:00
Dylan McKay
9a9ae99b30 [AVR] Disallow the LDDWRdPtrQ instruction with Z as the destination
This is an AVR-specific workaround for a limitation of the register
allocator that only exposes itself on targets with high register
contention like AVR, which only has three pointer registers.

The three pointer registers are X, Y, and Z.
In most nontrivial functions, Y is reserved for the frame pointer,
as per the calling convention. This leaves X and Z. Some instructions,
such as LPM ("load program memory"), are only defined for the Z
register. Sometimes this just leaves X.

When the backend generates a LDDWRdPtrQ instruction with Z as the
destination pointer, it usually trips up the register allocator
with this error message:

  LLVM ERROR: ran out of registers during register allocation

This patch is a hacky workaround. We ban the LDDWRdPtrQ instruction
from ever using the Z register as an operand. This gives the
register allocator a bit more space to allocate, fixing the
regalloc exhaustion error.

Here is a description from the patch author Peter Nimmervoll

  As far as I understand the problem occurs when LDDWRdPtrQ uses
  the ptrdispregs register class as target register. This should work, but
  the allocator can't deal with this for some reason. So from my testing,
  it seams like (and I might be totally wrong on this) the allocator reserves
  the Z register for the ICALL instruction and then the register class
  ptrdispregs only has 1 register left and we can't use Y for source and
  destination. Removing the Z register from DREGS fixes the problem but
  removing Y register does not.

More information about the bug can be found on the avr-rust issue
tracker at https://github.com/avr-rust/rust/issues/37.

A bug has raised to track the removal of this workaround and a proper
fix; PR39553 at https://bugs.llvm.org/show_bug.cgi?id=39553.

Patch by Peter Nimmervoll

llvm-svn: 346114
2018-11-05 05:00:44 +00:00
Craig Topper
ed6a0a817f [X86] Add vector shift by immediate to SimplifyDemandedBitsForTargetNode.
Summary: This also enables some constant folding from KnownBits propagation. This helps on some cases vXi64 case in 32-bit mode where constant vectors appear as vXi32 and a bitcast. This can prevent getNode from constant folding sra/shl/srl.

Reviewers: RKSimon, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54069

llvm-svn: 346102
2018-11-04 17:31:27 +00:00
Craig Topper
1ba86188cf [SelectionDAG] Remove special methods for creating *_EXTEND_VECTOR_INREG nodes. Move asserts into getNode.
These methods were just wrappers around getNode with additional asserts (identical and repeated 3 times). But getNode already has a switch that can be used to hold these asserts that allows them to be shared for all 3 opcodes. This also enables checking on the places that create these nodes without using the wrappers.

The rest of the patch is just changing all callers to use getNode directly.

llvm-svn: 346087
2018-11-04 02:10:18 +00:00
Craig Topper
7aed9e600b [X86] Update comment I forgot to change in r346043. NFC
llvm-svn: 346073
2018-11-03 19:49:13 +00:00
Reid Kleckner
2bcb288ade [codeview] Let the X86 backend tell us the VFRAME offset adjustment
Use MachineFrameInfo's OffsetAdjustment field to pass this information
from the target to CodeViewDebug.cpp. The X86 backend doesn't use it for
any other purpose.

This fixes PR38857 in the case where there is a non-aligned quantity of
CSRs and a non-aligned quantity of locals.

llvm-svn: 346062
2018-11-03 00:41:52 +00:00
Craig Topper
f7108aef14 [X86] In LowerEXTEND_VECTOR_INREG, emit a vector shuffle instead of directly using X86ISD::UNPCKL
The majority of the changes are because the rest of shuffle lowering/combining prefers to replace the undef input with the other operand. Using UNPCKL directly seemed to avoid this and just grabbed a randomish register for the undef which can create false dependencies.

llvm-svn: 346050
2018-11-02 22:48:02 +00:00
Wouter van Oortmerssen
de28b5d17f [WebAssembly] Parsing missing directives to produce valid .o
Summary:
The assembler was able to assemble and then dump back to .s, but
was failing to parse certain directives necessary for valid .o
output:
- .type directives are now recognized to distinguish function symbols
  and others.
- .size is now parsed to provide function size.
- .globaltype (introduced in https://reviews.llvm.org/D54012) is now
  recognized to ensure symbols like __stack_pointer have a proper type
  set for both .s and .o output.

Also added tests for the above.

Reviewers: sbc100, dschuff

Subscribers: jgravelle-google, aheejin, dexonsmith, kristina, llvm-commits, sunfish

Differential Revision: https://reviews.llvm.org/D53842

llvm-svn: 346047
2018-11-02 22:04:33 +00:00
Craig Topper
60c202a494 [X86] Don't emit *_extend_vector_inreg nodes when both the input and output types are legal with AVX1
We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible.

I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here.

I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector.

For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size.

Differential Revision: https://reviews.llvm.org/D54024

llvm-svn: 346043
2018-11-02 21:09:49 +00:00
Alex Bradbury
52c27785ce [RISCV] Add some missing expansions for floating-point intrinsics
A number of intrinsics, such as llvm.sin.f32, would result in a failure to 
select. This patch adds expansions for the relevant selection DAG nodes, as 
well as exhaustive testing for all f32 and f64 intrinsics.

The codegen for FMA remains a TODO item, pending support for the various 
RISC-V FMA instruction variants.

The llvm.minimum.f32.* and llvm.maximum.* tests are commented-out, pending 
upstream support for target-independent expansion, as discussed in 
http://lists.llvm.org/pipermail/llvm-dev/2018-November/127408.html.

Differential Revision: https://reviews.llvm.org/D54034
Patch by Luís Marques.

llvm-svn: 346034
2018-11-02 19:50:38 +00:00
Heejin Ahn
5b023e07ea [WebAssembly] Fix bugs in rethrow depth counting and InstPrinter
Summary:
EH stack depth is incremented at `try` and decremented at `catch`. When
there are more than two catch instructions for a try instruction, we
shouldn't count non-first catches when calculating EH stack depths.

This patch fixes two bugs:
- CFGStackify: Exclude `catch_all` in the terminate catch pad when
  calculating EH pad stack, because when we have multiple catches for a
  try we should count only the first catch instruction when calculating
  EH pad stack.
- InstPrinter: The initial intention was also to exclude non-first
  catches, but it didn't account nested try-catches, so it failed on
  this case:
```
try
  try
  catch
  end
catch    <-- (1)
end
```
In the example, when we are at the catch (1), the last seen EH
instruction is not `try` but `end_try`, violating the wrong assumption.

We don't need these after we switch to the second proposal because there
is gonna be only one `catch` instruction. But anyway before then these
bugfixes are necessary for keep trunk in working state.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53819

llvm-svn: 346029
2018-11-02 18:38:52 +00:00
Matthias Braun
5f7cb79e94 ARMExpandPseudoInsts: Fix CMP_SWAP expansion adding a kill flag to a def
llvm-svn: 346026
2018-11-02 18:22:15 +00:00
Jonas Paulsson
cced2a2775 [SystemZ::TTI] Improve cost handling of uint/sint to fp conversions.
Let i8/i16 uint/sint to fp conversions cost 1 if operand is a load.

Since the load already does the extension, there is no extra cost (previously
returned 2).

Review: Ulrich Weigand
https://reviews.llvm.org/D54028

llvm-svn: 346009
2018-11-02 17:53:31 +00:00
Sylvestre Ledru
df92dabaef Fixed inclusion of M_PI fow MinGW-w64
Patch by KOLANICH

llvm-svn: 346000
2018-11-02 17:25:40 +00:00
Jonas Paulsson
79f2441eee [SystemZ] Rework getInterleavedMemoryOpCost()
Model this function more closely after the BasicTTIImpl version, with
separate handling of loads and stores. For loads, the set of actually loaded
vectors is checked.

This makes it more readable and just slightly more accurate generally.

Review: Ulrich Weigand
https://reviews.llvm.org/D53071

llvm-svn: 345998
2018-11-02 17:15:36 +00:00
Krzysztof Parzyszek
f070544f8e [Hexagon] Do not reduce load size for globals in small-data
Small-data (i.e. GP-relative) loads and stores allow 16-bit scaled
offset. For a load of a value of type T, the small-data area is
equivalent to an array "T sdata[65536]". This implies that objects
of smaller sizes need to be closer to the beginning of sdata,
while larger objects may be farther away, or otherwise the offset
may be insufficient to reach it. Similarly, an object of a larger
size should not be accessed via a load of a smaller size.

llvm-svn: 345975
2018-11-02 14:17:47 +00:00
Alexey Bataev
8831ef7a16 [DEBUGINFO, NVPTX]DO not emit ',debug' option if no debug info or only debug directives are requested.
Summary:
If the output of debug directives only is requested, we should drop
emission of ',debug' option from the target directive. Required for
supporting of nvprof profiler.

Reviewers: probinson, echristo, dblaikie

Subscribers: Hahnfeld, jholewinski, llvm-commits, JDevlieghere, aprantl

Differential Revision: https://reviews.llvm.org/D46061

llvm-svn: 345972
2018-11-02 13:47:47 +00:00
Neil Henning
7d1b77df57 [AMDGPU] UBSan bug fix for r345710
UBSan detected an error in our ISelLowering that is exposed only when
you have a dmask == 0x1. Fix this by adding in an explicit check to
ensure we don't do the UBSan detected shl << 32.

llvm-svn: 345962
2018-11-02 10:24:57 +00:00
Matt Arsenault
8e0269ba0b AMDGPU: Fix assertion with bitcast from i64 constant to v4i16
llvm-svn: 345922
2018-11-02 02:43:55 +00:00
Wouter van Oortmerssen
3231e518a3 [WebAssembly] Added a .globaltype directive to .s output.
Summary:
Assembly output can use globals like __stack_pointer implicitly,
but has no way of indicating the type of such a global, which makes
it hard for tools processing it (such as the MC Assembler) to
reconstruct this information.

The improved assembler directives parsing (in progress in
https://reviews.llvm.org/D53842) will make use of this information.

Also deleted code for the .import_global directive which was unused.

New test case in userstack.ll

Reviewers: dschuff, sbc100

Subscribers: jgravelle-google, aheejin, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54012

llvm-svn: 345917
2018-11-02 00:45:00 +00:00
Thomas Lively
b2382c8bf7 [WebAssembly] General vector shift lowering
Summary: Adds support for lowering non-splat shifts.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53625

llvm-svn: 345916
2018-11-02 00:39:57 +00:00
Thomas Lively
fb84fd7c8e [WebAssembly] Expand inserts and extracts with variable indices
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53964

llvm-svn: 345913
2018-11-02 00:06:56 +00:00
Mandeep Singh Grang
547a0d765a [COFF, ARM64] Implement Intrinsic.sponentry for AArch64
Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform.

Patch by: Yin Ma (yinma@codeaurora.org)

Reviewers: mgrang, ssijaric, eli.friedman, TomTan, mstorsjo, rnk, compnerd, efriedma

Reviewed By: efriedma

Subscribers: efriedma, javed.absar, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D53996

llvm-svn: 345909
2018-11-01 23:22:25 +00:00
Farhana Aleen
5853762e5a [AMDGPU] Handle the idot8 pattern generated by FE.
Summary: Different variants of idot8 codegen dag patterns are not generated by llvm-tablegen due to a huge
         increase in the compile time. Support the pattern that clang FE generates after reordering the
         additions in integer-dot8 source language pattern.

Author: FarhanaAleen

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D53937

llvm-svn: 345902
2018-11-01 22:48:19 +00:00
Mandeep Singh Grang
df19e57a1c [COFF, ARM64] Implement llvm.addressofreturnaddress intrinsic
Reviewers: rnk, mstorsjo, efriedma, TomTan

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D53962

llvm-svn: 345892
2018-11-01 21:23:47 +00:00
Heejin Ahn
2e398976ba [WebAssembly] Fix signature parsing for 'try' in AsmParser
Summary:
Like `block` or `loop`, `try` can take an optional signature which can
be omitted. This patch allows `try`'s signature to be omitted. Also
added some tests for EH instructions.

Reviewers: aardappel

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53873

llvm-svn: 345888
2018-11-01 20:32:15 +00:00
Reid Kleckner
4af6025f09 [Hexagon] Remove unintended fallthrough from MC duplex code
I added these annotations in r345878 because I wasn't sure if the
fallthrough was intended. Krzysztof Parzyszek confirmed that they should
be breaks, so that's what this patch does.

Reviewers: kparzysz

Differential Revision: https://reviews.llvm.org/D53991

llvm-svn: 345883
2018-11-01 19:59:27 +00:00
Reid Kleckner
4dc0b1ac60 Fix clang -Wimplicit-fallthrough warnings across llvm, NFC
This patch should not introduce any behavior changes. It consists of
mostly one of two changes:
1. Replacing fall through comments with the LLVM_FALLTHROUGH macro
2. Inserting 'break' before falling through into a case block consisting
   of only 'break'.

We were already using this warning with GCC, but its warning behaves
slightly differently. In this patch, the following differences are
relevant:
1. GCC recognizes comments that say "fall through" as annotations, clang
   doesn't
2. GCC doesn't warn on "case N: foo(); default: break;", clang does
3. GCC doesn't warn when the case contains a switch, but falls through
   the outer case.

I will enable the warning separately in a follow-up patch so that it can
be cleanly reverted if necessary.

Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu

Differential Revision: https://reviews.llvm.org/D53950

llvm-svn: 345882
2018-11-01 19:54:45 +00:00
Sam Clegg
ddf049869a [WebAssembly] Fixup main signature by default
Differential Revision: https://reviews.llvm.org/D53396

llvm-svn: 345880
2018-11-01 19:38:44 +00:00
Reid Kleckner
bebc53f838 Annotate possibly unintended fallthroughs in Hexagon MC code, NFC
Clang's -Wimplicit-fallthrough check fires on these switch cases. GCC
does not warn when a case body that ends in a switch falls through to a
case label of an outer switch.

It's not clear if these fall throughs are truly intended.  The Hexagon
tests pass regardless of whether these case blocks fall through or
break.

For now, I have applied the intended fallthrough annotation macro with a
FIXME comment to unblock enabling the warning. I will send a follow-up
patch that converts them to breaks to the Hexagon maintainers.

llvm-svn: 345878
2018-11-01 19:32:04 +00:00
Volkan Keles
0a8dc9eb0f [GlobalISel] Fix a bug in LegalizeRuleSet::clampMaxNumElements
Summary:
This function was causing a crash when `MaxElements == 1` because
it was trying to create a single element vector type.

Reviewers: dsanders, aemerson, aditya_nandakumar

Reviewed By: dsanders

Subscribers: rovka, kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D53734

llvm-svn: 345875
2018-11-01 19:01:53 +00:00
Simon Pilgrim
b34a052852 [LegalizeDAG] Add generic vector CTPOP expansion (PR32655)
This patch adds support for expanding vector CTPOP instructions and removes the x86 'bitmath' lowering which replicates the same expansion.

Differential Revision: https://reviews.llvm.org/D53258

llvm-svn: 345869
2018-11-01 18:22:11 +00:00
Reid Kleckner
ba982b5f8f [Hexagon] Fix MO_JumpTable const extender conversion
Previously this case fell through to unreachable, so it is clearly not
covered by any test case in LLVM. It may be dynamically unreachable, in
fact. However, if it were to run, this is what it would logically do.
The assert suggests that the intended behavior was not to allow folding
offsets from jump table indices, which makes sense.

llvm-svn: 345868
2018-11-01 18:14:45 +00:00
Reid Kleckner
eb56894a4b [AArch64] Fix unintended fallthrough and strengthen cast
This was added in r330630. GCC's -Wimplicit-fallthrough seems to not
fire when the previous case contains a switch itself.

This fallthrough was bening because the helper function implementing the
case used dyn_cast to re-check the type of the node in question. After
fixing the fallthrough, we can strengthen the cast.

llvm-svn: 345864
2018-11-01 18:02:27 +00:00
Mandeep Singh Grang
b0cdf56dd7 Revert "[COFF, ARM64] Implement Intrinsic.sponentry for AArch64"
This reverts commit 585b6667b4712e3c7f32401e929855b3313b4ff2.

llvm-svn: 345863
2018-11-01 17:53:57 +00:00
Sam Parker
48fbf752b0 [ARM] Attempt to fix ppc64be buildbot
llvm-svn: 345850
2018-11-01 16:44:45 +00:00
Sam Parker
84a2f8b364 [ARM][CGP] Negative constant operand handling
While mutating instructions, we sign extended negative constant
operands for binary operators that can safely overflow. This was to
allow instructions, such as add nuw i8 %a, -2, to still be able to
perform a subtraction. However, the code to handle constants doesn't
take into consideration that instructions, such as sub nuw i8 -2, %a,
require the i8 -2 to be converted into i32 254.

This is a relatively simple fix, but I've taken the time to
reorganise the code a bit - mainly that instructions that can be
promoted are cached and splitting up the Mutate function.

Differential Revision: https://reviews.llvm.org/D53972

llvm-svn: 345840
2018-11-01 15:23:42 +00:00
Simon Pilgrim
d5d7224355 [X86][X86FixupLEA] Rename processInstructionForSLM to processInstructionForSlowLEA (NFCI)
The function isn't SLM specific (its driven by the FeatureSlowLEA flag).

Minor tidyup prior to PR38225.

llvm-svn: 345836
2018-11-01 14:57:07 +00:00
Aleksandar Beserminji
b9c840c9f0 [mips][micromips] Fix JmpLink to TargetExternalSymbol
When matching MipsISD::JmpLink t9, TargetExternalSymbol:i32'...',
wrong JALR16_MM is selected. This patch adds missing pattern for
JmpLink, so that JAL instruction is selected.

Differential Revision: https://reviews.llvm.org/D53366

llvm-svn: 345830
2018-11-01 13:57:54 +00:00
Chad Rosier
1546efd4a7 [AArch64] Add support for ARMv8.4 in Saphira.
llvm-svn: 345827
2018-11-01 13:45:16 +00:00
Simon Pilgrim
1f0a8421ad [X86][SSE] Move 2-input limit up from getFauxShuffleMask to resolveTargetShuffleInputs (reapplied)
Reapplying an updated version of rL345395 (reverted in rL345451), now the issues noticed in PR39483 have been fixed. 

This patch allows resolveTargetShuffleInputs to remove UNDEF inputs from cases where we have more than 2 inputs.

llvm-svn: 345824
2018-11-01 11:52:09 +00:00
Stefan Maksimovic
cd0c50e3d2 [Mips] Conditionally remove successor block
In MipsBranchExpansion::splitMBB, upon splitting
a block with two direct branches, remove the successor
of the newly created block (which inherits successors from
the original block) which is pointed to by the last
branch in the original block only if the targets of two
branches differ.

This is to fix the failing test when ran with
-verify-machineinstrs enabled.

Differential Revision: https://reviews.llvm.org/D53756

llvm-svn: 345821
2018-11-01 10:10:42 +00:00
Jonas Paulsson
6749c24f40 [SystemZ::TTI] Recognize the higher cost of scalar i1 -> fp conversion
Scalar i1 to fp conversions are done with a branch sequence, so it should
have a higher cost.

Review: Ulrich Weigand
https://reviews.llvm.org/D53924

llvm-svn: 345818
2018-11-01 09:05:32 +00:00
Jonas Paulsson
f15a53bc81 [SystemZ::TTI] Accurate costs for i1->double vector conversions
This factors out a new method getBoolVecToIntConversionCost() containing the
code for vector sext/zext of i1, in order to reuse it for i1 to double vector
conversions.

Review: Ulrich Weigand
https://reviews.llvm.org/D53923

llvm-svn: 345817
2018-11-01 09:01:51 +00:00
Li Jia He
03170a904f [PowerPC] Support constraint 'wi' in asm
From the gcc manual, we can see that the specific limit of wi inline asm is “FP or VSX register to hold 64-bit integers for VSX insns or NO_REGS”. The link is https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Machine-Constraints.html#Machine-Constraints. We should accept this constraint.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D53265

llvm-svn: 345810
2018-11-01 02:35:17 +00:00
Matthias Braun
a9f900561e X86: Consistently declare pass initializers in X86.h; NFC
This avoids declaring them twice: in X86TargetMachine.cpp and the file
implementing the pass.

llvm-svn: 345801
2018-11-01 00:38:01 +00:00
Thomas Lively
d4891a1b7a [WebAssembly] Lower vselect
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53630

llvm-svn: 345797
2018-11-01 00:01:02 +00:00
Thomas Lively
b61232eacd [WebAssembly] Process p2align operands for SIMD loads and stores
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53886

llvm-svn: 345795
2018-10-31 23:58:20 +00:00
Thomas Lively
6ff31fe34d [WebAssembly] Handle vector IMPLICIT_DEFs.
Summary:
Also reduce the test case for implicit defs and test it with all
register classes.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53855

llvm-svn: 345794
2018-10-31 23:50:53 +00:00
Mandeep Singh Grang
88ad9ac720 [COFF, ARM64] Implement Intrinsic.sponentry for AArch64
Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform.

Reviewers: mgrang, TomTan, rnk, compnerd, mstorsjo, efriedma

Reviewed By: efriedma

Subscribers: majnemer, chrib, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D53673

llvm-svn: 345791
2018-10-31 23:16:20 +00:00
Evandro Menezes
3a06c46470 [AArch64] Sort switch cases (NFC)
llvm-svn: 345786
2018-10-31 21:56:49 +00:00
Craig Topper
6c3f1692c8 Revert r345165 "[X86] Bring back the MOV64r0 pseudo instruction"
Google is reporting regressions on some benchmarks.

llvm-svn: 345785
2018-10-31 21:53:24 +00:00
Eli Friedman
063fd98bcc [ARM] Add missing pseudo-instruction for Thumb1 RSBS.
Shows up rarely for 64-bit arithmetic, more frequently for the compare
patterns added in r325323.

Differential Revision: https://reviews.llvm.org/D53848

llvm-svn: 345782
2018-10-31 21:45:48 +00:00
Stanislav Mekhanoshin
222e9c11f7 Check shouldReduceLoadWidth from SimplifySetCC
SimplifySetCC could shrink a load without checking for
profitability or legality of such shink with a target.

Added checks to prevent shrinking of aligned scalar loads
in AMDGPU below dword as scalar engine does not support it.

Differential Revision: https://reviews.llvm.org/D53846

llvm-svn: 345778
2018-10-31 21:24:30 +00:00
Scott Linder
c6c627253d [AMDGPU] Remove FeatureVGPRSpilling
This feature is only relevant to shaders, and is no longer used. When disabled,
lowering of reserved registers for shaders causes a compiler crash.

Remove the feature and add a test for compilation of shaders at OptNone.

Differential Revision: https://reviews.llvm.org/D53829

llvm-svn: 345763
2018-10-31 18:54:06 +00:00
Krzysztof Parzyszek
977a1fe507 [Hexagon] Make sure not to use GP-relative addressing with PIC
Make sure that -relocation-model=pic prevents use of GP-relative
addressing modes.

llvm-svn: 345731
2018-10-31 15:54:31 +00:00
Nicolai Haehnle
814abb59df AMDGPU: Rewrite SILowerI1Copies to always stay on SALU
Summary:
Instead of writing boolean values temporarily into 32-bit VGPRs
if they are involved in PHIs or are observed from outside a loop,
we use bitwise masking operations to combine lane masks in a way
that is consistent with wave control flow.

Move SIFixSGPRCopies to before this pass, since that pass
incorrectly attempts to move SGPR phis to VGPRs.

This should recover most of the code quality that was lost with
the bug fix in "AMDGPU: Remove PHI loop condition optimization".

There are still some relevant cases where code quality could be
improved, in particular:

- We often introduce redundant masks with EXEC. Ideally, we'd
  have a generic computeKnownBits-like analysis to determine
  whether masks are already masked by EXEC, so we can avoid this
  masking both here and when lowering uniform control flow.

- The criterion we use to determine whether a def is observed
  from outside a loop is conservative: it doesn't check whether
  (loop) branch conditions are uniform.

Change-Id: Ibabdb373a7510e426b90deef00f5e16c5d56e64b

Reviewers: arsenm, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, t-tye, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D53496

llvm-svn: 345719
2018-10-31 13:27:08 +00:00
Nicolai Haehnle
28212cc689 AMDGPU: Remove PHI loop condition optimization
Summary:
The optimization to early break out of loops if all threads are dead was
never fully implemented.

But the PHI node analyzing is actually causing a number of problems, so
remove all the extra code for it.

(This does actually regress code quality in a few places because it
 ends up relying more heavily on phi's of i1, which we don't do a
 great job with. However, since it fixes real bugs in the wild, we
 should take this change. I have some prototype changes to improve
 i1 lowering in general -- not just for control flow -- which should
 help recover the code quality, I just need to make those changes
 fit for general consumption. -- Nicolai)

Change-Id: I6fc6c6c8961857ac6009fcfb9f7e5e48dc23fbb1
Patch-by: Christian König <christian.koenig@amd.com>

Reviewers: arsenm, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53359

llvm-svn: 345718
2018-10-31 13:26:48 +00:00
Andrea Di Biagio
3d2b7176fc [tblgen][PredicateExpander] Add the ability to describe more complex constraints on instruction operands.
Before this patch, class PredicateExpander only knew how to expand simple
predicates that performed checks on instruction operands.
In particular, the new scheduling predicate syntax was not rich enough to
express checks like this one:

  Foo(MI->getOperand(0).getImm()) == ExpectedVal;

Here, the immediate operand value at index zero is passed in input to function
Foo, and ExpectedVal is compared against the value returned by function Foo.

While this predicate pattern doesn't show up in any X86 model, it shows up in
other upstream targets. So, being able to support those predicates is
fundamental if we want to be able to modernize all the scheduling models
upstream.

With this patch, we allow users to specify if a register/immediate operand value
needs to be passed in input to a function as part of the predicate check. Now,
register/immediate operand checks all derive from base class CheckOperandBase.

This patch also changes where TIIPredicate definitions are expanded by the
instructon info emitter. Before, definitions were expanded in class
XXXGenInstrInfo (where XXX is a target name).
With the introduction of this new syntax, we may want to have TIIPredicates
expanded directly in XXXInstrInfo. That is because functions used by the new
operand predicates may only exist in the derived class (i.e. XXXInstrInfo).

This patch is a non functional change for the existing scheduling models.
In future, we will be able to use this richer syntax to better describe complex
scheduling predicates, and expose them to llvm-mca.

Differential Revision: https://reviews.llvm.org/D53880

llvm-svn: 345714
2018-10-31 12:28:05 +00:00
Neil Henning
63718b214a [AMDGPU] support image load/store a16
Our a16 support was only enabled for sample/gather and buffer
load/store, but not for image load/store operations (which take an i16
as the pixel index rather than a half).

Fix our isel lowering and add test cases to prove it out.

Differential Revision: https://reviews.llvm.org/D53750

llvm-svn: 345710
2018-10-31 10:34:48 +00:00
Dorit Nuzman
34da6dd696 [LV] Support vectorization of interleave-groups that require an epilog under
optsize using masked wide loads 

Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53668

llvm-svn: 345705
2018-10-31 09:57:56 +00:00
Sanjin Sijaric
fadebc8aae [ARM64] [Windows] Exception handling support in frame lowering
Emit pseudo instructions indicating unwind codes corresponding to each
instruction inside the prologue/epilogue.  These are used by the MCLayer to
populate the .xdata section.

Differential Revision: https://reviews.llvm.org/D50288

llvm-svn: 345701
2018-10-31 09:27:01 +00:00
Martin Storsjo
315357faca [AArch64] Mark condition flags and x16/x17 as clobbered when calling __chkstk
This is similar to SVN r311061 for ARM.

Differential Revision: https://reviews.llvm.org/D53878

llvm-svn: 345698
2018-10-31 08:14:09 +00:00
Konstantin Zhuravlyov
2d22d24ac4 Revert r345542: AMDGPU: Enable code object v3 by default
It breaks mesa.

llvm-svn: 345662
2018-10-30 22:02:40 +00:00
Mandeep Singh Grang
71e0cc2a0b [COFF, ARM64] Make sure to forward arguments from vararg to musttail vararg
Summary:
    Thunk functions in Windows are varag functions that call a musttail function
    to pass the arguments after the fixup is done.  We need to make sure that we
    forward the arguments from the caller vararg to the callee vararg function.
    This is the same mechanism that is used for Windows on X86.

Reviewers: ssijaric, eli.friedman, TomTan, mgrang, mstorsjo, rnk, compnerd, efriedma

Reviewed By: efriedma

Subscribers: efriedma, kristof.beyls, chrib, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D53843

llvm-svn: 345641
2018-10-30 20:46:10 +00:00
Eli Friedman
93d0129b78 [AArch64] [Windows] SEH opcodes should be scheduling boundaries.
Prevents the post-RA scheduler from modifying the prologue sequences
emitting by frame lowering. This is roughly similar to what we do for
other targets: TargetInstrInfo::isSchedulingBoundary checks
isPosition(), which checks for CFI_INSTRUCTION.

isSEHInstruction is taken from D50288; it'll land with whatever patch
lands first.

Differential Revision: https://reviews.llvm.org/D53851

llvm-svn: 345634
2018-10-30 19:24:51 +00:00
David Greene
3e89fa8e08 [AArch64] Create proper memoperand for multi-vector stores
Re-apply r345315 with testcase fixes.

Include all of the store's source vector operands when creating the
MachineMemOperand. Previously, we were missing the first operand,
making the store size seem smaller than it really is.

Differential Revision: https://reviews.llvm.org/D52816

llvm-svn: 345631
2018-10-30 19:17:51 +00:00
Craig Topper
6958b5ffa9 [X86] In lowerVectorShuffleAsBroadcast, make peeking through CONCAT_VECTORS work correctly if we already walked through a bitcast that changed the element size.
The CONCAT_VECTORS case was using the original mask element count to determine how to adjust the broadcast index. But if we looked through a bitcast the original mask size doesn't tell us anything about the concat_vectors.

This patch switchs to using the concat_vectors input element count directly instead.

Differential Revision: https://reviews.llvm.org/D53823

llvm-svn: 345626
2018-10-30 18:48:42 +00:00
Ulrich Weigand
c5854b0adb [SystemZ] Simplify LRV/STRV ISD nodes
The LRV and STRV nodes carry an extra operand to indicate the
type of the memory access.  This is redundant, since the nodes
are actually of class MemIntrinsicNode and therefore hold that
same information already as MemoryVT.

NFC intended.

llvm-svn: 345618
2018-10-30 18:20:59 +00:00
Jonas Paulsson
af8e036c29 [SystemZ] Improve isFoldableLoad() for Sub, SDiv and UDiv.
Sub, SDiv and UDiv are not commutative, so only the RHS operand can fold a
load. This patch adds a check for this.

Review: Ulrich Weigand
https://reviews.llvm.org/D53791

llvm-svn: 345596
2018-10-30 13:41:03 +00:00
Francis Visoiu Mistrih
0e237d357e [X86] Re-enable the machine verifier after fixing more tests
Was disabled again in r345528. Hopefully this the bots.

llvm-svn: 345593
2018-10-30 12:20:17 +00:00
Roman Lebedev
b3a14208ac [X86][BMI1] X86DAGToDAGISel: select BEXTR from x & (-1 >> (32 - y)) pattern
Summary:
The final pattern.
There is no test changes:
* We are looking for the pattern with one-use of it's mask,
* If the mask is one-use, D48768 will unfold it into pattern d.
* Thus, the tests have extra-use on the mask.
* Thus, only the BMI2 BZHI can be tested, and it already worked.
* So there is no BMI1 test coverage, we just assume it works since it uses the same codepath.

Reviewers: craig.topper, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53575

llvm-svn: 345584
2018-10-30 11:12:34 +00:00
Diogo N. Sampaio
a3783b2ac2 [AArch64] Add support for UDF instruction
Summary: Add support for AArch64 UDF instruction.
UDF - Permanently Undefined generates an Undefined
Instruction exception (ESR_ELx.EC = 0b000000).

Reviewers: DavidSpickett, javed.absar, t.p.northover 

Reviewed By: javed.absar

Subscribers: nhaehnle, kristof.beyls

Differential Revision: https://reviews.llvm.org/D53319

llvm-svn: 345581
2018-10-30 11:06:50 +00:00
Simon Pilgrim
858303b827 [SelectionDAG] Add FoldBUILD_VECTOR to simplify new BUILD_VECTOR nodes
Similar to FoldCONCAT_VECTORS, this patch adds FoldBUILD_VECTOR to simplify cases that can avoid the creation of the BUILD_VECTOR - if all the operands are UNDEF or if the BUILD_VECTOR simplifies to a copy.

This exposed an assumption in some AMDGPU code that getBuildVector was guaranteed to be a BUILD_VECTOR node that I've tried to handle.	
	
Differential Revision: https://reviews.llvm.org/D53760

llvm-svn: 345578
2018-10-30 10:32:11 +00:00
Craig Topper
b293322cee [LegalizeTypes] Teach PromoteIntRes_BITCAST to better handle a bitcast with vector output type and a vector input type that needs to be widened
Summary: Previously if we had a bitcast vector output type that needs promotion and a vector input type that needs widening we would just do a stack store and load to handle the conversion. We can do a little better if we can widen the bitcast to a legal vector type the same size as the widened input type. Then we can do the bitcast between this widened type and the widened input type. Afterwards we can extract_subvector back to the original output and any_extend that. Type legalization will then circle back and handle promotion of the extract_subvector and the any_extend will just be removed. This will avoid going through the stack and allows us to remove a custom version of this legalization from X86.

Reviewers: efriedma, RKSimon

Reviewed By: efriedma

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D53229

llvm-svn: 345567
2018-10-30 03:27:15 +00:00
Craig Topper
67c2878501 [X86] Cleanup the code in LowerFABSorFNEG and LowerFCOPYSIGN a little. NFC
Use SelectionDAG::EVTToAPFloatSemantics. Make the LogicVT calculation in LowerFABSorFNEG similar to LowerFCOPYSIGN. Use APInt::getSignedMaxValue instead of ~APInt::getSignMask.

llvm-svn: 345565
2018-10-30 03:27:12 +00:00
Craig Topper
676d7a7a43 [X86] Stop changing f128 fand/for/fxor to v2i64.
The additional patterns don't cost us much and it seems better than changing element widths.

llvm-svn: 345564
2018-10-30 03:27:11 +00:00
Matt Arsenault
abc4f29f9c AMDGPU: Remove custom BUILD_VECTOR combine
This was looping in a testcase and removing it
now slightly improves a test.

llvm-svn: 345560
2018-10-30 01:37:59 +00:00
Matt Arsenault
b0b741efb8 AMDGPU: Use scavengeRegisterBackwards
llvm-svn: 345559
2018-10-30 01:33:14 +00:00
Reid Kleckner
23c9efc071 Remove unneeded friend declarations that clang-cl warns on
llvm-svn: 345549
2018-10-29 22:38:13 +00:00
Konstantin Zhuravlyov
5cb950200c AMDGPU: Enable code object v3 by default
Differential Revision: https://reviews.llvm.org/D53525

llvm-svn: 345542
2018-10-29 21:07:27 +00:00
Simon Pilgrim
090a444cb7 [X86] Set isMachineVerifierClean() back to false (PR27481)
Put back the isMachineVerifierClean() override removed at rL345513 to fix Windows ThinLTO tests

llvm-svn: 345528
2018-10-29 19:51:52 +00:00
Thomas Lively
eb15d00193 [WebAssembly] Lower away condition truncations for scalar selects
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53676

llvm-svn: 345521
2018-10-29 18:38:12 +00:00
Simon Pilgrim
3a2f3c2c0a [X86][SSE] getFauxShuffleMask - Fix shuffle mask adjustment for multiple inserted subvectors
Part of the issue discovered in PR39483, although its not fully exposed until I reapply rL345395 (by reverting rL345451)

llvm-svn: 345520
2018-10-29 18:25:48 +00:00
Craig Topper
220fd33522 [X86] Add AES to KNL CPUs to match clang.
I believe this was lost from KNL when AES was pushed from Westmere to Skylake recently. KNL used to inherit from IVB.

llvm-svn: 345519
2018-10-29 18:17:01 +00:00
Stanislav Mekhanoshin
6b1c6548bd [AMDGPU] Fixed return value causing warning and regression
llvm-svn: 345518
2018-10-29 17:53:23 +00:00
Bryan Chan
bfd32d4377 [AArch64] Rename FP16FML instruction format (NFC)
Rename SIMDThreeSameMult (etc.) to SIMDThreeSameVectorFML (etc.) to follow
usual naming convention, and add some comments in the .td files.

llvm-svn: 345515
2018-10-29 17:27:34 +00:00
Stanislav Mekhanoshin
79080ecd82 [AMDGPU] Match v_swap_b32
Differential Revision: https://reviews.llvm.org/D52677

llvm-svn: 345514
2018-10-29 17:26:01 +00:00
Francis Visoiu Mistrih
61c9de7565 [X86] Enable the MachineVerifier by default
The machine verifier was disabled for x86 by default. There are now only
9 tests failing, compared to what previously was between 20 and 30.

This is a good opportunity to file bugs for all the remaining issues,
then explicitly disable the failing tests and enabling the machine
verifier by default.

This allows us to avoid adding new tests that break the verifier.

PR27481

llvm-svn: 345513
2018-10-29 16:57:43 +00:00
Luke Cheeseman
71c989ae1f [AArch64] Return address signing B key support
- Add support to generate AUTIBSP, PACIBSP, RETAB instructions for return
  address signing
- The key used to sign the function is controlled by the function attribute
  "sign-return-address-key"

Differential Revision: https://reviews.llvm.org/D51427

llvm-svn: 345511
2018-10-29 16:26:58 +00:00
Craig Topper
aa5eb2fbaa [X86] Force floating point values in constant pool decoding to print in scientific notation so they can't be confused with integers.
When the floating point constants are whole numbers they have no decimal point so look like integers, but mean something very different in something like an 'and' instruction.

Ideally we would just print a decimal point and a 0, but I couldn't see how to make APFloat::toString do that.

llvm-svn: 345488
2018-10-29 04:52:04 +00:00
Craig Topper
42aa87143d [X86] Recognize constant splats in LowerFCOPYSIGN.
llvm-svn: 345484
2018-10-28 23:51:35 +00:00
Simon Pilgrim
9b77f0c291 [VectorLegalizer] Enable TargetLowering::expandFP_TO_UINT support.
Add vector support to TargetLowering::expandFP_TO_UINT.

This exposes an issue in X86TargetLowering::LowerVSELECT which was assuming that the select mask was the same width as the LHS/RHS ops - as long as the result is a sign splat we can easily sext/trunk this.

llvm-svn: 345473
2018-10-28 13:07:25 +00:00
Roman Lebedev
a5baf86744 AMD BdVer2 (Piledriver) Initial Scheduler model
Summary:
# Overview
This is somewhat partial.
* Latencies are good {F7371125}
  * All of these remaining inconsistencies //appear// to be noise/noisy/flaky.
* NumMicroOps are somewhat good {F7371158}
  * Most of the remaining inconsistencies are from `Ld` / `Ld_ReadAfterLd` classes
* Actual unit occupation (pipes, `ResourceCycles`) are undiscovered lands, i did not really look there.
  They are basically verbatum copy from `btver2`
* Many `InstRW`. And there are still inconsistencies left...

To be noted:
I think this is the first new schedule profile produced with the new next-gen tools like llvm-exegesis!

# Benchmark
I realize that isn't what was suggested, but i'll start with some "internal" public real-world benchmark i understand - [[ https://github.com/darktable-org/rawspeed | RawSpeed raw image decoding library ]].
Diff (the exact clang from trunk without/with this patch):
```
Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench
Benchmark                                                                                        Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_pvalue                             0.0000          0.0000      U Test, Repetitions: 25 vs 25
Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_mean                              -0.0607         -0.0604           234           219           233           219
Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_median                            -0.0630         -0.0626           233           219           233           219
Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_stddev                            +0.2581         +0.2587             1             2             1             2
Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_pvalue                             0.0000          0.0000      U Test, Repetitions: 25 vs 25
Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_mean                              -0.0770         -0.0767           144           133           144           133
Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_median                            -0.0767         -0.0763           144           133           144           133
Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_stddev                            -0.4170         -0.4156             1             0             1             0
Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_pvalue                                          0.0000          0.0000      U Test, Repetitions: 25 vs 25
Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_mean                                           -0.0271         -0.0270           463           450           463           450
Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_median                                         -0.0093         -0.0093           453           449           453           449
Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_stddev                                         -0.7280         -0.7280            13             4            13             4
Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_pvalue                                          0.0004          0.0004      U Test, Repetitions: 25 vs 25
Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_mean                                           -0.0065         -0.0065           569           565           569           565
Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_median                                         -0.0077         -0.0077           569           564           569           564
Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_stddev                                         +1.0077         +1.0068             2             5             2             5
Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_pvalue                                          0.0220          0.0199      U Test, Repetitions: 25 vs 25
Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_mean                                           +0.0006         +0.0007           312           312           312           312
Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_median                                         +0.0031         +0.0032           311           312           311           312
Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_stddev                                         -0.7069         -0.7072             4             1             4             1
Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_pvalue                                          0.0004          0.0004      U Test, Repetitions: 25 vs 25
Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_mean                                           -0.0015         -0.0015           141           141           141           141
Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_median                                         -0.0010         -0.0011           141           141           141           141
Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_stddev                                         -0.1486         -0.1456             0             0             0             0
Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_pvalue                                          0.6139          0.8766      U Test, Repetitions: 25 vs 25
Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_mean                                           -0.0008         -0.0005            60            60            60            60
Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_median                                         -0.0006         -0.0002            60            60            60            60
Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_stddev                                         -0.1467         -0.1390             0             0             0             0
Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_pvalue                                          0.0137          0.0137      U Test, Repetitions: 25 vs 25
Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_mean                                           +0.0002         +0.0002           275           275           275           275
Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_median                                         -0.0015         -0.0014           275           275           275           275
Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_stddev                                         +3.3687         +3.3587             0             2             0             2
Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_pvalue                                     0.4041          0.3933      U Test, Repetitions: 25 vs 25
Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_mean                                      +0.0004         +0.0004            67            67            67            67
Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_median                                    -0.0000         -0.0000            67            67            67            67
Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_stddev                                    +0.1947         +0.1995             0             0             0             0
Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_pvalue                              0.0074          0.0001      U Test, Repetitions: 25 vs 25
Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_mean                               -0.0092         +0.0074           547           542            25            25
Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_median                             -0.0054         +0.0115           544           541            25            25
Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_stddev                             -0.4086         -0.3486             8             5             0             0
Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_pvalue                                        0.3320          0.0000      U Test, Repetitions: 25 vs 25
Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_mean                                         +0.0015         +0.0204           218           218            12            12
Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_median                                       +0.0001         +0.0203           218           218            12            12
Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_stddev                                       +0.2259         +0.2023             1             1             0             0
GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_pvalue                                      0.0000          0.0001      U Test, Repetitions: 25 vs 25
GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_mean                                       -0.0209         -0.0179            96            94            90            88
GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_median                                     -0.0182         -0.0155            95            93            90            88
GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_stddev                                     -0.6164         -0.2703             2             1             2             1
Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_pvalue                                     0.0000          0.0000      U Test, Repetitions: 25 vs 25
Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_mean                                      -0.0098         -0.0098           176           175           176           175
Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_median                                    -0.0126         -0.0126           176           174           176           174
Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_stddev                                    +6.9789         +6.9157             0             2             0             2
Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_pvalue                 0.0000          0.0000      U Test, Repetitions: 25 vs 25
Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_mean                  -0.0237         -0.0238           474           463           474           463
Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_median                -0.0267         -0.0267           473           461           473           461
Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_stddev                +0.7179         +0.7178             3             5             3             5
Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_pvalue                   0.6837          0.6554      U Test, Repetitions: 25 vs 25
Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_mean                    -0.0014         -0.0013          1375          1373          1375          1373
Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_median                  +0.0018         +0.0019          1371          1374          1371          1374
Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_stddev                  -0.7457         -0.7382            11             3            10             3
Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_pvalue                                        0.0000          0.0000      U Test, Repetitions: 25 vs 25
Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_mean                                         -0.0080         -0.0289            22            22            10            10
Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_median                                       -0.0070         -0.0287            22            22            10            10
Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_stddev                                       +1.0977         +0.6614             0             0             0             0
Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_pvalue                                       0.0000          0.0000      U Test, Repetitions: 25 vs 25
Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_mean                                        +0.0132         +0.0967            35            36            10            11
Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_median                                      +0.0132         +0.0956            35            36            10            11
Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_stddev                                      -0.0407         -0.1695             0             0             0             0
Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_pvalue                                      0.0000          0.0000      U Test, Repetitions: 25 vs 25
Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_mean                                       +0.0331         +0.1307            13            13             6             6
Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_median                                     +0.0430         +0.1373            12            13             6             6
Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_stddev                                     -0.9006         -0.8847             1             0             0             0
Pentax/645Z/IMGP2837.PEF/threads:8/real_time_pvalue                                            0.0016          0.0010      U Test, Repetitions: 25 vs 25
Pentax/645Z/IMGP2837.PEF/threads:8/real_time_mean                                             -0.0023         -0.0024           395           394           395           394
Pentax/645Z/IMGP2837.PEF/threads:8/real_time_median                                           -0.0029         -0.0030           395           394           395           393
Pentax/645Z/IMGP2837.PEF/threads:8/real_time_stddev                                           -0.0275         -0.0375             1             1             1             1
Phase One/P65/CF027310.IIQ/threads:8/real_time_pvalue                                          0.0232          0.0000      U Test, Repetitions: 25 vs 25
Phase One/P65/CF027310.IIQ/threads:8/real_time_mean                                           -0.0047         +0.0039           114           113            28            28
Phase One/P65/CF027310.IIQ/threads:8/real_time_median                                         -0.0050         +0.0037           114           113            28            28
Phase One/P65/CF027310.IIQ/threads:8/real_time_stddev                                         -0.0599         -0.2683             1             1             0             0
Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_pvalue                          0.0000          0.0000      U Test, Repetitions: 25 vs 25
Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_mean                           +0.0206         +0.0207           405           414           405           414
Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_median                         +0.0204         +0.0205           405           414           405           414
Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_stddev                         +0.2155         +0.2212             1             1             1             1
Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_pvalue                         0.0000          0.0000      U Test, Repetitions: 25 vs 25
Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_mean                          -0.0109         -0.0108           147           145           147           145
Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_median                        -0.0104         -0.0103           147           145           147           145
Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_stddev                        -0.4919         -0.4800             0             0             0             0
Samsung/NX3000/_3184416.SRW/threads:8/real_time_pvalue                                         0.0000          0.0000      U Test, Repetitions: 25 vs 25
Samsung/NX3000/_3184416.SRW/threads:8/real_time_mean                                          -0.0149         -0.0147           220           217           220           217
Samsung/NX3000/_3184416.SRW/threads:8/real_time_median                                        -0.0173         -0.0169           221           217           220           217
Samsung/NX3000/_3184416.SRW/threads:8/real_time_stddev                                        +1.0337         +1.0341             1             3             1             3
Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_pvalue                                         0.0001          0.0001      U Test, Repetitions: 25 vs 25
Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_mean                                          -0.0019         -0.0019           194           193           194           193
Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_median                                        -0.0021         -0.0021           194           193           194           193
Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_stddev                                        -0.4441         -0.4282             0             0             0             0
Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_pvalue                                0.0000          0.4263      U Test, Repetitions: 25 vs 25
Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_mean                                 +0.0258         -0.0006            81            83            19            19
Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_median                               +0.0235         -0.0011            81            82            19            19
Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_stddev                               +0.1634         +0.1070             1             1             0             0
```
{F7443905}
If we look at the `_mean`s, the time column, the biggest win is `-7.7%` (`Canon/EOS 5D Mark II/10.canon.sraw2.cr2`),
and the biggest loose is `+3.3%` (`Panasonic/DC-GH5S/P1022085.RW2`);
Overall: mean `-0.7436%`, median `-0.23%`, `cbrt(sum(time^3))` = `-8.73%`
Looks good so far i'd say.

llvm-exegesis details:
{F7371117} {F7371125}
{F7371128} {F7371144} {F7371158}

Reviewers: craig.topper, RKSimon, andreadb, courbet, avt77, spatel, GGanesh

Reviewed By: andreadb

Subscribers: javed.absar, gbedwell, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D52779

llvm-svn: 345463
2018-10-27 20:46:30 +00:00
Simon Pilgrim
a365719a24 [X86][SSE] LowerVSELECT - pull out repeated getOperand(). NFCI.
llvm-svn: 345458
2018-10-27 18:37:59 +00:00
Simon Pilgrim
88116e905e Revert rL345395: [X86][SSE] Move 2-input limit up from getFauxShuffleMask to resolveTargetShuffleInputs
Makes no difference to actual shuffle decoding yet, but merges all the existing limits in one place for when proper support is fixed.
........
Its been reported that this is causing out of trunk failures.

llvm-svn: 345451
2018-10-27 07:10:48 +00:00
Sanjin Sijaric
96f2ea3dd4 [ARM64][Windows] MCLayer support for exception handling
Add ARM64 unwind codes to MCLayer, as well SEH directives that will be emitted
by the frame lowering patch to follow.  We only emit unwind codes into object
object files for now.

Differential Revision: https://reviews.llvm.org/D50166

llvm-svn: 345450
2018-10-27 06:13:06 +00:00
Craig Topper
4b89647b79 [X86] Add some isel patterns for scalar_to_vector/extract_vector_element that use the avx512 extended register classes when they are available.
llvm-svn: 345448
2018-10-27 05:35:20 +00:00
Alina Sbirlea
bdb16f0519 Revert r345169 [along with its llvm counterpart r345170] as it makes Halide builds timeout.
llvm-svn: 345447
2018-10-27 04:51:12 +00:00
Brendon Cahoon
aa783dfd6e [Hexagon] Add missing assignment to Itinerary in Call_nr
The class definition for Call_nr has the itinerary as a
parameter, but the value is never assigned to the Itinerary
field for the instruction. This means the compiler is unable
to schedule and packetize the instruction correctly because
these instrution will not have any resource descritions.
I don't have a specific test case, but the ps_call_nr.ll
test failed with a proposed patch.

llvm-svn: 345442
2018-10-27 00:50:29 +00:00
Reid Kleckner
98d880fbd7 [Spectre] Fix MIR verifier errors in retpoline thunks
Summary:
The main challenge here is that X86InstrInfo::AnalyzeBranch doesn't
understand the way we're using a CALL instruction as a branch, so we
can't list the CallTarget MBB as a successor of the entry block. If we
don't list it as a successor, then the AsmPrinter doesn't print a label
for the MBB.

Fix the issue by inserting our own label at the beginning of the call
target block. We can rely on the AsmPrinter to always emit it, even
though the block appears to be unreachable, but address-taken.

Fixes PR38391.

Reviewers: thegameg, chandlerc, echristo

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D53653

llvm-svn: 345426
2018-10-26 20:26:36 +00:00
Eli Friedman
2ac1162917 [ARM] Make InstrEmitter mark CPSR defs dead for Thumb1.
The "dead" markings allow existing target-independent optimizations,
like MachineSink, to trigger more frequently. The CPSR defs would have
eventually been marked dead by LiveVariables, so this only affects
optimizations before regalloc.

The ARMBaseInstrInfo.cpp change is fixing a bug which is only visible
with this change: the transform adds a use to an otherwise dead def
of CPSR. This is covered by existing regression tests.

thumb2-tbh.ll breaks for Thumb1 due to MachineLICM changing the
generated code; I'll fix it in D53452.

Differential Revision: https://reviews.llvm.org/D53453

llvm-svn: 345420
2018-10-26 19:32:24 +00:00
Lei Huang
de20843f6f [PowerPC] Improve BUILD_VECTOR of 4 i32s
Currently, for this node:
  vector int test(int a, int b, int c, int d) {
    return (vector int) { a, b, c, d };
  }

we get this on Power9:
  mtvsrdd 34, 5, 3
  mtvsrdd 35, 6, 4
  vmrgow 2, 3, 2

and this on Power8:
  mtvsrwz 0, 3
  mtvsrwz 1, 5
  mtvsrwz 2, 4
  mtvsrwz 3, 6
  xxmrghd 34, 1, 0
  xxmrghd 35, 3, 2
  vmrgow 2, 3, 2

This can be improved to this on LE Power9:
  rldimi 3, 4, 32, 0
  rldimi 5, 6, 32, 0
  mtvsrdd 34, 5, 3

and this on LE Power8
  rldimi 3, 4, 32, 0
  rldimi 5, 6, 32, 0
  mtvsrd 34, 3
  mtvsrd 35, 5
  xxpermdi 34, 35, 34, 0

This patch updates the TD pattern to generate the optimized sequence for both
Power8 and Power9 on LE and BE.

Differential Revision: https://reviews.llvm.org/D53494

llvm-svn: 345414
2018-10-26 18:09:36 +00:00
Craig Topper
8315d9990c [X86] Stop promoting vector and/or/xor/andn to vXi64.
These promotions add additional bitcasts to the SelectionDAG that can pessimize computeKnownBits/computeNumSignBits. It also seems to interfere with broadcast formation.

This patch removes the promotion and adds isel patterns instead.

The increased table size is more than I would like, but hopefully we can find some canonicalizations or other tricks to start pruning out patterns going forward.

Differential Revision: https://reviews.llvm.org/D53268

llvm-svn: 345408
2018-10-26 17:21:26 +00:00
Simon Pilgrim
5d1be4f8d4 [X86][SSE] Move 2-input limit up from getFauxShuffleMask to resolveTargetShuffleInputs
Makes no difference to actual shuffle decoding yet, but merges all the existing limits in one place for when proper support is fixed.

llvm-svn: 345395
2018-10-26 15:19:02 +00:00
Sanjay Patel
6b40768f5a [x86] commute blendvb with constant condition op to allow load folding
This is a narrow fix for 1 of the problems mentioned in PR27780:
https://bugs.llvm.org/show_bug.cgi?id=27780

I looked at more general solutions, but it's a mess. We canonicalize shuffle masks
based on the number of elements accessed from each operand, and that's not optional.
If you remove that, we'll crash because we fail to match isel patterns. So I'm
waiting until we're sure that we have blendvb with constant condition and then
commuting based on the load potential. Other cases like blend-with-immediate are
already handled elsewhere, so this is probably not a common problem anyway.

I didn't use "MayFoldLoad" because that checks for one-use and in these cases, we've
screwed that up by creating a temporary PSHUFB using these operands that we're counting
on to be killed later. Undoing that didn't look like a simple task because it's
intertwined with determining if we actually use both operands of the shuffle or not.a

Differential Revision: https://reviews.llvm.org/D53737

llvm-svn: 345390
2018-10-26 14:58:13 +00:00
Simon Pilgrim
7575c6d01b [X86] Use existing pulled out VT variables. NFCI.
llvm-svn: 345388
2018-10-26 14:39:28 +00:00
Scott Linder
11ef7984b0 [AMDGPU] Add a pass to promote bitcast calls
AMDGPU currently only supports direct calls, but at lower optimisation levels it
fails to lower statically direct calls which appear indirect due to a bitcast.

Add a pass to visit all CallSites and use CallPromotionUtils to "devirtualize"
calls.

Differential Revision: https://reviews.llvm.org/D52741

llvm-svn: 345382
2018-10-26 13:18:36 +00:00
Fangrui Song
065c3610ad [SystemZ] Fix -Wcovered-switch-default as coding standard regulates
llvm-svn: 345369
2018-10-26 06:59:08 +00:00
Li Jia He
f6fb752fe8 [PowerPC] Fix some missed optimization opportunities in combineSetCC
For both operands are bool, short, int, long, long long, add the following optimization.
1. 0-x == y --> x+y ==0
2. 0-x != y --> x+y != 0

Review: nemanjai
Differential Revision: https://reviews.llvm.org/D53360

llvm-svn: 345366
2018-10-26 06:48:53 +00:00
Nemanja Ivanovic
6a74bfba20 [PowerPC] Keep vector int to fp conversions in vector domain
At present a v2i16 -> v2f64 convert is implemented by extracts to scalar,
scalar converts, and merge back into a vector. Use vector converts instead,
with the int data permuted into the proper position and extended if necessary.

Patch by RolandF.

Differential revision: https://reviews.llvm.org/D53346

llvm-svn: 345361
2018-10-26 03:19:13 +00:00
Fangrui Song
61ea8dae2e Add dependency from SystemZAsmParser to SystemZAsmPrinter after rL345349
This fixes -DBUILD_SHARED_LIBS=on build. The dependency is similar to that of X86's.

llvm-svn: 345358
2018-10-26 03:04:54 +00:00
Vlad Tsyrklevich
21beeb29ea Revert "[AArch64] Create proper memoperand for multi-vector stores"
This reverts commit r345315, it was causing test failures on
sanitizer-x86_64-linux-fast.

llvm-svn: 345356
2018-10-26 02:00:14 +00:00
Jonas Paulsson
dda46307c2 [SystemZ] Implement SystemZOperand::print()
SystemZAsmParser can now handle -debug by printing the operands neatly to the
output stream. Before this patch this lead to an llvm_unreachable().

It seems that now '-mllvm -debug' does not cause any crashes anywhere (at
least not on SPEC).

Review: Ulrich Weigand
https://reviews.llvm.org/D53328

llvm-svn: 345349
2018-10-26 00:36:00 +00:00
Jonas Paulsson
e2c5cbc164 [SystemZ] Pass the DAG pointer from SystemZAddressingMode::dump().
In order to print the IR slot number for the memory operand, the DAG pointer
must be passed to SDNode::dump().

The isel-debug.ll test updated to also check for the IR Value reference being
printed correctly.

Review: Ulrich Weigand
https://reviews.llvm.org/D53333

llvm-svn: 345347
2018-10-26 00:02:33 +00:00
Heejin Ahn
24faf859e5 Reland "[WebAssembly] LSDA info generation"
Summary:
This adds support for LSDA (exception table) generation for wasm EH.
Wasm EH mostly follows the structure of Itanium-style exception tables,
with one exception: a call site table entry in wasm EH corresponds to
not a call site but a landing pad.

In wasm EH, the VM is responsible for stack unwinding. After an
exception occurs and the stack is unwound, the control flow is
transferred to wasm 'catch' instruction by the VM, after which the
personality function is called from the compiler-generated code. (Refer
to WasmEHPrepare pass for more information on this part.)

This patch:
- Changes wasm.landingpad.index intrinsic to take a token argument, to
make this 1:1 match with a catchpad instruction
- Stores landingpad index info and catch type info MachineFunction in
before instruction selection
- Lowers wasm.lsda intrinsic to an MCSymbol pointing to the start of an
exception table
- Adds WasmException class with overridden methods for table generation
- Adds support for LSDA section in Wasm object writer

Reviewers: dschuff, sbc100, rnk

Subscribers: mgorny, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52748

llvm-svn: 345345
2018-10-25 23:55:10 +00:00
Heejin Ahn
3103d3dcd1 [WebAssembly] Support EH instructions in InstPrinter
Summary: This adds support for exception handling instructions to InstPrinter.

Reviewers: dschuff, aardappel

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53634

llvm-svn: 345343
2018-10-25 23:45:48 +00:00
Bryan Chan
f0923f16f8 [AArch64] Implement FP16FML intrinsics
Add LLVM intrinsics for the ARMv8.2-A FP16FML vector-form instructions. Add a
DAG pattern to define the indexed-form intrinsics in terms of the vector-form
ones, similarly to how the Dot Product intrinsics were implemented.

Based on a patch by Gao Yiling.

Differential Revision: https://reviews.llvm.org/D53632

llvm-svn: 345337
2018-10-25 23:36:41 +00:00
Heejin Ahn
1d13e6be37 Address comments
- Add llvm-mc test case (and delete the old one)
- Change report_fatal_error to assertions

llvm-svn: 345334
2018-10-25 23:35:14 +00:00
Heejin Ahn
1147d91402 [WebAssembly] Error out when block/loop markers mismatch
Summary:
Currently InstPrinter ignores if there are mismatches between block/loop
and end markers by skipping the case if ControlFlowStack is empty. I
guess it is better to explicitly error out in this case, because this
signals invalid input.

Reviewers: aardappel

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53620

llvm-svn: 345333
2018-10-25 23:35:13 +00:00
Jonas Paulsson
2b280ea604 [SystemZ] NFC reformatting in SystemZTargetTransformInfo.cpp
Some lines more than 80 characters long reformatted.

llvm-svn: 345331
2018-10-25 22:53:27 +00:00
Jonas Paulsson
b7caa809e1 [SystemZ] Improve getMemoryOpCost() to find foldable loads that are converted.
The SystemZ backend can do arithmetic of memory by loading and then extending
one of the operands. Similarly, a load + truncate can be folded into an
operand.

This patch improves the SystemZ TTI cost function to recognize this.

Review: Ulrich Weigand
https://reviews.llvm.org/D52692

llvm-svn: 345327
2018-10-25 22:28:25 +00:00
Jonas Paulsson
4645711a8d [SystemZ] Improve handling and cost estimates of vector integer div/rem
Enable the DAG optimization that converts vector div/rem with constants into
multiply+shifts sequences by expanding them early. This is needed since
ISD::SMUL_LOHI is 'Custom' lowered on SystemZ, and will therefore not be
available to BuildSDIV after legalization.

Better cost values for these instructions based on how they will be
implemented (a constant divisor is cheaper).

Review: Ulrich Weigand
https://reviews.llvm.org/D53196

llvm-svn: 345321
2018-10-25 21:47:22 +00:00
Craig Topper
813064bf4d [X86] Change X86 backend to look for 'min-legal-vector-width' attribute instead of 'required-vector-width' when determining whether 512-bit vectors should be legal.
The required-vector-width attribute was only used for backend testing and has never been generated by clang.

I believe clang is now generating min-legal-vector-width for vector uses in user code.

With this I believe passing -mprefer-vector-width=256 to clang should prevent use of zmm registers in the generated assembly unless the user used a 512-bit intrinsic in their source code.

llvm-svn: 345317
2018-10-25 21:16:06 +00:00
David Greene
53e869da7d [AArch64] Create proper memoperand for multi-vector stores
Include all of the store's source vector operands when creating the
MachineMemOperand. Previously, we were missing the first operand,
making the store size seem smaller than it really is.

Differential Revision: https://reviews.llvm.org/D52816

llvm-svn: 345315
2018-10-25 21:10:39 +00:00
Thomas Lively
0aad98fd07 [WebAssembly] Use target-independent saturating add
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53721

llvm-svn: 345299
2018-10-25 19:06:13 +00:00
Volkan Keles
f87473fe1c [GISel] LegalizerInfo: Rename MemDesc::Size to SizeInBits to make the value clearer
Requested in D53679.

llvm-svn: 345288
2018-10-25 17:37:07 +00:00
Craig Topper
c10de9a37a [X86] Remove ProcIntelKNL and replace with a SlowPMADDWD flag to use in the one place it was checked.
llvm-svn: 345286
2018-10-25 17:29:00 +00:00
Craig Topper
5d787ac4be [X86] Remove some uarch tuning flags from KNL that look to have been inherited from SNB/IVB incorrectly
KNL is based on a modified Silvermont core so I don't think these features apply. I think the LEA flag is probably also wrong, but I'm less sure as I barely understand the 3 LEA flags we have currently.

Differential Revision: https://reviews.llvm.org/D53671

llvm-svn: 345285
2018-10-25 17:28:57 +00:00
Volkan Keles
3a103b1d25 [AArch64][GlobalISel] Fix the LegalityPredicate for lowerIf for G_LOAD/G_STORE
Summary:
Currently, Legalizer is trying to lower G_LOAD with a vector type
that has more than two elements due to the incorrect LegalityPredicate.

This patch fixes the issue by removing the multiplication by 8
as `MemDesc.Size` already contains the size in bits.

Reviewers: dsanders, aemerson

Reviewed By: dsanders

Subscribers: rovka, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D53679

llvm-svn: 345282
2018-10-25 17:23:25 +00:00
Evandro Menezes
b53cf99388 [AArch64] Refactor Exynos feature sets (NFC)
llvm-svn: 345279
2018-10-25 16:45:46 +00:00
John Brawn
958865202d [AArch64] Add EXT patterns for 64-bit EXT of a subvector of a 128-bit vector
If we have a 64-bit EXT where one of the operands is a subvector of a 128-bit
vector then in some cases we can eliminate an extract_subvector by converting
to a 128-bit EXT of the 128-bit vector.

Differential Revision: https://reviews.llvm.org/D53582

llvm-svn: 345275
2018-10-25 15:31:51 +00:00
Sam Parker
a16667e79b [ARM] Use Cortex-A57 sched model for Cortex-A72
This mirrors what we already do for AArch64 as the cores are similar.
As discussed in the review, enabling the machine scheduler causes
more variations in performance changes so it is not enabled for now.
This patch improves LNT scores by a geomean of 1.57% at -O3.

Differential Revision: https://reviews.llvm.org/D53562

llvm-svn: 345272
2018-10-25 15:08:29 +00:00
John Brawn
b8e7887f33 [AArch64] Refactor definition of EXT patterns to use a multiclass
Using a multiclass reduces duplication, and makes it easier to add new patterns
later. This refactoring does add some new patterns, but as far as I can tell
there's no IR that will end up triggering them so this is effectively NFC.

Differential Revision: https://reviews.llvm.org/D53580

llvm-svn: 345271
2018-10-25 15:00:10 +00:00
John Brawn
49e61d90ca [AArch64] Do 64-bit vector move of 0 and -1 by extracting from the 128-bit move
Currently a vector move of 0 or -1 will use different instructions depending on
the size of the vector. Using a single instruction (the 128-bit one) for both
gives more opportunity for Machine CSE to eliminate instructions.

Differential Revision: https://reviews.llvm.org/D53579

llvm-svn: 345270
2018-10-25 14:56:48 +00:00
Alexey Bataev
0f2fe4f135 [DEBUG_INFO][NVPTX]Fix processing of DBG_VALUES.
Summary:
If the instruction in the eliminateFrameIndex function is a DBG_VALUE
instruction, it requires special processing. The frame register is set
to VRFrame and the offset is based on the object offset.
The code is similar to the code used in
lib/CodeGen/PrologEpilogInserter.cpp.

Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D53657

llvm-svn: 345269
2018-10-25 14:27:27 +00:00
Amara Emerson
cbd86d8429 [GlobalISel] Use the target preferred type for G_EXTRACT_VECTOR_ELT index.
Allows for better imported pattern re-use.

llvm-svn: 345265
2018-10-25 14:04:54 +00:00
Simon Pilgrim
53e8e145e9 [CostModel][X86] Add realistic vXi64 uitofp vXf64 costs
Match codegen improvements from D53649/rL345256

llvm-svn: 345263
2018-10-25 13:06:20 +00:00
Alex Bradbury
74d4931da2 [RISCV] Use PatFrags for variable shift patterns
This follows SystemZ and I think is cleaner vs the multiclass.

llvm-svn: 345262
2018-10-25 12:45:20 +00:00
Simon Pilgrim
0573b8d8b6 [CostModel][X86] Add realistic i64 uitofp f64 scalar costs
llvm-svn: 345261
2018-10-25 12:42:10 +00:00
Simon Pilgrim
071e82218f [TTI] Add generic SK_Broadcast shuffle costs
I noticed while fixing PR39368 that we don't have generic shuffle costs for broadcast style shuffles.

This patch adds SK_BROADCAST handling, but exposes ARM/AARCH64 lack of handling of this type, which I've added a fix for at the same time.

Differential Revision: https://reviews.llvm.org/D53570

llvm-svn: 345253
2018-10-25 10:52:36 +00:00
Clement Courbet
41c8af3924 [MCSched] Bind PFM Counters to the CPUs instead of the SchedModel.
Summary:
The pfm counters are now in the ExegesisTarget rather than the
MCSchedModel (PR39165).

This also compresses the pfm counter tables (PR37068).

Reviewers: RKSimon, gchatelet

Subscribers: mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D52932

llvm-svn: 345243
2018-10-25 07:44:01 +00:00
Craig Topper
7ae43cad65 [X86] Don't use the OriginalDemandedBits to calculate the DemandedMask for PMULUDQ/PMULDQ inputs.
Multiply a is complex operation so just because some bit of the output isn't used doesn't mean that bit of the input isn't used.

We might able to bound it, but it will require some more thought.

llvm-svn: 345241
2018-10-25 07:00:09 +00:00
Craig Topper
eaa1cf5b57 [X86] Fix typo in comment. NFC
llvm-svn: 345236
2018-10-25 05:00:20 +00:00
Thomas Lively
325c9c5e84 [WebAssembly] Set LoadExt and TruncStore actions for SIMD types
Summary: Fixes part of the problem reported in bug 39275.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits, alexcrichton

Differential Revision: https://reviews.llvm.org/D53542

llvm-svn: 345230
2018-10-25 01:46:07 +00:00
Heejin Ahn
ac764aa88e [WebAssembly] Fix immediate of rethrow when throwing to caller
Summary:
Currently when assigning depths 'rethrow' does not take the whole
control flow stack into accounts but only considers EH pad stacks. When
assigning depth immmediates to rethrows, in normal cases it is done
correctly but when a rethrow instruction throws up to a caller, i.e., we
convert a pseudo RETHROW_TO_CALLER instruction to a rethrow, it
mistakenly compute the whole stack depth.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53619

llvm-svn: 345223
2018-10-24 23:31:24 +00:00
Thomas Lively
ed9513472c [WebAssembly] Retain shuffle types during custom lowering
Summary:
Changing the node type in lowering was violating assumptions made in
the DAG combiner, so don't change the node type any more. This fixes
one of the issues reported in bug 39275.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits, alexcrichton

Differential Revision: https://reviews.llvm.org/D53537

llvm-svn: 345221
2018-10-24 23:27:40 +00:00
Reid Kleckner
49a24278ba [ELF] Fix large code model MIR verifier errors
Instead of using the MOVGOT64r pseudo, use the existing
MO_PIC_BASE_OFFSET support on symbol operands. Now I don't have to
create a "scratch register operand" for the pseudo to use, and the
register allocator can make better decisions.

Fixes some X86 verifier errors tracked in PR27481.

llvm-svn: 345219
2018-10-24 22:57:28 +00:00
Thomas Lively
30f1d69115 [NFC] Rename minnan and maxnan to minimum and maximum
Summary:
Changes all uses of minnan/maxnan to minimum/maximum
globally. These names emphasize that the semantic difference between
these operations is more than just NaN-propagation.

Reviewers: arsenm, aheejin, dschuff, javed.absar

Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D53112

llvm-svn: 345218
2018-10-24 22:49:55 +00:00
Evandro Menezes
096e2497b5 [AArch64] Refactor Exynos machine model
Effectively, NFC.

llvm-svn: 345201
2018-10-24 21:40:43 +00:00
Reid Kleckner
9c5bda652c [X86] Add *SP to tailcall register class to fix verifier error
It's possible to do a tail call to a stack argument. LLVM already
calculates the right stack offset to call through.

Fixes the sibcall* and musttail* verifier failures tracked at PR27481.

llvm-svn: 345197
2018-10-24 21:09:34 +00:00
Reid Kleckner
953bdce68d [MC] Separate masm integer literal lexer support from inline asm
Summary:
This renames the IsParsingMSInlineAsm member variable of AsmLexer to
LexMasmIntegers and moves it up to MCAsmLexer. This is the only behavior
controlled by that variable. I added a public setter, so that it can be
set from outside or from the llvm-mc command line. We may need to
arrange things so that users can get this behavior from clang, but
that's future work.

I also put additional hex literal lexing functionality under this flag
to fix PR32973. It appears that this hex literal parsing wasn't intended
to be enabled in non-masm-style blocks.

Now, masm integers (0b1101 and 0ABCh) work in __asm blocks from clang,
but 0b label references work when using .intel_syntax in standalone .s
files.

However, 0b label references will *not* work from __asm blocks in clang.
They will work from GCC inline asm blocks, which it sounds like is
important for Crypto++ as mentioned in PR36144.

Essentially, we only lex masm literals for inline asm blobs that use
intel syntax. If the .intel_syntax directive is used inside a gnu-style
inline asm statement, masm literals will not be lexed, which is
compatible with gas and llvm-mc standalone .s assembly.

This fixes PR36144 and PR32973.

Reviewers: Gerolf, avt77

Subscribers: eraman, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D53535

llvm-svn: 345189
2018-10-24 20:23:57 +00:00
Tim Northover
1c353419ab AArch64: add a pass to compress jump-table entries when possible.
llvm-svn: 345188
2018-10-24 20:19:09 +00:00
Evandro Menezes
769d4cebad [AArch64] Refactor Exynos machine model (NFC)
llvm-svn: 345187
2018-10-24 20:03:24 +00:00
Evandro Menezes
80bc136732 [AArch64] Fix overlapping instructions
Fix overlapping instruction descriptions in the machine model for Exynos M3.
Effectively, NFC.

llvm-svn: 345186
2018-10-24 20:03:20 +00:00
Craig Topper
7bb8c2e6e5 [X86] Explicitly list all KNL features of inheriting from IVB. NFC
I'm not sure all the microarchitectural tuning flags that have been added to IVBFeatures are relevant for KNL. Separating will allow us to see and audit them. There might even be some simplification opportunities in the Sandy Bridge through Icelake inheritance line without KNL using the same chain.

llvm-svn: 345183
2018-10-24 19:24:44 +00:00
Simon Pilgrim
c5bb362b13 [X86][SSE] Add SimplifyDemandedBitsForTargetNode PMULDQ/PMULUDQ handling
Add X86 SimplifyDemandedBitsForTargetNode and use it to simplify PMULDQ/PMULUDQ target nodes.

This enables us to repeatedly simplify the node's arguments after the previous approach had to be reverted due to PR39398.

Differential Revision: https://reviews.llvm.org/D53643

llvm-svn: 345182
2018-10-24 19:11:28 +00:00
Simon Pilgrim
ac84005841 [CostModel][X86] Add vXi8 vector division by constants costs.
ISD::MULHS/ISD::MULHU lowering of vXi8 types means we expand these in TargetLowering BuildSDIV/BuildUDIV.

llvm-svn: 345175
2018-10-24 18:44:12 +00:00
Peter Collingbourne
4bb928c110 ARM: Use BKPT instead of TRAP to implement llvm.debugtrap.
The BKPT instruction is specified to cause a software breakpoint,
and at least on Linux results in a SIGTRAP. This makes it more
suitable for implementing debugtrap than TRAP (aka UDF #254), which
is specified to cause an undefined instruction exception and results
in a SIGILL on Linux.

Moreover, BKPT is not marked as a terminator, which is not only
consistent with the IR instruction but allows the analyzeBlock
function to correctly analyze a basic block containing the instruction,
which fixes an assertion failure in the machine block placement pass
previously triggered by the included test case.

Because BKPT is only supported starting with ARMv5T, we continue to
use UDF #254 when targeting v4T.

Differential Revision: https://reviews.llvm.org/D53614

llvm-svn: 345171
2018-10-24 18:10:38 +00:00
Krzysztof Parzyszek
57b5ac1431 [Hexagon] Flip hexagon-autohvx to be true by default
This will allow other generators of LLVM IR to use the auto-vectorizer
without having to change that flag.

Note: on its own, this patch will enable auto-vectorization on Hexagon
in all cases, regardless of the -fvectorize flag. There is a companion
clang patch that together with this one forms an NFC for clang users.

llvm-svn: 345169
2018-10-24 17:55:13 +00:00
Craig Topper
2417273255 [X86] Bring back the MOV64r0 pseudo instruction
This patch brings back the MOV64r0 pseudo instruction for zeroing a 64-bit register. This replaces the SUBREG_TO_REG MOV32r0 sequence we use today. Post register allocation we will rewrite the MOV64r0 to a 32-bit xor with an implicit def of the 64-bit register similar to what we do for the various XMM/YMM/ZMM zeroing pseudos.

My main motivation is to enable the spill optimization in foldMemoryOperandImpl. As we were seeing some code that repeatedly did "xor eax, eax; store eax;" to spill several registers with a new xor for each store. With this optimization enabled we get a store of a 0 immediate instead of an xor. Though I admit the ideal solution would be one xor where there are multiple spills. I don't believe we have a test case that shows this optimization in here. I'll see if I can try to reduce one from the code were looking at.

There's definitely some other machine CSE(and maybe other passes) behavior changes exposed by this patch. So it seems like there might be some other deficiencies in SUBREG_TO_REG handling.

Differential Revision: https://reviews.llvm.org/D52757

llvm-svn: 345165
2018-10-24 17:32:09 +00:00
Simon Pilgrim
2cce074e8c [CostModel][X86] Enable non-uniform vector division by constants costs.
Non-uniform division/remainder handling was added back at D49248/D50765 - so share the 'mul+sub' costs that already exist for uniform cases.

llvm-svn: 345164
2018-10-24 17:30:29 +00:00
Alexey Bataev
c15c853c3a [DEBUGINFO, NVPTX] Try to pack bytes data into a single string.
Summary:
If the target does not support `.asciz` and `.ascii` directives, the
strings are represented as bytes and each byte is placed on the new line
as a separate byte directive `.b8 <data>`. NVPTX target allows to
represent the vector of the data of the same type as a vector, where
values are separated using `,` symbol: `.b8 <data1>,<data2>,...`. This
allows to reduce the size of the final PTX file. Ptxas tool includes ptx
files into the resulting binary object, so reducing the size of the PTX
file is important.

Reviewers: tra, jlebar, echristo

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D45822

llvm-svn: 345142
2018-10-24 14:04:00 +00:00
Tim Renouf
2a1b1d94b6 [AMDGPU] Defined gfx909 Raven Ridge 2
Differential Revision: https://reviews.llvm.org/D53418

Change-Id: Ie3d054f2e956c2768988c0f4c0ffd29a47294eef
llvm-svn: 345120
2018-10-24 08:14:07 +00:00
Craig Topper
da54bbf52a [X86] Correct a bad isel predicate. Though I don't think it can be exposed.
This B/W VPTEST instructions are only available with AVX512BW. But lowering should prevent any byte or word elements from getting to isel so this can't be exposed.

llvm-svn: 345112
2018-10-24 06:13:36 +00:00
Saleem Abdulrasool
4005f9a860 ARM: handle checking aliases with out-of-bounds GEPs
A global alias may use indices which are not considered in bounds.  In
such a case, accessing the base object will fail as it only peers
through inbounds accesses.  This pattern is used by the swift compiler
to create references to preceeding members in the type metadata.  This
would cause the code generation to fail when targeting a platform that
used ELF as the object file format.  Be conservative and fail the
read-only check if we run into an alias that we cannot peer through.

llvm-svn: 345107
2018-10-24 00:00:52 +00:00
Matthias Braun
4f82406c46 SelectionDAG: Reuse bigger sized constants in memset expansion.
When implementing memset's today we often see this pattern:
$x0 = MOV 0xXYXYXYXYXYXYXYXY
store $x0, ...
$w1 = MOV 0xXYXYXYXY
store $w1, ...

We first create a 64bit constant in a 64bit register with all bytes the
same and then create a 32bit constant with all bytes the same in a 32bit
register. In many targets we could just access the lower byte of the
64bit register instead.

- Ideally this would be handled by the ConstantHoist pass but it runs
  too early when memset isn't expanded yet.
- The memset expansion code already had this optimization implemented,
  however SelectionDAG constantfolding would constantfold the
  "trunc(bigconstnat)" pattern to "smallconstant".
- This patch makes the memset expansion mark the constant as Opaque and
  stop DAGCombiner from constant folding in this situation. (Similar to
  how ConstantHoisting marks things as Opaque to avoid folding
  ADD/SUB/etc.)

Differential Revision: https://reviews.llvm.org/D53181

llvm-svn: 345102
2018-10-23 23:19:23 +00:00
Simon Pilgrim
b6c57075c0 [X86][SSE] Revert rL343922 combinePMULDQ AddToWorklist (PR39398)
We can't add the MULDQ node back to the worklist after the demanded bits change has been committed in case the node has been removed entirely. This will have to wait until we have SimplifyDemandedBitsForTargetNode.

llvm-svn: 345070
2018-10-23 19:07:53 +00:00
Roman Lebedev
2fae985793 X86DAGToDAGISel::matchBitExtract(): lambdas can't have default arguments.
As reported by ctopper.
That is a gcc-only warning at the moment.

llvm-svn: 345065
2018-10-23 18:27:10 +00:00
Stefan Pintilie
927e8bf316 [Power9] Add __float128 support in the backend for bitcast to a i128
Add support to allow bit-casting from f128 to i128 and then
extracting 64 bits from the result.

Differential Revision: https://reviews.llvm.org/D49507

llvm-svn: 345053
2018-10-23 17:11:36 +00:00
Simon Pilgrim
f04a04c2b6 [TTI][X86] Treat SK_Transpose shuffles as SK_PermuteTwoSrc - there's no difference in lowering.
llvm-svn: 345048
2018-10-23 16:45:26 +00:00
Sanjay Patel
47a52a0521 [WebAssembly] use 'match' to simplify code; NFC
Vector types are not possible here because this code explicitly
checks for a scalar type, but this is another step towards 
completely removing the fake binop queries for not/neg/fneg.

llvm-svn: 345043
2018-10-23 16:05:09 +00:00
Roman Lebedev
06e4db07af Experimental re-land of [X86][BMI1] X86DAGToDAGISel: select BEXTR from x << (32 - y) >> (32 - y) pattern
This initially landed in rL345014, but was reverted in rL345017
due to sanitizer-x86_64-linux-fast buildbot failure in
check-lld (ELF/relocatable-versioned.s) test.

While i'm not yet quite sure what is the problem, one obvious
thing here is that extra truncation roundtrip.
Maybe that's it? If not, will re-revert.

Differential Revision: https://reviews.llvm.org/D53521

llvm-svn: 345027
2018-10-23 13:19:31 +00:00
Simon Pilgrim
f85ee9f8b4 [X86][SSE] Update raw mask shuffle decoders to handle UNDEF mask elts
Matches the approach taken in the constant pool shuffle decoders, and uses an UndefElts mask instead of uint64_t(-1) raw mask values, which doesn't work safely for i32/i64 shuffle mask sizes (as the -1 value is legal).

This allows us to remove the constant pool shuffle decoders from most of the getTargetShuffleMask variable shuffle cases (X86ISD::VPERMV3 will be handled in a future commit).

llvm-svn: 345018
2018-10-23 11:33:38 +00:00
Roman Lebedev
c29dbbdb10 Revert "[X86][BMI1] X86DAGToDAGISel: select BEXTR from x << (32 - y) >> (32 - y) pattern"
*Seems* to be breaking sanitizer-x86_64-linux-fast buildbot,
the ELF/relocatable-versioned.s test:

==17758==MemorySanitizer CHECK failed: /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cc:191 "((kBlockMagic)) == ((((u64*)addr)[0]))" (0x6a6cb03abcebc041, 0x0)
    #0 0x59716b in MsanCheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/msan/msan.cc:393
    #1 0x586635 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_termination.cc:79
    #2 0x57d5ff in __sanitizer::InternalFree(void*, __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<__sanitizer::AP32> >*) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cc:191
    #3 0x7fc21b24193f  (/lib/x86_64-linux-gnu/libc.so.6+0x3593f)
    #4 0x7fc21b241999 in exit (/lib/x86_64-linux-gnu/libc.so.6+0x35999)
    #5 0x7fc21b22c2e7 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e7)
    #6 0x57c039 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/lld+0x57c039)

This reverts commit r345014.

llvm-svn: 345017
2018-10-23 10:34:57 +00:00
Simon Pilgrim
816e57be35 [TTI] Add generic cost handling of SK_Reverse shuffles
These can be treated as a general permute.

This required a fix for missing reverse patterns on ARM

llvm-svn: 345015
2018-10-23 09:42:10 +00:00
Roman Lebedev
1c95b2f779 [X86][BMI1] X86DAGToDAGISel: select BEXTR from x << (32 - y) >> (32 - y) pattern
Summary:
Continuation of D52348.

We also get the `c) x &  (-1 >> (32 - y))` pattern here, because of the D48768.
I will add extra-uses into those tests and follow-up with a patch to handle those patterns too.

Reviewers: RKSimon, craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53521

llvm-svn: 345014
2018-10-23 09:08:44 +00:00
Heejin Ahn
a40303aa03 [WebAssembly] Fix assembly printing of br_table
Summary: In `br_table's stack version asm string, \t was missing.

Reviewers: aardappel

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53516

llvm-svn: 344981
2018-10-23 00:28:14 +00:00
Saleem Abdulrasool
96cd3cc312 X86: fix a comment copy-paste issue (NFC)
The comment was copy-pasted but not updated.  NFC.

llvm-svn: 344973
2018-10-22 23:34:24 +00:00
Craig Topper
96889b8b96 [X86] Remove unused entries from the X86ProcFamily enum. Add a note to discourage creation of new enum entries.
As we've learned multiple times, a coarse grained enum like this is not scalable and we should be migrating away from it.

llvm-svn: 344972
2018-10-22 23:14:55 +00:00
Matthias Braun
a0beeffeed X86: Do not optimize branches with undef eflags inputs
analyzeBranch()/insertBranch() etc. do not properly deal with an undef
flag on the eflags input and used to produce invalid MIR.  I don't see
this ever affecting real world inputs (I don't think it is possible to
produce undef flags with llvm IR), so I simply changed the code to bail
out in this case.

rdar://42122367

llvm-svn: 344970
2018-10-22 22:52:23 +00:00
Craig Topper
c8e183f9ee Recommit r344877 "[X86] Stop promoting integer loads to vXi64"
I've included a fix to DAGCombiner::ForwardStoreValueToDirectLoad that I believe will prevent the previous miscompile.

Original commit message:

Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to rem

I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping.

I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the lo

I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits, RKSimon

Differential Revision: https://reviews.llvm.org/D53306

llvm-svn: 344965
2018-10-22 22:14:05 +00:00
Thomas Lively
c63b5fcb2a [WebAssembly][NFC] Remove WebAssemblyStackifier TableGen backend
Summary:
Replace its functionality with a TableGen InstrInfo relational
instruction mapping. Although arguably more complex than the TableGen
backend, the relational mapping is a smaller maintenance burden than a
TableGen backend.

Reviewers: aardappel, aheejin, dschuff

Subscribers: mgorny, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53307

llvm-svn: 344962
2018-10-22 21:55:26 +00:00
Tim Northover
a23c12a627 X86: add alias for pushfw/popfw in Intel mode
A while ago we changed pushf and popf in Intel mode to generate pushfq
and popfq. Unfortunately that left us with no way to get the 16-bit
encoding in Intel mode so this patch adds pushfw and popfw as aliases
there.

llvm-svn: 344949
2018-10-22 20:38:13 +00:00
Simon Pilgrim
3b91e9676b Revert rL344931 from llvm/trunk: [X86][SSE] getTargetShuffleMaskIndices - allow opt-in support for whole undef shuffle mask elements
We can't safely assume that certain RawMask entries are UNDEF as most variable shuffles ignore non-index bits - PSHUFB only works on i8 elts so it'd be safe to use but I'm intending to come up with an alternative approach that works for all.
........
Enable this for PSHUFB constant mask decoding and remove the ConstantPool DecodePSHUFBMask

llvm-svn: 344937
2018-10-22 19:01:25 +00:00
Simon Pilgrim
794f85cd93 Revert rL344933 from llvm/trunk: [X86][SSE] Tidyup DecodeVPERMILPMask shuffle mask decoding
We can't safely assume that certain RawMask entries are UNDEF as most variable shuffles ignore non-index bits.
........
Add support for UNDEF raw mask elements and remove the ConstantPool DecodeVPERMILPMask usage in X86ISelLowering.cpp

llvm-svn: 344936
2018-10-22 18:58:32 +00:00
Simon Pilgrim
476c9f42fc [X86][SSE] Tidyup DecodeVPERMILPMask shuffle mask decoding
Add support for UNDEF raw mask elements and remove the ConstantPool DecodeVPERMILPMask usage in X86ISelLowering.cpp

llvm-svn: 344933
2018-10-22 18:35:13 +00:00
Simon Pilgrim
3521367ff3 [X86][SSE] getTargetShuffleMaskIndices - allow opt-in support for whole undef shuffle mask elements
Enable this for PSHUFB constant mask decoding and remove the ConstantPool DecodePSHUFBMask

llvm-svn: 344931
2018-10-22 18:09:02 +00:00
Simon Pilgrim
5dff767c25 [X86] getTargetConstantBitsFromNode - handle extraction from larger constant pool entries
First step towards removing X86ShuffleDecodeConstantPool usage from X86ISelLowering.cpp

llvm-svn: 344924
2018-10-22 17:43:33 +00:00
Craig Topper
8d8dcfe690 Revert r344877 "[X86] Stop promoting integer loads to vXi64"
Sam McCall reported miscompiles in some tensorflow code. Reverting while I try to figure out.

llvm-svn: 344921
2018-10-22 16:59:24 +00:00
Matt Arsenault
687ec75d10 DAG: Change behavior of fminnum/fmaxnum nodes
Introduce new versions that follow the IEEE semantics
to help with legalization that may need quieted inputs.

There are some regressions from inserting unnecessary
canonicalizes when these are matched from fast math
fcmp + select which should be fixed in a future commit.

llvm-svn: 344914
2018-10-22 16:27:27 +00:00
Simon Pilgrim
6f5cd7c67f [X86][SSE] getTargetShuffleMask - pull out repeated shuffle mask element size. NFCI.
llvm-svn: 344910
2018-10-22 15:33:30 +00:00
Roman Lebedev
898808504d [X86] X86DAGToDAGISel: handle BZHI selection too, not just BEXTR.
Summary:
As discussed in D52304 / IRC, we now have pattern matching for
'bit extract' in two places - tablegen and `X86DAGToDAGISel`.
There are 4 patterns.
And we will have a problem with `x &  (-1 >> (32 - y))` pattern.
* If the mask is one-use, then it is always unfolded into `x << (32 - y) >> (32 - y)` first.
  Thus, the existing test coverage is already broken.
* If it is not one-use, then it is not unfolded, and is matched as BZHI.
* If it is not one-use, we will not match it as BEXTR. And if it is one-use, it will have been unfolded already.
So we will either not handle that pattern for BEXTR, or not have test coverage for it.
This is bad.

As discussed with @craig.topper, let's unify this matching, and do everything in `X86DAGToDAGISel`.
Then we will not have code duplication, and will have proper test coverage.

This indeed does not affect any tests, and this is great.
It means that for these two patterns, the `X86DAGToDAGISel` is identical to the tablegen version.

Please review carefully, i'm not fully sure about that intrinsic change, and introduction of the new `X86ISD` opcode.

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: craig.topper

Subscribers: llvm-commits, craig.topper

Differential Revision: https://reviews.llvm.org/D53164

llvm-svn: 344904
2018-10-22 14:12:44 +00:00
Roman Lebedev
13c5ab2e27 [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ((1 << nbits) + (-1)) pattern
Summary:
Trivial continuation of D52304.
While this pattern is not canonical, we do select it in the BZHI case,
so this should not be any different.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52348

llvm-svn: 344902
2018-10-22 13:54:17 +00:00
Petar Avramovic
e72a743740 Test commit: change comment.
llvm-svn: 344900
2018-10-22 13:27:50 +00:00
Nemanja Ivanovic
674581afbb [PowerPC][NFC] Fix bugs in r+r to r+i conversion
The D-Form VSX loads introduced in ISA 3.0 are not direct D-Form equivalent of
the corresponding X-Forms since they only target the Altivec registers.
Namely LXSSPX can load into any of the 64 VSX registers whereas LXSSP can only
load into the upper 32 VSX registers. Similarly with the remaining affected
instructions.

There is currently no way that I can see to trigger the bug, but as we add other
ways of exploiting these instructions, there may very well be instances that do.

This is an NFC patch in practical terms since the changes it introduces can not
be triggered without an MIR test.

Differential revision: https://reviews.llvm.org/D53323

llvm-svn: 344894
2018-10-22 11:22:59 +00:00
Craig Topper
290c081d91 [X86] Add patterns for vector and/or/xor/andn with other types than vXi64.
This makes fast isel treat all legal vector types the same way. Previously only vXi64 was in the fast-isel tables.

This unfortunately prevents matching of andn by fast-isel for these types since the requires SelectionDAG. But we already had this issue for vXi64. So at least we're consistent now.

Interestinly it looks like fast-isel can't handle instructions with constant vector arguments so the the not part of the andn patterns is selected with SelectionDAG. This explains why VPTERNLOG shows up in some of the tests.

This is a subset of D53268. As I make progress on that, I will try to reduce the number of lines in the tablegen files.

llvm-svn: 344884
2018-10-22 06:30:22 +00:00
Craig Topper
321df5b0d4 [X86] Stop promoting integer loads to vXi64
Summary:
Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to remove the bitcast.

I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping.

I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the load size.

I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits, RKSimon

Differential Revision: https://reviews.llvm.org/D53306

llvm-svn: 344877
2018-10-21 21:30:26 +00:00
Craig Topper
8de07b4db1 Revert r344873 "foo"
Rebase gone wrong left this in my tree.

llvm-svn: 344875
2018-10-21 21:08:37 +00:00
Craig Topper
5eea94edd4 [X86] Remove SDIVREM8_SEXT_HREG/UDIVREM8_ZEXT_HREG and their associated DAG combine and target bits support. Use a post isel peephole instead.
Summary:
These nodes exist to overcome an isel problem where we can generate a zero extend of an AH register followed by an extract subreg, and another zero extend. The first zero extend exists to avoid a partial register update copying the AH register into the low 8-bits. The second zero extend exists if the user wanted the remainder zero extended.

To make this work we had a DAG combine to morph the DIVREM opcode to a special opcode that included the extend. But then we had to add the new node to computeKnownBits and computeNumSignBits to process the extension portion.

This patch instead removes all of that and adds a late peephole to detect the two extends.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53449

llvm-svn: 344874
2018-10-21 21:07:27 +00:00
Craig Topper
e367039fe5 foo
llvm-svn: 344873
2018-10-21 21:07:25 +00:00
Simon Pilgrim
eb806d5f30 [X86][AVX] Enable lowerVectorShuffleAsLanePermuteAndPermute v16i16/v32i8 unary shuffle lowering
llvm-svn: 344868
2018-10-21 17:07:50 +00:00
Simon Pilgrim
abc24fdb94 [X86] Only extract constant pool shuffle mask data with zero offsets
D53306 exposes an issue where we sometimes use constant pool data from bigger vectors than the target shuffle mask. This should be safe to do, but we have to be certain that we're using the bottom most part of the vector as the shuffle mask decoders have no way to peek into subvectors with non-zero offsets.

llvm-svn: 344867
2018-10-21 11:55:56 +00:00
Thomas Lively
5ea17d450e [WebAssembly] Implement vector sext_inreg and tests with comparisons
Summary: Depends on D53251.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53252

llvm-svn: 344826
2018-10-20 01:35:23 +00:00
Thomas Lively
55735d522d [WebAssembly] Custom lower i64x2 constant shifts to avoid wrap
Summary: Depends on D53057.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53251

llvm-svn: 344825
2018-10-20 01:31:18 +00:00
Changpeng Fang
f95f763ea5 AMDGPU: Add support pattern for SUB of one bit
Summary:
  Add selection patterns to support one bit Sub.

Reviewers:
  rampitec, arsenm

Differential Revision:
  https://reviews.llvm.org/D52946

llvm-svn: 344815
2018-10-19 21:09:21 +00:00
Craig Topper
5ed1099962 [X86] Remove some left over code from when MVT:i1 was a legal type for AVX512.
llvm-svn: 344813
2018-10-19 20:44:33 +00:00
Craig Topper
5c81c68385 [X86] In PostprocessISelDAG, start from allnodes_end, not the root.
There is no guarantee the root is at the end if isel created any nodes without morphing them. This includes the nodes created by manual isel from C++ code in X86ISelDAGToDAG.

This is similar to r333415 from PowerPC which is where I originally stole the peephole loop from.

I don't have a test case, but without this a future patch doesn't work which is how I found it.

llvm-svn: 344808
2018-10-19 19:24:42 +00:00
Thomas Lively
11a332d08d [WebAssembly] Handle undefined lane indices in SIMD patterns
Summary:
Undefined indices in shuffles can be used when not all lanes of the
output vector will be used. This happens for example in the expansion
of vector reduce operations. Regardless, undefs are legal as lane
indices in IR and should be supported.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53057

llvm-svn: 344803
2018-10-19 19:08:06 +00:00
Krzysztof Parzyszek
6bfc6577f2 [Hexagon] Remove support for V4
llvm-svn: 344791
2018-10-19 17:31:11 +00:00
Fangrui Song
2e83b2e9ee Use llvm::{all,any,none}_of instead std::{all,any,none}_of. NFC
llvm-svn: 344774
2018-10-19 06:12:02 +00:00
Eli Friedman
b09c778715 Revert r344693 ("[ARM] bottom-top mul support in ARMParallelDSP")
Still causing failures on the polly-aosp buildbot; I'll follow up
with a reduced testcase.

llvm-svn: 344752
2018-10-18 19:34:30 +00:00
Kristina Brooks
312fcc116b [X86] Support for the mno-tls-direct-seg-refs flag
Allows to disable direct TLS segment access (%fs or %gs). GCC supports
a similar flag, it can be useful in some circumstances, e.g. when a thread
context block needs to be updated directly from user space. More info
and specific use cases: https://bugs.llvm.org/show_bug.cgi?id=16145

There is another revision for clang as well.
Related: D53102

All X86 CodeGen tests appear to pass:
```
[46/47] Running lit suite /SourceCache/llvm-trunk-8.0/test/CodeGen
Testing Time: 23.17s
  Expected Passes    : 3801
  Expected Failures  : 15
  Unsupported Tests  : 8021
```

Reviewed by: Craig Topper.

Patch by nruslan (Ruslan Nikolaev).

Differential Revision: https://reviews.llvm.org/D53103

llvm-svn: 344723
2018-10-18 03:14:37 +00:00
Nicolai Haehnle
4821937d2e AMDGPU: Avoid selecting ds_{read,write}2_b32 on SI
Summary:
To workaround a hardware issue in the (base + offset) calculation
when base is negative. The impact on code quality should be limited
since SILoadStoreOptimizer still runs afterwards and is able to
combine loads/stores based on known sign information.

This fixes visible corruption in Hitman on SI (easily reproducible
by running benchmark mode).

Change-Id: Ia178d207a5e2ac38ae7cd98b532ea2ae74704e5f
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99923

Reviewers: arsenm, mareko

Subscribers: jholewinski, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53160

llvm-svn: 344698
2018-10-17 15:37:48 +00:00
Nicolai Haehnle
c4a2ff0950 AMDGPU: Divergence-driven selection of scalar buffer load intrinsics
Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar load intrinsics directly
to either VMEM or SMRD buffer loads based on divergence analysis.

If an offset happens to end up in a VGPR -- either because a floating
point calculation was involved, or due to other remaining deficiencies
in SIFixSGPRCopies -- we use v_readfirstlane.

There is some unrelated churn in tests since we now select MUBUF offsets
in a unified way with non-scalar buffer loads.

Change-Id: I170e6816323beb1348677b358c9d380865cd1a19

Reviewers: arsenm, alex-t, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53283

llvm-svn: 344696
2018-10-17 15:37:30 +00:00
Sam Parker
2ef3c0dad6 [ARM] bottom-top mul support in ARMParallelDSP
Previously reverted in rL343082.

Original commit message:

On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.

Differential Revision: https://reviews.llvm.org/D51983

llvm-svn: 344693
2018-10-17 13:02:48 +00:00
Nicolai Haehnle
e9b134aa31 AMDGPU: Remove dead TableGen code
Summary: Change-Id: Ic1f2c1d0cf9e90a0baa9fc6bacd0d3c386069fb0

Reviewers: tpr

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53318

Change-Id: Ib4d143c898801e5cf6cb9999a495d62c91ae77fb
llvm-svn: 344691
2018-10-17 12:14:26 +00:00
Petar Jovanovic
8a08412533 [MIPS GlobalISel] Legalize constants
Legalize s1, s8, s16 and s64 G_CONSTANT for MIPS32.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D53077

llvm-svn: 344684
2018-10-17 10:30:03 +00:00
Sjoerd Meijer
64cfb74a61 [ARM] Do not fuse VADD and VMUL, continued (2/2)
This is patch 2/2, following up on D53314, and is the functional change
to prevent fusing mul + add sequences into VFMAs.

Differential revision: https://reviews.llvm.org/D53315

llvm-svn: 344683
2018-10-17 10:05:44 +00:00
Sjoerd Meijer
1a213a42d6 [ARM] Follow up of rL344671, attempt to pacify a buildbot
It was rightfully complaining about an unpretty logical expression.

llvm-svn: 344677
2018-10-17 07:51:24 +00:00
Sjoerd Meijer
ff3ab33ec8 [ARM][NFCI] Do not fuse VADD and VMUL, continued (1/2)
This is a follow up of rL342874, which stopped fusing muls and adds into VMLAs
for performance reasons on the Cortex-M4 and Cortex-M33.  This is a serie of 2
patches, that is trying to achieve the same for VFMA.  The second column in the
table below shows what we were generating before rL342874, the third column
what changed with rL342874, and the last column what we want to achieve with
these 2 patches:

 --------------------------------------------------------
 | Opt   |  < rL342874   |  >= rL342874   |             |
 |------------------------------------------------------|
 |-O3    |     vmla      |      vmul      |     vmul    |
 |       |               |      vadd      |     vadd    |
 |------------------------------------------------------|
 |-Ofast |     vfma      |      vfma      |     vmul    |
 |       |               |                |     vadd    |
 |------------------------------------------------------|
 |-Oz    |     vmla      |      vmla      |     vmla    |
 --------------------------------------------------------

This patch 1/2, is a cleanup of the spaghetti predicate logic on the different
VMLA and VFMA codegen rules, so that we can make the final functional change in
patch 2/2.  This also fixes a typo in the regression test added in rL342874.

Differential revision: https://reviews.llvm.org/D53314

llvm-svn: 344671
2018-10-17 07:26:35 +00:00
Craig Topper
e0a992918b [X86] Match (cmp (and (shr X, C), mask), 0) to BEXTR+TEST.
Without this we match the CMP+AND to a TEST and then match the SHR separately. I'm trusting analyzeCompare to remove the TEST during the peephole pass. Otherwise we need to check the flag users to see if they only use the Z flag.

This recovers a case lost by r344270.

Differential Revision: https://reviews.llvm.org/D53310

llvm-svn: 344649
2018-10-16 22:29:36 +00:00
Krasimir Georgiev
547d824da6 Revert "[WebAssembly] LSDA info generation"
This reverts commit r344575.
Newly introduced test eh-lsda.ll.test fails with use-after-free under
ASAN build.

llvm-svn: 344639
2018-10-16 18:50:09 +00:00
Evandro Menezes
c98decf864 [PATCH] [NFC][AArch64] Fix refactoring of macro fusion
Fix compiler error.

llvm-svn: 344632
2018-10-16 17:41:45 +00:00
Evandro Menezes
46eadcff9c [NFC][ARM] Refactor macro fusion
Simplify code for wildcards.

llvm-svn: 344625
2018-10-16 17:19:51 +00:00
Evandro Menezes
de655c6d3a [NFC][AArch64] Refactor macro fusion
Simplify API of checking functions.

llvm-svn: 344624
2018-10-16 17:19:28 +00:00
Simon Pilgrim
7d27cfdcb2 [X86] Fix Skylake ReadAfterLd for PADDrm etc.
Missed in rL343868 as due to their custom InstrRW.

llvm-svn: 344600
2018-10-16 09:50:16 +00:00
Aleksandar Beserminji
a5949439ca [mips][micromips] Fix how values in .gcc_except_table are calculated
When a landing pad is calculated in a program that is compiled
for micromips, it will point to an even address. Such an error will
cause a segmentation fault, as the instructions in micromips are
aligned on odd addresses. This patch sets the last bit of the offset
where a landing pad is, to 1, which will effectively be
an odd address and point to the instruction exactly.

Differential Revision: https://reviews.llvm.org/D52985

llvm-svn: 344591
2018-10-16 08:27:28 +00:00
Heejin Ahn
0981eaab47 [WebAssembly] LSDA info generation
Summary:
This adds support for LSDA (exception table) generation for wasm EH.
Wasm EH mostly follows the structure of Itanium-style exception tables,
with one exception: a call site table entry in wasm EH corresponds to
not a call site but a landing pad.

In wasm EH, the VM is responsible for stack unwinding. After an
exception occurs and the stack is unwound, the control flow is
transferred to wasm 'catch' instruction by the VM, after which the
personality function is called from the compiler-generated code. (Refer
to WasmEHPrepare pass for more information on this part.)

This patch:
- Changes wasm.landingpad.index intrinsic to take a token argument, to
make this 1:1 match with a catchpad instruction
- Stores landingpad index info and catch type info MachineFunction in
before instruction selection
- Lowers wasm.lsda intrinsic to an MCSymbol pointing to the start of an
exception table
- Adds WasmException class with overridden methods for table generation
- Adds support for LSDA section in Wasm object writer

Reviewers: dschuff, sbc100, rnk

Subscribers: mgorny, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52748

llvm-svn: 344575
2018-10-16 00:09:12 +00:00
Craig Topper
e70c560b6d [X86] Remove some isel patterns that shouldn't be possible.
These included a bitcast of a load from v4f32 to v2f64, but DAG combine should have already changed the type of the load to remove the cast.

llvm-svn: 344573
2018-10-15 23:34:58 +00:00
Craig Topper
2909a3d9d0 [X86] Fix a bad bitcast in the load form of vXi16 uniform shift patterns for EVEX encoded instructions.
llvm-svn: 344563
2018-10-15 21:51:32 +00:00
Simon Pilgrim
095a7fe635 [AARCH64] Improve vector popcnt lowering with ADDLP
AARCH64 equivalent to D53257 - uses widening pairwise adds on vXi8 CTPOP to support i16/i32/i64 vectors.

This is a blocker for generic vector CTPOP expansion (P32655) - this will remove the aarch64 diff from D53258.

Differential Revision: https://reviews.llvm.org/D53259

llvm-svn: 344554
2018-10-15 21:15:58 +00:00
Konstantin Zhuravlyov
94dfcc2eb2 AMDGPU: Generate .amdgcn_target for object code v3
Differential Revision: https://reviews.llvm.org/D53221

llvm-svn: 344552
2018-10-15 20:37:47 +00:00
Aleksandar Beserminji
81eb440772 [mips][micromips] Fix overlaping FDEs error
When compiling static executable for micromips, CFI symbols
are incorrectly labeled as MICROMIPS, which cause
".eh_frame_hdr refers to overlapping FDEs." error.

This patch does not label CFI symbols as MICROMIPS, and FDEs do not
overlap anymore. This patch also exposes another bug, which is fixed
here: https://reviews.llvm.org/D52985

Differential Revision: https://reviews.llvm.org/D52987

llvm-svn: 344516
2018-10-15 14:39:12 +00:00
Aleksandar Beserminji
585f55bb8b [mips][micromips] Revert "Fix overlaping FDEs error"
This reverts r344511.

llvm-svn: 344515
2018-10-15 14:36:48 +00:00
Simon Pilgrim
5abb607ebe [ARM][NEON] Improve vector popcnt lowering with PADDL (PR39281)
As I suggested on PR39281, this patch uses PADDL pairwise addition to widen from the vXi8 CTPOP result to the target vector type.

This is a blocker for moving more x86 code to generic vector CTPOP expansion (P32655 + D53258) - ARM's vXi64 CTPOP currently expands, which would generate a vXi64 MUL but ARM's custom lowering expands the general MUL case and vectors aren't well handled in LegalizeDAG - improving the CTPOP lowering was a lot easier than fixing the MUL lowering for this one case......

Differential Revision: https://reviews.llvm.org/D53257

llvm-svn: 344512
2018-10-15 13:20:41 +00:00
Aleksandar Beserminji
10ec5c8c28 [mips][micromips] Fix overlaping FDEs error
When compiling static executable for micromips, CFI symbols
are incorrectly labeled as MICROMIPS, which cause
".eh_frame_hdr refers to overlapping FDEs." error.

This patch does not label CFI symbols as MICROMIPS, and FDEs do not
overlap anymore. This patch also exposes another bug, which is fixed
here: https://reviews.llvm.org/D52985

Differential Revision: https://reviews.llvm.org/D52987

llvm-svn: 344511
2018-10-15 12:59:17 +00:00
Chandler Carruth
edb12a838a [TI removal] Make variables declared as TerminatorInst and initialized
by `getTerminator()` calls instead be declared as `Instruction`.

This is the biggest remaining chunk of the usage of `getTerminator()`
that insists on the narrow type and so is an easy batch of updates.
Several files saw more extensive updates where this would cascade to
requiring API updates within the file to use `Instruction` instead of
`TerminatorInst`. All of these were trivial in nature (pervasively using
`Instruction` instead just worked).

llvm-svn: 344502
2018-10-15 10:04:59 +00:00
Craig Topper
06aea1720a [X86] Move promotion of vector and/or/xor from legalization to DAG combine
Summary:
I've noticed that the bitcasts we introduce for these make computeKnownBits and computeNumSignBits not work well in LegalizeVectorOps. LegalizeVectorOps legalizes bottom up while LegalizeDAG legalizes top down. The bottom up strategy for LegalizeVectorOps means operands are legalized before their uses. So we promote and/or/xor before we legalize the operands that use them making computeKnownBits/computeNumSignBits in places like LowerTruncate suboptimal. I looked at changing LegalizeVectorOps to be top down as well, but that was more disruptive and caused some regressions. I also looked at just moving promotion of binops to LegalizeDAG, but that had a few issues one around matching AND,ANDN,OR into VSELECT because I had to create ANDN as vXi64, but the other nodes hadn't legalized yet, I didn't look too hard at fixing that.

This patch seems to produce better results overall than my other attempts. We now form broadcasts of constants better in some cases. For at least some of them the AND was being introduced in LegalizeDAG, promoted to vXi64, and the BUILD_VECTOR was also legalized there. I think we got bad ordering of that. Now the promotion is out of the legalizer so we handle this better.

In the longer term I think we really should evaluate whether we should be doing this promotion at all. It's really there to reduce isel pattern count, but I'm wondering if we'd be better served just eating the pattern cost or doing C++ based isel for vector and/or/xor in X86ISelDAGToDAG. The masked and/or/xor will definitely be difficult in patterns if a bitcast gets between the vselect and the and/or/xor node. That becomes a lot of permutations to cover.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53107

llvm-svn: 344487
2018-10-15 01:51:58 +00:00
Craig Topper
671779456a [X86] Add 128 MOVDDUP to the constant pool printing in X86AsmPrinter::EmitInstruction.
We use this instruction to broadcast a single 64-bit value to a v2i64/v2f64 vector.

llvm-svn: 344486
2018-10-15 01:51:53 +00:00
Simon Pilgrim
861cd0ba44 [X86][AVX] Enable lowerVectorShuffleAsLanePermuteAndPermute v16i16/v32i8 shuffle lowering
Extends D53148 from v4f64 now that we have test coverage for v16i16/v32i8 shuffles.

llvm-svn: 344481
2018-10-14 17:34:20 +00:00
Dorit Nuzman
38bbf81ade recommit 344472 after fixing build failure on ARM and PPC.
llvm-svn: 344475
2018-10-14 08:50:06 +00:00
Dorit Nuzman
5118c68cde revert 344472 due to failures.
llvm-svn: 344473
2018-10-14 07:21:20 +00:00
Dorit Nuzman
8174368955 [IAI,LV] Add support for vectorizing predicated strided accesses using masked
interleave-group

The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.

Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53011

llvm-svn: 344472
2018-10-14 07:06:16 +00:00
Craig Topper
20fa085d74 [X86] Fix bad indentation. NFC
llvm-svn: 344471
2018-10-14 04:01:40 +00:00
Craig Topper
ec4b75f47a [X86] Type legalize v2f32 stores by widening to v4f32, casting to v2f64, extracting f64 and storing.
Summary: This is similar to what D52528 did for loads. It should match what generic type legalization does in 64-bit mode where it uses a v2i64 cast and an i64 store.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53173

llvm-svn: 344470
2018-10-14 03:36:27 +00:00
Benjamin Kramer
c55e997556 Move some helpers from the global namespace into anonymous ones.
llvm-svn: 344468
2018-10-13 22:18:22 +00:00
Thomas Lively
ffde98de21 [WebAssembly][NFC] Fix signed/unsigned comparison warning
llvm-svn: 344459
2018-10-13 16:58:03 +00:00
Simon Pilgrim
c5d7c6e5f6 [X86][SSE] Remove most of vector CTTZ custom lowering and use LegalizeDAG instead.
There is one remnant - AVX1 custom splitting of 256-bit vectors - which is due to a regression where the X86ISD::ANDNP is still performed as a YMM.

I've also tightened the CTLZ or CTPOP lowering in SelectionDAGLegalize::ExpandBitCount to require a legal CTLZ - it doesn't affect existing users and fixes an issue with AVX512 codegen.

llvm-svn: 344457
2018-10-13 16:11:15 +00:00
Simon Pilgrim
1c2051ead7 [X86][SSE] Begin removing vector CTTZ custom lowering and use LegalizeDAG instead.
Adds CTTZ vector legalization support and begins the removal of the X86/SSE custom lowering. 

llvm-svn: 344453
2018-10-13 15:16:55 +00:00
Simon Pilgrim
1c6d320351 [X86][SSE] combineIncDecVector - use isConstantSplat
Use isConstantSplat instead of ISD::isConstantSplatVector to let us us peek through to illegal types (in this case for i686 targets to recognise i64 constants)

llvm-svn: 344452
2018-10-13 14:45:44 +00:00
Simon Pilgrim
a03379527a [X86] Pull out target constant splat helper function. NFCI.
The code in LowerScalarImmediateShift is just a more powerful version of ISD::isConstantSplatVector.

llvm-svn: 344451
2018-10-13 14:28:40 +00:00
Simon Pilgrim
10434cbae1 Pull out repeated getOperand(). NFCI.
llvm-svn: 344450
2018-10-13 13:33:32 +00:00
Simon Pilgrim
bc141724c0 Remove unused variable. NFCI.
llvm-svn: 344449
2018-10-13 13:30:10 +00:00
Simon Pilgrim
f64e654d62 [X86][SSE] Improve CTTZ lowering when CTLZ is legal
If we have better CTLZ support than CTPOP, then use cttz(x) = width - ctlz(~x & (x - 1)) - and remove the CTTZ_ZERO_UNDEF handling as it no longer gives better codegen.

Similar to rL344447, this is also closer to LegalizeDAG's approach

llvm-svn: 344448
2018-10-13 13:05:19 +00:00
Simon Pilgrim
afead139cf [X86][SSE] Change CTTZ vector lowering to cttz(x) = ctpop(~x & (x - 1))
This patch changes the vector CTTZ lowering from:

cttz(x) = ctpop((x & -x) - 1)

to:

cttz(x) = ctpop(~x & (x - 1))

Not only does this make better use of the PANDN instruction, but it also matches the LegalizeDAG method which should allow us to remove the x86 specific code at some point in the future (we need to fix some issues with the bitcasted logic ops and CTPOP lowering first).

Differential Revision: https://reviews.llvm.org/D53214

llvm-svn: 344447
2018-10-13 12:12:06 +00:00
Simon Pilgrim
f3952413f7 [X86][AVX] Add lowerVectorShuffleAsLanePermuteAndPermute for v4f64 shuffles (PR39161)
Add shuffle lowering for the case where we can shuffle the lanes into place followed by an in-lane permute.

This is mainly for cases where we can have non-repeating permutes in each lane, but for now I've just enabled it for v4f64 unary shuffles to fix PR39161 - there is no test coverage for other shuffles that might benefit yet.

We now have several cross-lane shuffle lowering methods that all do something similar - I've looked at merging some of these (notably by making the repeated mask mechanism in lowerVectorShuffleByMerging128BitLanes optional), but there is a lot of assertions/assumptions in the way that makes this tricky - I ended up going for adding yet another relatively simple method instead.

Differential Revision: https://reviews.llvm.org/D53148

llvm-svn: 344446
2018-10-13 11:38:10 +00:00
Arnaud A. de Grandmaison
162435e7b5 [AArch64] Swap comparison operands if that enables some folding.
Summary:
AArch64 can fold some shift+extend operations on the RHS operand of
comparisons, so swap the operands if that makes sense.

This provides a fix for https://bugs.llvm.org/show_bug.cgi?id=38751

Reviewers: efriedma, t.p.northover, javed.absar

Subscribers: mcrosier, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D53067

llvm-svn: 344439
2018-10-13 07:43:56 +00:00
Thomas Lively
3afc346dd0 [WebAssembly] SIMD min and max
Summary: Depends on D52324 and D52764.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52325

llvm-svn: 344438
2018-10-13 07:26:10 +00:00
Thomas Lively
0ff82ac154 [WebAssembly][NFC] Unify ARGUMENT classes
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53172

llvm-svn: 344436
2018-10-13 07:09:10 +00:00
Alex Bradbury
748d080e62 [RISCV] Eliminate unnecessary masking of promoted shift amounts
SelectionDAGBuilder::visitShift will always zero-extend a shift amount when it
is promoted to the ShiftAmountTy. This results in zero-extension (masking)
which is unnecessary for RISC-V as the shift operations only read the lower 5
or 6 bits (RV32 or RV64).

I initially proposed adding a getExtendForShiftAmount hook so the shift amount
can be any-extended (D52975). @efriedma explained this was unsafe, so I have
instead eliminate the unnecessary and operations at instruction selection time
in a manner similar to X86InstrCompiler.td.

Differential Revision: https://reviews.llvm.org/D53224

llvm-svn: 344432
2018-10-12 23:18:52 +00:00
Craig Topper
3e76b2d736 [X86] Improve type legalization of (v2i32/v4i16/v8i16 (bitcast (v2f32))) to avoid a stack stack temporary.
llvm-svn: 344425
2018-10-12 22:00:04 +00:00
Craig Topper
c693a23025 [X86] Simplify the end of custom type legalization for (v2i32/v4i16/v8i8 (bitcast (f64))) by just emitting an EXTRACT_SUBVECTOR instead of a BUILD_VECTOR.
Generic legalization should be able to finish legalizing the EXTRACT_SUBVECTOR probably by turning it into a BUILD_VECTOR. But we should emit the simplest sequence.

llvm-svn: 344424
2018-10-12 22:00:00 +00:00
Craig Topper
a8a44f1bec [X86] Skip (v2i32/v4i16/v8i8 (bitcast (f64))) handling in ReplaceNodeResults if the dest type can be widened by generic legalization. NFCI
The algorithm we would do previously was identical to generic legalization. If we ever switch to legalizing integer vectors via widening we'll be able to kill off the code since it now only runs for promotion.

llvm-svn: 344423
2018-10-12 21:59:58 +00:00
Sanjay Patel
e28c8ecd72 [x86] add and use fast horizontal vector math subtarget feature
This is the planned follow-up to D52997. Here we are reducing horizontal vector math codegen 
by default. AMD Jaguar (btver2) should have no difference with this patch because it has 
fast-hops. (If we want to set that bit for other CPUs, let me know.)

The code changes are small, but there are many test diffs. For files that are specifically 
testing for hops, I added RUNs to distinguish fast/slow, so we can see the consequences 
side-by-side. For files that are primarily concerned with codegen other than hops, I just 
updated the CHECK lines to reflect the new default codegen.

To recap the recent horizontal op story:

1. Before rL343727, we were producing hops for all subtargets for a variety of patterns. 
   Hops were likely not optimal for all targets though.
2. The IR improvement in r343727 exposed a hole in the backend hop pattern matching, so 
   we reduced hop codegen for all subtargets. That was bad for Jaguar (PR39195).
3. We restored the hop codegen for all targets with rL344141. Good for Jaguar, but 
   probably bad for other CPUs.
4. This patch allows us to distinguish when we want to produce hops, so everyone can be 
   happy. I'm not sure if we have the best predicate here, but the intent is to undo the 
   extra hop-iness that was enabled by r344141.

Differential Revision: https://reviews.llvm.org/D53095

llvm-svn: 344361
2018-10-12 16:41:02 +00:00
Eric Liu
55ab86b72b Fix unused variable warning after r344348
llvm-svn: 344350
2018-10-12 15:01:11 +00:00
Simon Pilgrim
78b5a3c3ef [X86][SSE] LowerVectorCTPOP - pull out repeated byte sum stage.
Pull out repeated byte sum stage for popcount of vector elements > 8bits.

This allows us to simplify the LUT/BITMATH popcnt code to always assume vXi8 vectors, and also improves avx512bitalg codegen which only has access to vpopcntb/vpopcntw.

llvm-svn: 344348
2018-10-12 14:18:47 +00:00
Hiroshi Inoue
9552dd187a [PowerPC] avoid masking already-zero bits in BitPermutationSelector
The current BitPermutationSelector generates a code to build a value by tracking two types of bits: ConstZero and Variable.
ConstZero means a bit we need to mask off and Variable is a bit we copy from an input value.

This patch add third type of bits VariableKnownToBeZero caused by AssertZext node or zero-extending load node.
VariableKnownToBeZero means a bit comes from an input value, but it is known to be already zero. So we do not need to mask them.
VariableKnownToBeZero enhances flexibility to group bits, since we can avoid redundant masking for these bits.

This patch also renames "HasZero" to "NeedMask" since now we may skip masking even when we have zeros (of type VariableKnownToBeZero).

Differential Revision: https://reviews.llvm.org/D48025

llvm-svn: 344347
2018-10-12 14:02:20 +00:00
Simon Pilgrim
29279f29c8 [X86][SSE] Add extract_subvector(PSHUFB) -> PSHUFB(extract_subvector()) combine
Fixes PR32160 by reducing the size of PSHUFB if we only use one of the lanes.

This approach can probably be generalized to handle any target shuffle (and any subvector index) but we have no test coverage at the moment.

llvm-svn: 344336
2018-10-12 12:10:34 +00:00
Andrea Di Biagio
6eebbe0a97 [tblgen][llvm-mca] Add the ability to describe move elimination candidates via tablegen.
This patch adds the ability to identify instructions that are "move elimination
candidates". It also allows scheduling models to describe processor register
files that allow move elimination.

A move elimination candidate is an instruction that can be eliminated at
register renaming stage.
Each subtarget can specify which instructions are move elimination candidates
with the help of tablegen class "IsOptimizableRegisterMove" (see
llvm/Target/TargetInstrPredicate.td).

For example, on X86, BtVer2 allows both GPR and MMX/SSE moves to be eliminated.
The definition of 'IsOptimizableRegisterMove' for BtVer2 looks like this:

```
def : IsOptimizableRegisterMove<[
  InstructionEquivalenceClass<[
    // GPR variants.
    MOV32rr, MOV64rr,

    // MMX variants.
    MMX_MOVQ64rr,

    // SSE variants.
    MOVAPSrr, MOVUPSrr,
    MOVAPDrr, MOVUPDrr,
    MOVDQArr, MOVDQUrr,

    // AVX variants.
    VMOVAPSrr, VMOVUPSrr,
    VMOVAPDrr, VMOVUPDrr,
    VMOVDQArr, VMOVDQUrr
  ], CheckNot<CheckSameRegOperand<0, 1>> >
]>;
```

Definitions of IsOptimizableRegisterMove from processor models of a same
Target are processed by the SubtargetEmitter to auto-generate a target-specific
override for each of the following predicate methods:

```
bool TargetSubtargetInfo::isOptimizableRegisterMove(const MachineInstr *MI)
const;
bool MCInstrAnalysis::isOptimizableRegisterMove(const MCInst &MI, unsigned
CPUID) const;
```

By default, those methods return false (i.e. conservatively assume that there
are no move elimination candidates).

Tablegen class RegisterFile has been extended with the following information:
 - The set of register classes that allow move elimination.
 - Maxium number of moves that can be eliminated every cycle.
 - Whether move elimination is restricted to moves from registers that are
   known to be zero.

This patch is structured in three part:

A first part (which is mostly boilerplate) adds the new
'isOptimizableRegisterMove' target hooks, and extends existing register file
descriptors in MC by introducing new fields to describe properties related to
move elimination.

A second part, uses the new tablegen constructs to describe move elimination in
the BtVer2 scheduling model.

A third part, teaches llm-mca how to query the new 'isOptimizableRegisterMove'
hook to mark instructions that are candidates for move elimination. It also
teaches class RegisterFile how to describe constraints on move elimination at
PRF granularity.

llvm-mca tests for btver2 show differences before/after this patch.

Differential Revision: https://reviews.llvm.org/D53134

llvm-svn: 344334
2018-10-12 11:23:04 +00:00
Simon Pilgrim
c844bc84dd [X86] Ignore float/double non-temporal loads (PR39256)
Scalar non-temporal loads were asserting instead of just being ignored.

Reduced from https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10895

llvm-svn: 344331
2018-10-12 10:20:16 +00:00
Stefan Maksimovic
285c0f4fdc [mips] Mark fmaxl as a long double emulation routine
Failure was discovered upon running
projects/compiler-rt/test/builtins/Unit/divtc3_test.c
in a stage2 compiler build.

When compiling projects/compiler-rt/lib/builtins/divtc3.c,
a call to fmaxl within the divtc3 implementation had its
return values read from registers $2 and $3 instead of $f0 and $f2.
Include fmaxl in the list of long double emulation routines
to have its return value correctly interpreted as f128.

Almost exact issue here: https://reviews.llvm.org/D17760

Differential Revision: https://reviews.llvm.org/D52649

llvm-svn: 344326
2018-10-12 08:18:38 +00:00
Tom Stellard
a894043910 Revert "AMDGPU/GlobalISel: Implement select for G_INSERT"
This reverts commit r344310.

The test case was failing on some bots.

llvm-svn: 344317
2018-10-11 23:36:46 +00:00
Matthias Braun
d6131c9633 X86/TargetTransformInfo: Report div/rem constant immediate costs as TCC_Free
DIV/REM by constants should always be expanded into mul/shift/etc.
patterns. Unfortunately the ConstantHoisting pass runs too early at a
point where the pattern isn't expanded yet. However after
ConstantHoisting hoisted some immediate the result may not expand
anymore. Also the hoisting typically doesn't make sense because it
operates on immediates that will change completely during the expansion.

Report DIV/REM as TCC_Free so ConstantHoisting will not touch them.

Differential Revision: https://reviews.llvm.org/D53174

llvm-svn: 344315
2018-10-11 23:14:35 +00:00
Tom Stellard
4733be6e7b AMDGPU/GlobalISel: Implement select for G_INSERT
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53116

llvm-svn: 344310
2018-10-11 22:49:54 +00:00
Ana Pazos
0a5fcefa31 [RISCV] Fix disassembling of fence instruction with invalid field
Summary:
Instruction with 0 in fence field being disassembled as fence , iorw.
Printing "unknown" to match GAS behavior.

This bug was uncovered by a LLVM MC Disassembler Protocol Buffer Fuzzer
for the RISC-V assembly language.

Reviewers: asb

Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, jfb, PkmX, jocewei, asb

Differential Revision: https://reviews.llvm.org/D51828

llvm-svn: 344309
2018-10-11 22:49:13 +00:00
Richard Trieu
dfd1760b5f Inline variable into assert to avoid unused variable warning.
llvm-svn: 344308
2018-10-11 22:42:41 +00:00
Craig Topper
35d513c7e4 [X86] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector.
On 64-bit targets the generic legalize will use an i64 load and a scalar_to_vector for us. But on 32-bit targets i64 isn't legal and the generic legalizer will end up emitting two 32-bit loads. We have DAG combines that try to put those two loads back together with pretty good success.

This patch instead uses f64 to avoid the splitting entirely. I've made it do the same for 64-bit mode for consistency and to keep the load in the fp domain.

There are a few things in here that look like regressions in 32-bit mode, but I believe they bring us closer to the 64-bit mode codegen. And that the 64-bit mode code could be better. I think those issues should be looked at separately.

Differential Revision: https://reviews.llvm.org/D52528

llvm-svn: 344291
2018-10-11 20:36:06 +00:00
Thomas Lively
f04bed8e79 [WebAssembly][NFC] Remove repetition of Defs = [ARGUMENTS] (fixed)
llvm-svn: 344287
2018-10-11 20:21:22 +00:00
Sumanth Gundapaneni
a4a9155e4f [Hexagon] Restrict compound instructions with constant value.
Having a constant value operand in the compound instruction
is not always profitable. This patch improves coremark by ~4% on
Hexagon.

Differential Revision: https://reviews.llvm.org/D53152

llvm-svn: 344284
2018-10-11 19:48:15 +00:00
Thomas Lively
ab37189f7e [WebAssembly] Revert rL344180, which was breaking expensive checks
llvm-svn: 344280
2018-10-11 18:45:48 +00:00
Krzysztof Parzyszek
5d3a6f76a8 [Hexagon] Eliminate potential sources of non-determinism in HCE
Also, avoid comparing GUIDs when ordering global addresses, because
source file location can cause different GUID to be calculated. As a
result, a pair of symbols can compare "less" in one directory, but
"greater" in another.

llvm-svn: 344271
2018-10-11 18:26:02 +00:00
Craig Topper
fb2ac8969e [X86] Restore X86ISelDAGToDAG::matchBEXTRFromAnd. Teach address matching to create a BEXTR pattern from a (shl (and X, mask >> C1) if C1 can be folded into addressing mode.
This is an alternative to D53080 since I think using a BEXTR for a shifted mask is definitely an improvement when the shl can be absorbed into addressing mode. The other cases I'm less sure about.

We already have several tricks for handling an and of a shift in address matching. This adds a new case for BEXTR.

I've moved the BEXTR matching code back to X86ISelDAGToDAG to allow it to match. I suppose alternatively we could directly emit a X86ISD::BEXTR node that isel could pattern match. But I'm trying to view BEXTR matching as an isel concern so DAG combine can see 'and' and 'shift' operations that are well understood. We did lose a couple cases from tbm_patterns.ll, but I think there are ways to recover that.

I've also put back the manual load folding code in matchBEXTRFromAnd that I removed a few months ago in r324939. This gives us some more freedom to make decisions based on the ability to fold a load. I haven't done anything with that yet.

Differential Revision: https://reviews.llvm.org/D53126

llvm-svn: 344270
2018-10-11 18:06:07 +00:00
Diogo N. Sampaio
352a2fa1e7 [AARCH64][FIX] Emit data symbol for constant pool data
The ARM64 elf emitter would omit printing data
symbol for zero filled constant data. This patch
overrides the emitFill method as to enforce that
the symbol is correctly printed.

Differential revision: https://reviews.llvm.org/D53132

llvm-svn: 344248
2018-10-11 14:10:32 +00:00
Roman Lebedev
4225f4adff [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ~(-1 << nbits) pattern
Summary:
As discussed in D48491, we can't really do this in the TableGen,
since we need to produce *two* instructions. This only implements
one single pattern. The other 3 patterns will be in follow-ups.

I'm not sure yet if we want to also fuse shift into here
(i.e `(x >> start) & ...`)

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D52304

llvm-svn: 344224
2018-10-11 07:51:13 +00:00
Thomas Lively
7fa7e6a284 [WebAssembly][NFC] Use intrinsic dag nodes directly
Summary: Instead of custom lowering to WebAssemblyISD nodes first.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53119

llvm-svn: 344211
2018-10-11 00:49:24 +00:00
Thomas Lively
2ebacb107b [WebAssembly] Saturating float to int intrinsics
Summary:
Although the saturating float to int instructions are already
emitted from normal IR, the fpto{s,u}i instructions produce poison
values if the argument cannot fit in the result type. These intrinsics
are therefore necessary to get guaranteed defined saturating behavior.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53004

llvm-svn: 344204
2018-10-11 00:01:25 +00:00
Craig Topper
b5421c498d [X86] Prevent non-temporal loads from folding into instructions by blocking them in X86DAGToDAGISel::IsProfitableToFold rather than with a predicate.
Remove tryFoldVecLoad since tryFoldLoad would call IsProfitableToFold and pick up the new check.

This saves about 5K out of ~600K on the generated isel table.

llvm-svn: 344189
2018-10-10 21:48:34 +00:00
George Burgess IV
6ef8002c2c Replace most users of UnknownSize with LocationSize::unknown(); NFC
Moving away from UnknownSize is part of the effort to migrate us to
LocationSizes (e.g. the cleanup promised in D44748).

This doesn't entirely remove all of the uses of UnknownSize; some uses
require tweaks to assume that UnknownSize isn't just some kind of int.
This patch is intended to just be a trivial replacement for all places
where LocationSize::unknown() will Just Work.

llvm-svn: 344186
2018-10-10 21:28:44 +00:00