In ARMOperand::print:
- Print human-readable register names, instead of numbers.
- Print the correct names for IT condition masks (these were in the wrong order
before).
- Print all parts of memory operands, not just the base register.
This makes the output of llvm-mc -show-inst-operands more readable.
Differential revision: https://reviews.llvm.org/D54850
llvm-svn: 347494
A consequence of r347274 is that SCALAR_TO_VECTOR can be converted into
BUILD_VECTOR by SimplifyDemandedBits, but LowerBUILD_VECTOR can turn
BUILD_VECTOR into SCALAR_TO_VECTOR so we get an infinite loop.
Fix this by making LowerBUILD_VECTOR not do this transformation for those
vectors that would get transformed back, i.e. BUILD_VECTOR of a single-element
constant vector. Doing that means we get a DUP, which we then need to recognise
in ISel as a copy.
llvm-svn: 347456
Implement getIntrinsicInstrCost() and return costs reflecting that bswap can
be done with a vperm per vector register.
Review: Ulrich Weigand
https://reviews.llvm.org/D54789
llvm-svn: 347445
This is further cleanup for PPCMCCodeEmitter. The class had been contained
within the cpp file alone. Now it has been split up between a header file and
a cpp file which allows other classes to make use of the functions in this class
if required.
llvm-svn: 347428
R_MIPS_JALR/R_MICROMIPS_JALR can now be parsed in .s files and emitted to .o.
They are still not generated with JALR.
Differential revision: https://reviews.llvm.org/D54721
llvm-svn: 347398
Add missing linkage from Nios2CodeGen library to Nios2AsmPrinter
library. The missing dependency causes shared-lib build to fail with
the following reason:
lib/Target/Nios2/CMakeFiles/LLVMNios2CodeGen.dir/Nios2AsmPrinter.cpp.o: In function `(anonymous namespace)::Nios2AsmPrinter::PrintAsmMemoryOperand(llvm::MachineInstr const*, unsigned int, unsigned int, char const*, llvm::raw_ostream&)':
Nios2AsmPrinter.cpp:(.text._ZN12_GLOBAL__N_115Nios2AsmPrinter21PrintAsmMemoryOperandEPKN4llvm12MachineInstrEjjPKcRNS1_11raw_ostreamE+0x2b): undefined reference to `llvm::Nios2InstPrinter::getRegisterName(unsigned int)'
lib/Target/Nios2/CMakeFiles/LLVMNios2CodeGen.dir/Nios2AsmPrinter.cpp.o: In function `(anonymous namespace)::Nios2AsmPrinter::PrintAsmOperand(llvm::MachineInstr const*, unsigned int, unsigned int, char const*, llvm::raw_ostream&)':
Nios2AsmPrinter.cpp:(.text._ZN12_GLOBAL__N_115Nios2AsmPrinter15PrintAsmOperandEPKN4llvm12MachineInstrEjjPKcRNS1_11raw_ostreamE+0x97): undefined reference to `llvm::Nios2InstPrinter::getRegisterName(unsigned int)'
collect2: error: ld returned 1 exit status
Differential Revision: https://reviews.llvm.org/D47810
llvm-svn: 347387
We have efficient codegen on P9 for lowering bswap that involves moving
the value into a vector reg and moving it back. However, the check under
which we custom lowered it did not adequately reflect the actual requirements.
It required only that the subtarget be an implementation of ISA 3.0 since all
compliant implementations have to provide the vector instructions.
However, the kernel builds have a valid use case for -mno-altivec -mcpu=pwr9
(i.e. don't emit vector code, don't have to save vector regs for context
switch). So we should require the correct features for this lowering.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39334
llvm-svn: 347376
These are AVX2 instructions, but have been incorrectly marked in tablegen for a while. This wasn't a problem until r346784 switched the patterns to use target independent ISD opcodes. This made the patterns visible to fast isel.
Fixes PR39733
llvm-svn: 347375
We can't guarantee that demanded bits passing through the vector shuffle won't cause the AND in front of this to be removed. This would prevent the PACKUS from being matched during shuffle lowering.
Unfortunately, this adds a packuswb to one of the vector-reduce-mul.ll tests since we were removing the shuffle via SimplifyDemandedVectorElts. We appear to have similar issues with vpmovwb on the same test case on other targets.
llvm-svn: 347361
Previously we emitted to separate shuffles, one for unpcklbw and one for unpcklwd. Instead emit a single shuffle equivalent to both of the original shuffles. Shuffle lowering seems able to handle it. This avoids a bitcast between the two shuffles which seems helpful to DAG combine.
Remove the custom type legalization for v8i8->v8i32. I had put that in to avoid some almost duplicate punpcklbw instructions I was seeing, but this lowering change seems to fix that. It also fixes some duplicate shuffles seen in vector-sext.ll
llvm-svn: 347348
Rather than assuming that `tempRet0` exists in linear memory only assume
the getter/setter functions exist. This avoids conflicting with
binaryen which declares a wasm global for this purpose and defines it's
own getter and setter for that.
The other advantage of doing things this way is that it leaving
it up to the linker/finalizer to decide how to actually store this
temporary. As it happens binaryen uses a wasm global which is more
appropriate since it is thread safe.
This also allows us to change the way this is stored in the future
(memory, TLS memory, wasm global) without modifying LLVM.
This is part of a 4 part change:
LLVM: https://reviews.llvm.org/D53240
fastcomp: https://github.com/kripken/emscripten-fastcomp/pull/237
emscripten: https://github.com/kripken/emscripten/pull/7358
binaryen: https://github.com/WebAssembly/binaryen/pull/1709
Differential Revision: https://reviews.llvm.org/D53240
llvm-svn: 347340
When doing some instruction scheduling work, we noticed some missing itineraries.
Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling,
because we can still get same latency due to default values.
With machine scheduler, however, itineraries will have impact to scheduling.
eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class.
And most of the instruction class with itineraries will have NumMicroOps default to 1.
This will has impact on the count of RetiredMOps, affects the Pending/Available Queue,
then causing different scheduling or suboptimal scheduling further.
This patch is for STWU/STWUX (IIC_LdStStoreUpd ) for P8.
Since there are already multiple IIC for store update, this patch also merge
IIC_LdStSTDU/IIC_LdStStoreUpd to IIC_LdStSTU
IIC_LdStSTDUX to IIC_LdStSTUX
and we add a new testcase in https://reviews.llvm.org/D54699 to show the difference.
Differential Revision: https://reviews.llvm.org/D54700
llvm-svn: 347311
Pull out getPackDemandedElts demanded elts remapping helper from computeKnownBitsForTargetNode and use in computeKnownBits/ComputeNumSignBits.
llvm-svn: 347303
Previously if V2 was unused we ended up using V1 for both inputs as part of the code that follows the new code. By using lowerVectorShuffleWithUNPCK we keep the undef nature of V2 in the output.
As near as I can tell this makes v16i8 behavior consistent with every other VT now.
This does mean that we give the register allocator freedom to fill in random registers now and create false dependencies. But like I said we're already doing that for other types.
llvm-svn: 347296
getZeroVector produces a specifically canonicalized zero vector, but we can just let DAG legalization take care of it.
The test changes are because MULH lowering happens later than it should and this change gave us the opportunity to constant fold away a multiply during a DAG combine before the build_vector got legalized with a bitcast.
llvm-svn: 347290
Turns out that there was no check for a store that truncates down
to a single byte when combining a (store (bswap...)) into a byte-swapping
store. This patch just adds that check.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39478.
llvm-svn: 347288
This works if DAG combiner is enabled, but without combining
we cannot select scalar_to_vector of <2 x half> and <2 x i16>.
Differential Revision: https://reviews.llvm.org/D54718
llvm-svn: 347259
We're seeing some issues internally where we sent some intrinsics into the cost model that the getTypeLegalizationCost call fails on, but X86 specific tables don't care about. Our base class implementation takes care of them. We'd just like X86 backend to ignore them.
This patch makes sure the switch returned something X86 cares about and skips the table lookups and type legalization call if not. Probably more efficient too since we don't go scanning the tables for every intrinsic we could possibly see.
Differential Revision: https://reviews.llvm.org/D54711
llvm-svn: 347248
SSE PSHUFB vector ctlz lowering works at the i4 nibble level. As detailed in PR39703, we were masking the lower nibble off but we only actually use it in the case where the upper nibble is known to be zero, making it safe to remove the mask and save an instruction.
Differential Revision: https://reviews.llvm.org/D54707
llvm-svn: 347242
Previously we split the vectors in half to allow the two halves to be any extended then concatenated the results back together.
This patch instead instead extends the v16i8 sse algorithm to extend half of each 128-bit lane using punpcklbw/punpckhbw. Multiplies all the low half lanes and high half lanes together in separate operations. Then merges the half lane results back together using packuswb.
Unfortunately, some of the cases in vector-reduce-mul.ll regress because we aren't narrowing the vector width of the multiplies as we reduce. The splitting was somewhat making up for that before by causing halves to be discarded after the split.
Differential Revision: https://reviews.llvm.org/D54668
llvm-svn: 347240
This allows to avoid scratch use or indirect VGPR addressing for
small vectors.
Differential Revision: https://reviews.llvm.org/D54606
llvm-svn: 347231
Summary:
This makes it easier/cleaner to generate a single signature from
this directive. Also:
- Adds the symbol name, such that we don't depend on the location
of this directive anymore.
- Actually constructs the signature in the assembler, and make the
assembler own it.
- Refactor the use of MVT vs ValType in the streamer and assembler
to require less conversions overall.
- Changed 700 or so tests to use it.
Reviewers: sbc100, dschuff
Subscribers: jgravelle-google, eraman, aheejin, sunfish, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D54652
llvm-svn: 347228
Summary:
AMDGPUAsmPrinter has a getSTI function that derives a GCNSubtarget from the
TM. However, this means that overridden target features are not detected and can
result in incorrect behaviour.
Switch to using STM which is a GCNSubtarget derived from the MF (used elsewhere
in the same function).
Change-Id: Ib6328ad667b7fcdc87e9c06344e59859207db9b0
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D54301
llvm-svn: 347221
This patch defines an interleaved-load-combine pass. The pass searches
for ShuffleVector instructions that represent interleaved loads. Matches are
converted such that they will be captured by the InterleavedAccessPass.
The pass extends LLVMs capabilities to use target specific instruction
selection of interleaved load patterns (e.g.: ld4 on Aarch64
architectures).
Differential Revision: https://reviews.llvm.org/D52653
llvm-svn: 347208
Truncs are treated as sources if their produce a value of the same
type as the one we currently trying to promote. Truncs used to be
considered as a sink if their operand was the same value type.
We now allow smaller types in the search, so we should search through
truncs that produce a smaller value. These truncs can then be
converted to an AND mask.
This leaves sinks as being:
- points where the value in the register is being observed, such as
an icmp, switch or store.
- points where value types have to match, such as calls and returns.
- zext are included to ease the transformation and are generally
removed later on.
During this change, it also became apart from truncating sinks was
broken: if a sink used a source, its type information had already
been lost by the time the truncation happens. So I've changed the
method of caching the type information.
Differential Revision: https://reviews.llvm.org/D54515
llvm-svn: 347191
There is no variable-length shifts on MSP430. Therefore
"eat" 8 bits of shift via bswap & ext.
Path by Kristina Bessonova!
Differential Revision: https://reviews.llvm.org/D54623
llvm-svn: 347187
The shift requires a copy to avoid clobbering a register. Comparing with 0 uses an xor to produce 0 that will be overwritten with the compare results. So still requires 2 instructions, but should be one byte shorter since it doesn't need to encode an immediate.
llvm-svn: 347185
Previously we used an arithmetic shift right by 31, but that requires a copy to preserve the input. So we might as well materialize a zero and compare to it since the comparison will overwrite the register that contains the zeros. This should be one byte shorter.
llvm-svn: 347181
Leave just the v4i8->v4i64 and v8i8->v8i64, but only enable them on pre-sse4.1 targets when 64-bit mode is enabled. In those cases we end up creating sext loads that get scalarized to code that looks better than what we get from loading into a vector register and doing a multiple step sign extend using unpacks and shifts.
llvm-svn: 347180
Pre-SSE4.1 sext_invec for v2i64 is complicated because we don't have a v2i64 sra instruction. So instead we sign extend to i32 using unpack and sra, then copy the elements and do a v4i32 sra to fill with sign bits, then interleave the i32 sign extend and the sign bits. So really we're doing to two sign extends but only using half of the v4i32 intermediate result.
When the result is more than 128 bits, default type legalization would prefer to split the destination type all the way down to v2i64 with shuffles followed by v16i8/v8i16->v2i64 sext_inreg operations. This results in more instructions than necessary because we are only utilizing the lower 2 elements of the v4i32 intermediate result. Instead we can custom split a v4i8/v4i16->v4i64 sign_extend. Then we can sign extend v4i8/v4i16->v4i32 invec producing a full v4i32 result. Create the sign bit vector as a v4i32 then split and interleave with the sign bits using an punpackldq and punpackhdq.
llvm-svn: 347176
If we widen illegal types instead of promoting, we should be able to rely on the type legalizer to create the vector_inreg operations for us with some caveats.
This patch disables combineToExtendVectorInReg when we are using widening.
I've enabled custom legalization for v8i8->v8i64 extends under avx512f since the type legalizer would want to create a vector_inreg with a v64i8 input type which isn't legal without avx512bw. So we go to v16i8 with custom code using the relaxation of rules we get from D54346.
I've also enable custom legalization of v8i64 and v16i32 operations with with AVX. When the input type is 128 bits, the default splitting legalization would extend first 128->256, then do the a split to two 128 pieces. Extend each half to 256 and then concat the result. The custom legalization I've added instead uses a 128->256 bit vector_inreg extend that only reads the lower 64-bits for the low half of the split. Then shuffles the high 64-bits to the low 64-bits and does another vector_inreg extend.
llvm-svn: 347172
Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54671
llvm-svn: 347171
Refactor towards making this recursive (necessary for PR38243 rotation splat detection).
IsSplatVector returns the original vector source of the splat and the splat index.
GetSplatValue returns the scalar splatted value as an extraction from IsSplatVector.
llvm-svn: 347168
We were using the 'normalized' shuffle mask from resolveTargetShuffleInputs, which replaces zero/undef inputs with sentinel values. For SimplifyDemandedVectorElts we need the raw mask so we can correctly demand those 'zero' inputs that got normalized away, this requires an extra bit of logic to locally normalize undef inputs.
llvm-svn: 347158
The zero extend will require two stages of unpacks to implement. So its better to shrink the multiply using pmullw and then extend that result back to v4i32 using a single unpack.
llvm-svn: 347149
This tries to force the result type to vXi32 followed by a truncate. This can help avoid scalarization that would otherwise occur.
There's some annoying examples of an avx512 truncate instruction followed by a packus where we should really be able to just use one truncate. But overall this is still a net improvement.
llvm-svn: 347105
When unwinding past a function that uses shadow call stack, we must
subtract 8 from the value of the x18 register. This patch causes us
to emit a call frame instruction that causes that to happen.
Differential Revision: https://reviews.llvm.org/D54609
llvm-svn: 347089
Summary:
As discussed in previous review, and noted in the FIXME, if `X` is actually an `lshr Y, Z` (logical!),
we can fold the `Z` into 'control`, and let the `BEXTR` do this too.
We could just insert those 8 bits of shift amount into control,
but it is better to instead zero-extend them, and 'or' them in place.
We can only do this for `lshr`, not `ashr`, because we do not know that the mask cover only the bits of `Y`,
and not any of the sign-extended bits.
The obvious question is, is this actually legal to do?
I believe it is. Relevant quotes, from `Intel® 64 and IA-32 Architectures Software Developer’s Manual`, `BEXTR — Bit Field Extract`:
* `Bit 7:0 of the second source operand specifies the starting bit position of bit extraction.`
* `A START value exceeding the operand size will not extract any bits from the second source operand.`
* `Only bit positions up to (OperandSize -1) of the first source operand are extracted.`
* `All higher order bits in the destination operand (starting at bit position LENGTH) are zeroed.`
* `The destination register is cleared if no bits are extracted.`
FIXME: if we can do this, i wonder if we should prefer `BEXTR` over `BZHI` in such cases.
Reviewers: RKSimon, craig.topper, spatel, andreadb
Reviewed By: RKSimon, craig.topper, andreadb
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54095
llvm-svn: 347048
The RISC-V ISA manual was updated on 2018-11-07 (commit 00557c3) to define a
new compressed instruction format, RVC format CA (no actual instruction
encodings were changed). This patch updates the RISC-V backend to define the
new format, and to use it in the relevant instructions.
Differential Revision: https://reviews.llvm.org/D54302
Patch by Luís Marques.
llvm-svn: 347043
This commit introduces support for materialising 64-bit constants for RV64I,
making use of the RISCVMatInt::generateInstSeq helper in order to share logic
for immediate materialisation with the MC layer (where it's used for the li
pseudoinstruction).
test/CodeGen/RISCV/imm.ll is updated to test RV64, and gains new 64-bit
constant tests. It would be preferable if anyext constant returns were sign
rather than zero extended (see PR39092). This patch simply adds an explicit
signext to the returns in imm.ll.
Further optimisations for constant materialisation are possible, most notably
for mask-like values which can be generated my loading -1 and shifting right.
A future patch will standardise on the C++ codepath for immediate selection on
RV32 as well as RV64, and then add further such optimisations to
RISCVMatInt::generateInstSeq in order to benefit both RV32 and RV64 for
codegen and li expansion.
Differential Revision: https://reviews.llvm.org/D52962
llvm-svn: 347042
Introduces support for '.refsym' assembler directive.
From GCC docs (for MSP430):
'.refsym' - This directive instructs assembler to add an undefined reference
to the symbol following the directive. No relocation is created for this symbol;
it will exist purely for pulling in object files from archives.
Patch by Kristina Bessonova!
Differential Revision: https://reviews.llvm.org/D54618
llvm-svn: 347041
By early promoting the multiply to use an i16 element type we can avoid op legalization emit a second multiply for the 8 upper elements of the v16i8 type we would otherwise get.
llvm-svn: 347032
If a block had one of the _term instructions used for gluing
exec modifying instructions to the end of the block,
analyzeBranch would fail, preventing the verifier from catching
a broken successor list.
llvm-svn: 347027
We aren't going to use the upper bits of the multiply result that the extend would effect. So we don't need a specific type of extend.
This makes some reduction test cases shorter because we were previously trying to sign_extend a truncate which we can't eliminate.
llvm-svn: 347011
Add a pass to fixup various vector ISel issues.
Currently we handle converting GLOBAL_{LOAD|STORE}_*
and GLOBAL_Atomic_* instructions into their _SADDR variants.
This involves feeding the sreg into the saddr field of the new instruction.
llvm-svn: 347008
Summary:
`throw` instruction is a terminator in wasm, but BBs were not splitted
after `throw` instructions, causing machine instruction verifier to
fail.
This patch
- Splits BBs after `throw` instructions in WasmEHPrepare and adding an
unreachable instruction after `throw`, which will be deleted in
LateEHPrepare pass
- Refactors WasmEHPrepare into two member functions
- Changes the semantics of `eraseBBsAndChildren` in LateEHPrepare pass
to match that of WasmEHPrepare pass, which is newly added. Now
`eraseBBsAndChildren` does not delete BBs with remaining predecessors.
- Fixes style nits, making static function names conform to clang-tidy
- Re-enables the test temporarily disabled by rL346840 && rL346845
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D54571
llvm-svn: 347003
Removing this code doesn't affect any lit tests so it doesn't appear to be tested anymore. I assume it was when it was added, but I guess something else changed? Code coverage report also says its unused.
I mostly didn't like that it seemed to count the sign bits as if it was a sign_extend, but then set isPositive as if it was a zero_extend. It feels like we should have picked one interpretation?
Differential Revision: https://reviews.llvm.org/D54596
llvm-svn: 346995
Use unsigned to calculate the subvector index to avoid a cast.
Remove an unnecessary condition and replace it with a stronger assert.
Use the InVT variable we updated when we extracted instead of grabbing it from the In SDValue.
llvm-svn: 346983
In reduceVMULWidth, we no longer need to worry about extending the vector to 128 bits first. Regular widening of extends, muls and shuffles will take care of that for us.
In combineMulToPMADDWD, we can handle v2i32 multiplies and allow the VPMADDWD to be widened to v4i32 during type legalization by adding custom widening like we do have for AVG/ADDUS/SUBUS. I had to modify that code a little to allow different and output VTs.
Differential Revision: https://reviews.llvm.org/D54512
llvm-svn: 346980
Summary:
The old return type did not allow for correct error reporting and was
causing a compiler warning.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D54586
llvm-svn: 346979
This fixes -filetype=null support when compiling for a Win32 target and the module has a CodeView flag.
The only places changed are the uses of getTargetStreamer function - this patch guards both of them with null checks.
Committed on behalf of @eush (Eugene Sharygin)
Differential Revision: https://reviews.llvm.org/D54008
llvm-svn: 346962
C.EBREAK was defined with hasSideEffects = 0, which is incorrect and
inconsistent with the non-compressed instruction form. This patch corrects
this oversight.
This wouldn't cause codegen issues, as compressed instructions are only ever
generated by converting the non-compressed form as an MCInst. But having
correct flags is still worthwhile.
Differential Revision: https://reviews.llvm.org/D54256
Patch by Luís Marques.
llvm-svn: 346959
Mark the FREM SelectionDAG node as Expand, which is necessary in order to
support the frem IR instruction on RISC-V. This is expanded into a library
call. Adds the corresponding test. Previously, this would have triggered an
assertion at instruction selection time.
Differential Revision: https://reviews.llvm.org/D54159
Patch by Luís Marques.
llvm-svn: 346958
Reapply r346374 with the fixes for modules build.
Original summary:
This change implements assembler parser, code emitter, ELF object writer
and disassembler for the MSP430 ISA. Also, more instruction forms are added
to the target description.
Patch by Michael Skvortsov!
llvm-svn: 346948
Logic to load 32-bit and 64-bit immediates is currently present in
RISCVAsmParser::emitLoadImm in order to support the li pseudoinstruction. With
the introduction of RV64 codegen, there is a greater benefit of sharing
immediate materialisation logic between the MC layer and codegen. The
generateInstSeq helper allows this by producing a vector of simple structs
representing the chosen instructions. This can then be consumed in the MC
layer to produce MCInsts or at instruction selection time to produce
appropriate SelectionDAG node. Sharing this logic means that both the li
pseudoinstruction and codegen can benefit from future optimisations, and
that this logic can be used for materialising constants during RV64 codegen.
This patch does contain a behaviour change: addi will now be produced on RV64
when no lui is necessary to materialise the constant. In that case addiw takes
x0 as the source register, so is semantically identical to addi.
Differential Revision: https://reviews.llvm.org/D52961
llvm-svn: 346937
This avoids some nasty shuffles when we have avx512. It will also prevent using zmm truncate instructions when a ymm instruction that zeroes part of an xmm register will do. Also avoid using avx512 truncate instructions when the input is 128 bits or less. These instructions are 2 uops on skx so we can probably find a better single uop shuffle like pshufb.
llvm-svn: 346936
The narrow types end up requesting widening, but generic legalization will end up scalaring and using a build_vector to do the widening.
llvm-svn: 346916
On 64-bit targets the type legalizer will use i64 to legalize these. But when i64 isn't legal, the type legalizer won't try an FP type. So do it manually instead.
There are a few regressions in here due to some v2i32 operations like mul and div now being reassembled into a full vector just to store instead of storing the pieces. But this was already occuring in 64-bit mode so its not a new issue.
llvm-svn: 346908
Using the MBB flags, we can tell if X16/X17/NZCV are unused in a block,
and also not live out.
If this holds for all MBBs, then we can avoid checking for liveness on
that candidate. Furthermore, if it holds for an individual candidate's
MBB, then we can avoid checking for liveness on that candidate.
llvm-svn: 346901
The machine scheduler currently biases register copies to/from
physical registers to be closer to their point of use / def to
minimize their live ranges. This change extends this to also physical
register assignments from immediate values.
This causes a reduction in reduction in overall register pressure and
minor reduction in spills and indirectly fixes an out-of-registers
assertion (PR39391).
Most test changes are from minor instruction reorderings and register
name selection changes and direct consequences of that.
Reviewers: MatzeB, qcolombet, myatsina, pcc
Subscribers: nemanjai, jvesely, nhaehnle, eraman, hiraditya,
javed.absar, arphaman, jfb, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D54218
llvm-svn: 346894
Narrower vectors will be widened to 128 bits without changing the element size. And generic type legalization can already handle widening mulhu/mulhs.
Differential Revision: https://reviews.llvm.org/D54513
llvm-svn: 346879
Add support for the expansion of funnelshift/rotates to getIntrinsicInstrCost.
This also required us to move the X86 fshl/fshr costs to the same place as the rotates to avoid expansion and get correct scalarization vs vectorization costs.
llvm-svn: 346854
This patch removes the last use of the constant pool shuffle decode helper and consistently uses the 'getTargetShuffleMaskIndices' versions instead. The constant pool versions are now purely used for assembly comments.
The avx512vbmi intrinsic upgrades had to be altered as they were being decoded as broadcasts, similar to what I fixed in rL346032. I don't think the change is critical - although its annoying that we lose the {k}{z} instruction test coverage as they are tricky to generate....
Differential Revision: https://reviews.llvm.org/D54083
llvm-svn: 346850
Summary:
This adds support for the 'event section' specified in the exception
handling proposal. (This was named 'exception section' first, but later
renamed to 'event section' to take possibilities of other kinds of
events into consideration. But currently we only store exception info in
this section.)
The event section is added between the global section and the export
section. This is for ease of validation per request of the V8 team.
This patch:
- Creates the event symbol type, which is a weak symbol
- Makes 'throw' instruction take the event symbol '__cpp_exception'
- Adds relocation support for events
- Adds WasmObjectWriter / WasmObjectFile (Reader) support
- Adds obj2yaml / yaml2obj support
- Adds '.eventtype' printing support
Reviewers: dschuff, sbc100, aardappel
Subscribers: jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D54096
llvm-svn: 346825
To make ISD::VSELECT available(legal) so long as there are altivec instruction, otherwise it's default behavior is expanding,
which is legalized at type-legalization phase. Use xxsel to match vselect if vsx is open, or use vsel.
Differential Revision: https://reviews.llvm.org/D49531
llvm-svn: 346824
If we keep track of if the ContainsCalls bit is set in the MBB flags for each
candidate, then we have a better chance of not checking the candidate for calls
at all.
This saves quite a few checks in some CTMark tests (~200 in Bullet, for
example.)
llvm-svn: 346816
We already determine a bunch of information about an MBB in
getMachineOutlinerMBBFlags. We can reuse that information to avoid calculating
things that must be false/true.
The first thing we can easily check is if an outlined sequence could ever
contain calls. There's no reason to walk over the outlined range, checking for
calls, if we already know that there are no calls in the block containing the
sequence.
llvm-svn: 346809
Since we never outline anything with fewer than 2 occurrences, there's no
reason to compute cost model information if there's less than that.
llvm-svn: 346803
An extractelement with non-constant index will be lowered either to
scratch or movrel loop in most cases. This patch converts such
instruction into a set of selects if vector size is not too big.
Differential Revision: https://reviews.llvm.org/D54351
llvm-svn: 346800
Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output.
This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes.
X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code.
I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements.
The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits.
Differential Revision: https://reviews.llvm.org/D54346
llvm-svn: 346784
Summary:
Without this change I get the following error:
lib/Target/AVR/AVRGenAsmMatcher.inc:1135:1: error: redundant #include of module 'LLVM_Utils.Support.Format' appears within namespace 'llvm' [-Wmodules-import-nested-redundant]
Reviewers: dylanmckay
Reviewed By: dylanmckay
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53425
llvm-svn: 346750
If a loaded value is replicated it is best to combine these two operations
into a VLREP (load and replicate), but isel will not produce this if the load
has other users as well.
This patch handles this by putting the other users of the load to use the
REPLICATE 0-element instead of the load. This way the load has only the
REPLICATE node as user, and we get a VLREP.
Review: Ulrich Weigand
https://reviews.llvm.org/D54264
llvm-svn: 346746
Turns out it's way simpler to do this check with one LRU. Instead of
maintaining two, just keep one. Check if each of the registers is available,
and then check if it's a live out from the block. If it's a live out, but
available in the block, we know we're in an unsafe case.
llvm-svn: 346721
Instead of returning Flags, return true if the MBB is safe to outline from.
This lets us check for unsafe situations, like say, in AArch64, X17 is live
across a MBB without being defined in that MBB. In that case, there's no point
in performing an instruction mapping.
llvm-svn: 346718
This patch adds the ability to use a PALIGNR to rotate a pair of inputs to select a range containing all the referenced elements, followed by a single input permute to put them in the right location.
Differential Revision: https://reviews.llvm.org/D54267
llvm-svn: 346706
Summary:
This is to replace the ELFAsmParser that WebAssembly was using, which
so far was a stub that didn't do anything, and couldn't work correctly
with wasm.
This new class is there to implement generic directives related to
wasm as a binary format. Wasm target specific directives are still
parsed in WebAssemblyAsmParser as before. The two classes now
cooperate more correctly too.
Also implemented .result which was missing. Any unknown directives
will now result in errors.
Reviewers: dschuff, sbc100
Subscribers: mgorny, jgravelle-google, eraman, aheejin, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D54360
llvm-svn: 346700
Truncate and shuffle lowering are already capable of matching to PACKUS using known bits analysis.
This features one test change where we now prefer to extend v16i16->v16i32 then trunc v16i32->v16i8 over extract_subvector+packus when avx512f is available, but avx512bw is not.
llvm-svn: 346697
Sometimes after basic block placement we end up with a code like:
sreg = s_mov_b64 -1
vcc = s_and_b64 exec, sreg
s_cbranch_vccz
This happens as a join of a block assigning -1 to a saved mask and
another block which consumes that saved mask with s_and_b64 and a
branch.
This is essentially a single s_cbranch_execz instruction when moved
into a single new basic block.
Differential Revision: https://reviews.llvm.org/D54164
llvm-svn: 346690
When we repeat the 2 shifting operands then this is a bit rotation - annoyingly this has to be done in the other getIntrinsicInstrCost than most intrinsics as we need to check the operands are the same.
llvm-svn: 346688
Improve getCastInstrCost() by respecting the different types of Src and Dst
for vector integer <-> fp conversions.
This means that extracting from integer becomes more expensive (by the
extraction penalty), and the extraction from fp becomes cheaper (no longer
has a false extraction penalty).
Review: Ulrich Weigand
https://reviews.llvm.org/D54423
llvm-svn: 346663
This extends the .option support from D45864 to enable/disable the relax
feature flag from D44886
During parsing of the relax/norelax directives, the RISCV::FeatureRelax
feature bits of the SubtargetInfo stored in the AsmParser are updated
appropriately to reflect whether relaxation is currently enabled in the
parser. When an instruction is parsed, the parser checks if relaxation is
currently enabled and if so, gets a handle to the AsmBackend and sets the
ForceRelocs flag. The AsmBackend uses a combination of the original
RISCV::FeatureRelax feature bits set by e.g -mattr=+/-relax and the
ForceRelocs flag to determine whether to emit relocations for symbol and
branch diffs. Diff relocations should therefore only not be emitted if the
relax flag was not set on the command line and no instruction was ever parsed
in a section with relaxation enabled to ensure correct diffs are emitted.
Differential Revision: https://reviews.llvm.org/D46423
Patch by Lewis Revill.
llvm-svn: 346655
Iterate over all elements and count the number of uses among them for each
used load. Then make sure to REPLICATE the load which has the most uses in
order to minimize the number of needed element insertions.
Review: Ulrich Weigand
https://reviews.llvm.org/D54322
llvm-svn: 346637
getConstant will create a BUILD_VECTOR for us and use a legal type if necessary. So just create the simple node and let BUILD_VECTOR legalization do the canonicalization.
llvm-svn: 346603
This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more
opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs.
Apart from 2-3 strange cases, these are all wins.
I've structured this to be no-functional-change-intended for any target except for x86
because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those
targets have existing regression tests (4, 4, 10 files respectively) that would be
affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show
any regression test diffs. The trade-off is deciding if an extra vector load is better
than a single wide load + extract_subvector.
For x86, this is almost always better (on paper at least) because we often can fold
loads into subsequent ops and not increase the official instruction count. There's also
some unknown -- but potentially large -- benefit from using narrower vector ops if wide
ops are implemented with multiple uops and/or frequency throttling is avoided.
Differential Revision: https://reviews.llvm.org/D54073
llvm-svn: 346595
There are two AGU units, and per 1cy, there can be either two loads,
or a load and a store; but not two stores, or two loads and a store.
Additionally, loads shouldn't affect the store scheduler and vice versa.
(but *should* affect the PdEX scheduler.)
Required rL346545.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39465
llvm-svn: 346587
The sdivrem will emit its own MOVSX to move %ah to the low byte of a register. By using a MOVSX for an any_extend this allows a post-isel peephole to merge them.
llvm-svn: 346581
This gives shuffle lowering the freedom to use zero_extend_vector_inreg for the unpckl shuffle. Shuffle combining usually makes this swap later, but not when AVX512 is enabled it seems.
While there also use DAG.getConstant to create a 0 vector instead of using the helper the forces a specific BUILD_VECTOR. I don't think that helper is usually needed. We're basically free to create a constant build_vector anytime and it will be legalized on its own.
llvm-svn: 346574
This patch adds support for funclets in frame lowering and ISel
lowering. Together with D50288 and D50166, it enables C++ exception
handling.
Patch by Sanjin Sijaric, with some fixes by me.
Differential Revision: https://reviews.llvm.org/D51524
llvm-svn: 346568
LDRcp should be deleted when the dest register is dead in register
coalescing. Without MemOp, dead LDRcp will cause dead constant pool
value which references to non-existing label.
Patch by Yin Ma.
Differential Revision: https://reviews.llvm.org/D54173
llvm-svn: 346563
With avx512f but not avx512bw we need to extend to v16i32 then truncate that to to v16i8. Previously we emitted both nodes during lowering, but I'm trying to switch to using target independent nodes and with that switched the extend+truncate wou
This patch changes the implementation to what will be necessary with that patch which helps minimize test diffs.
llvm-svn: 346552
This makes X86ISD::VSEXT more similar to ISD::SIGN_EXTEND and ISD::ZERO_EXTEND.
I'm hoping to replace X86ISD::VSEXT/VZEXT with target independent nodes. Making the target specific nodes similar to the target independent nodes helps minimize test diffs in that patch.
llvm-svn: 346539
Eliminate the stack frame in functions with the noreturn nounwind
attributes, and when the noreturn-stack-elim target feature is
enabled. This reduces the code and stack space needed for noreturn
functions.
Differential Revision: https://reviews.llvm.org/D54210
llvm-svn: 346532
Both -fPIC and -G0 disable placement of globals in small data section,
but if a global has an explicit section assigmnent placing it in small
data, it should go there anyway.
llvm-svn: 346523
Currently in llvm, CalleeSavedInfo can only assign a callee saved register to
stack frame index to be spilled in the prologue. We would like to enable
spilling gprs to vector registers. This patch adds the capability to spill to
other registers aside from just the stack. It also adds the changes for power9
to spill gprs to volatile vector registers when they are available.
This happens only for leaf functions when using the option
-ppc-enable-pe-vector-spills.
Differential Revision: https://reviews.llvm.org/D39386
llvm-svn: 346512
This reverts commit r345972. Need to update the description + possibly
to update the patch itself after discussion with Eric Christofer.
llvm-svn: 346508
A minor improvement of buildVector() that skips creating an
INSERT_VECTOR_ELT for a Value which has already been used for the
REPLICATE.
Review: Ulrich Weigand
https://reviews.llvm.org/D54315
llvm-svn: 346504
Now that we have mixed type sizes, i1 values need to be explicitly
handled as we want to avoid promoting these values.
Differential Revision: https://reviews.llvm.org/D54308
llvm-svn: 346499
I noticed that we weren't generating broadcasts as much I thought we would with
D54271, and this is part of the problem.
Widening the shuffle elements means adding bitcasts and hiding the relationship
between a splatted scalar and the vector. If we can form a broadcast, do that
before going through the rest of the shuffle lowering because broadcasts should
be cheap and can often be load-folded.
Differential Revision: https://reviews.llvm.org/D54280
llvm-svn: 346498
Summary:
This simplifies the code and moves everything to tablegen for consistency. This
also prepares the ground for adding issue counters.
Reviewers: gchatelet, john.brawn, jsji
Subscribers: nemanjai, mgorny, javed.absar, kbarton, tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54297
llvm-svn: 346489
Previously, during the search, all values had to have the same
'TypeSize', which is equal to number of bits of the integer type of
the icmp operand. All values in the tree had to match this size;
meaning that, if we searched from i16, we wouldn't accept i8s. A
change in type size requires zext and truncs to perform the casts so,
to allow mixed narrow types, the handling of these instructions is
now slightly different:
- we allow casts if their result or operand is <= TypeSize.
- zexts are sinks if their result > TypeSize.
- truncs are still sinks if their operand == TypeSize.
- truncs are still sources if their result == TypeSize.
The transformation bails on finding an icmp that operates on data
smaller than the current TypeSize.
Differential Revision: https://reviews.llvm.org/D54108
llvm-svn: 346480
A few code movement things:
- AreSymmetrical is now a method of BinOpChain.
- Created a lambda in CreateParallelMACPairs to reduce loop nesting.
- A Reduction object now gets pasted in a couple of places instead,
including CreateParallelMACPairs so it doesn't need to return a
value.
I've also added RecordSequentialLoads, which is run before the
transformation begins, and caches the interesting loads. This can then
be queried later instead of cross checking many load values.
Differential Revision: https://reviews.llvm.org/D54254
llvm-svn: 346479
Summary:
Reorders the sections in the SIMD tablegen file to roughly match the
new opcode ordering. Depends on D54126.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D54134
llvm-svn: 346464
Summary:
The pass incorrectly assumed if there's a longjmp declaration in the
module, there is also a setjmp function declaration. Fixed it, and now
the pass only converts longjmp and does not do any other transformation
when there's no setjmp declaration in the module.
Fixes PR39562.
Reviewers: jgravelle-google, sbc100
Subscribers: dschuff, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D54273
llvm-svn: 346445
As discussed in D54073, we have a potential regression from more aggressive vector narrowing here, so let's try to avoid that by changing build-vector lowering slightly.
Insert-vector-element lowering always does this since there's no "pinsr" for ymm/zmm:
// If the vector is wider than 128 bits, extract the 128-bit subvector, insert
// into that, and then insert the subvector back into the result.
...but we can sometimes do better for insert-into-constant-vector by using shuffle lowering.
Differential Revision: https://reviews.llvm.org/D54271
llvm-svn: 346433
It was discovered in randomized testing that the SystemZ implementation of
shouldCoalesce() could be caused to crash when subreg liveness was
enabled. This was because an undef use of the virtual register was copied
outside current MBB at the point of shouldCoalesce() being called. For more
details, see https://bugs.llvm.org/show_bug.cgi?id=39276.
This patch changes the check for MBB locality from livein/liveout checks to
do checks for all instructions of both intervals being inside MBB. This
avoids the cases with dead defs / undef uses outside MBB, which are not
affecting liveness in/out of MBB.
The original test case included as a reduced .mir test case.
Review: Ulrich Weigand
https://reviews.llvm.org/D54197
llvm-svn: 346406
Generalize code in Thumb2InstrInfo::storeRegToStackSlot() and
loadRegToStackSlot() to allow the GPR class or any of its sub-classes
(including hGPR) to be stored/loaded by ARM::t2STRi12/ARM::t2LDRi12.
Differential Revision: https://reviews.llvm.org/D51927
llvm-svn: 346401
Promote alloca can vectorize a small array by bitcasting it to a
vector type. Extend vectorization for the case when alloca is
already a vector type. We still want to replace GEPs with an
insert/extract element instructions in this case.
Differential Revision: https://reviews.llvm.org/D54219
llvm-svn: 346376
Summary:
This change implements assembler parser, code emitter, ELF object writer
and disassembler for the MSP430 ISA. Also, more instruction forms are added
to the target description.
Reviewers: asl
Reviewed By: asl
Subscribers: pftbest, krisb, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D53661
llvm-svn: 346374
In this context, usesWindowsCFI() is basically the same thing as
isOSWindows(), but it makes the relevant property of the target
more explicit.
llvm-svn: 346366
Summary:
This is not needed, because we don't actually insert relevant branches
for KILLs that late in the compilation flow.
Besides, this was always checking for the wrong kill opcode anyway...
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D54085
llvm-svn: 346362
Like the comment says, this isn't the most efficient fix in terms of
codesize, but it works.
Differential Revision: https://reviews.llvm.org/D54129
llvm-svn: 346358
The lowering was missing live-ins in certain cases, like a sequence of
multiple tMOVCCr_pseudo instructions. This would lead to a verifier
failure, and on pre-v6 Thumb CPSR would be incorrectly clobbered.
For reasons I don't completely understand, it's hard to get a sequence
of multiple tMOVCCr_pseudo instructions; the issue only seems to show up
with 64-bit comparisons where the result is zero-extended. I added some
extra testcases in case that changes in the future. Probably some
optimization opportunities here if anyone is interested. (@test_slt_not
is the case that was getting miscompiled.)
The code to check the liveness of CPSR was stolen from
X86ISelLowering.cpp; maybe it could be refactored into common helper,
but I have no idea where to put it.
Differential Revision: https://reviews.llvm.org/D54192
llvm-svn: 346355
This allows testing AMDGPU alias analysis like any
other alias analysis pass. This fixes the existing
test pointlessly running opt -O3 when it really
just wants to run the one analysis.
Before there was no way to test this using -aa-eval
with opt, since the default constructed pass
is run. The wrapper subclass allows the
default constructor to pass the necessary callback.
llvm-svn: 346353
Summary:
The conditional branch created to support -fsplit-stack for X86 is
left unbiased/unhinted, resulting in less than ideal block placement:
the __morestack call block is kept on the main hot path. Bias the
branch to insure that the stack allocation block is treated as a
"cold" block during machine basic block placement.
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54123
llvm-svn: 346336
Set operands order for G_MERGE_VALUES and G_UNMERGE_VALUES so
that least significant bits always go first, regardless of endianness.
Differential Revision: https://reviews.llvm.org/D54098
llvm-svn: 346305
Add this option for debugging and providing workaround.
By default it is off so no behavior change in backend.
Differential Revision: https://reviews.llvm.org/D54158
llvm-svn: 346267
Change the type in a couple of lists and sets that only store physical
registers from unsigned to MCPhysRegs. The later is only 16bits and
saves us a bit of memory.
llvm-svn: 346254
The `sigrie` instruction signals a Reserved Instruction Exception.
This patch adds support for assembling / disassembling the instruction.
Differential Revision: http://reviews.llvm.org/D53861
llvm-svn: 346230
Cleanup CCMP pattern matching code in preparation for review/bugfix:
- Rename `isConjunctionDisjunctionTree()` to `canEmitConjunction()`
(it won't accept arbitrary disjunctions and is really about whether we
can transform the subtree into a conjunction that we can emit).
- Rename `emitConjunctionDisjunctionTree()` to `emitConjunction()`
llvm-svn: 346203
This reverts rL345880. It caused some test failures on the
webassembly waterfall. e.g. binaryen2.test_mainenv fails due
the fact that `envp` ends up being undef rather than 0.
Differential Revision: https://reviews.llvm.org/D54117
llvm-svn: 346187
MachineFunction can only be used in code using lib/CodeGen, hence we
can keep a more specific reference to LLVMTargetMachine rather than just
TargetMachine around.
Do the same for references in ScheduleDAG and RegUsageInfoCollector.
llvm-svn: 346183
The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit.
llvm-svn: 346180
SimplifyDemandedBits can turn a sign_extend back into an any_extend and trigger an infinite loop. So instead legalize it the same way as a sign_extend, but preserve the opcode. Then just pattern match it the same as sign_extend during isel.
I don't have a reduced test case for such an infinite loop yet.
llvm-svn: 346170
On Power9, we don't have patterns to select the following intrinsics:
llvm.ppc.vsx.stxvw4x.be
llvm.ppc.vsx.stxvd2x.be
This patch adds support for these.
Differential Revision: https://reviews.llvm.org/D53581
llvm-svn: 346148
Expand on LONG_BRANCH_LUi and LONG_BRANCH_(D)ADDiu pseudo
instructions by creating variants which support
less operands/accept GPR64Opnds as their operand in order
to appease the machine verifier pass.
Differential Revision: https://reviews.llvm.org/D53977
llvm-svn: 346133
The new atomic optimizer I previously added in D51969 did not work
correctly when a pixel shader was using derivatives, and had helper
lanes active.
To fix this we add an llvm.amdgcn.ps.live call that guards a branch
around the entire atomic operation - ensuring that all helper lanes are
inactive within the wavefront when we compute our atomic results.
I've added a test case that can cause derivatives, and exposes the
problem.
Differential Revision: https://reviews.llvm.org/D53930
llvm-svn: 346128
Turn the assert in PrepareConstants into a conditon so that we can
handle mul instructions with negative immediates.
Differential Revision: https://reviews.llvm.org/D54094
llvm-svn: 346126
r345840 slightly changed the way promotion happens which could
result in zext and truncs having the same source and destination
types. This fixes that issue.
We can now also remove the zext and trunc in the following case:
(zext (trunc (promoted op)), i32)
This means that we can no longer treat a value, that is only used by
a sink, to be safe to promote.
I've also added in some extra asserts and replaced a cast for a
dyn_cast.
Differential Revision: https://reviews.llvm.org/D54032
llvm-svn: 346125
This patch fixes a bug in the AVR FRMIDX expansion logic.
The expansion would leave a leftover operand from the original FRMIDX,
but now attached to a MOVWRdRr instruction. The MOVWRdRr instruction
did not expect this operand and so LLVM rejected the machine
instruction.
This would trigger an assertion:
Assertion failed: ((isImpReg || Op.isRegMask() || MCID->isVariadic() ||
OpNo < MCID->getNumOperands() || isMetaDataOp) &&
"Trying to add an operand to a machine instr that is already done!"),
function addOperand, file llvm/lib/CodeGen/MachineInstr.cpp
Tim fixed this so that now the FRMIDX is expanded correctly into
a well-formed MOVWRdRr.
Patch by Tim Neumann
llvm-svn: 346117
v2i8/v2i16/v2i32 are promoted to v2i64. pmuludq takes a v2i64 input and produces a v2i64 output. Since we don't about the upper bits of the type legalized multiply we can use the pmuludq to produce the multiply result for the bits we do care about.
llvm-svn: 346115
This is an AVR-specific workaround for a limitation of the register
allocator that only exposes itself on targets with high register
contention like AVR, which only has three pointer registers.
The three pointer registers are X, Y, and Z.
In most nontrivial functions, Y is reserved for the frame pointer,
as per the calling convention. This leaves X and Z. Some instructions,
such as LPM ("load program memory"), are only defined for the Z
register. Sometimes this just leaves X.
When the backend generates a LDDWRdPtrQ instruction with Z as the
destination pointer, it usually trips up the register allocator
with this error message:
LLVM ERROR: ran out of registers during register allocation
This patch is a hacky workaround. We ban the LDDWRdPtrQ instruction
from ever using the Z register as an operand. This gives the
register allocator a bit more space to allocate, fixing the
regalloc exhaustion error.
Here is a description from the patch author Peter Nimmervoll
As far as I understand the problem occurs when LDDWRdPtrQ uses
the ptrdispregs register class as target register. This should work, but
the allocator can't deal with this for some reason. So from my testing,
it seams like (and I might be totally wrong on this) the allocator reserves
the Z register for the ICALL instruction and then the register class
ptrdispregs only has 1 register left and we can't use Y for source and
destination. Removing the Z register from DREGS fixes the problem but
removing Y register does not.
More information about the bug can be found on the avr-rust issue
tracker at https://github.com/avr-rust/rust/issues/37.
A bug has raised to track the removal of this workaround and a proper
fix; PR39553 at https://bugs.llvm.org/show_bug.cgi?id=39553.
Patch by Peter Nimmervoll
llvm-svn: 346114
Summary: This also enables some constant folding from KnownBits propagation. This helps on some cases vXi64 case in 32-bit mode where constant vectors appear as vXi32 and a bitcast. This can prevent getNode from constant folding sra/shl/srl.
Reviewers: RKSimon, spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54069
llvm-svn: 346102
These methods were just wrappers around getNode with additional asserts (identical and repeated 3 times). But getNode already has a switch that can be used to hold these asserts that allows them to be shared for all 3 opcodes. This also enables checking on the places that create these nodes without using the wrappers.
The rest of the patch is just changing all callers to use getNode directly.
llvm-svn: 346087
Use MachineFrameInfo's OffsetAdjustment field to pass this information
from the target to CodeViewDebug.cpp. The X86 backend doesn't use it for
any other purpose.
This fixes PR38857 in the case where there is a non-aligned quantity of
CSRs and a non-aligned quantity of locals.
llvm-svn: 346062
The majority of the changes are because the rest of shuffle lowering/combining prefers to replace the undef input with the other operand. Using UNPCKL directly seemed to avoid this and just grabbed a randomish register for the undef which can create false dependencies.
llvm-svn: 346050
Summary:
The assembler was able to assemble and then dump back to .s, but
was failing to parse certain directives necessary for valid .o
output:
- .type directives are now recognized to distinguish function symbols
and others.
- .size is now parsed to provide function size.
- .globaltype (introduced in https://reviews.llvm.org/D54012) is now
recognized to ensure symbols like __stack_pointer have a proper type
set for both .s and .o output.
Also added tests for the above.
Reviewers: sbc100, dschuff
Subscribers: jgravelle-google, aheejin, dexonsmith, kristina, llvm-commits, sunfish
Differential Revision: https://reviews.llvm.org/D53842
llvm-svn: 346047
We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible.
I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here.
I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector.
For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size.
Differential Revision: https://reviews.llvm.org/D54024
llvm-svn: 346043
A number of intrinsics, such as llvm.sin.f32, would result in a failure to
select. This patch adds expansions for the relevant selection DAG nodes, as
well as exhaustive testing for all f32 and f64 intrinsics.
The codegen for FMA remains a TODO item, pending support for the various
RISC-V FMA instruction variants.
The llvm.minimum.f32.* and llvm.maximum.* tests are commented-out, pending
upstream support for target-independent expansion, as discussed in
http://lists.llvm.org/pipermail/llvm-dev/2018-November/127408.html.
Differential Revision: https://reviews.llvm.org/D54034
Patch by Luís Marques.
llvm-svn: 346034
Summary:
EH stack depth is incremented at `try` and decremented at `catch`. When
there are more than two catch instructions for a try instruction, we
shouldn't count non-first catches when calculating EH stack depths.
This patch fixes two bugs:
- CFGStackify: Exclude `catch_all` in the terminate catch pad when
calculating EH pad stack, because when we have multiple catches for a
try we should count only the first catch instruction when calculating
EH pad stack.
- InstPrinter: The initial intention was also to exclude non-first
catches, but it didn't account nested try-catches, so it failed on
this case:
```
try
try
catch
end
catch <-- (1)
end
```
In the example, when we are at the catch (1), the last seen EH
instruction is not `try` but `end_try`, violating the wrong assumption.
We don't need these after we switch to the second proposal because there
is gonna be only one `catch` instruction. But anyway before then these
bugfixes are necessary for keep trunk in working state.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D53819
llvm-svn: 346029
Let i8/i16 uint/sint to fp conversions cost 1 if operand is a load.
Since the load already does the extension, there is no extra cost (previously
returned 2).
Review: Ulrich Weigand
https://reviews.llvm.org/D54028
llvm-svn: 346009
Model this function more closely after the BasicTTIImpl version, with
separate handling of loads and stores. For loads, the set of actually loaded
vectors is checked.
This makes it more readable and just slightly more accurate generally.
Review: Ulrich Weigand
https://reviews.llvm.org/D53071
llvm-svn: 345998
Small-data (i.e. GP-relative) loads and stores allow 16-bit scaled
offset. For a load of a value of type T, the small-data area is
equivalent to an array "T sdata[65536]". This implies that objects
of smaller sizes need to be closer to the beginning of sdata,
while larger objects may be farther away, or otherwise the offset
may be insufficient to reach it. Similarly, an object of a larger
size should not be accessed via a load of a smaller size.
llvm-svn: 345975
Summary:
If the output of debug directives only is requested, we should drop
emission of ',debug' option from the target directive. Required for
supporting of nvprof profiler.
Reviewers: probinson, echristo, dblaikie
Subscribers: Hahnfeld, jholewinski, llvm-commits, JDevlieghere, aprantl
Differential Revision: https://reviews.llvm.org/D46061
llvm-svn: 345972
UBSan detected an error in our ISelLowering that is exposed only when
you have a dmask == 0x1. Fix this by adding in an explicit check to
ensure we don't do the UBSan detected shl << 32.
llvm-svn: 345962
Summary:
Assembly output can use globals like __stack_pointer implicitly,
but has no way of indicating the type of such a global, which makes
it hard for tools processing it (such as the MC Assembler) to
reconstruct this information.
The improved assembler directives parsing (in progress in
https://reviews.llvm.org/D53842) will make use of this information.
Also deleted code for the .import_global directive which was unused.
New test case in userstack.ll
Reviewers: dschuff, sbc100
Subscribers: jgravelle-google, aheejin, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D54012
llvm-svn: 345917
Summary: Different variants of idot8 codegen dag patterns are not generated by llvm-tablegen due to a huge
increase in the compile time. Support the pattern that clang FE generates after reordering the
additions in integer-dot8 source language pattern.
Author: FarhanaAleen
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D53937
llvm-svn: 345902
Summary:
Like `block` or `loop`, `try` can take an optional signature which can
be omitted. This patch allows `try`'s signature to be omitted. Also
added some tests for EH instructions.
Reviewers: aardappel
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D53873
llvm-svn: 345888
I added these annotations in r345878 because I wasn't sure if the
fallthrough was intended. Krzysztof Parzyszek confirmed that they should
be breaks, so that's what this patch does.
Reviewers: kparzysz
Differential Revision: https://reviews.llvm.org/D53991
llvm-svn: 345883
This patch should not introduce any behavior changes. It consists of
mostly one of two changes:
1. Replacing fall through comments with the LLVM_FALLTHROUGH macro
2. Inserting 'break' before falling through into a case block consisting
of only 'break'.
We were already using this warning with GCC, but its warning behaves
slightly differently. In this patch, the following differences are
relevant:
1. GCC recognizes comments that say "fall through" as annotations, clang
doesn't
2. GCC doesn't warn on "case N: foo(); default: break;", clang does
3. GCC doesn't warn when the case contains a switch, but falls through
the outer case.
I will enable the warning separately in a follow-up patch so that it can
be cleanly reverted if necessary.
Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu
Differential Revision: https://reviews.llvm.org/D53950
llvm-svn: 345882
Clang's -Wimplicit-fallthrough check fires on these switch cases. GCC
does not warn when a case body that ends in a switch falls through to a
case label of an outer switch.
It's not clear if these fall throughs are truly intended. The Hexagon
tests pass regardless of whether these case blocks fall through or
break.
For now, I have applied the intended fallthrough annotation macro with a
FIXME comment to unblock enabling the warning. I will send a follow-up
patch that converts them to breaks to the Hexagon maintainers.
llvm-svn: 345878
Summary:
This function was causing a crash when `MaxElements == 1` because
it was trying to create a single element vector type.
Reviewers: dsanders, aemerson, aditya_nandakumar
Reviewed By: dsanders
Subscribers: rovka, kristof.beyls, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D53734
llvm-svn: 345875
This patch adds support for expanding vector CTPOP instructions and removes the x86 'bitmath' lowering which replicates the same expansion.
Differential Revision: https://reviews.llvm.org/D53258
llvm-svn: 345869
Previously this case fell through to unreachable, so it is clearly not
covered by any test case in LLVM. It may be dynamically unreachable, in
fact. However, if it were to run, this is what it would logically do.
The assert suggests that the intended behavior was not to allow folding
offsets from jump table indices, which makes sense.
llvm-svn: 345868
This was added in r330630. GCC's -Wimplicit-fallthrough seems to not
fire when the previous case contains a switch itself.
This fallthrough was bening because the helper function implementing the
case used dyn_cast to re-check the type of the node in question. After
fixing the fallthrough, we can strengthen the cast.
llvm-svn: 345864
While mutating instructions, we sign extended negative constant
operands for binary operators that can safely overflow. This was to
allow instructions, such as add nuw i8 %a, -2, to still be able to
perform a subtraction. However, the code to handle constants doesn't
take into consideration that instructions, such as sub nuw i8 -2, %a,
require the i8 -2 to be converted into i32 254.
This is a relatively simple fix, but I've taken the time to
reorganise the code a bit - mainly that instructions that can be
promoted are cached and splitting up the Mutate function.
Differential Revision: https://reviews.llvm.org/D53972
llvm-svn: 345840
When matching MipsISD::JmpLink t9, TargetExternalSymbol:i32'...',
wrong JALR16_MM is selected. This patch adds missing pattern for
JmpLink, so that JAL instruction is selected.
Differential Revision: https://reviews.llvm.org/D53366
llvm-svn: 345830
Reapplying an updated version of rL345395 (reverted in rL345451), now the issues noticed in PR39483 have been fixed.
This patch allows resolveTargetShuffleInputs to remove UNDEF inputs from cases where we have more than 2 inputs.
llvm-svn: 345824
In MipsBranchExpansion::splitMBB, upon splitting
a block with two direct branches, remove the successor
of the newly created block (which inherits successors from
the original block) which is pointed to by the last
branch in the original block only if the targets of two
branches differ.
This is to fix the failing test when ran with
-verify-machineinstrs enabled.
Differential Revision: https://reviews.llvm.org/D53756
llvm-svn: 345821
Scalar i1 to fp conversions are done with a branch sequence, so it should
have a higher cost.
Review: Ulrich Weigand
https://reviews.llvm.org/D53924
llvm-svn: 345818
This factors out a new method getBoolVecToIntConversionCost() containing the
code for vector sext/zext of i1, in order to reuse it for i1 to double vector
conversions.
Review: Ulrich Weigand
https://reviews.llvm.org/D53923
llvm-svn: 345817
Summary:
Also reduce the test case for implicit defs and test it with all
register classes.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D53855
llvm-svn: 345794
Shows up rarely for 64-bit arithmetic, more frequently for the compare
patterns added in r325323.
Differential Revision: https://reviews.llvm.org/D53848
llvm-svn: 345782
SimplifySetCC could shrink a load without checking for
profitability or legality of such shink with a target.
Added checks to prevent shrinking of aligned scalar loads
in AMDGPU below dword as scalar engine does not support it.
Differential Revision: https://reviews.llvm.org/D53846
llvm-svn: 345778
This feature is only relevant to shaders, and is no longer used. When disabled,
lowering of reserved registers for shaders causes a compiler crash.
Remove the feature and add a test for compilation of shaders at OptNone.
Differential Revision: https://reviews.llvm.org/D53829
llvm-svn: 345763
Summary:
Instead of writing boolean values temporarily into 32-bit VGPRs
if they are involved in PHIs or are observed from outside a loop,
we use bitwise masking operations to combine lane masks in a way
that is consistent with wave control flow.
Move SIFixSGPRCopies to before this pass, since that pass
incorrectly attempts to move SGPR phis to VGPRs.
This should recover most of the code quality that was lost with
the bug fix in "AMDGPU: Remove PHI loop condition optimization".
There are still some relevant cases where code quality could be
improved, in particular:
- We often introduce redundant masks with EXEC. Ideally, we'd
have a generic computeKnownBits-like analysis to determine
whether masks are already masked by EXEC, so we can avoid this
masking both here and when lowering uniform control flow.
- The criterion we use to determine whether a def is observed
from outside a loop is conservative: it doesn't check whether
(loop) branch conditions are uniform.
Change-Id: Ibabdb373a7510e426b90deef00f5e16c5d56e64b
Reviewers: arsenm, rampitec, tpr
Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, t-tye, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D53496
llvm-svn: 345719
Summary:
The optimization to early break out of loops if all threads are dead was
never fully implemented.
But the PHI node analyzing is actually causing a number of problems, so
remove all the extra code for it.
(This does actually regress code quality in a few places because it
ends up relying more heavily on phi's of i1, which we don't do a
great job with. However, since it fixes real bugs in the wild, we
should take this change. I have some prototype changes to improve
i1 lowering in general -- not just for control flow -- which should
help recover the code quality, I just need to make those changes
fit for general consumption. -- Nicolai)
Change-Id: I6fc6c6c8961857ac6009fcfb9f7e5e48dc23fbb1
Patch-by: Christian König <christian.koenig@amd.com>
Reviewers: arsenm, rampitec, tpr
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D53359
llvm-svn: 345718
Before this patch, class PredicateExpander only knew how to expand simple
predicates that performed checks on instruction operands.
In particular, the new scheduling predicate syntax was not rich enough to
express checks like this one:
Foo(MI->getOperand(0).getImm()) == ExpectedVal;
Here, the immediate operand value at index zero is passed in input to function
Foo, and ExpectedVal is compared against the value returned by function Foo.
While this predicate pattern doesn't show up in any X86 model, it shows up in
other upstream targets. So, being able to support those predicates is
fundamental if we want to be able to modernize all the scheduling models
upstream.
With this patch, we allow users to specify if a register/immediate operand value
needs to be passed in input to a function as part of the predicate check. Now,
register/immediate operand checks all derive from base class CheckOperandBase.
This patch also changes where TIIPredicate definitions are expanded by the
instructon info emitter. Before, definitions were expanded in class
XXXGenInstrInfo (where XXX is a target name).
With the introduction of this new syntax, we may want to have TIIPredicates
expanded directly in XXXInstrInfo. That is because functions used by the new
operand predicates may only exist in the derived class (i.e. XXXInstrInfo).
This patch is a non functional change for the existing scheduling models.
In future, we will be able to use this richer syntax to better describe complex
scheduling predicates, and expose them to llvm-mca.
Differential Revision: https://reviews.llvm.org/D53880
llvm-svn: 345714
Our a16 support was only enabled for sample/gather and buffer
load/store, but not for image load/store operations (which take an i16
as the pixel index rather than a half).
Fix our isel lowering and add test cases to prove it out.
Differential Revision: https://reviews.llvm.org/D53750
llvm-svn: 345710
optsize using masked wide loads
Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).
Reviewers: Ayal, hsaito, dcaballe, fhahn
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D53668
llvm-svn: 345705
Emit pseudo instructions indicating unwind codes corresponding to each
instruction inside the prologue/epilogue. These are used by the MCLayer to
populate the .xdata section.
Differential Revision: https://reviews.llvm.org/D50288
llvm-svn: 345701
Summary:
Thunk functions in Windows are varag functions that call a musttail function
to pass the arguments after the fixup is done. We need to make sure that we
forward the arguments from the caller vararg to the callee vararg function.
This is the same mechanism that is used for Windows on X86.
Reviewers: ssijaric, eli.friedman, TomTan, mgrang, mstorsjo, rnk, compnerd, efriedma
Reviewed By: efriedma
Subscribers: efriedma, kristof.beyls, chrib, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D53843
llvm-svn: 345641
Prevents the post-RA scheduler from modifying the prologue sequences
emitting by frame lowering. This is roughly similar to what we do for
other targets: TargetInstrInfo::isSchedulingBoundary checks
isPosition(), which checks for CFI_INSTRUCTION.
isSEHInstruction is taken from D50288; it'll land with whatever patch
lands first.
Differential Revision: https://reviews.llvm.org/D53851
llvm-svn: 345634
Re-apply r345315 with testcase fixes.
Include all of the store's source vector operands when creating the
MachineMemOperand. Previously, we were missing the first operand,
making the store size seem smaller than it really is.
Differential Revision: https://reviews.llvm.org/D52816
llvm-svn: 345631
The CONCAT_VECTORS case was using the original mask element count to determine how to adjust the broadcast index. But if we looked through a bitcast the original mask size doesn't tell us anything about the concat_vectors.
This patch switchs to using the concat_vectors input element count directly instead.
Differential Revision: https://reviews.llvm.org/D53823
llvm-svn: 345626
The LRV and STRV nodes carry an extra operand to indicate the
type of the memory access. This is redundant, since the nodes
are actually of class MemIntrinsicNode and therefore hold that
same information already as MemoryVT.
NFC intended.
llvm-svn: 345618
Sub, SDiv and UDiv are not commutative, so only the RHS operand can fold a
load. This patch adds a check for this.
Review: Ulrich Weigand
https://reviews.llvm.org/D53791
llvm-svn: 345596
Summary:
The final pattern.
There is no test changes:
* We are looking for the pattern with one-use of it's mask,
* If the mask is one-use, D48768 will unfold it into pattern d.
* Thus, the tests have extra-use on the mask.
* Thus, only the BMI2 BZHI can be tested, and it already worked.
* So there is no BMI1 test coverage, we just assume it works since it uses the same codepath.
Reviewers: craig.topper, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53575
llvm-svn: 345584
Similar to FoldCONCAT_VECTORS, this patch adds FoldBUILD_VECTOR to simplify cases that can avoid the creation of the BUILD_VECTOR - if all the operands are UNDEF or if the BUILD_VECTOR simplifies to a copy.
This exposed an assumption in some AMDGPU code that getBuildVector was guaranteed to be a BUILD_VECTOR node that I've tried to handle.
Differential Revision: https://reviews.llvm.org/D53760
llvm-svn: 345578
Summary: Previously if we had a bitcast vector output type that needs promotion and a vector input type that needs widening we would just do a stack store and load to handle the conversion. We can do a little better if we can widen the bitcast to a legal vector type the same size as the widened input type. Then we can do the bitcast between this widened type and the widened input type. Afterwards we can extract_subvector back to the original output and any_extend that. Type legalization will then circle back and handle promotion of the extract_subvector and the any_extend will just be removed. This will avoid going through the stack and allows us to remove a custom version of this legalization from X86.
Reviewers: efriedma, RKSimon
Reviewed By: efriedma
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D53229
llvm-svn: 345567
Use SelectionDAG::EVTToAPFloatSemantics. Make the LogicVT calculation in LowerFABSorFNEG similar to LowerFCOPYSIGN. Use APInt::getSignedMaxValue instead of ~APInt::getSignMask.
llvm-svn: 345565
Rename SIMDThreeSameMult (etc.) to SIMDThreeSameVectorFML (etc.) to follow
usual naming convention, and add some comments in the .td files.
llvm-svn: 345515
The machine verifier was disabled for x86 by default. There are now only
9 tests failing, compared to what previously was between 20 and 30.
This is a good opportunity to file bugs for all the remaining issues,
then explicitly disable the failing tests and enabling the machine
verifier by default.
This allows us to avoid adding new tests that break the verifier.
PR27481
llvm-svn: 345513
- Add support to generate AUTIBSP, PACIBSP, RETAB instructions for return
address signing
- The key used to sign the function is controlled by the function attribute
"sign-return-address-key"
Differential Revision: https://reviews.llvm.org/D51427
llvm-svn: 345511
When the floating point constants are whole numbers they have no decimal point so look like integers, but mean something very different in something like an 'and' instruction.
Ideally we would just print a decimal point and a 0, but I couldn't see how to make APFloat::toString do that.
llvm-svn: 345488
Add vector support to TargetLowering::expandFP_TO_UINT.
This exposes an issue in X86TargetLowering::LowerVSELECT which was assuming that the select mask was the same width as the LHS/RHS ops - as long as the result is a sign splat we can easily sext/trunk this.
llvm-svn: 345473
Makes no difference to actual shuffle decoding yet, but merges all the existing limits in one place for when proper support is fixed.
........
Its been reported that this is causing out of trunk failures.
llvm-svn: 345451
Add ARM64 unwind codes to MCLayer, as well SEH directives that will be emitted
by the frame lowering patch to follow. We only emit unwind codes into object
object files for now.
Differential Revision: https://reviews.llvm.org/D50166
llvm-svn: 345450
The class definition for Call_nr has the itinerary as a
parameter, but the value is never assigned to the Itinerary
field for the instruction. This means the compiler is unable
to schedule and packetize the instruction correctly because
these instrution will not have any resource descritions.
I don't have a specific test case, but the ps_call_nr.ll
test failed with a proposed patch.
llvm-svn: 345442
Summary:
The main challenge here is that X86InstrInfo::AnalyzeBranch doesn't
understand the way we're using a CALL instruction as a branch, so we
can't list the CallTarget MBB as a successor of the entry block. If we
don't list it as a successor, then the AsmPrinter doesn't print a label
for the MBB.
Fix the issue by inserting our own label at the beginning of the call
target block. We can rely on the AsmPrinter to always emit it, even
though the block appears to be unreachable, but address-taken.
Fixes PR38391.
Reviewers: thegameg, chandlerc, echristo
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D53653
llvm-svn: 345426
The "dead" markings allow existing target-independent optimizations,
like MachineSink, to trigger more frequently. The CPSR defs would have
eventually been marked dead by LiveVariables, so this only affects
optimizations before regalloc.
The ARMBaseInstrInfo.cpp change is fixing a bug which is only visible
with this change: the transform adds a use to an otherwise dead def
of CPSR. This is covered by existing regression tests.
thumb2-tbh.ll breaks for Thumb1 due to MachineLICM changing the
generated code; I'll fix it in D53452.
Differential Revision: https://reviews.llvm.org/D53453
llvm-svn: 345420
Currently, for this node:
vector int test(int a, int b, int c, int d) {
return (vector int) { a, b, c, d };
}
we get this on Power9:
mtvsrdd 34, 5, 3
mtvsrdd 35, 6, 4
vmrgow 2, 3, 2
and this on Power8:
mtvsrwz 0, 3
mtvsrwz 1, 5
mtvsrwz 2, 4
mtvsrwz 3, 6
xxmrghd 34, 1, 0
xxmrghd 35, 3, 2
vmrgow 2, 3, 2
This can be improved to this on LE Power9:
rldimi 3, 4, 32, 0
rldimi 5, 6, 32, 0
mtvsrdd 34, 5, 3
and this on LE Power8
rldimi 3, 4, 32, 0
rldimi 5, 6, 32, 0
mtvsrd 34, 3
mtvsrd 35, 5
xxpermdi 34, 35, 34, 0
This patch updates the TD pattern to generate the optimized sequence for both
Power8 and Power9 on LE and BE.
Differential Revision: https://reviews.llvm.org/D53494
llvm-svn: 345414
These promotions add additional bitcasts to the SelectionDAG that can pessimize computeKnownBits/computeNumSignBits. It also seems to interfere with broadcast formation.
This patch removes the promotion and adds isel patterns instead.
The increased table size is more than I would like, but hopefully we can find some canonicalizations or other tricks to start pruning out patterns going forward.
Differential Revision: https://reviews.llvm.org/D53268
llvm-svn: 345408
This is a narrow fix for 1 of the problems mentioned in PR27780:
https://bugs.llvm.org/show_bug.cgi?id=27780
I looked at more general solutions, but it's a mess. We canonicalize shuffle masks
based on the number of elements accessed from each operand, and that's not optional.
If you remove that, we'll crash because we fail to match isel patterns. So I'm
waiting until we're sure that we have blendvb with constant condition and then
commuting based on the load potential. Other cases like blend-with-immediate are
already handled elsewhere, so this is probably not a common problem anyway.
I didn't use "MayFoldLoad" because that checks for one-use and in these cases, we've
screwed that up by creating a temporary PSHUFB using these operands that we're counting
on to be killed later. Undoing that didn't look like a simple task because it's
intertwined with determining if we actually use both operands of the shuffle or not.a
Differential Revision: https://reviews.llvm.org/D53737
llvm-svn: 345390
AMDGPU currently only supports direct calls, but at lower optimisation levels it
fails to lower statically direct calls which appear indirect due to a bitcast.
Add a pass to visit all CallSites and use CallPromotionUtils to "devirtualize"
calls.
Differential Revision: https://reviews.llvm.org/D52741
llvm-svn: 345382
For both operands are bool, short, int, long, long long, add the following optimization.
1. 0-x == y --> x+y ==0
2. 0-x != y --> x+y != 0
Review: nemanjai
Differential Revision: https://reviews.llvm.org/D53360
llvm-svn: 345366
At present a v2i16 -> v2f64 convert is implemented by extracts to scalar,
scalar converts, and merge back into a vector. Use vector converts instead,
with the int data permuted into the proper position and extended if necessary.
Patch by RolandF.
Differential revision: https://reviews.llvm.org/D53346
llvm-svn: 345361
SystemZAsmParser can now handle -debug by printing the operands neatly to the
output stream. Before this patch this lead to an llvm_unreachable().
It seems that now '-mllvm -debug' does not cause any crashes anywhere (at
least not on SPEC).
Review: Ulrich Weigand
https://reviews.llvm.org/D53328
llvm-svn: 345349
In order to print the IR slot number for the memory operand, the DAG pointer
must be passed to SDNode::dump().
The isel-debug.ll test updated to also check for the IR Value reference being
printed correctly.
Review: Ulrich Weigand
https://reviews.llvm.org/D53333
llvm-svn: 345347
Summary:
This adds support for LSDA (exception table) generation for wasm EH.
Wasm EH mostly follows the structure of Itanium-style exception tables,
with one exception: a call site table entry in wasm EH corresponds to
not a call site but a landing pad.
In wasm EH, the VM is responsible for stack unwinding. After an
exception occurs and the stack is unwound, the control flow is
transferred to wasm 'catch' instruction by the VM, after which the
personality function is called from the compiler-generated code. (Refer
to WasmEHPrepare pass for more information on this part.)
This patch:
- Changes wasm.landingpad.index intrinsic to take a token argument, to
make this 1:1 match with a catchpad instruction
- Stores landingpad index info and catch type info MachineFunction in
before instruction selection
- Lowers wasm.lsda intrinsic to an MCSymbol pointing to the start of an
exception table
- Adds WasmException class with overridden methods for table generation
- Adds support for LSDA section in Wasm object writer
Reviewers: dschuff, sbc100, rnk
Subscribers: mgorny, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D52748
llvm-svn: 345345
Add LLVM intrinsics for the ARMv8.2-A FP16FML vector-form instructions. Add a
DAG pattern to define the indexed-form intrinsics in terms of the vector-form
ones, similarly to how the Dot Product intrinsics were implemented.
Based on a patch by Gao Yiling.
Differential Revision: https://reviews.llvm.org/D53632
llvm-svn: 345337
Summary:
Currently InstPrinter ignores if there are mismatches between block/loop
and end markers by skipping the case if ControlFlowStack is empty. I
guess it is better to explicitly error out in this case, because this
signals invalid input.
Reviewers: aardappel
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D53620
llvm-svn: 345333
The SystemZ backend can do arithmetic of memory by loading and then extending
one of the operands. Similarly, a load + truncate can be folded into an
operand.
This patch improves the SystemZ TTI cost function to recognize this.
Review: Ulrich Weigand
https://reviews.llvm.org/D52692
llvm-svn: 345327
Enable the DAG optimization that converts vector div/rem with constants into
multiply+shifts sequences by expanding them early. This is needed since
ISD::SMUL_LOHI is 'Custom' lowered on SystemZ, and will therefore not be
available to BuildSDIV after legalization.
Better cost values for these instructions based on how they will be
implemented (a constant divisor is cheaper).
Review: Ulrich Weigand
https://reviews.llvm.org/D53196
llvm-svn: 345321
The required-vector-width attribute was only used for backend testing and has never been generated by clang.
I believe clang is now generating min-legal-vector-width for vector uses in user code.
With this I believe passing -mprefer-vector-width=256 to clang should prevent use of zmm registers in the generated assembly unless the user used a 512-bit intrinsic in their source code.
llvm-svn: 345317
Include all of the store's source vector operands when creating the
MachineMemOperand. Previously, we were missing the first operand,
making the store size seem smaller than it really is.
Differential Revision: https://reviews.llvm.org/D52816
llvm-svn: 345315
KNL is based on a modified Silvermont core so I don't think these features apply. I think the LEA flag is probably also wrong, but I'm less sure as I barely understand the 3 LEA flags we have currently.
Differential Revision: https://reviews.llvm.org/D53671
llvm-svn: 345285
Summary:
Currently, Legalizer is trying to lower G_LOAD with a vector type
that has more than two elements due to the incorrect LegalityPredicate.
This patch fixes the issue by removing the multiplication by 8
as `MemDesc.Size` already contains the size in bits.
Reviewers: dsanders, aemerson
Reviewed By: dsanders
Subscribers: rovka, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D53679
llvm-svn: 345282
If we have a 64-bit EXT where one of the operands is a subvector of a 128-bit
vector then in some cases we can eliminate an extract_subvector by converting
to a 128-bit EXT of the 128-bit vector.
Differential Revision: https://reviews.llvm.org/D53582
llvm-svn: 345275
This mirrors what we already do for AArch64 as the cores are similar.
As discussed in the review, enabling the machine scheduler causes
more variations in performance changes so it is not enabled for now.
This patch improves LNT scores by a geomean of 1.57% at -O3.
Differential Revision: https://reviews.llvm.org/D53562
llvm-svn: 345272
Using a multiclass reduces duplication, and makes it easier to add new patterns
later. This refactoring does add some new patterns, but as far as I can tell
there's no IR that will end up triggering them so this is effectively NFC.
Differential Revision: https://reviews.llvm.org/D53580
llvm-svn: 345271
Currently a vector move of 0 or -1 will use different instructions depending on
the size of the vector. Using a single instruction (the 128-bit one) for both
gives more opportunity for Machine CSE to eliminate instructions.
Differential Revision: https://reviews.llvm.org/D53579
llvm-svn: 345270
Summary:
If the instruction in the eliminateFrameIndex function is a DBG_VALUE
instruction, it requires special processing. The frame register is set
to VRFrame and the offset is based on the object offset.
The code is similar to the code used in
lib/CodeGen/PrologEpilogInserter.cpp.
Reviewers: tra
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D53657
llvm-svn: 345269
I noticed while fixing PR39368 that we don't have generic shuffle costs for broadcast style shuffles.
This patch adds SK_BROADCAST handling, but exposes ARM/AARCH64 lack of handling of this type, which I've added a fix for at the same time.
Differential Revision: https://reviews.llvm.org/D53570
llvm-svn: 345253
Summary:
The pfm counters are now in the ExegesisTarget rather than the
MCSchedModel (PR39165).
This also compresses the pfm counter tables (PR37068).
Reviewers: RKSimon, gchatelet
Subscribers: mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D52932
llvm-svn: 345243
Multiply a is complex operation so just because some bit of the output isn't used doesn't mean that bit of the input isn't used.
We might able to bound it, but it will require some more thought.
llvm-svn: 345241
Summary: Fixes part of the problem reported in bug 39275.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits, alexcrichton
Differential Revision: https://reviews.llvm.org/D53542
llvm-svn: 345230
Summary:
Currently when assigning depths 'rethrow' does not take the whole
control flow stack into accounts but only considers EH pad stacks. When
assigning depth immmediates to rethrows, in normal cases it is done
correctly but when a rethrow instruction throws up to a caller, i.e., we
convert a pseudo RETHROW_TO_CALLER instruction to a rethrow, it
mistakenly compute the whole stack depth.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D53619
llvm-svn: 345223
Summary:
Changing the node type in lowering was violating assumptions made in
the DAG combiner, so don't change the node type any more. This fixes
one of the issues reported in bug 39275.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits, alexcrichton
Differential Revision: https://reviews.llvm.org/D53537
llvm-svn: 345221
Instead of using the MOVGOT64r pseudo, use the existing
MO_PIC_BASE_OFFSET support on symbol operands. Now I don't have to
create a "scratch register operand" for the pseudo to use, and the
register allocator can make better decisions.
Fixes some X86 verifier errors tracked in PR27481.
llvm-svn: 345219
Summary:
Changes all uses of minnan/maxnan to minimum/maximum
globally. These names emphasize that the semantic difference between
these operations is more than just NaN-propagation.
Reviewers: arsenm, aheejin, dschuff, javed.absar
Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D53112
llvm-svn: 345218
It's possible to do a tail call to a stack argument. LLVM already
calculates the right stack offset to call through.
Fixes the sibcall* and musttail* verifier failures tracked at PR27481.
llvm-svn: 345197
Summary:
This renames the IsParsingMSInlineAsm member variable of AsmLexer to
LexMasmIntegers and moves it up to MCAsmLexer. This is the only behavior
controlled by that variable. I added a public setter, so that it can be
set from outside or from the llvm-mc command line. We may need to
arrange things so that users can get this behavior from clang, but
that's future work.
I also put additional hex literal lexing functionality under this flag
to fix PR32973. It appears that this hex literal parsing wasn't intended
to be enabled in non-masm-style blocks.
Now, masm integers (0b1101 and 0ABCh) work in __asm blocks from clang,
but 0b label references work when using .intel_syntax in standalone .s
files.
However, 0b label references will *not* work from __asm blocks in clang.
They will work from GCC inline asm blocks, which it sounds like is
important for Crypto++ as mentioned in PR36144.
Essentially, we only lex masm literals for inline asm blobs that use
intel syntax. If the .intel_syntax directive is used inside a gnu-style
inline asm statement, masm literals will not be lexed, which is
compatible with gas and llvm-mc standalone .s assembly.
This fixes PR36144 and PR32973.
Reviewers: Gerolf, avt77
Subscribers: eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D53535
llvm-svn: 345189
I'm not sure all the microarchitectural tuning flags that have been added to IVBFeatures are relevant for KNL. Separating will allow us to see and audit them. There might even be some simplification opportunities in the Sandy Bridge through Icelake inheritance line without KNL using the same chain.
llvm-svn: 345183
Add X86 SimplifyDemandedBitsForTargetNode and use it to simplify PMULDQ/PMULUDQ target nodes.
This enables us to repeatedly simplify the node's arguments after the previous approach had to be reverted due to PR39398.
Differential Revision: https://reviews.llvm.org/D53643
llvm-svn: 345182
The BKPT instruction is specified to cause a software breakpoint,
and at least on Linux results in a SIGTRAP. This makes it more
suitable for implementing debugtrap than TRAP (aka UDF #254), which
is specified to cause an undefined instruction exception and results
in a SIGILL on Linux.
Moreover, BKPT is not marked as a terminator, which is not only
consistent with the IR instruction but allows the analyzeBlock
function to correctly analyze a basic block containing the instruction,
which fixes an assertion failure in the machine block placement pass
previously triggered by the included test case.
Because BKPT is only supported starting with ARMv5T, we continue to
use UDF #254 when targeting v4T.
Differential Revision: https://reviews.llvm.org/D53614
llvm-svn: 345171
This will allow other generators of LLVM IR to use the auto-vectorizer
without having to change that flag.
Note: on its own, this patch will enable auto-vectorization on Hexagon
in all cases, regardless of the -fvectorize flag. There is a companion
clang patch that together with this one forms an NFC for clang users.
llvm-svn: 345169
This patch brings back the MOV64r0 pseudo instruction for zeroing a 64-bit register. This replaces the SUBREG_TO_REG MOV32r0 sequence we use today. Post register allocation we will rewrite the MOV64r0 to a 32-bit xor with an implicit def of the 64-bit register similar to what we do for the various XMM/YMM/ZMM zeroing pseudos.
My main motivation is to enable the spill optimization in foldMemoryOperandImpl. As we were seeing some code that repeatedly did "xor eax, eax; store eax;" to spill several registers with a new xor for each store. With this optimization enabled we get a store of a 0 immediate instead of an xor. Though I admit the ideal solution would be one xor where there are multiple spills. I don't believe we have a test case that shows this optimization in here. I'll see if I can try to reduce one from the code were looking at.
There's definitely some other machine CSE(and maybe other passes) behavior changes exposed by this patch. So it seems like there might be some other deficiencies in SUBREG_TO_REG handling.
Differential Revision: https://reviews.llvm.org/D52757
llvm-svn: 345165
Non-uniform division/remainder handling was added back at D49248/D50765 - so share the 'mul+sub' costs that already exist for uniform cases.
llvm-svn: 345164
Summary:
If the target does not support `.asciz` and `.ascii` directives, the
strings are represented as bytes and each byte is placed on the new line
as a separate byte directive `.b8 <data>`. NVPTX target allows to
represent the vector of the data of the same type as a vector, where
values are separated using `,` symbol: `.b8 <data1>,<data2>,...`. This
allows to reduce the size of the final PTX file. Ptxas tool includes ptx
files into the resulting binary object, so reducing the size of the PTX
file is important.
Reviewers: tra, jlebar, echristo
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D45822
llvm-svn: 345142
This B/W VPTEST instructions are only available with AVX512BW. But lowering should prevent any byte or word elements from getting to isel so this can't be exposed.
llvm-svn: 345112
A global alias may use indices which are not considered in bounds. In
such a case, accessing the base object will fail as it only peers
through inbounds accesses. This pattern is used by the swift compiler
to create references to preceeding members in the type metadata. This
would cause the code generation to fail when targeting a platform that
used ELF as the object file format. Be conservative and fail the
read-only check if we run into an alias that we cannot peer through.
llvm-svn: 345107
When implementing memset's today we often see this pattern:
$x0 = MOV 0xXYXYXYXYXYXYXYXY
store $x0, ...
$w1 = MOV 0xXYXYXYXY
store $w1, ...
We first create a 64bit constant in a 64bit register with all bytes the
same and then create a 32bit constant with all bytes the same in a 32bit
register. In many targets we could just access the lower byte of the
64bit register instead.
- Ideally this would be handled by the ConstantHoist pass but it runs
too early when memset isn't expanded yet.
- The memset expansion code already had this optimization implemented,
however SelectionDAG constantfolding would constantfold the
"trunc(bigconstnat)" pattern to "smallconstant".
- This patch makes the memset expansion mark the constant as Opaque and
stop DAGCombiner from constant folding in this situation. (Similar to
how ConstantHoisting marks things as Opaque to avoid folding
ADD/SUB/etc.)
Differential Revision: https://reviews.llvm.org/D53181
llvm-svn: 345102
We can't add the MULDQ node back to the worklist after the demanded bits change has been committed in case the node has been removed entirely. This will have to wait until we have SimplifyDemandedBitsForTargetNode.
llvm-svn: 345070
Add support to allow bit-casting from f128 to i128 and then
extracting 64 bits from the result.
Differential Revision: https://reviews.llvm.org/D49507
llvm-svn: 345053
Vector types are not possible here because this code explicitly
checks for a scalar type, but this is another step towards
completely removing the fake binop queries for not/neg/fneg.
llvm-svn: 345043
This initially landed in rL345014, but was reverted in rL345017
due to sanitizer-x86_64-linux-fast buildbot failure in
check-lld (ELF/relocatable-versioned.s) test.
While i'm not yet quite sure what is the problem, one obvious
thing here is that extra truncation roundtrip.
Maybe that's it? If not, will re-revert.
Differential Revision: https://reviews.llvm.org/D53521
llvm-svn: 345027
Matches the approach taken in the constant pool shuffle decoders, and uses an UndefElts mask instead of uint64_t(-1) raw mask values, which doesn't work safely for i32/i64 shuffle mask sizes (as the -1 value is legal).
This allows us to remove the constant pool shuffle decoders from most of the getTargetShuffleMask variable shuffle cases (X86ISD::VPERMV3 will be handled in a future commit).
llvm-svn: 345018
Summary:
Continuation of D52348.
We also get the `c) x & (-1 >> (32 - y))` pattern here, because of the D48768.
I will add extra-uses into those tests and follow-up with a patch to handle those patterns too.
Reviewers: RKSimon, craig.topper
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53521
llvm-svn: 345014
analyzeBranch()/insertBranch() etc. do not properly deal with an undef
flag on the eflags input and used to produce invalid MIR. I don't see
this ever affecting real world inputs (I don't think it is possible to
produce undef flags with llvm IR), so I simply changed the code to bail
out in this case.
rdar://42122367
llvm-svn: 344970
I've included a fix to DAGCombiner::ForwardStoreValueToDirectLoad that I believe will prevent the previous miscompile.
Original commit message:
Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to rem
I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping.
I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the lo
I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits, RKSimon
Differential Revision: https://reviews.llvm.org/D53306
llvm-svn: 344965
Summary:
Replace its functionality with a TableGen InstrInfo relational
instruction mapping. Although arguably more complex than the TableGen
backend, the relational mapping is a smaller maintenance burden than a
TableGen backend.
Reviewers: aardappel, aheejin, dschuff
Subscribers: mgorny, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D53307
llvm-svn: 344962
A while ago we changed pushf and popf in Intel mode to generate pushfq
and popfq. Unfortunately that left us with no way to get the 16-bit
encoding in Intel mode so this patch adds pushfw and popfw as aliases
there.
llvm-svn: 344949
We can't safely assume that certain RawMask entries are UNDEF as most variable shuffles ignore non-index bits - PSHUFB only works on i8 elts so it'd be safe to use but I'm intending to come up with an alternative approach that works for all.
........
Enable this for PSHUFB constant mask decoding and remove the ConstantPool DecodePSHUFBMask
llvm-svn: 344937
We can't safely assume that certain RawMask entries are UNDEF as most variable shuffles ignore non-index bits.
........
Add support for UNDEF raw mask elements and remove the ConstantPool DecodeVPERMILPMask usage in X86ISelLowering.cpp
llvm-svn: 344936
Introduce new versions that follow the IEEE semantics
to help with legalization that may need quieted inputs.
There are some regressions from inserting unnecessary
canonicalizes when these are matched from fast math
fcmp + select which should be fixed in a future commit.
llvm-svn: 344914
Summary:
As discussed in D52304 / IRC, we now have pattern matching for
'bit extract' in two places - tablegen and `X86DAGToDAGISel`.
There are 4 patterns.
And we will have a problem with `x & (-1 >> (32 - y))` pattern.
* If the mask is one-use, then it is always unfolded into `x << (32 - y) >> (32 - y)` first.
Thus, the existing test coverage is already broken.
* If it is not one-use, then it is not unfolded, and is matched as BZHI.
* If it is not one-use, we will not match it as BEXTR. And if it is one-use, it will have been unfolded already.
So we will either not handle that pattern for BEXTR, or not have test coverage for it.
This is bad.
As discussed with @craig.topper, let's unify this matching, and do everything in `X86DAGToDAGISel`.
Then we will not have code duplication, and will have proper test coverage.
This indeed does not affect any tests, and this is great.
It means that for these two patterns, the `X86DAGToDAGISel` is identical to the tablegen version.
Please review carefully, i'm not fully sure about that intrinsic change, and introduction of the new `X86ISD` opcode.
Reviewers: craig.topper, RKSimon, spatel
Reviewed By: craig.topper
Subscribers: llvm-commits, craig.topper
Differential Revision: https://reviews.llvm.org/D53164
llvm-svn: 344904
Summary:
Trivial continuation of D52304.
While this pattern is not canonical, we do select it in the BZHI case,
so this should not be any different.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52348
llvm-svn: 344902
The D-Form VSX loads introduced in ISA 3.0 are not direct D-Form equivalent of
the corresponding X-Forms since they only target the Altivec registers.
Namely LXSSPX can load into any of the 64 VSX registers whereas LXSSP can only
load into the upper 32 VSX registers. Similarly with the remaining affected
instructions.
There is currently no way that I can see to trigger the bug, but as we add other
ways of exploiting these instructions, there may very well be instances that do.
This is an NFC patch in practical terms since the changes it introduces can not
be triggered without an MIR test.
Differential revision: https://reviews.llvm.org/D53323
llvm-svn: 344894
This makes fast isel treat all legal vector types the same way. Previously only vXi64 was in the fast-isel tables.
This unfortunately prevents matching of andn by fast-isel for these types since the requires SelectionDAG. But we already had this issue for vXi64. So at least we're consistent now.
Interestinly it looks like fast-isel can't handle instructions with constant vector arguments so the the not part of the andn patterns is selected with SelectionDAG. This explains why VPTERNLOG shows up in some of the tests.
This is a subset of D53268. As I make progress on that, I will try to reduce the number of lines in the tablegen files.
llvm-svn: 344884
Summary:
Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to remove the bitcast.
I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping.
I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the load size.
I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits, RKSimon
Differential Revision: https://reviews.llvm.org/D53306
llvm-svn: 344877
Summary:
These nodes exist to overcome an isel problem where we can generate a zero extend of an AH register followed by an extract subreg, and another zero extend. The first zero extend exists to avoid a partial register update copying the AH register into the low 8-bits. The second zero extend exists if the user wanted the remainder zero extended.
To make this work we had a DAG combine to morph the DIVREM opcode to a special opcode that included the extend. But then we had to add the new node to computeKnownBits and computeNumSignBits to process the extension portion.
This patch instead removes all of that and adds a late peephole to detect the two extends.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53449
llvm-svn: 344874
D53306 exposes an issue where we sometimes use constant pool data from bigger vectors than the target shuffle mask. This should be safe to do, but we have to be certain that we're using the bottom most part of the vector as the shuffle mask decoders have no way to peek into subvectors with non-zero offsets.
llvm-svn: 344867
Summary:
Add selection patterns to support one bit Sub.
Reviewers:
rampitec, arsenm
Differential Revision:
https://reviews.llvm.org/D52946
llvm-svn: 344815
There is no guarantee the root is at the end if isel created any nodes without morphing them. This includes the nodes created by manual isel from C++ code in X86ISelDAGToDAG.
This is similar to r333415 from PowerPC which is where I originally stole the peephole loop from.
I don't have a test case, but without this a future patch doesn't work which is how I found it.
llvm-svn: 344808
Summary:
Undefined indices in shuffles can be used when not all lanes of the
output vector will be used. This happens for example in the expansion
of vector reduce operations. Regardless, undefs are legal as lane
indices in IR and should be supported.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D53057
llvm-svn: 344803
Allows to disable direct TLS segment access (%fs or %gs). GCC supports
a similar flag, it can be useful in some circumstances, e.g. when a thread
context block needs to be updated directly from user space. More info
and specific use cases: https://bugs.llvm.org/show_bug.cgi?id=16145
There is another revision for clang as well.
Related: D53102
All X86 CodeGen tests appear to pass:
```
[46/47] Running lit suite /SourceCache/llvm-trunk-8.0/test/CodeGen
Testing Time: 23.17s
Expected Passes : 3801
Expected Failures : 15
Unsupported Tests : 8021
```
Reviewed by: Craig Topper.
Patch by nruslan (Ruslan Nikolaev).
Differential Revision: https://reviews.llvm.org/D53103
llvm-svn: 344723
Summary:
To workaround a hardware issue in the (base + offset) calculation
when base is negative. The impact on code quality should be limited
since SILoadStoreOptimizer still runs afterwards and is able to
combine loads/stores based on known sign information.
This fixes visible corruption in Hitman on SI (easily reproducible
by running benchmark mode).
Change-Id: Ia178d207a5e2ac38ae7cd98b532ea2ae74704e5f
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99923
Reviewers: arsenm, mareko
Subscribers: jholewinski, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D53160
llvm-svn: 344698
Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar load intrinsics directly
to either VMEM or SMRD buffer loads based on divergence analysis.
If an offset happens to end up in a VGPR -- either because a floating
point calculation was involved, or due to other remaining deficiencies
in SIFixSGPRCopies -- we use v_readfirstlane.
There is some unrelated churn in tests since we now select MUBUF offsets
in a unified way with non-scalar buffer loads.
Change-Id: I170e6816323beb1348677b358c9d380865cd1a19
Reviewers: arsenm, alex-t, rampitec, tpr
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D53283
llvm-svn: 344696
Previously reverted in rL343082.
Original commit message:
On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.
Differential Revision: https://reviews.llvm.org/D51983
llvm-svn: 344693
This is patch 2/2, following up on D53314, and is the functional change
to prevent fusing mul + add sequences into VFMAs.
Differential revision: https://reviews.llvm.org/D53315
llvm-svn: 344683
This is a follow up of rL342874, which stopped fusing muls and adds into VMLAs
for performance reasons on the Cortex-M4 and Cortex-M33. This is a serie of 2
patches, that is trying to achieve the same for VFMA. The second column in the
table below shows what we were generating before rL342874, the third column
what changed with rL342874, and the last column what we want to achieve with
these 2 patches:
--------------------------------------------------------
| Opt | < rL342874 | >= rL342874 | |
|------------------------------------------------------|
|-O3 | vmla | vmul | vmul |
| | | vadd | vadd |
|------------------------------------------------------|
|-Ofast | vfma | vfma | vmul |
| | | | vadd |
|------------------------------------------------------|
|-Oz | vmla | vmla | vmla |
--------------------------------------------------------
This patch 1/2, is a cleanup of the spaghetti predicate logic on the different
VMLA and VFMA codegen rules, so that we can make the final functional change in
patch 2/2. This also fixes a typo in the regression test added in rL342874.
Differential revision: https://reviews.llvm.org/D53314
llvm-svn: 344671
Without this we match the CMP+AND to a TEST and then match the SHR separately. I'm trusting analyzeCompare to remove the TEST during the peephole pass. Otherwise we need to check the flag users to see if they only use the Z flag.
This recovers a case lost by r344270.
Differential Revision: https://reviews.llvm.org/D53310
llvm-svn: 344649
When a landing pad is calculated in a program that is compiled
for micromips, it will point to an even address. Such an error will
cause a segmentation fault, as the instructions in micromips are
aligned on odd addresses. This patch sets the last bit of the offset
where a landing pad is, to 1, which will effectively be
an odd address and point to the instruction exactly.
Differential Revision: https://reviews.llvm.org/D52985
llvm-svn: 344591
Summary:
This adds support for LSDA (exception table) generation for wasm EH.
Wasm EH mostly follows the structure of Itanium-style exception tables,
with one exception: a call site table entry in wasm EH corresponds to
not a call site but a landing pad.
In wasm EH, the VM is responsible for stack unwinding. After an
exception occurs and the stack is unwound, the control flow is
transferred to wasm 'catch' instruction by the VM, after which the
personality function is called from the compiler-generated code. (Refer
to WasmEHPrepare pass for more information on this part.)
This patch:
- Changes wasm.landingpad.index intrinsic to take a token argument, to
make this 1:1 match with a catchpad instruction
- Stores landingpad index info and catch type info MachineFunction in
before instruction selection
- Lowers wasm.lsda intrinsic to an MCSymbol pointing to the start of an
exception table
- Adds WasmException class with overridden methods for table generation
- Adds support for LSDA section in Wasm object writer
Reviewers: dschuff, sbc100, rnk
Subscribers: mgorny, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D52748
llvm-svn: 344575
These included a bitcast of a load from v4f32 to v2f64, but DAG combine should have already changed the type of the load to remove the cast.
llvm-svn: 344573
AARCH64 equivalent to D53257 - uses widening pairwise adds on vXi8 CTPOP to support i16/i32/i64 vectors.
This is a blocker for generic vector CTPOP expansion (P32655) - this will remove the aarch64 diff from D53258.
Differential Revision: https://reviews.llvm.org/D53259
llvm-svn: 344554
When compiling static executable for micromips, CFI symbols
are incorrectly labeled as MICROMIPS, which cause
".eh_frame_hdr refers to overlapping FDEs." error.
This patch does not label CFI symbols as MICROMIPS, and FDEs do not
overlap anymore. This patch also exposes another bug, which is fixed
here: https://reviews.llvm.org/D52985
Differential Revision: https://reviews.llvm.org/D52987
llvm-svn: 344516
As I suggested on PR39281, this patch uses PADDL pairwise addition to widen from the vXi8 CTPOP result to the target vector type.
This is a blocker for moving more x86 code to generic vector CTPOP expansion (P32655 + D53258) - ARM's vXi64 CTPOP currently expands, which would generate a vXi64 MUL but ARM's custom lowering expands the general MUL case and vectors aren't well handled in LegalizeDAG - improving the CTPOP lowering was a lot easier than fixing the MUL lowering for this one case......
Differential Revision: https://reviews.llvm.org/D53257
llvm-svn: 344512
When compiling static executable for micromips, CFI symbols
are incorrectly labeled as MICROMIPS, which cause
".eh_frame_hdr refers to overlapping FDEs." error.
This patch does not label CFI symbols as MICROMIPS, and FDEs do not
overlap anymore. This patch also exposes another bug, which is fixed
here: https://reviews.llvm.org/D52985
Differential Revision: https://reviews.llvm.org/D52987
llvm-svn: 344511
by `getTerminator()` calls instead be declared as `Instruction`.
This is the biggest remaining chunk of the usage of `getTerminator()`
that insists on the narrow type and so is an easy batch of updates.
Several files saw more extensive updates where this would cascade to
requiring API updates within the file to use `Instruction` instead of
`TerminatorInst`. All of these were trivial in nature (pervasively using
`Instruction` instead just worked).
llvm-svn: 344502
Summary:
I've noticed that the bitcasts we introduce for these make computeKnownBits and computeNumSignBits not work well in LegalizeVectorOps. LegalizeVectorOps legalizes bottom up while LegalizeDAG legalizes top down. The bottom up strategy for LegalizeVectorOps means operands are legalized before their uses. So we promote and/or/xor before we legalize the operands that use them making computeKnownBits/computeNumSignBits in places like LowerTruncate suboptimal. I looked at changing LegalizeVectorOps to be top down as well, but that was more disruptive and caused some regressions. I also looked at just moving promotion of binops to LegalizeDAG, but that had a few issues one around matching AND,ANDN,OR into VSELECT because I had to create ANDN as vXi64, but the other nodes hadn't legalized yet, I didn't look too hard at fixing that.
This patch seems to produce better results overall than my other attempts. We now form broadcasts of constants better in some cases. For at least some of them the AND was being introduced in LegalizeDAG, promoted to vXi64, and the BUILD_VECTOR was also legalized there. I think we got bad ordering of that. Now the promotion is out of the legalizer so we handle this better.
In the longer term I think we really should evaluate whether we should be doing this promotion at all. It's really there to reduce isel pattern count, but I'm wondering if we'd be better served just eating the pattern cost or doing C++ based isel for vector and/or/xor in X86ISelDAGToDAG. The masked and/or/xor will definitely be difficult in patterns if a bitcast gets between the vselect and the and/or/xor node. That becomes a lot of permutations to cover.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53107
llvm-svn: 344487
interleave-group
The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.
Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D53011
llvm-svn: 344472
Summary: This is similar to what D52528 did for loads. It should match what generic type legalization does in 64-bit mode where it uses a v2i64 cast and an i64 store.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53173
llvm-svn: 344470
There is one remnant - AVX1 custom splitting of 256-bit vectors - which is due to a regression where the X86ISD::ANDNP is still performed as a YMM.
I've also tightened the CTLZ or CTPOP lowering in SelectionDAGLegalize::ExpandBitCount to require a legal CTLZ - it doesn't affect existing users and fixes an issue with AVX512 codegen.
llvm-svn: 344457
Use isConstantSplat instead of ISD::isConstantSplatVector to let us us peek through to illegal types (in this case for i686 targets to recognise i64 constants)
llvm-svn: 344452
If we have better CTLZ support than CTPOP, then use cttz(x) = width - ctlz(~x & (x - 1)) - and remove the CTTZ_ZERO_UNDEF handling as it no longer gives better codegen.
Similar to rL344447, this is also closer to LegalizeDAG's approach
llvm-svn: 344448
This patch changes the vector CTTZ lowering from:
cttz(x) = ctpop((x & -x) - 1)
to:
cttz(x) = ctpop(~x & (x - 1))
Not only does this make better use of the PANDN instruction, but it also matches the LegalizeDAG method which should allow us to remove the x86 specific code at some point in the future (we need to fix some issues with the bitcasted logic ops and CTPOP lowering first).
Differential Revision: https://reviews.llvm.org/D53214
llvm-svn: 344447
Add shuffle lowering for the case where we can shuffle the lanes into place followed by an in-lane permute.
This is mainly for cases where we can have non-repeating permutes in each lane, but for now I've just enabled it for v4f64 unary shuffles to fix PR39161 - there is no test coverage for other shuffles that might benefit yet.
We now have several cross-lane shuffle lowering methods that all do something similar - I've looked at merging some of these (notably by making the repeated mask mechanism in lowerVectorShuffleByMerging128BitLanes optional), but there is a lot of assertions/assumptions in the way that makes this tricky - I ended up going for adding yet another relatively simple method instead.
Differential Revision: https://reviews.llvm.org/D53148
llvm-svn: 344446
Summary:
AArch64 can fold some shift+extend operations on the RHS operand of
comparisons, so swap the operands if that makes sense.
This provides a fix for https://bugs.llvm.org/show_bug.cgi?id=38751
Reviewers: efriedma, t.p.northover, javed.absar
Subscribers: mcrosier, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D53067
llvm-svn: 344439
SelectionDAGBuilder::visitShift will always zero-extend a shift amount when it
is promoted to the ShiftAmountTy. This results in zero-extension (masking)
which is unnecessary for RISC-V as the shift operations only read the lower 5
or 6 bits (RV32 or RV64).
I initially proposed adding a getExtendForShiftAmount hook so the shift amount
can be any-extended (D52975). @efriedma explained this was unsafe, so I have
instead eliminate the unnecessary and operations at instruction selection time
in a manner similar to X86InstrCompiler.td.
Differential Revision: https://reviews.llvm.org/D53224
llvm-svn: 344432
Generic legalization should be able to finish legalizing the EXTRACT_SUBVECTOR probably by turning it into a BUILD_VECTOR. But we should emit the simplest sequence.
llvm-svn: 344424
The algorithm we would do previously was identical to generic legalization. If we ever switch to legalizing integer vectors via widening we'll be able to kill off the code since it now only runs for promotion.
llvm-svn: 344423
This is the planned follow-up to D52997. Here we are reducing horizontal vector math codegen
by default. AMD Jaguar (btver2) should have no difference with this patch because it has
fast-hops. (If we want to set that bit for other CPUs, let me know.)
The code changes are small, but there are many test diffs. For files that are specifically
testing for hops, I added RUNs to distinguish fast/slow, so we can see the consequences
side-by-side. For files that are primarily concerned with codegen other than hops, I just
updated the CHECK lines to reflect the new default codegen.
To recap the recent horizontal op story:
1. Before rL343727, we were producing hops for all subtargets for a variety of patterns.
Hops were likely not optimal for all targets though.
2. The IR improvement in r343727 exposed a hole in the backend hop pattern matching, so
we reduced hop codegen for all subtargets. That was bad for Jaguar (PR39195).
3. We restored the hop codegen for all targets with rL344141. Good for Jaguar, but
probably bad for other CPUs.
4. This patch allows us to distinguish when we want to produce hops, so everyone can be
happy. I'm not sure if we have the best predicate here, but the intent is to undo the
extra hop-iness that was enabled by r344141.
Differential Revision: https://reviews.llvm.org/D53095
llvm-svn: 344361
Pull out repeated byte sum stage for popcount of vector elements > 8bits.
This allows us to simplify the LUT/BITMATH popcnt code to always assume vXi8 vectors, and also improves avx512bitalg codegen which only has access to vpopcntb/vpopcntw.
llvm-svn: 344348
The current BitPermutationSelector generates a code to build a value by tracking two types of bits: ConstZero and Variable.
ConstZero means a bit we need to mask off and Variable is a bit we copy from an input value.
This patch add third type of bits VariableKnownToBeZero caused by AssertZext node or zero-extending load node.
VariableKnownToBeZero means a bit comes from an input value, but it is known to be already zero. So we do not need to mask them.
VariableKnownToBeZero enhances flexibility to group bits, since we can avoid redundant masking for these bits.
This patch also renames "HasZero" to "NeedMask" since now we may skip masking even when we have zeros (of type VariableKnownToBeZero).
Differential Revision: https://reviews.llvm.org/D48025
llvm-svn: 344347
Fixes PR32160 by reducing the size of PSHUFB if we only use one of the lanes.
This approach can probably be generalized to handle any target shuffle (and any subvector index) but we have no test coverage at the moment.
llvm-svn: 344336
This patch adds the ability to identify instructions that are "move elimination
candidates". It also allows scheduling models to describe processor register
files that allow move elimination.
A move elimination candidate is an instruction that can be eliminated at
register renaming stage.
Each subtarget can specify which instructions are move elimination candidates
with the help of tablegen class "IsOptimizableRegisterMove" (see
llvm/Target/TargetInstrPredicate.td).
For example, on X86, BtVer2 allows both GPR and MMX/SSE moves to be eliminated.
The definition of 'IsOptimizableRegisterMove' for BtVer2 looks like this:
```
def : IsOptimizableRegisterMove<[
InstructionEquivalenceClass<[
// GPR variants.
MOV32rr, MOV64rr,
// MMX variants.
MMX_MOVQ64rr,
// SSE variants.
MOVAPSrr, MOVUPSrr,
MOVAPDrr, MOVUPDrr,
MOVDQArr, MOVDQUrr,
// AVX variants.
VMOVAPSrr, VMOVUPSrr,
VMOVAPDrr, VMOVUPDrr,
VMOVDQArr, VMOVDQUrr
], CheckNot<CheckSameRegOperand<0, 1>> >
]>;
```
Definitions of IsOptimizableRegisterMove from processor models of a same
Target are processed by the SubtargetEmitter to auto-generate a target-specific
override for each of the following predicate methods:
```
bool TargetSubtargetInfo::isOptimizableRegisterMove(const MachineInstr *MI)
const;
bool MCInstrAnalysis::isOptimizableRegisterMove(const MCInst &MI, unsigned
CPUID) const;
```
By default, those methods return false (i.e. conservatively assume that there
are no move elimination candidates).
Tablegen class RegisterFile has been extended with the following information:
- The set of register classes that allow move elimination.
- Maxium number of moves that can be eliminated every cycle.
- Whether move elimination is restricted to moves from registers that are
known to be zero.
This patch is structured in three part:
A first part (which is mostly boilerplate) adds the new
'isOptimizableRegisterMove' target hooks, and extends existing register file
descriptors in MC by introducing new fields to describe properties related to
move elimination.
A second part, uses the new tablegen constructs to describe move elimination in
the BtVer2 scheduling model.
A third part, teaches llm-mca how to query the new 'isOptimizableRegisterMove'
hook to mark instructions that are candidates for move elimination. It also
teaches class RegisterFile how to describe constraints on move elimination at
PRF granularity.
llvm-mca tests for btver2 show differences before/after this patch.
Differential Revision: https://reviews.llvm.org/D53134
llvm-svn: 344334
Failure was discovered upon running
projects/compiler-rt/test/builtins/Unit/divtc3_test.c
in a stage2 compiler build.
When compiling projects/compiler-rt/lib/builtins/divtc3.c,
a call to fmaxl within the divtc3 implementation had its
return values read from registers $2 and $3 instead of $f0 and $f2.
Include fmaxl in the list of long double emulation routines
to have its return value correctly interpreted as f128.
Almost exact issue here: https://reviews.llvm.org/D17760
Differential Revision: https://reviews.llvm.org/D52649
llvm-svn: 344326
DIV/REM by constants should always be expanded into mul/shift/etc.
patterns. Unfortunately the ConstantHoisting pass runs too early at a
point where the pattern isn't expanded yet. However after
ConstantHoisting hoisted some immediate the result may not expand
anymore. Also the hoisting typically doesn't make sense because it
operates on immediates that will change completely during the expansion.
Report DIV/REM as TCC_Free so ConstantHoisting will not touch them.
Differential Revision: https://reviews.llvm.org/D53174
llvm-svn: 344315
Summary:
Instruction with 0 in fence field being disassembled as fence , iorw.
Printing "unknown" to match GAS behavior.
This bug was uncovered by a LLVM MC Disassembler Protocol Buffer Fuzzer
for the RISC-V assembly language.
Reviewers: asb
Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, jfb, PkmX, jocewei, asb
Differential Revision: https://reviews.llvm.org/D51828
llvm-svn: 344309
On 64-bit targets the generic legalize will use an i64 load and a scalar_to_vector for us. But on 32-bit targets i64 isn't legal and the generic legalizer will end up emitting two 32-bit loads. We have DAG combines that try to put those two loads back together with pretty good success.
This patch instead uses f64 to avoid the splitting entirely. I've made it do the same for 64-bit mode for consistency and to keep the load in the fp domain.
There are a few things in here that look like regressions in 32-bit mode, but I believe they bring us closer to the 64-bit mode codegen. And that the 64-bit mode code could be better. I think those issues should be looked at separately.
Differential Revision: https://reviews.llvm.org/D52528
llvm-svn: 344291
Having a constant value operand in the compound instruction
is not always profitable. This patch improves coremark by ~4% on
Hexagon.
Differential Revision: https://reviews.llvm.org/D53152
llvm-svn: 344284
Also, avoid comparing GUIDs when ordering global addresses, because
source file location can cause different GUID to be calculated. As a
result, a pair of symbols can compare "less" in one directory, but
"greater" in another.
llvm-svn: 344271
This is an alternative to D53080 since I think using a BEXTR for a shifted mask is definitely an improvement when the shl can be absorbed into addressing mode. The other cases I'm less sure about.
We already have several tricks for handling an and of a shift in address matching. This adds a new case for BEXTR.
I've moved the BEXTR matching code back to X86ISelDAGToDAG to allow it to match. I suppose alternatively we could directly emit a X86ISD::BEXTR node that isel could pattern match. But I'm trying to view BEXTR matching as an isel concern so DAG combine can see 'and' and 'shift' operations that are well understood. We did lose a couple cases from tbm_patterns.ll, but I think there are ways to recover that.
I've also put back the manual load folding code in matchBEXTRFromAnd that I removed a few months ago in r324939. This gives us some more freedom to make decisions based on the ability to fold a load. I haven't done anything with that yet.
Differential Revision: https://reviews.llvm.org/D53126
llvm-svn: 344270
The ARM64 elf emitter would omit printing data
symbol for zero filled constant data. This patch
overrides the emitFill method as to enforce that
the symbol is correctly printed.
Differential revision: https://reviews.llvm.org/D53132
llvm-svn: 344248
Summary:
As discussed in D48491, we can't really do this in the TableGen,
since we need to produce *two* instructions. This only implements
one single pattern. The other 3 patterns will be in follow-ups.
I'm not sure yet if we want to also fuse shift into here
(i.e `(x >> start) & ...`)
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D52304
llvm-svn: 344224
Summary:
Although the saturating float to int instructions are already
emitted from normal IR, the fpto{s,u}i instructions produce poison
values if the argument cannot fit in the result type. These intrinsics
are therefore necessary to get guaranteed defined saturating behavior.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D53004
llvm-svn: 344204
Remove tryFoldVecLoad since tryFoldLoad would call IsProfitableToFold and pick up the new check.
This saves about 5K out of ~600K on the generated isel table.
llvm-svn: 344189
Moving away from UnknownSize is part of the effort to migrate us to
LocationSizes (e.g. the cleanup promised in D44748).
This doesn't entirely remove all of the uses of UnknownSize; some uses
require tweaks to assume that UnknownSize isn't just some kind of int.
This patch is intended to just be a trivial replacement for all places
where LocationSize::unknown() will Just Work.
llvm-svn: 344186