hasNoCarryFlagUses hardcoded that the flag result is 1 and used that to filter which uses were of interest. hasNoSignedComparisonUses just assumes the only result is flags and checks whether any user of the node is a CopyToReg instruction.
After this patch we now do a result number check in both and rely on the caller to provide the result number.
This shouldn't change behavior it was just an odd difference between the two functions that I noticed.
llvm-svn: 349222
The change is an effort to split and refactor abandoned
D34708 into smaller parts.
Here the behaviour of unsupported instructions is changed
to match the behaviour of explicit intrinsics calls.
Currently LLVM crashes with:
> Assertion getInstruction() && "Not a call or invoke instruction!" failed.
With this patch LLVM produces a more sensible error message:
> Cannot select: ... i32 = ExternalSymbol'__foobar'
Author: Denys Zariaiev <denys.zariaiev@gmail.com>
Differential Revision: https://reviews.llvm.org/D55145
llvm-svn: 349213
Refactor the ARMInstructionSelector to cache some opcodes in the
constructor instead of checking all the time if we're in ARM or Thumb
mode.
llvm-svn: 349143
Mark G_ADD, G_SUB, G_MUL, G_AND, G_OR and G_XOR as legal for both ARM
and Thumb2.
Extract the legalizer tests for these opcodes into another file.
Add tests for the instruction selector.
llvm-svn: 349142
Summary:
If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node.
To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term.
Reviewers: RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55459
llvm-svn: 349137
This requires the two callers to manifest a 0 to make EmitCmp call EmitTest.
I'm looking into changing how we combine TEST and flag setting instructions to not be part of lowering. And instead be part of DAG combine or isel. Which will mean EmitTest will probably become gutted and maybe disappear entirely.
llvm-svn: 349094
Fix the logic in the definition of the `ExynosShiftExPred` as a more
specific version of `ExynosShiftPred`. But, since `ExynosShiftExPred` is
not used yet, this change has NFC.
llvm-svn: 349091
Summary:
Macros are expanded on a single line. In case of large expansions,
with sufficiently many instructions with memory operands (and when
-fdebug-info-for-profiling is requested), we may be unable to generate
new base discriminator values - new values overflow (base
discriminators may not be larger than 2^12).
This CL warns instead of asserting in such a case. A subsequent CL
will add APIs to check for overflow before creating new debug info.
See https://bugs.llvm.org/show_bug.cgi?id=39890
Reviewers: davidxl, wmi, gbedwell
Reviewed By: davidxl
Subscribers: aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D55643
llvm-svn: 349075
There's still a couple of minor SimplifyDemandedElts regressions in some of the shift amount splats that will be fixed in future patches.
llvm-svn: 349052
Summary: The Sparc V9 membar instruction can enforce different types of
memory orderings depending on the value in its immediate field. In the
architectural manual the type is selected by combining different assembler
tags into a mask. This patch adds support for these tags.
Reviewers: jyknight, venkatra, brad
Reviewed By: jyknight
Subscribers: fedor.sergeev, jrtc27, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D53491
llvm-svn: 349048
Summary:
Constraining an integer value to a floating point register using "f"
causes an llvm_unreachable to trigger. This patch allows i32 integers
to be placed in a single precision float register and i64 integers to
be placed in a double precision float register. This matches the behavior
of GCC.
For other types the llvm_unreachable is removed to instead trigger an
error message that points out the offending line.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D51614
llvm-svn: 349045
There are several Pseudo in PowerPC backend.
eg:
* ISel Pseudo-instructions , which has let usesCustomInserter=1 in td
ExpandISelPseudos -> EmitInstrWithCustomInserter will deal with them.
* Post-RA pseudo instruction, which has let isPseudo = 1 in td, or Standard pseudo (SUBREG_TO_REG,COPY etc.)
ExpandPostRAPseudos -> expandPostRAPseudo will expand them
* Multi-instruction pseudo operations will expand them PPCAsmPrinter::EmitInstruction
* Pseudo instruction in CodeEmitter, which has encoding of 0.
Currently, in td files, especially PPCInstrVSX.td,
we did not distinguish Post-RA pseudo instruction and Pseudo instruction in CodeEmitter very clearly.
This patch is to
* Rename Pseudo<> class to PPCEmitTimePseudo, which means encoding of 0 in CodeEmitter
* Introduce new class PPCPostRAExpPseudo <> for previous PostRA Pseudo
* Introduce new class PPCCustomInserterPseudo <> for previous Isel Pseudo
Differential Revision: https://reviews.llvm.org/D55143
llvm-svn: 349044
When computing register allocation hints for a GRX32Bit register, make sure
that any of the hinted registers that are also copy hints are returned first
in the list.
Review: Ulrich Weigand.
llvm-svn: 349037
Mark G_SEXT, G_ZEXT and G_ANYEXT to 32 bits as legal and add support for
them in the instruction selector. This uses handwritten code again
because the patterns that are generated with TableGen are tuned for what
the DAG combiner would produce and not for simple sext/zext nodes.
Luckily, we only need to update the opcodes to use the Thumb2 variants,
everything else can be reused from ARM.
llvm-svn: 349026
Move existing rotation expansion code into TargetLowering and set it up for vectors as well.
Ideally this would share more of the funnel shift expansion, but we handle the shift amount modulo quite differently at the moment.
Begun removing x86 vector rotate custom lowering to use the expansion.
llvm-svn: 349025
Adds support for the various RISC-V FMA instructions (fmadd, fmsub, fnmsub, fnmadd).
The criteria for choosing whether a fused add or subtract is used, as well as
whether the product is negated or not, is whether some of the arguments to the
llvm.fma.* intrinsic are negated or not. In the tests, extraneous fadd
instructions were added to avoid the negation being performed using a xor
trick, which prevented the proper FMA forms from being selected and thus
tested.
The FMA instruction patterns might seem incorrect (e.g., fnmadd: -rs1 * rs2 -
rs3), but they should be correct. The misleading names were inherited from
MIPS, where the negation happens after computing the sum.
The llvm.fmuladd.* intrinsics still do not generate RISC-V FMA instructions,
as that depends on TargetLowering::isFMAFasterthanFMulAndFAdd.
Some comments in the test files about what type of instructions are there
tested were updated, to better reflect the current content of those test
files.
Differential Revision: https://reviews.llvm.org/D54205
Patch by Luís Marques.
llvm-svn: 349023
MULX has somewhat improved register allocation constraints compared to the legacy MUL instruction. Both output registers are encoded instead of fixed to EAX/EDX, but EDX is used as input. It also doesn't touch flags. Unfortunately, the encoding is longer.
Prefering it whenever BMI2 is enabled is probably not optimal. Choosing it should somehow be a function of register allocation constraints like converting adds to three address. gcc and icc definitely don't pick MULX by default. Not sure what if any rules they have for using it.
Differential Revision: https://reviews.llvm.org/D55565
llvm-svn: 348975
Updated the annotate-kernel-features pass to support the propagation of uniform-work-group attribute from the kernel to the called functions. Once this pass is run, all kernels, even the ones which initially did not have the attribute, will be able to indicate weather or not they have uniform work group size depending on the value of the attribute.
Differential Revision: https://reviews.llvm.org/D50200
llvm-svn: 348971
Continue to present HSA metadata as YAML in ASM and when output by tools
(e.g. llvm-readobj), but encode it in Messagepack in the code object.
Differential Revision: https://reviews.llvm.org/D48179
llvm-svn: 348963
I'm hoping we can just replace SETCC_CARRY with SBB. This is another step towards that.
I've explicitly used zero as the input to the setcc to avoid a false dependency that we've had with the SETCC_CARRY. I changed one of the patterns that used NEG to instead use an explicit compare with 0 on the LHS. We needed the zero anyway to avoid the false dependency. The negate would clobber its input register. By using a CMP we can avoid that which could be useful.
Differential Revision: https://reviews.llvm.org/D55414
llvm-svn: 348959
This patch introduces a generic function to determine whether a given vector type is known to be a splat value for the specified demanded elements, recursing up the DAG looking for BUILD_VECTOR or VECTOR_SHUFFLE splat patterns.
It also keeps track of the elements that are known to be UNDEF - it returns true if all the demanded elements are UNDEF (as this may be useful under some circumstances), so this needs to be handled by the caller.
A wrapper variant is also provided that doesn't take the DemandedElts or UndefElts arguments for cases where we just want to know if the SDValue is a splat or not (with/without UNDEFS).
I had hoped to completely remove the X86 local version of this function, but I'm seeing some regressions in shift/rotate codegen that will take a little longer to fix and I hope to get this in sooner so I can continue work on PR38243 which needs more capable splat detection.
Differential Revision: https://reviews.llvm.org/D55426
llvm-svn: 348953
If a module has function references, but no functions
themselves, we may end up never calling runOnMachineFunction
and therefore would never initialize nvptxSubtarget field
which would eventually cause a crash.
Instead of relying on nvptxSubtarget being initialized by
one of the methods, retrieve subtarget info directly.
Differential Revision: https://reviews.llvm.org/D55580
llvm-svn: 348952
This extends the code that handles 16-bit add promotion to form LEA to also allow 8-bit adds.
That allows us to combine add ops with register moves and save some instructions. This is
another step towards allowing add truncation in generic DAGCombiner (see D54640).
Differential Revision: https://reviews.llvm.org/D55494
llvm-svn: 348946
I've extended the load/store optimizer to be able to produce dwordx3
loads and stores, This change allows many more load/stores to be combined,
and results in much more optimal code for our hardware.
Differential Revision: https://reviews.llvm.org/D54042
llvm-svn: 348937
Summary:
This patch provides a means to set Metadata section kind
for a global variable, if its explicit section name is
prefixed with ".AMDGPU.metadata."
This could be useful to make the global variable go to
an ELF section without any section flags set.
Reviewers: dstuttard, tpr, kzhuravl, nhaehnle, t-tye
Reviewed By: dstuttard, kzhuravl
Subscribers: llvm-commits, arsenm, jvesely, wdng, yaxunl, t-tye
Differential Revision: https://reviews.llvm.org/D55267
llvm-svn: 348922
Unfortunately we can't use TableGen for this because it doesn't yet
support predicates on the source pattern root. Therefore, add a bit of
handwritten code to the instruction selector to handle the most basic
cases.
Also mark them as legal and extract their legalizer test cases to a new
test file.
llvm-svn: 348920
Summary:
Emit COFF header when printing out the function. This is important as the
header contains two important pieces of information: the storage class for the
symbol and the symbol type information. This bit of information is required for
the linker to correctly identify the type of symbol that it is dealing with.
This patch mimics X86 and ARM COFF behavior for function header emission.
Reviewers: rnk, mstorsjo, compnerd, TomTan, ssijaric
Reviewed By: mstorsjo
Subscribers: dmajor, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D55535
llvm-svn: 348875
Summary:
When doing X86CondBrFolding::analyzeCompare, it will meet the SUB32ri instruction as below to use the global address for its operand,
%733:gr32 = SUB32ri %62:gr32(tied-def 0), @img2buf_normal, implicit-def $eflags
JNE_1 %bb.41, implicit $eflags
so the assertion "assert(MI.getOperand(ValueIndex).isImm() && "Expecting Imm operand")" is not correct and change the assert to if make X86CondBrFolding::analyzeCompare return false as not finding the compare for this
Patch by Jianping Chen
Reviewers: smaslov, LuoYuanke, liutianle, Jianping
Reviewed By: Jianping
Subscribers: lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D54250
llvm-svn: 348853
As discussed in D55494, we want to extend this to handle 8-bit
ops too, but that could be extended further to enable this on
32-bit systems too.
llvm-svn: 348851
As discussed in:
D55494
...this code has been disabled/dead for a long time (the code references
Athlon and Pentium 4), and there's almost no chance that it will be used
given the last decade of uarch evolution. Also, in SDAG we promote 16-bit
ops to 32-bit, so there's almost no way to test this code any more.
llvm-svn: 348845
Summary:
This patch supports `.eventtype` directive printing and parsing in the
same syntax with `.functype`.
Reviewers: aardappel, sbc100
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55353
llvm-svn: 348818
Summary:
- Unify mixed argument names (`Symbol` and `Sym`) to `Sym`
- Changed `MCSymbolWasm*` argument of `emit***` functions to `const
MCSymbolWasm*`. It seems not very intuitive that emit function in the
streamer modifies symbol contents.
- Moved empty function bodies to the header
- clang-format
Reviewers: aardappel, dschuff, sbc100
Subscribers: jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55347
llvm-svn: 348816
https://reviews.llvm.org/D55294
Previously MachineIRBuilder::buildInstr used to accept variadic
arguments for sources (which were either unsigned or
MachineInstrBuilder). While this worked well in common cases, it doesn't
allow us to build instructions that have multiple destinations.
Additionally passing in other optional parameters in the end (such as
flags) is not possible trivially. Also a trivial call such as
B.buildInstr(Opc, Reg1, Reg2, Reg3)
can be interpreted differently based on the opcode (2defs + 1 src for
unmerge vs 1 def + 2srcs).
This patch refactors the buildInstr to
buildInstr(Opc, ArrayRef<DstOps>, ArrayRef<SrcOps>)
where DstOps and SrcOps are typed unions that know how to add itself to
MachineInstrBuilder.
After this patch, most invocations would look like
B.buildInstr(Opc, {s32, DstReg}, {SrcRegs..., SrcMIBs..});
Now all the other calls (such as buildAdd, buildSub etc) forward to
buildInstr. It also makes it possible to build instructions with
multiple defs.
Additionally in a subsequent patch, we should make it possible to add
flags directly while building instructions.
Additionally, the main buildInstr method is now virtual and other
builders now only have to override buildInstr (for say constant
folding/cseing) is straightforward.
Also attached here (https://reviews.llvm.org/F7675680) is a clang-tidy
patch that should upgrade the API calls if necessary.
llvm-svn: 348815
- Check if an operand is an immediate before calling getImm. Some operands
that take constant values can actually have global symbols or other
constant expressions.
- When a load-constant instruction can be folded into users, make sure to
only delete it when all users have been successfully converted.
llvm-svn: 348802
This patch restricts the capability of G_MERGE_VALUES, and uses the new
G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places.
This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32>
and <2 x s64> vectors.
Differential Revisions: https://reviews.llvm.org/D53629
llvm-svn: 348788
This should really be generalized to allow increment and/or
we should replace it by using ISD::matchUnaryPredicate().
See D55515 for context.
llvm-svn: 348776
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, for the Exynos processors.
Differential revision: https://reviews.llvm.org/D55345
llvm-svn: 348774
This commit changes which l1 flush instruction is used for AMDPAL and
MESA3d workloads to flush the entire l1 cache instead of just the
volatile lines.
Differential Revision: https://reviews.llvm.org/D55367
llvm-svn: 348771
Refactor the scheduling predicates based on `MCInstPredicate`. Augment the
number of helper predicates used by processor specific predicates.
Differential revision: https://reviews.llvm.org/D55375
llvm-svn: 348768
When replacing jal with jalr, also emit '.reloc R_MIPS_JALR' (R_MICROMIPS_JALR
for micromips). The linker might then be able to turn jalr into a direct
call.
Add '-mips-jalr-reloc' to enable/disable this feature (default is true).
Differential revision: https://reviews.llvm.org/D55292
llvm-svn: 348760
A new pass to manage the Mode register.
Currently this just manages the floating point double precision
rounding requirements, but is intended to be easily extended to
encompass all Mode register settings.
The immediate motivation comes from the requirement to use the
round-to-zero rounding mode for the 16 bit interpolation
instructions, where the rounding mode setting is shared between
16 and 64 bit operations.
llvm-svn: 348754
Fixes https://bugs.llvm.org/show_bug.cgi?id=39926.
The size of the first copy was computed as
std::abs(std::abs(LdDisp2) - std::abs(LdDisp1)), which results in
skipped bytes if the signs of LdDisp2 and LdDisp1 differ. As far as
I can see, this should just be LdDisp2 - LdDisp1. The case where
LdDisp1 > LdDisp2 is already handled in the code above, in which case
LdDisp2 is set to LdDisp1 and this subtraction will evaluate to
Size1 = 0, which is the correct value to skip an overlapping copy.
Differential Revision: https://reviews.llvm.org/D55485
llvm-svn: 348750
Both intrinsics do the exact same thing so we really only need one.
Earlier in the 8.0 cycle we changed the signature of this intrinsic without renaming it. But it looks difficult to get the autoupgrade code to allow me to merge the intrinsics and change the signature at the same time. So I've renamed the intrinsic slightly for the new merged intrinsic. I'm skipping autoupgrading from the previous new to 8.0 signature. I've also renamed the subborrow for consistency.
llvm-svn: 348737
Summary:
`llvm::AttributeList` and `llvm::AttributeSet` are immutable, and so methods
defined on these classes, such as `addAttribute`, return a new immutable
object with the attribute added. In https://reviews.llvm.org/D55217 I attempted
to annotate methods such as `addAttribute` with `LLVM_NODISCARD`, since
calling these methods has no side-effects, and so ignoring the result
that is returned is almost certainly a programmer error.
However, committing the change resulted in new warnings in the AMDGPU target.
The AMDGPU simplify libcalls pass added in https://reviews.llvm.org/D36436
attempts to add the readonly and nounwind attributes to simplified
library functions, but instead calls the `addAttribute` methods and
ignores the result.
Modify the simplify libcalls pass to actually add the nounwind and
readonly attributes. Also update the simplify libcalls test to assert
that these attributes are actually being set.
Reviewers: rampitec, vpykhtin, rnk
Reviewed By: rampitec
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D55435
llvm-svn: 348732
Previously we had to take the carry in and add -1 to it to set the carry flag so we could use it with ADC/SBB. But if we know its 0 then we don't need to bother.
This should go a long way towards fixing PR24545.
llvm-svn: 348727
The dependency was added in r213995 in response to r213986 which did make
X86/Utils depend on IR, but r256680 later removed that dependency again.
llvm-svn: 348724
The existing code tries to handle an undef operand while transforming an add to an LEA,
but it's incomplete because we will crash on the i16 test with the debug output shown below.
It's better to just give up instead. Really, GlobalIsel should have folded these before we
could get into trouble.
# Machine code for function add_undef_i16: NoPHIs, TracksLiveness, Legalized, RegBankSelected, Selected
bb.0 (%ir-block.0):
liveins: $edi
%1:gr32 = COPY killed $edi
%0:gr16 = COPY %1.sub_16bit:gr32
%5:gr64_nosp = IMPLICIT_DEF
%5.sub_16bit:gr64_nosp = COPY %0:gr16
%6:gr64_nosp = IMPLICIT_DEF
%6.sub_16bit:gr64_nosp = COPY %2:gr16
%4:gr32 = LEA64_32r killed %5:gr64_nosp, 1, killed %6:gr64_nosp, 0, $noreg
%3:gr16 = COPY killed %4.sub_16bit:gr32
$ax = COPY killed %3:gr16
RET 0, implicit killed $ax
# End machine code for function add_undef_i16.
*** Bad machine code: Reading virtual register without a def ***
- function: add_undef_i16
- basic block: %bb.0 (0x7fe6cd83d940)
- instruction: %6.sub_16bit:gr64_nosp = COPY %2:gr16
- operand 1: %2:gr16
LLVM ERROR: Found 1 machine code errors.
Differential Revision: https://reviews.llvm.org/D54710
llvm-svn: 348722
Extension to rL348617, turns out llvm-exegesis doesn't need to match the perf counter name against a scheduler model resource name - so I've added a few more counters that I could find in the libpfm4 source code (and fix a typo in the knl/knm retired_uops counter - which uses 'all' instead of 'any').
llvm-svn: 348721
We were still using the rounded down offset and alignment even though
they aren't handled because you can't trivially bitcast the loaded
value.
llvm-svn: 348658
Summary:
- LLVM clang-format style doesn't allow one-line ifs.
- LLVM clang-tidy style says method names should start with a lowercase
letter. But currently WebAssemblyAsmParser's parent class
MCTargetAsmParser is mixing lowercase and uppercase method names
itself so overridden methods cannot be renamed now.
- Changed else ifs after returns to ifs.
- Added some newlines for readability.
Reviewers: aardappel, sbc100
Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55350
llvm-svn: 348648
To make X86CondBrFoldingPass can be run with --run-pass option, this can test one wrong assertion on analyzeCompare function for SUB32ri when its operand is not imm
Patch by Jianping Chen
Differential Revision: https://reviews.llvm.org/D55412
llvm-svn: 348620
This patch attempts to improve pfm perf counter coverage for all the x86 CPUs that libpfm4 supports.
Intel/AMD CPU families tend to share names for cycle/uops counters so even if they don't have a scheduler model yet they can at least use the default values (checked against the libpfm4 source code).
The remaining CPUs (where their port/pipe resource counters are known) I've tried to add to the existing model mappings.
These are untested but don't represent a regression to current llvm-exegesis behaviour for these CPUs.
Differential Revision: https://reviews.llvm.org/D55432
llvm-svn: 348617
This change attempts to shrink scalar AND, OR and XOR instructions which take an immediate that isn't inlineable.
It performs:
AND s0, s0, ~(1 << n) -> BITSET0 s0, n
OR s0, s0, (1 << n) -> BITSET1 s0, n
AND s0, s1, x -> ANDN2 s0, s1, ~x
OR s0, s1, x -> ORN2 s0, s1, ~x
XOR s0, s1, x -> XNOR s0, s1, ~x
In particular, this catches setting and clearing the sign bit for fabs (and x, 0x7ffffffff -> bitset0 x, 31 and or x, 0x80000000 -> bitset1 x, 31).
llvm-svn: 348601
When we had dynamic call frames (i.e. sp adjustment around each call) we
were including that adjustment into offsets calculated based on r6, even
though it's only sp that changes. This led to incorrect stack slot
accesses.
llvm-svn: 348591
Adds fatal errors for any target that does not support the Tiny or Kernel
codemodels by rejigging the getEffectiveCodeModel calls.
Differential Revision: https://reviews.llvm.org/D50141
llvm-svn: 348585
Fix assert about using an undefined physical register in machine instruction verify pass.
The reason is that register flag undef is missing when doing transformation from If Conversion Pass.
```
Bad machine code: Using an undefined physical register
- function: func_65
- basic block: %bb.0 entry (0x10024740738)
- instruction: BCLR killed $cr5lt, implicit $lr8, implicit $rm, implicit undef $x3
- operand 0: killed $cr5lt
LLVM ERROR: Found 1 machine code errors.
```
There are also other existing testcases with same issue. So I add -verify-machineinstrs option to open verifying.
Differential Revision: https://reviews.llvm.org/D55408
llvm-svn: 348566
This addresses a FIXME and avoids depending on an isel pattern match I think. I've remove the isel patterns too since he have no lit tests left that cover them. Hopefully that really means they are unused.
I'm trying to decide if we need SETCC_CARRY. This removes one of its usages.
Differential Revision: https://reviews.llvm.org/D55355
llvm-svn: 348536
Initial step towards making the function more generic (and probably move into SelectionDAG).
This is necessary to avoid massive codegen bloat for PR38243 (Add modulo rotate support to LowerRotate).
llvm-svn: 348498
Summary:
If the output of debug directives only is requested, we should drop
emission of ',debug' option from the target directive. Required for
supporting of nvprof profiler.
Reviewers: echristo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46061
llvm-svn: 348497
Summary:
We may end up with not emitted debug directives at the end of the module
emission. Patch fixes this problem emitting those last directives the
end of the module emission.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D54320
llvm-svn: 348495
This patch splits backend features currently
hidden behind architecture versions.
For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.
This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html
Reviewers: DavidSpickett, olista01, t.p.northover
Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio
Differential revision: https://reviews.llvm.org/D54633
--
It was reverted in rL348249 due a build bot failure in one of the
regression tests:
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/14386
The problem seems to be that FileCheck behaves
different in windows and linux. This new patch
splits the test file in multiple,
and does more exact pattern matching attempting
to circumvent the issue.
llvm-svn: 348493
...yet!
A lot of the current code should be shared for arm and thumb mode, but
until we add tests and work out some of the details (e.g. checking the
correct subtarget feature for G_SDIV) it's safer to bail out as early as
possible for thumb targets.
This should have arguably been part of r348347, which allowed Thumb
functions to be handled by the IR Translator.
llvm-svn: 348472
The code emitting AND-subtrees used to check whether any of the operands
was an OR in order to figure out if the result needs to be negated.
However the OR could be hidden in further subtrees and not immediately
visible.
Change the code so that canEmitConjunction() determines whether the
result of the generated subtree needs to be negated. Cleanup emission
logic to use this. I also changed the code a bit to make all negation
decisions early before we actually emit the subtrees.
This fixes http://llvm.org/PR39550
Differential Revision: https://reviews.llvm.org/D54137
llvm-svn: 348444
https://reviews.llvm.org/D54980
This provides a standard API across GISel passes to observe and notify
passes about changes (insertions/deletions/mutations) to MachineInstrs.
This patch also removes the recordInsertion method in MachineIRBuilder
and instead provides method to setObserver.
Reviewed by: vkeles.
llvm-svn: 348406
Whenever we effectively take the address of a basic block we need to
manually update that basic block to reflect that fact or later passes
such as tail duplication and tail merging can break the invariants of
the code. =/ Sadly, there doesn't appear to be any good way of
automating this or even writing a reasonable assert to catch it early.
The change seems trivially and obviously correct, but sadly the only
really good test case I have is 1000s of basic blocks. I've tried
directly writing a test case that happens to make tail duplication do
something that crashes later on, but this appears to require an
*amazingly* complex set of conditions that I've not yet reproduced.
The change is technically covered by the tests because we mark the
blocks as having their address taken, but that doesn't really count as
properly testing the functionality.
llvm-svn: 348374
Prep work for PR38243 - mainly adding comments on where we need to add modulo support (doing so at the moment causes massive codegen regressions).
I've also consistently added support for modulo folding for uniform constants (although at the moment we have no way to trigger this) and removed the old assertions.
llvm-svn: 348366
This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions.
Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code.
Differential Revision: https://reviews.llvm.org/D54698
llvm-svn: 348353
Functions annotated with `__fastcall` or `__attribute__((__fastcall__))`
or `__attribute__((__swiftcall__))` may contain SEH handlers even on
Win64. This matches the behaviour of cl which allows for
`__try`/`__except` inside a `__fastcall` function. This was detected
while trying to self-host clang on Windows ARM64.
llvm-svn: 348337
We previously disabled this in r323371 because of a bug where we selected an
extending load, but didn't delete the old G_LOAD, resulting in two loads being
generated for volatile loads.
Since we now have dedicated G_SEXTLOAD/G_ZEXTLOAD operations, and that the
tablegen patterns should no longer be able to select (ext(load x)) patterns, it
should be safe to re-enable it.
The old test case should still work as expected.
llvm-svn: 348320
PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction.
The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment.
The X87 cases don't need the strict flag but generates much nicer code with it....
Differential Revision: https://reviews.llvm.org/D53794
llvm-svn: 348251
We only needed this because it provided really aggressive constant folding even through constant pool entries created from build_vectors. The main case was for vXi8 MULH legalization which was happening as part of legalize DAG instead of as part of legalize vector ops. Now its part of vector op legalization and we've added special handling for build vectors of all constants there. This has removed the need for this code on the list tests we have.
llvm-svn: 348237
The comment was misplaced, and the code didn't do what the comment indicated,
namely ignoring the varargs portion when computing the local stack size of a
funclet in emitEpilogue. This results in incorrect offset computations within
funclets that are contained in vararg functions.
Differential Revision: https://reviews.llvm.org/D55096
llvm-svn: 348222
This moves the stack check logic into a lambda within getOutliningCandidateInfo.
This allows us to be less conservative with stack checks. Whether or not a
stack instruction is safe to outline is dependent on the frame variant and call
variant of the outlined function; only in cases where we modify the stack can
these be unsafe.
So, if we move that logic later, when we're looking at an individual candidate,
we can make better decisions here.
This gives some code size savings as a result.
llvm-svn: 348220
If we dropped too many candidates to be beneficial when dropping candidates
that modify the stack, there's no reason to check for other cost model
qualities.
llvm-svn: 348219
This is the smallest vector enhancement I could find to D54640.
Here, we're allowing narrowing to only legal vector ops because we'll see
regressions without that. All of the test diffs are wins from what I can tell.
With AVX/AVX512, we can shrink ymm/zmm ops to xmm.
x86 vector multiplies are the problem case that we're avoiding due to the
patchwork ISA, and it's not clear to me if we can dance around those
regressions using TLI hooks or if we need preliminary patches to plug those
holes.
Differential Revision: https://reviews.llvm.org/D55126
llvm-svn: 348195
The `DIEExpr` is used in debug information entries for either TLS variables
or call sites. For now the last case is unsupported for targets with delay
slots, for MIPS in particular.
The `DIEExpr::EmitValue` method calls a virtual `EmitDebugThreadLocal`
routine which, in case of MIPS, always emits either `.dtprelword` or
`.dtpreldword` directives. That is okay for "main" code, but in unit
tests `DIEExpr` instances can be created not for TLS variables only even
on MIPS hosts. That is a reason of the `TestDWARF32Version5Addr8AllForms`
failure because handling of the `R_MIPS_TLS_DTPREL` relocation writes
incorrect value into dwarf structures. And anyway unconditional emitting
of `.dtprelword` directives will be incorrect when/if debug information
entries for call sites become supported on MIPS.
The patch solves the problem by wrapping expression created in the
`MipsTargetObjectFile::getDebugThreadLocalSymbol` method in to the
`MipsMCExpr` expression with a new `MEK_DTPREL` tag. This tag is
recognized in the `MipsAsmPrinter::EmitDebugThreadLocal` method and
`.dtprelword` directives created in this case only. In other cases the
expression saved as a regular data.
Differential Revision: http://reviews.llvm.org/D54937
llvm-svn: 348194
Summary:
The assembler processes directives and instructions in whatever order
they are in the file, then directly emits them to the streamer. This
could cause badly written (or generated) .s files to produce
incorrect binaries.
It now has state that tracks what it has most recently seen, to
enforce they are emitted in a given order that always produces
correct wasm binaries.
Also added a new test that compares obj2yaml output from llc (the
backend) to that going via .s and the assembler to ensure both paths
generate the same binaries.
The features this test covers could be extended.
Passes all wasm Lit tests.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=39557
Reviewers: sbc100, dschuff, aheejin
Subscribers: jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55149
llvm-svn: 348185
If it's a bigger code size win to drop candidates that require stack fixups
than to demote every candidate to that variant, the outliner should do that.
This happens if the number of bytes taken by calls to functions that don't
require fixups, plus the number of bytes that'd be left is less than the
number of bytes that it'd take to emit a save + restore for all candidates.
Also add tests for each possible new behaviour.
- machine-outliner-compatible-candidates shows that when we have candidates
that don't use the stack, we can use the default call variant along with the
no save/regsave variant.
- machine-outliner-all-stack shows that when it's better to fix up the stack,
we still will demote all candidates to that case
- machine-outliner-drop-stack shows that we can discard candidates that
require stack fixups when it would be beneficial to do so.
llvm-svn: 348168
Summary:
We need to unpackl and unpackh the operands to use two vXi16 multiplies. Previously it looks like the low unpack would get constant folded at least in the 128-bit case after shuffle lowering turned the unpackl into ZERO_EXTEND_VECTOR_INREG and X86 custom DAG combined it. The same doesn't happen for the high half. So we'd load a constant and then shuffle it. But the low half would just be loaded and used by the multiply directly.
After this patch we now end up with a constant pool entry for the low and high unpacks separately with no shuffle operations.
This is a step towards removing custom constant folding for ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG in the X86 backend.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55165
llvm-svn: 348159
Summary:
Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction.
The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54836
llvm-svn: 348158
A loaded value with multiple users compared with 0 will become a load and
test single instruction. The load is not folded in this case (multiple
users), but the compare instruction is eliminated.
This patch returns 0 cost for the icmp in these cases.
Review: Ulrich Weigand
https://reviews.llvm.org/D55111
llvm-svn: 348141
Summary:
SSBS (Speculative Store Bypass Safe) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SSBS, as it was previously only possible to
enable by selecting -march=armv8.5-a.
Similar patch upstream in GNU binutils:
https://sourceware.org/ml/binutils/2018-09/msg00274.html
Reviewers: olista01, samparker, aemerson
Reviewed By: samparker
Subscribers: javed.absar, kristof.beyls, kristina, llvm-commits
Differential Revision: https://reviews.llvm.org/D54629
llvm-svn: 348137
The introduction of S_{ADD|SUB}_U64_PSEUDO instructions which are decomposed
into VOP3 instruction pairs for S_ADD_U64_PSEUDO:
V_ADD_I32_e64
V_ADDC_U32_e64
and for S_SUB_U64_PSEUDO
V_SUB_I32_e64
V_SUBB_U32_e64
preclude the use of SDWA to encode a constant.
SDWA: Sub-Dword addressing is supported on VOP1 and VOP2 instructions,
but not on VOP3 instructions.
We desire to fold the bit-and operand into the instruction encoding
for the V_ADD_I32 instruction. This requires that we transform the
VOP3 into a VOP2 form of the instruction (_e32).
%19:vgpr_32 = V_AND_B32_e32 255,
killed %16:vgpr_32, implicit $exec
%47:vgpr_32, %49:sreg_64_xexec = V_ADD_I32_e64
%26.sub0:vreg_64, %19:vgpr_32, implicit $exec
%48:vgpr_32, dead %50:sreg_64_xexec = V_ADDC_U32_e64
%26.sub1:vreg_64, %54:vgpr_32, killed %49:sreg_64_xexec, implicit $exec
which then allows the SDWA encoding and becomes
%47:vgpr_32 = V_ADD_I32_sdwa
0, %26.sub0:vreg_64, 0, killed %16:vgpr_32, 0, 6, 0, 6, 0,
implicit-def $vcc, implicit $exec
%48:vgpr_32 = V_ADDC_U32_e32
0, %26.sub1:vreg_64, implicit-def $vcc, implicit $vcc, implicit $exec
Differential Revision: https://reviews.llvm.org/D54882
llvm-svn: 348132
This has two positive effects. First, using a custom node prevents
recombination leading to an infinite loop since the output DAG is notionally a
little more complex than the input one. Using a flag-setting instruction also
allows the subtraction to be folded with the related comparison more easily.
https://reviews.llvm.org/D53190
llvm-svn: 348122
This patch splits backend features currently
hidden behind architecture versions.
For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.
This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html
Reviewers: DavidSpickett, olista01, t.p.northover
Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio
Differential revision: https://reviews.llvm.org/D54633
llvm-svn: 348121
Currently, variadic operands on an MCInst are assumed to be uses,
because they come after the defs. However, this is not always the case,
for example the Arm/Thumb LDM instructions write to a variable number of
registers.
This adds a property of instruction definitions which can be used to
mark variadic operands as defs. This only affects MCInst, because
MachineInstruction already tracks use/def per operand in each instance
of the instruction, so can already represent this.
This property can then be checked in MCInstrDesc, allowing us to remove
some special cases in ARMAsmParser::isITBlockTerminator.
Differential revision: https://reviews.llvm.org/D54853
llvm-svn: 348114
In the Arm assembly parser, we first match an instruction, then call
processInstruction to possibly change it to a different encoding, to
match rules in the architecture manual which can't be expressed by the
table-generated matcher.
This adds debug printing so that this process is visible when using the
-debug option.
To support this, I've added a new overload of MCInst::dump_pretty which
takes the opcode name as a StringRef, since we don't have an InstPrinter
instance in the assembly parser. Instead, we can get the same
information directly from the MCInstrInfo.
Differential revision: https://reviews.llvm.org/D54852
llvm-svn: 348113
Summary:
There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the
function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD.
These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D54738
llvm-svn: 348109
In theory, we should let the PPC target to determine how to lower the TOC Entry for globals.
And the PPCTargetLowering requires this query to do some optimization for TOC_Entry.
Differential Revision: https://reviews.llvm.org/D54925
llvm-svn: 348108
Previously this code generated its own extracts and build_vector. But we can use a simpler concat_vectors or scalar_to_vector operation and let type legalization do additional legalization of those operations.
llvm-svn: 348087
The generic legalizer will fall back to a stack spill that uses a truncating store. That store will get expanded into a shuffle and non-truncating store on pre-avx512 targets. Once that happens the stack store/load pair will be combined away leaving behind the shuffle and bitcasts. On avx512 targets the truncating store is legal so doesn't get folded away.
By custom legalizing it we can avoid this churn and maybe produce better code.
llvm-svn: 348085
If we know that we'll definitely save LR to a register, there's no reason to
pre-check whether or not a stack instruction is unsafe to fix up.
This makes it so that we check for that condition before mapping instructions.
This allows us to outline more, since we don't pessimise as many instructions.
Also update some tests, since we outline more.
llvm-svn: 348081
Summary: With sse4.1 we use two zero_extend_vector_inreg and a pshufd to expand the v16i8 input into two v8i16 vectors for the multiply. That's 3 shuffles to extend one operand. The other operand is usually constant as this is mostly used by division by constant optimization. Pre sse4.1 we use a punpckhbw and a punpcklbw with a zero vector. That's two shuffles and an xor and a copy due to tied register constraints. That seems maybe better than the 3 shuffles. With AVX we avoid the copy so that's obviously better.
Reviewers: spatel, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55138
llvm-svn: 348079
The identity ~(x ^ y) == (~x ^ y) == (x ^ ~y) allows XNOR (XOR/NOT) to turn into NOT/XOR. Handling this case with its own split means we can make the NOT remain in the scalar unit. Previously, we split 64-bit XNOR into two 32-bit XNOR, then lowered. Now, we get three instructions (s_not, v_xor, v_xor) rather than four in the case where either of the sources is a scalar 64-bit.
Add test cases to xnor.ll to attempt XNOR Vx, Sy and XNOR Sx, Vy. Also adding test that uses the opposite identity such that (~x ^ y) on the scalar unit (or vector for gfx906) can generate XNOR. This already worked, but I didn't see a test for it.
Differential: https://reviews.llvm.org/D55071
llvm-svn: 348075
As noted by Eli Friedman <https://reviews.llvm.org/D52977?id=168629#1315291>,
the RV64I shift patterns for SLLW/SRLW/SRAW make some incorrect assumptions.
SRAW assumed that (sext_inreg foo, i32) could only be produced when
sign-extended an i32. However, it can be produced by input such as:
define i64 @tricky_ashr(i64 %a, i64 %b) {
%1 = shl i64 %a, 32
%2 = ashr i64 %1, 32
%3 = ashr i64 %2, %b
ret i64 %3
}
It's important not to select sraw in the above case, because sraw only uses
bits lower 5 bits from the shift, while a shift of 32-63 would be valid.
Similarly, the patterns for srlw assumed (and foo, 0xffffffff) would only be
produced when zero-extending a value that was originally i32 in LLVM IR. This
is obviously incorrect.
This patch removes the SLLW/SRLW/SRAW shift patterns for the time being and
adds test cases that would demonstrate a miscompile if the incorrect patterns
were re-added.
llvm-svn: 348067
Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar load intrinsics directly
to either VMEM or SMRD buffer loads based on divergence analysis.
If an offset happens to end up in a VGPR -- either because a floating
point calculation was involved, or due to other remaining deficiencies
in SIFixSGPRCopies -- we use v_readfirstlane.
There is some unrelated churn in tests since we now select MUBUF offsets
in a unified way with non-scalar buffer loads.
Change-Id: I170e6816323beb1348677b358c9d380865cd1a19
Reviewers: arsenm, alex-t, rampitec, tpr
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D53283
llvm-svn: 348050
Summary:
The VirtReg2Value mapping is crucial for getting consistently
reliable divergence information into the SelectionDAG. This
patch fixes a bunch of issues that lead to incorrect divergence
info and introduces tight assertions to ensure we don't regress:
1. VirtReg2Value is generated lazily; there were some cases where
a lookup was performed before all relevant virtual registers were
created, leading to an out-of-sync mapping. Those cases were:
- Complex code to lower formal arguments that generated CopyFromReg
nodes from live-in registers (fixed by never querying the mapping
for live-in registers).
- Code that generates CopyToReg for formal arguments that are used
outside the entry basic block (fixed by never querying the
mapping for Register nodes, which don't need the divergence info
anyway).
2. For complex values that are lowered to a sequence of registers,
all registers must be reflected in the VirtReg2Value mapping.
I am not adding any new tests, since I'm not actually aware of any
bugs that these problems are causing with trunk as-is. However,
I recently added a test case (in r346423) which fails when D53283 is
applied without this change. Also, the new assertions should provide
most of the effective test coverage.
There is one test change in sdwa-peephole.ll. The underlying issue
is that since the divergence info is now correct, the DAGISel will
select V_OR_B32 directly instead of S_OR_B32. This leads to an extra
COPY which affects the behavior of MachineLICM in a way that ends up
with the S_MOV_B32 with the constant in a different basic block than
the V_OR_B32, which is presumably what defeats the peephole.
Reviewers: alex-t, arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D54340
llvm-svn: 348049
Instead of treating the outlined functions for these as distinct frames, they
should be combined into one case. Neither allows for stack fixups, and both
generate the same frame. Thus, they ought to be considered one case.
This makes the code far easier to understand, for one thing. It also offers
some small code size improvements. It's fairly rare to see a class of outlined
functions that doesn't fall entirely into one variant (on CTMark anyway). It
does happen from time to time though.
This mostly offers some serious simplification.
Also update the test to show the added functionality.
llvm-svn: 348036
All that you can legitimately do with the CFI for a nounwind function
is get a backtrace, and adjusting the SCS register is not (currently)
required for this purpose.
Differential Revision: https://reviews.llvm.org/D54988
llvm-svn: 348035
This reduces the number of shuffle operations that need to be done. The splitting strategy requires the shuffle unit for the extraction and the extension. With the unpack strategy the unpacks accomplish a splitting and extending in one operation.
llvm-svn: 348019
This does require a constant pool load instead of loading an immediate into a gpr, moving to a k register and masking. But its less instructions and more consistent with previous ISAs. It probably opens up more combine opportunities as one of the test cases demonstrates.
llvm-svn: 348018
Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses.
Differential revision: https://reviews.llvm.org/D53762
llvm-svn: 347993
This patch adds CSR instructions aliases for the cases where the instruction
takes an immediate operand but the alias doesn't have the i suffix. This is
necessary for gas/gcc compatibility.
gas doesn't do a similar conversion for fsflags or fsrm, so this should be
complete.
Differential Revision: https://reviews.llvm.org/D55008
Patch by Luís Marques.
llvm-svn: 347991
This patch adds support for UNIMP in both 32- and 16-bit forms. The 32-bit
form can be seen as a variant of the ECALL/EBREAK/etc. family of instructions.
The 16-bit form is just all zeroes, which isn't a valid RISC-V instruction,
but still follows the 16-bit instruction form (i.e. bits 0-1 != 11).
Until recently unimp was undocumented and supported just by binutils, which
printed unimp for either the 16 or 32-bit form. Both forms are now documented
<https://github.com/riscv/riscv-asm-manual/pull/20> and binutils now supports
c.unimp <https://sourceware.org/ml/binutils-cvs/2018-11/msg00179.html>.
Differential Revision: https://reviews.llvm.org/D54316
Patch by Luís Marques.
llvm-svn: 347988
DAGTypeLegalizer::PromoteSetCCOperands currently prefers to zero-extend
operands when it is able to do so. For some targets this is more expensive
than a sign-extension, which is also a valid choice. Introduce the
isSExtCheaperThanZExt hook and use it in the new SExtOrZExtPromotedInteger
helper. On RISC-V, we prefer sign-extension for FromTy == MVT::i32 and ToTy ==
MVT::i64, as it can be performed using a single instruction.
Differential Revision: https://reviews.llvm.org/D52978
llvm-svn: 347977
As discussed in the RFC
<http://lists.llvm.org/pipermail/llvm-dev/2018-October/126690.html>, 64-bit
RISC-V has i64 as the only legal integer type. This patch introduces patterns
to support codegen of the new instructions
introduced in RV64I: addiw, addiw, subw, sllw, slliw, srlw, srliw, sraw,
sraiw, ld, sd.
Custom selection code is needed for srliw as SimplifyDemandedBits will remove
lower bits from the mask, meaning the obvious pattern won't work:
def : Pat<(sext_inreg (srl (and GPR:$rs1, 0xffffffff), uimm5:$shamt), i32),
(SRLIW GPR:$rs1, uimm5:$shamt)>;
This is sufficient to compile and execute all of the GCC torture suite for
RV64I other than those files using frameaddr or returnaddr intrinsics
(LegalizeDAG doesn't know how to promote the operands - a future patch
addresses this).
When promoting i32 sltu/sltiu operands, it would be more efficient to use
sign-extension rather than zero-extension for RV64. A future patch adds a hook
to allow this.
Differential Revision: https://reviews.llvm.org/D52977
llvm-svn: 347973
Previously we emitted a punpcklbw/punpckhbw to move the byte elements into the upper half of 16 bit elements then shifted right by 8 to zero the upper bits. After DAG combine we end up with punpcklbw/punpckhbw into the lower bits with zeros in the uppers bits and no shifts. So just emit that directly.
llvm-svn: 347966
Don't expand SDIV with an immediate that is a power of 2 if we optimise for
minimum code size. For example:
sdiv %1, i32 4
gets expanded to a sequence of 3 instructions, but this is suboptimal for
minimum code size so instead we just generate a MOV and a SDIV if integer
division is supported.
Differential Revision: https://reviews.llvm.org/D54546
llvm-svn: 347965
Three minor changes to these extra costs:
* For ICmp instructions, instead of adding 2 all the time for extending each
operand, this is only done if that operand is neither a load or an
immediate.
* The operands extension costs for divides removed, because we now use a high
cost already for the divide (20).
* The costs for lhsr/ashr extra costs removed as this did not seem useful.
Review: Ulrich Weigand
https://reviews.llvm.org/D55053
llvm-svn: 347961
We had a EVT variable capturing the result of getSimpleValueType which returns an MVT. Another place using EVT that could have been MVT. And an 'int' that should be 'unsigned'.
llvm-svn: 347959
Summary:
Suppressed warnings in release builds due to variable used
only in assert statement.
Subscribers: llvm-commits, eraman, mgorny
Differential Revision: https://reviews.llvm.org/D55100
llvm-svn: 347939
Summary:
Expands for vector types all of the integer operations that are
expanded for scalars because they are not supported at all by
WebAssembly.
This CL has no tests because such tests would really be testing the
target-independent expansion, but I'm happy to add tests if reviewers
think it would be helpful.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D55010
llvm-svn: 347923
Scattered ARM relocations for Mach-O's only have 24 bits available to
encode the offset. This is not checked but just truncated and can result
in corrupt binaries after linking because the relocations are applied to
the wrong offset. This patch will check and error out in those
situations instead of emitting a wrong relocation.
Patch by: Sander Bogaert (dzn)
Differential revision: https://reviews.llvm.org/D54776
llvm-svn: 347922
Utilise a similar ('late') lowering strategy to D47882. The changes to
AtomicExpandPass allow this strategy to be utilised by other targets which
implement shouldExpandAtomicCmpXchgInIR.
All cmpxchg are lowered as 'strong' currently and failure ordering is ignored.
This is conservative but correct.
Differential Revision: https://reviews.llvm.org/D48131
llvm-svn: 347914
Also revert fix r347876
One of the buildbots was reporting a failure in some relevant tests that I can't
repro or explain at present, so reverting until I can isolate.
llvm-svn: 347911
It makes more sense to order FI-based memops in descending order when
the stack goes down. This allows offsets to stay "consecutive" and allow
easier pattern matching.
llvm-svn: 347906
I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG.
I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse.
Differential Revision: https://reviews.llvm.org/D54276
llvm-svn: 347902
This is another patch for -x86-experimental-vector-widening. This pre widens narrow division by constants so that we can get pass the legal type check in the generic DAG combiner. Otherwise we end up scalarizing.
I've restricted this to splats for now because it was easy to just call DAG.getConstant. Not sure what we should do for non-splat? Increase the element size?Widen the constant vector by padding with 1?
Differential Revision: https://reviews.llvm.org/D54919
llvm-svn: 347898
This patch adds support for S_ANDN2, S_ORN2 32-bit and 64-bit instructions and adds splits to move them to the vector unit (for which there is no equivalent instruction). It modifies the way that the more complex scalar instructions are lowered to vector instructions by first breaking them down to sequences of simpler scalar instructions which are then lowered through the existing code paths. The pattern for S_XNOR has also been updated to apply inversion to one input rather than the output of the XOR as the result is equivalent and may allow leaving the NOT instruction on the scalar unit.
A new tests for NAND, NOR, ANDN2 and ORN2 have been added, and existing tests now hit the new instructions (and have been modified accordingly).
Differential: https://reviews.llvm.org/D54714
llvm-svn: 347877
My change svn-id: 347871 caused a buildbot failure due to an unused
variable def (used in an assert).
Change-Id: Ia882d18bb6fa79b4d7bbfda422b9ea5d23eab336
llvm-svn: 347876
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.
This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.
This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).
There's an additional fix now to avoid a dmask=0
For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.
Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.
The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:
%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1
Differential revision: https://reviews.llvm.org/D48826
Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda
llvm-svn: 347871
It causes asserts building BoringSSL. See https://crbug.com/91009#c3 for
repro.
This also reverts the follow-ups:
Revert r347724 "Do not insert prefetches with unsupported memory operands."
Revert r347606 "[X86] Add dependency from X86 to ProfileData after rL347596"
Revert r347607 "Add new passes to X86 pipeline tests"
llvm-svn: 347864
Change meaning of TargetOptions::EnableGlobalISel. The flag was
previously set only when a target switched on GlobalISel but it is now
always set when the GlobalISel pipeline is enabled. This makes the flag
consistent with TargetOptions::EnableFastISel and allows its use in
other parts of the compiler to determine when GlobalISel is enabled.
The EnableGlobalISel flag had previouly only one use in
TargetPassConfig::isGlobalISelAbortEnabled(). The method used its value
to determine if GlobalISel was enabled by a target and returned false in
such a case. To preserve the current behaviour, a new flag
TargetOptions::GlobalISelAbort is introduced to separately record the
abort behaviour.
Differential Revision: https://reviews.llvm.org/D54518
llvm-svn: 347861
This patch adds the ability to specify via tablegen which processor resources
are load/store queue resources.
A new tablegen class named MemoryQueue can be optionally used to mark resources
that model load/store queues. Information about the load/store queue is
collected at 'CodeGenSchedule' stage, and analyzed by the 'SubtargetEmitter' to
initialize two new fields in struct MCExtraProcessorInfo named `LoadQueueID` and
`StoreQueueID`. Those two fields are identifiers for buffered resources used to
describe the load queue and the store queue.
Field `BufferSize` is interpreted as the number of entries in the queue, while
the number of units is a throughput indicator (i.e. number of available pickers
for loads/stores).
At construction time, LSUnit in llvm-mca checks for the presence of extra
processor information (i.e. MCExtraProcessorInfo) in the scheduling model. If
that information is available, and fields LoadQueueID and StoreQueueID are set
to a value different than zero (i.e. the invalid processor resource index), then
LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value
declared by the two processor resources.
With this patch, we more accurately track dynamic dispatch stalls caused by the
lack of LS tokens (i.e. load/store queue full). This is also shown by the
differences in two BdVer2 tests. Stalls that were previously classified as
generic SCHEDULER FULL stalls, are not correctly classified either as "load
queue full" or "store queue full".
About the differences in the -scheduler-stats view: those differences are
expected, because entries in the load/store queue are not released at
instruction issue stage. Instead, those are released at instruction executed
stage. This is the main reason why for the modified tests, the load/store
queues gets full before PdEx is full.
Differential Revision: https://reviews.llvm.org/D54957
llvm-svn: 347857
Summary:
MachineLoopInfo cannot be relied on for correctness, because it cannot
properly recognize loops in irreducible control flow which can be
introduced by late machine basic block optimization passes. See the new
test case for the reduced form of an example that occurred in practice.
Use a simple fixpoint iteration instead.
In order to facilitate this change, refactor WaitcntBrackets so that it
only tracks pending events and registers, rather than also maintaining
state that is relevant for the high-level algorithm. Various accessor
methods can be removed or made private as a consequence.
Affects (in radv):
- dEQP-VK.glsl.loops.special.{for,while}_uniform_iterations.select_iteration_count_{fragment,vertex}
Fixes: r345719 ("AMDGPU: Rewrite SILowerI1Copies to always stay on SALU")
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam
Differential Revision: https://reviews.llvm.org/D54231
llvm-svn: 347853
Summary:
There is one obsolete reference to using -1 as an indication of "unknown",
but this isn't actually used anywhere.
Using unsigned makes robust wrapping checks easier.
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, llvm-commits, tpr, t-tye, hakzsam
Differential Revision: https://reviews.llvm.org/D54230
llvm-svn: 347852
Summary:
Instead of storing the "score" (last time point) of the various relevant
events, only store whether an event is pending or not.
This is sufficient, because whenever only one event of a count type is
pending, its last time point is naturally the upper bound of all time
points of this count type, and when multiple event types are pending,
the count type has gone out of order and an s_waitcnt to 0 is required
to clear any pending event type (and will then clear all pending event
types for that count type).
This also removes the special handling of GDS_GPR_LOCK and EXP_GPR_LOCK.
I do not understand what this special handling ever attempted to achieve.
It has existed ever since the original port from an internal code base,
so my best guess is that it solved a problem related to EXEC handling in
that internal code base.
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam
Differential Revision: https://reviews.llvm.org/D54228
llvm-svn: 347850
Summary:
It hides the type casting ugliness, and I happened to have to add a new
such loop (in a later patch).
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam
Differential Revision: https://reviews.llvm.org/D54227
llvm-svn: 347849
Summary:
Reduce the statefulness of the algorithm in two ways:
1. More clearly split generateWaitcntInstBefore into two phases: the
first one which determines the required wait, if any, without changing
the ScoreBrackets, and the second one which actually inserts the wait
and updates the brackets.
2. Communicate pre-existing s_waitcnt instructions using an argument to
generateWaitcntInstBefore instead of through the ScoreBrackets.
To simplify these changes, a Waitcnt structure is introduced which carries
the counts of an s_waitcnt instruction in decoded form.
There are some functional changes:
1. The FIXME for the VCCZ bug workaround was implemented: we only wait for
SMEM instructions as required instead of waiting on all counters.
2. We now properly track pre-existing waitcnt's in all cases, which leads
to less conservative waitcnts being emitted in some cases.
s_load_dword ...
s_waitcnt lgkmcnt(0) <-- pre-existing wait count
ds_read_b32 v0, ...
ds_read_b32 v1, ...
s_waitcnt lgkmcnt(0) <-- this is too conservative
use(v0)
more code
use(v1)
This increases code size a bit, but the reduced latency should still be a
win in basically all cases. The worst code size regressions in my shader-db
are:
WORST REGRESSIONS - Code Size
Before After Delta Percentage
1724 1736 12 0.70 % shaders/private/f1-2015/1334.shader_test [0]
2276 2284 8 0.35 % shaders/private/f1-2015/1306.shader_test [0]
4632 4640 8 0.17 % shaders/private/ue4_elemental/62.shader_test [0]
2376 2384 8 0.34 % shaders/private/f1-2015/1308.shader_test [0]
3284 3292 8 0.24 % shaders/private/talos_principle/1955.shader_test [0]
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam
Differential Revision: https://reviews.llvm.org/D54226
llvm-svn: 347848
Summary:
A signed comparison of i1 values produces the opposite result to an unsigned one if the condition code
includes less-than or greater-than. This is so because 1 is the most negative signed i1 number and the
most positive unsigned i1 number. The CR-logical operations used for such comparisons are non-commutative
so for signed comparisons vs. unsigned ones, the input operands just need to be swapped.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D54825
llvm-svn: 347831
This failed to select (which might be a separate bug) in
X86ISelDAGToDAG because we try to create a select node
that can be simplified away after rL347227.
This change avoids the problem by simplifying the SHRUNKBLEND
node sooner. In the test case, we manage to realize that the
true/false values of the select (SHRUNKBLEND) are the same thing,
so it simplifies away completely.
llvm-svn: 347818
Unlike most cost model functions this code makes a lot of table lookups without using the results from getTypeLegalizationCost. This means 512-bit vectors can be looked up even when the type isn't legal.
This patch adds a check around the two tables that contain 512-bit types to make sure that neither of the types would be split by type legalization. Meaning 512 bit types are illegal. I wanted to write this in a somewhat generic way that uses type legalization query hooks. But if prefered, I can switch to just using is512BitVector and the subtarget feature.
Differential Revision: https://reviews.llvm.org/D54984
llvm-svn: 347786
This fixes some of scalarization costs reported for sext/zext using avx512bw. This does not fix all scalarization costs being reported. Just the worst.
I've restricted this only to combinations of types that are legal with avx512bw like v32i1/v64i1/v32i16/v64i8 and conversions between vXi1 and vXi8/vXi16 with legal vXi8/vXi16 result types.
Differential Revision: https://reviews.llvm.org/D54979
llvm-svn: 347785
Expansion of SIGN_EXTEND_INREG can create a VSRAI instruction. If there is already a VSRAI after it, we should combine them into a larger VSRAI
Differential Revision: https://reviews.llvm.org/D54959
llvm-svn: 347784
This adds support in the RISCVAsmParser the storing of Subtarget feature bits to a stack so that they can be pushed/popped to enable/disable multiple features at once.
Differential Revision: https://reviews.llvm.org/D46424
Patch by Lewis Revill.
llvm-svn: 347774
Before this patch, the following stores in `merge_fail` would fail to be
merged, while they would get merged in `merge_ok`:
```
void use(unsigned long long *);
void merge_fail(unsigned key, unsigned index)
{
unsigned long long args[8];
args[0] = key;
args[1] = index;
use(args);
}
void merge_ok(unsigned long long *dst, unsigned a, unsigned b)
{
dst[0] = a;
dst[1] = b;
}
```
The reason is that `getMemOpBaseImmOfs` would return false for FI base
operands.
This adds support for this.
Differential Revision: https://reviews.llvm.org/D54847
llvm-svn: 347747
Currently, instructions doing memory accesses through a base operand that is
not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`.
This means that functions such as `TII::shouldClusterMemOps` will bail
out on instructions using an FI as a base instead of a register.
The goal of this patch is to refactor all this to return a base
operand instead of a base register.
Then in a separate patch, I will add FI support to the mem op clustering
in the MachineScheduler.
Differential Revision: https://reviews.llvm.org/D54846
llvm-svn: 347746
This reverts r294500. DwarfCompileUnit::addAddressExpr uses DIEExpr
for PCOffset. In that case the expression is unrelated to thread locals
and so emitting a value of the DIEExpr does not have to always mean
emit-debug-thread-local.
llvm-svn: 347744
CGF/CLGF compares an i64 register with a sign/zero extended loaded i32 value
in memory.
This patch makes such a load considered foldable and so gets a 0 cost.
Review: Ulrich Weigand
https://reviews.llvm.org/D54944
llvm-svn: 347735
AH, SH and MH costs are already covered in the cases where LHS is 32 bits and
RHS is 16 bits of memory sign-extended to i32.
As these instructions are also used when LHS is i16, this patch recognizes
that the loads will get folded then as well.
Review: Ulrich Weigand
https://reviews.llvm.org/D54940
llvm-svn: 347734
Single instructions exist for i8 and i16 comparisons of memory against a
small immediate.
This patch makes sure that if the load in these cases has a single user (the
ICmp), it gets a 0 cost (folded), and also that the ICmp gets a cost of 1.
Review: Ulrich Weigand
https://reviews.llvm.org/D54897
llvm-svn: 347733
Since byte-swapping loads and stores are supported, a 'load -> bswap' or
'bswap -> store' sequence should have the cost of one.
Review: Ulrich Weigand
https://reviews.llvm.org/D54870
llvm-svn: 347732
We're already mixing this APInt with other 'unsigned' variables. This allows us to use regular comparison operators instead of needing to use APInt::ult or APInt::uge. And it removes a later conversion from APInt to unsigned.
I might be adding another combine to this function and this will probably simplify the logic required for that.
llvm-svn: 347684
This is skylake-avx512 with the addition of avx512vnni ISA.
Patch by Jianping Chen
Differential Revision: https://reviews.llvm.org/D54785
llvm-svn: 347681
If we fold the bitcast into the store we'll end up creating a truncating store to vXi1 that will get scalarized. Instead allow the bitcast to be turned into a movmsk.
We probably need to do something if the store itself is a vXi1 type, but I'll leave that til a testcase appears.
llvm-svn: 347632
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, `AArch64InstrInfo::hasExtendedReg()`.
Differential revision: https://reviews.llvm.org/D54822
llvm-svn: 347599
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, `AArch64InstrInfo::hasShiftedReg()`.
Differential revision: https://reviews.llvm.org/D54820
llvm-svn: 347598
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, `AArch64InstrInfo::isScaledAddr()`
Differential revision: https://reviews.llvm.org/D54777
llvm-svn: 347597
Summary:
Support for profile-driven cache prefetching (X86)
This change is part of a larger system, consisting of a cache prefetches recommender, create_llvm_prof (https://github.com/google/autofdo), and LLVM.
A proof of concept recommender is DynamoRIO's cache miss analyzer. It processes memory access traces obtained from a running binary and identifies patterns in cache misses. Based on them, it produces a csv file with recommendations. The expectation is that, by leveraging such recommendations, we can reduce the amount of clock cycles spent waiting for data from memory. A microbenchmark based on the DynamoRIO analyzer is available as a proof of concept: https://goo.gl/6TM2Xp.
The recommender makes prefetch recommendations in terms of:
* the binary offset of an instruction with a memory operand;
* a delta;
* and a type (nta, t0, t1, t2)
meaning: a prefetch of that type should be inserted right before the instrution at that binary offset, and the prefetch should be for an address delta away from the memory address the instruction will access.
For example:
0x400ab2,64,nta
and assuming the instruction at 0x400ab2 is:
movzbl (%rbx,%rdx,1),%edx
means that the recommender determined it would be beneficial for a prefetchnta instruction to be inserted right before this instruction, as such:
prefetchnta 0x40(%rbx,%rdx,1)
movzbl (%rbx, %rdx, 1), %edx
The workflow for prefetch cache instrumentation is as follows (the proof of concept script details these steps as well):
1. build binary, making sure -gmlt -fdebug-info-for-profiling is passed. The latter option will enable the X86DiscriminateMemOps pass, which ensures instructions with memory operands are uniquely identifiable (this causes ~2% size increase in total binary size due to the additional debug information).
2. collect memory traces, run analysis to obtain recommendations (see above-referenced DynamoRIO demo as a proof of concept).
3. use create_llvm_prof to convert recommendations to reference insertion locations in terms of debug info locations.
4. rebuild binary, using the exact same set of arguments used initially, to which -mllvm -prefetch-hints-file=<file> needs to be added, using the afdo file obtained at step 3.
Note that if sample profiling feedback-driven optimization is also desired, that happens before step 1 above. In this case, the sample profile afdo file that was used to produce the binary at step 1 must also be included in step 4.
The data needed by the compiler in order to identify prefetch insertion points is very similar to what is needed for sample profiles. For this reason, and given that the overall approach (memory tracing-based cache recommendation mechanisms) is under active development, we use the afdo format as a syntax for capturing this information. We avoid confusing semantics with sample profile afdo data by feeding the two types of information to the compiler through separate files and compiler flags. Should the approach prove successful, we can investigate improvements to this encoding mechanism.
Reviewers: davidxl, wmi, craig.topper
Reviewed By: davidxl, wmi, craig.topper
Subscribers: davide, danielcdh, mgorny, aprantl, eraman, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D54052
llvm-svn: 347596
It's possible in some cases to have a restore present
without a corresponding spill. Due to an apparent bug
in D54366 <https://reviews.llvm.org/D54366>, only the
restore for a register was emitted. It's probably
always a bug for this to happen, but due to how SGPR
spilling is implemented, this makes the issues appear
worse than it is.
llvm-svn: 347595
SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512.
This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral.
Differential Revision: https://reviews.llvm.org/D54906
llvm-svn: 347593
Summary:
Add a hook to the GCMetadataPrinter for emitting stack maps in
custom format. The hook will be called at stack map generation
time. The default stack map format is used if there is no hook.
For this to be useful a few data structures and accessors are
exposed from the StackMaps class, so the custom printer can
access the stack map data.
This patch authored by Cherry Zhang <cherryyz@google.com>.
Reviewers: thanm, apilipenko, reames
Reviewed By: reames
Subscribers: reames, apilipenko, nemanjai, javed.absar, kbarton, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D53892
llvm-svn: 347584
We have these 2 "isDesirable" promotion hooks (I'm not sure why we need both of them, but that's
independent of this patch), and we can adjust them to promote "mul i8 X, C" to i32. Then, all of
our existing LEA and other multiply expansion magic happens as it would for i32 ops.
Some of the test diffs show that we could end up with an actual 32-bit mul instruction here
because we choose not to expand to simpler ops. That instruction could be slower depending on the
subtarget. On the plus side, this means we don't need a separate instruction to load the constant
operand and possibly an extra instruction to move the result. If we need to tune mul i32 further,
we could add a later transform that tries to shrink it back to i8 based on subtarget timing.
I did not bother to duplicate all of the 32-bit test file RUNs and target settings that exist to
test whether LEA expansion is cheap or not. The diffs here assume a default target, so that means
LEA is generally cheap.
Differential Revision: https://reviews.llvm.org/D54803
llvm-svn: 347557
We can now select CLZ via the TableGen'erated code, so support G_CTLZ
and G_CTLZ_ZERO_UNDEF throughout the pipeline for types <= s32.
Legalizer:
If the CLZ instruction is available, use it for both G_CTLZ and
G_CTLZ_ZERO_UNDEF. Otherwise, use a libcall for G_CTLZ_ZERO_UNDEF and
lower G_CTLZ in terms of it.
In order to achieve this we need to add support to the LegalizerHelper
for the legalization of G_CTLZ_ZERO_UNDEF for s32 as a libcall (__clzsi2).
We also need to allow lowering of G_CTLZ in terms of G_CTLZ_ZERO_UNDEF
if that is supported as a libcall, as opposed to just if it is Legal or
Custom. Due to a minor refactoring of the helper function in charge of
this, we will also allow the same behaviour for G_CTTZ and G_CTPOP.
This is not going to be a problem in practice since we don't yet have
support for treating G_CTTZ and G_CTPOP as libcalls (not even in
DAGISel).
Reg bank select:
Map G_CTLZ to GPR. G_CTLZ_ZERO_UNDEF should not make it to this point.
Instruction select:
Nothing to do.
llvm-svn: 347545
Both zext and sext are currently allowed during the search for narrow
sequences and sexts operands are later added to the mac candidates.
But operands of muls are also added, without checking whether they're
sext or zext, which means we can generate a signed smlad when we
shouldn't.
Differential Revision: https://reviews.llvm.org/D54790
llvm-svn: 347542
This reverts commits r347532. Forget add the option
-mtriple powerpc64-unknown-linux-gnu. So other platform is error except
for PowerPC.
llvm-svn: 347534
Summary:
There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the
function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD.
These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D54738
llvm-svn: 347532
This should likely be adjusted to limit this transform
further, but these diffs should be clear wins.
If we have blendv/conditional move, then we should assume
those are cheap ops. The loads become independent of the
compare, so those can be speculated before we need to use
the values in the blend/mov.
llvm-svn: 347526