Commit Graph

9448 Commits

Author SHA1 Message Date
Craig Topper
826f44b550 [TargetLowering][AMDGPU] Remove the SimplifyDemandedBits function that takes a User and OpIdx. Stop using it in AMDGPU target for simplifyI24.
As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions.

Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node.

This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input.

Differential Revision: https://reviews.llvm.org/D56087

llvm-svn: 350560
2019-01-07 19:30:43 +00:00
Craig Topper
57fc891c1b [LegalizeVectorOps] Add FSHL/FSHR to the list of vector operations that should be handled.
The FSHL/FSHR nodes are handled in the expand function, but they need to also be listed in the code that queries for the operation action too.

llvm-svn: 350490
2019-01-06 07:06:35 +00:00
Stanislav Mekhanoshin
35a3a3bd11 Added single use check to ShrinkDemandedConstant
Fixes cvt_f32_ubyte combine. performCvtF32UByteNCombine() could shrink
source node to demanded bits only even if there are other uses.

Differential Revision: https://reviews.llvm.org/D56289

llvm-svn: 350475
2019-01-05 19:20:00 +00:00
Craig Topper
cfeb1cf9af [X86] Add INSERT_SUBVECTOR to ComputeNumSignBits
This adds support for calculating sign bits of insert_subvector. I based it on the computeKnownBits.

My motivating case is propagating sign bits information across basic blocks on AVX targets where concatenating using insert_subvector is common.

Differential Revision: https://reviews.llvm.org/D56283

llvm-svn: 350432
2019-01-04 20:50:59 +00:00
Sanjay Patel
9633d76a40 [DAGCombiner][x86] scalarize binop followed by extractelement
As noted in PR39973 and D55558:
https://bugs.llvm.org/show_bug.cgi?id=39973
...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine:

// extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index)

We want to have this in the DAG too because as we can see in some of the test diffs (reductions), 
the pattern may not be visible in IR.

Given that this is already an IR canonicalization, any backend that would prefer a vector op over 
a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's
a realistic expectation though). The transform is limited with a TLI hook because there's an
existing transform in CodeGenPrepare that tries to do the opposite transform.

Differential Revision: https://reviews.llvm.org/D55722

llvm-svn: 350354
2019-01-03 21:31:16 +00:00
Craig Topper
8dd7bd2cd7 [DAGCombiner] After performing the division by constant optimization for a DIV or REM node, replace the users of the corresponding REM or DIV node if it exists.
Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced.

Improves the test case from PR38217. There may be additional opportunities after this.

Differential Revision: https://reviews.llvm.org/D56145

llvm-svn: 350239
2019-01-02 18:19:07 +00:00
Craig Topper
3109f3a4ab [LegalizeIntegerTypes] When promoting the result of an extract_vector_elt also promote the input type if necessary
By also promoting the input type we get a better idea for what scalar type to use. This can provide better results if the result of the extract is sign extended. What was previously happening is that the extract result would be legalized, sometime later the input of the sign extend would be legalized using the result of the extract. Then later the extract input would be legalized forcing a truncate into the input of the sign extend using a replace all uses. This requires DAG combine to combine out the sext/truncate pair. But sometimes we visited the truncate first and messed things up before the sext could be combined.

By creating the extract with the correct scalar type when we create legalize the result type, the truncate will be added right away. Then when the sign_extend input is legalized it will create an any_extend of the truncate which can be optimized by getNode to maybe remove the truncate. And then a sign_extend_inreg. Now DAG combine doesn't have to worry about getting rid of the extend.

This fixes the regression on X86 in D56156.

Differential Revision: https://reviews.llvm.org/D56176

llvm-svn: 350236
2019-01-02 17:58:30 +00:00
Craig Topper
c562fae02b [DAGCombiner][X86][PowerPC] Teach visitSIGN_EXTEND_INREG to fold (sext_in_reg (aext/sext x)) -> (sext x) when x has more than 1 sign bit and the sext_inreg is from one of them.
If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead.

The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this.

Differential Revision: https://reviews.llvm.org/D56156

llvm-svn: 350235
2019-01-02 17:58:27 +00:00
Ayonam Ray
e00606a1b2 Reversing the commit in revision 350186. Revision causes regression in 4
tests.

llvm-svn: 350187
2019-01-01 07:28:55 +00:00
Ayonam Ray
c471bb2e67 Omit range checks from jump tables when lowering switches with unreachable
default

During the lowering of a switch that would result in the generation of a jump
table, a range check is performed before indexing into the jump table, for the
switch value being outside the jump table range and a conditional branch is
inserted to jump to the default block. In case the default block is
unreachable, this conditional jump can be omitted. This patch implements
omitting this conditional branch for unreachable defaults.

Review Reference: D52002

llvm-svn: 350186
2019-01-01 06:37:50 +00:00
Craig Topper
ed3ffae4a4 [SelectionDAG] Add SIGN_EXTEND_VECTOR_INREG support to computeKnownBits.
Differential Revision: https://reviews.llvm.org/D56168

llvm-svn: 350179
2018-12-31 19:09:30 +00:00
Craig Topper
802c4979ae [DAGCombiner] Add missing one use check on the shuffle in the bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1) transform.
Found while trying out some other changes so I don't really have a test case.

llvm-svn: 350172
2018-12-31 05:40:46 +00:00
Kang Zhang
4aa6453767 [PowerPC] Fix ADDE, SUBE do not know how to promote operator
Summary:
This patch is created to fix the Bugzilla bug 39815:
https://bugs.llvm.org/show_bug.cgi?id=39815 

This patch is to support promotion integer result for the instruction ADDE, SUBE.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D56119

llvm-svn: 350161
2018-12-30 07:48:09 +00:00
Richard Trieu
a87b70d1db Add vtable anchor to classes.
llvm-svn: 350142
2018-12-29 02:02:13 +00:00
Justin Lebar
49fac56ea3 [NVPTX] Allow libcalls that are defined in the current module.
The patch adds a possibility to make library calls on NVPTX.

An important thing about library functions - they must be defined within
the current module. This basically should guarantee that we produce a
valid PTX assembly (without calls to not defined functions). The one who
wants to use the libcalls is probably will have to link against
compiler-rt or any other implementation.

Currently, it's completely impossible to make library calls because of
error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can
lower ExternalSymbol to TargetExternalSymbol and verify if the function
definition is available.

Also, there was an issue with a DAG during legalisation. When we expand
instruction into libcall, the inner call-chain isn't being "integrated"
into outer chain. Since the last "data-flow" (call retval load) node is
located in call-chain earlier than CALLSEQ_END node, the latter becomes
a leaf and therefore a dead node (and is being removed quite fast).
Proposed here solution relies on another data-flow pseudo nodes
(ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and
instruction selection phases - we remove the pseudo instructions before
register scheduling phase.

Patch by Denys Zariaiev!

Differential Revision: https://reviews.llvm.org/D34708

llvm-svn: 350069
2018-12-26 19:12:31 +00:00
Craig Topper
0229da8f07 [X86] Use GetDemandedBits to simplify the operands of PMULDQ/PMULUDQ.
This is an alternative to what I attempted in D56057.

GetDemandedBits is a special version of SimplifyDemandedBits that allows simplifications even when the operand has other uses. GetDemandedBits will only do simplifications that allow a node to be bypassed. It won't create new nodes or alter any of the other users.

I had to add support for bypassing SIGN_EXTEND_INREG to GetDemandedBits.

Based on a patch that Simon Pilgrim sent me in email.

Fixes PR40142.

llvm-svn: 350059
2018-12-24 19:40:20 +00:00
George Burgess IV
610c76534f [SelectionDAGBuilder] Use ::precise LocationSizes; NFC
More migration so we can disable the implicit int -> LocationSize
conversion.

All of these are either scatter/gather'ed vector instructions, or direct
loads. Hence, they're all precise.

Perhaps if we see way more getTypeStoreSize calls, we can make a
getTypeStoreLocationSize (or similar) as a wrapper that applies this
::precise. Doesn't appear that it's a good idea to make getTypeStoreSize
return a LocationSize itself, however.

llvm-svn: 350042
2018-12-24 05:34:21 +00:00
Sanjay Patel
93f1074677 [DAGCombiner] limit shuffle to extend transform (PR40146)
It's dangerous to knowingly create an illegal vector type
no matter what stage of combining we're in.

This prevents the missed folding/scalarization seen in:
https://bugs.llvm.org/show_bug.cgi?id=40146

llvm-svn: 350034
2018-12-23 20:48:31 +00:00
Sanjay Patel
9933574ac3 [DAGCombiner] allow hoisting vector bitwise logic ahead of extends
llvm-svn: 350032
2018-12-23 19:58:16 +00:00
Sanjay Patel
4b537aaf6d [DAGCombiner] allow narrowing of add followed by truncate
trunc (add X, C ) --> add (trunc X), C'

If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type.
This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine).

This change used to show regressions for x86, but those are gone after D55494. 
This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic) 
that does almost the same thing.

Differential Revision: https://reviews.llvm.org/D55866

llvm-svn: 350006
2018-12-22 17:10:31 +00:00
Sanjay Patel
47a6129e26 [DAGCombiner] simplify code leading to scalarizeExtractedVectorLoad; NFC
llvm-svn: 349958
2018-12-21 21:26:30 +00:00
Simon Pilgrim
911dce2f30 [SelectionDAG] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349907
2018-12-21 14:56:18 +00:00
Eli Friedman
b1bbd5dca3 [ARM] Complete the Thumb1 shift+and->shift+shift transforms.
This saves materializing the immediate.  The additional forms are less
common (they don't usually show up for bitfield insert/extract), but
they're still relevant.

I had to add a new target hook to prevent DAGCombine from reversing the
transform. That isn't the only possible way to solve the conflict, but
it seems straightforward enough.

Differential Revision: https://reviews.llvm.org/D55630

llvm-svn: 349857
2018-12-20 23:39:54 +00:00
Simon Pilgrim
b208255fe0 [SelectionDAGBuilder] Enable funnel shift building to custom rotates
This patch enables funnel shift -> rotate building for all ROTL/ROTR custom/legal operations.

AFAICT X86 was the last target that was missing modulo support (PR38243), but I've tried to CC stakeholders for every target that has ROTL/ROTR custom handling for their final OK.

Differential Revision: https://reviews.llvm.org/D55747

llvm-svn: 349765
2018-12-20 14:56:44 +00:00
Craig Topper
bd788ce5db [DAGCombiner] Fix a place that was creating a SIGN_EXTEND with an extra operand.
llvm-svn: 349726
2018-12-20 05:28:06 +00:00
Simon Pilgrim
2ae3a91656 [SelectionDAG] Optional handling of UNDEF elements in matchBinaryPredicate (part 2 of 2)
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.

This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.

I've updated the (or (and X, c1), c2) -> (and (or X, c2), c1|c2) fold to demonstrate its use, which I believe is safe for undef cases.

Differential Revision: https://reviews.llvm.org/D55822

llvm-svn: 349629
2018-12-19 14:09:38 +00:00
Simon Pilgrim
47ff0431e9 [SelectionDAG] Optional handling of UNDEF elements in matchBinaryPredicate (part 1 of 2)
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.

This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.

Differential Revision: https://reviews.llvm.org/D55822

llvm-svn: 349628
2018-12-19 14:09:09 +00:00
Simon Pilgrim
6c95bea072 [TargetLowering] Fix propagation of undefs in zero extension ops (PR40091)
As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero.

SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue.

Thanks to @dmgreen for catching this.

Differential Revision: https://reviews.llvm.org/D55883

llvm-svn: 349625
2018-12-19 13:37:59 +00:00
Simon Pilgrim
2072b5afbe [SelectionDAG] Optional handling of UNDEF elements in matchUnaryPredicate
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts are simplifying vector elements, we're seeing more constant BUILD_VECTOR containing UNDEFs.

This patch provides opt-in handling of UNDEF elements in matchUnaryPredicate, passing NULL instead of the ConstantSDNode* argument.

I've updated SelectionDAG::simplifyShift to demonstrate its use.

Differential Revision: https://reviews.llvm.org/D55819

llvm-svn: 349616
2018-12-19 10:41:06 +00:00
Pete Cooper
f86db5ce9e Rewrite objc intrinsics to runtime methods in PreISelIntrinsicLowering instead of SDAG.
SelectionDAG currently changes these intrinsics to function calls, but that won't work
for other ISel's.  Also we want to eventually support nonlazybind and weak linkage coming
from the front-end which we can't do in SelectionDAG.

llvm-svn: 349552
2018-12-18 22:20:03 +00:00
Nikita Popov
a7d2a235bb [SelectionDAG][X86] Fix [US](ADD|SUB)SAT vector legalization, add tests
Integer result promotion needs to use the scalar size, and we need
support for result widening.

This is in preparation for D55787.

llvm-svn: 349480
2018-12-18 13:22:53 +00:00
Simon Pilgrim
af6fbbf18b [TargetLowering] Fallback from SimplifyDemandedVectorElts to SimplifyDemandedBits
For opcodes not covered by SimplifyDemandedVectorElts, SimplifyDemandedBits might be able to help now that it supports demanded elts as well.

llvm-svn: 349466
2018-12-18 09:33:25 +00:00
Krzysztof Parzyszek
5852aa44ae [SDAG] Clarify the origin of chain in REG_SEQUENCE in comment, NFC
llvm-svn: 349391
2018-12-17 20:30:20 +00:00
Craig Topper
15b7246935 [SelectionDAG] Fix noop detection for vectors in AssertZext/AssertSext in getNode
The assertion type is always supposed to be a scalar type. So if the result VT of the assertion is a vector, we need to get the scalar VT before we can compare them.

Similarly for the assert above it.

I don't have a test case because I don't know of any place we violate this today. A coworker found this while trying to use r347287 on the 6.0 branch without also having r336868

llvm-svn: 349390
2018-12-17 20:29:13 +00:00
JF Bastien
1811217e4d NFC: remove unused variable
D55768 removed its use.

llvm-svn: 349377
2018-12-17 19:03:24 +00:00
Simon Pilgrim
9274f17a5e [TargetLowering] Add DemandedElts mask to SimplifyDemandedBits (PR40000)
This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen.

I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well.

Differential Revision: https://reviews.llvm.org/D55768

llvm-svn: 349374
2018-12-17 18:43:43 +00:00
Tim Northover
256a16d031 FastIsel: take care to update iterators when removing instructions.
We keep a few iterators into the basic block we're selecting while
performing FastISel. Usually this is fine, but occasionally code wants
to remove already-emitted instructions. When this happens we have to be
careful to update those iterators so they're not pointint at dangling
memory.

llvm-svn: 349365
2018-12-17 17:25:53 +00:00
Sanjay Patel
f24900b934 [DAGCombiner] allow hoisting vector bitwise logic ahead of truncates
The transform performs a bitwise logic op in a wider type followed by
truncate when both inputs are truncated from the same source type:
logic_op (truncate x), (truncate y) --> truncate (logic_op x, y)

There are a bunch of other checks that should prevent doing this when 
it might be harmful.

We already do this transform for scalars in this spot. The vector 
limitation was shared with a check for the case when the operands are 
extended. I'm not sure if that limit is needed either, but that would 
be a separate patch.

Differential Revision: https://reviews.llvm.org/D55448

llvm-svn: 349303
2018-12-16 14:57:04 +00:00
Simon Pilgrim
0ef977b83d [SelectionDAG] Add FSHL/FSHR support to computeKnownBits
Also exposes an issue in DAGCombiner::visitFunnelShift where we were assuming the shift amount had the result type (after legalization it'll have the targets shift amount type).

llvm-svn: 349298
2018-12-16 13:33:37 +00:00
Simon Pilgrim
1e1fd9c761 [TargetLowering] Add ISD::OR + ISD::XOR handling to SimplifyDemandedVectorElts
Differential Revision: https://reviews.llvm.org/D55600

llvm-svn: 349264
2018-12-15 11:36:36 +00:00
Krzysztof Parzyszek
6b01d35497 [SDAG] Ignore chain operand in REG_SEQUENCE when emitting instructions
llvm-svn: 349186
2018-12-14 20:14:12 +00:00
Craig Topper
257ce3871e [DAGCombiner][X86] Prevent visitSIGN_EXTEND from returning N when (sext (setcc)) already has the target desired type for the setcc
Summary:
If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node.

To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term.

Reviewers: RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55459

llvm-svn: 349137
2018-12-14 08:28:24 +00:00
Sanjay Patel
093ab45d4c [DAGCombiner] clean up visitEXTRACT_VECTOR_ELT
This isn't quite NFC, but I don't know how to expose
any outward diffs from these changes. Mostly, this
was confusing because it used 'VT' to refer to the
operand type rather the usual type of the input node.

There's also a large block at the end that is dedicated 
solely to matching loads, but that wasn't obvious. This
could probably be split up into separate functions to
make it easier to see. 

It's still not clear to me when we make certain transforms 
because the legality and constant conditions are 
intertwined in a way that might be improved.

llvm-svn: 349095
2018-12-14 00:09:08 +00:00
Sanjay Patel
791ae69afe [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract; 2nd try
This is a retry of rL349051 (reverted at rL349056). I changed the check for dead-ness from
number of uses to an opcode test for DELETED_NODE based on existing similar code.

Differential Revision: https://reviews.llvm.org/D55655

llvm-svn: 349058
2018-12-13 17:05:01 +00:00
Sanjay Patel
c56f5728ee revert rL349051: [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract
This causes an address sanitizer bot failure:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/27187/steps/check-llvm%20asan/logs/stdio

llvm-svn: 349056
2018-12-13 16:32:44 +00:00
Sanjay Patel
a7b115b392 [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract
Differential Revision: https://reviews.llvm.org/D55655

llvm-svn: 349051
2018-12-13 15:44:26 +00:00
Simon Pilgrim
ab973a45b9 [DAGCombine] Moved X86 rotate_amount % bitwidth == 0 early out to DAGCombiner
Remove common code from custom lowering (code is still safe if somehow a zero value gets used).

llvm-svn: 349028
2018-12-13 12:23:32 +00:00
Simon Pilgrim
77fc551d1a [TargetLowering] Add ISD::ROTL/ROTR vector expansion
Move existing rotation expansion code into TargetLowering and set it up for vectors as well.

Ideally this would share more of the funnel shift expansion, but we handle the shift amount modulo quite differently at the moment.

Begun removing x86 vector rotate custom lowering to use the expansion.

llvm-svn: 349025
2018-12-13 11:20:48 +00:00
Clement Courbet
76f4ae1092 [CodeGen] Allow mempcy/memset to generate small overlapping stores.
Summary:
All targets either just return false here or properly model `Fast`, so I
don't think there is any reason to prevent CodeGen from doing the right
thing here.

Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D55365

llvm-svn: 349016
2018-12-13 09:56:19 +00:00
Simon Pilgrim
eb508f8ccb [SelectionDAG] Add a generic isSplatValue function
This patch introduces a generic function to determine whether a given vector type is known to be a splat value for the specified demanded elements, recursing up the DAG looking for BUILD_VECTOR or VECTOR_SHUFFLE splat patterns.

It also keeps track of the elements that are known to be UNDEF - it returns true if all the demanded elements are UNDEF (as this may be useful under some circumstances), so this needs to be handled by the caller.

A wrapper variant is also provided that doesn't take the DemandedElts or UndefElts arguments for cases where we just want to know if the SDValue is a splat or not (with/without UNDEFS).

I had hoped to completely remove the X86 local version of this function, but I'm seeing some regressions in shift/rotate codegen that will take a little longer to fix and I hope to get this in sooner so I can continue work on PR38243 which needs more capable splat detection.

Differential Revision: https://reviews.llvm.org/D55426

llvm-svn: 348953
2018-12-12 18:32:29 +00:00