Commit Graph

2384 Commits

Author SHA1 Message Date
Sanjay Patel
9633d76a40 [DAGCombiner][x86] scalarize binop followed by extractelement
As noted in PR39973 and D55558:
https://bugs.llvm.org/show_bug.cgi?id=39973
...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine:

// extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index)

We want to have this in the DAG too because as we can see in some of the test diffs (reductions), 
the pattern may not be visible in IR.

Given that this is already an IR canonicalization, any backend that would prefer a vector op over 
a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's
a realistic expectation though). The transform is limited with a TLI hook because there's an
existing transform in CodeGenPrepare that tries to do the opposite transform.

Differential Revision: https://reviews.llvm.org/D55722

llvm-svn: 350354
2019-01-03 21:31:16 +00:00
Craig Topper
8dd7bd2cd7 [DAGCombiner] After performing the division by constant optimization for a DIV or REM node, replace the users of the corresponding REM or DIV node if it exists.
Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced.

Improves the test case from PR38217. There may be additional opportunities after this.

Differential Revision: https://reviews.llvm.org/D56145

llvm-svn: 350239
2019-01-02 18:19:07 +00:00
Craig Topper
c562fae02b [DAGCombiner][X86][PowerPC] Teach visitSIGN_EXTEND_INREG to fold (sext_in_reg (aext/sext x)) -> (sext x) when x has more than 1 sign bit and the sext_inreg is from one of them.
If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead.

The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this.

Differential Revision: https://reviews.llvm.org/D56156

llvm-svn: 350235
2019-01-02 17:58:27 +00:00
Craig Topper
802c4979ae [DAGCombiner] Add missing one use check on the shuffle in the bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1) transform.
Found while trying out some other changes so I don't really have a test case.

llvm-svn: 350172
2018-12-31 05:40:46 +00:00
Sanjay Patel
93f1074677 [DAGCombiner] limit shuffle to extend transform (PR40146)
It's dangerous to knowingly create an illegal vector type
no matter what stage of combining we're in.

This prevents the missed folding/scalarization seen in:
https://bugs.llvm.org/show_bug.cgi?id=40146

llvm-svn: 350034
2018-12-23 20:48:31 +00:00
Sanjay Patel
9933574ac3 [DAGCombiner] allow hoisting vector bitwise logic ahead of extends
llvm-svn: 350032
2018-12-23 19:58:16 +00:00
Sanjay Patel
4b537aaf6d [DAGCombiner] allow narrowing of add followed by truncate
trunc (add X, C ) --> add (trunc X), C'

If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type.
This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine).

This change used to show regressions for x86, but those are gone after D55494. 
This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic) 
that does almost the same thing.

Differential Revision: https://reviews.llvm.org/D55866

llvm-svn: 350006
2018-12-22 17:10:31 +00:00
Sanjay Patel
47a6129e26 [DAGCombiner] simplify code leading to scalarizeExtractedVectorLoad; NFC
llvm-svn: 349958
2018-12-21 21:26:30 +00:00
Simon Pilgrim
911dce2f30 [SelectionDAG] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349907
2018-12-21 14:56:18 +00:00
Eli Friedman
b1bbd5dca3 [ARM] Complete the Thumb1 shift+and->shift+shift transforms.
This saves materializing the immediate.  The additional forms are less
common (they don't usually show up for bitfield insert/extract), but
they're still relevant.

I had to add a new target hook to prevent DAGCombine from reversing the
transform. That isn't the only possible way to solve the conflict, but
it seems straightforward enough.

Differential Revision: https://reviews.llvm.org/D55630

llvm-svn: 349857
2018-12-20 23:39:54 +00:00
Craig Topper
bd788ce5db [DAGCombiner] Fix a place that was creating a SIGN_EXTEND with an extra operand.
llvm-svn: 349726
2018-12-20 05:28:06 +00:00
Simon Pilgrim
2ae3a91656 [SelectionDAG] Optional handling of UNDEF elements in matchBinaryPredicate (part 2 of 2)
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.

This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.

I've updated the (or (and X, c1), c2) -> (and (or X, c2), c1|c2) fold to demonstrate its use, which I believe is safe for undef cases.

Differential Revision: https://reviews.llvm.org/D55822

llvm-svn: 349629
2018-12-19 14:09:38 +00:00
Simon Pilgrim
6c95bea072 [TargetLowering] Fix propagation of undefs in zero extension ops (PR40091)
As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero.

SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue.

Thanks to @dmgreen for catching this.

Differential Revision: https://reviews.llvm.org/D55883

llvm-svn: 349625
2018-12-19 13:37:59 +00:00
Sanjay Patel
f24900b934 [DAGCombiner] allow hoisting vector bitwise logic ahead of truncates
The transform performs a bitwise logic op in a wider type followed by
truncate when both inputs are truncated from the same source type:
logic_op (truncate x), (truncate y) --> truncate (logic_op x, y)

There are a bunch of other checks that should prevent doing this when 
it might be harmful.

We already do this transform for scalars in this spot. The vector 
limitation was shared with a check for the case when the operands are 
extended. I'm not sure if that limit is needed either, but that would 
be a separate patch.

Differential Revision: https://reviews.llvm.org/D55448

llvm-svn: 349303
2018-12-16 14:57:04 +00:00
Simon Pilgrim
0ef977b83d [SelectionDAG] Add FSHL/FSHR support to computeKnownBits
Also exposes an issue in DAGCombiner::visitFunnelShift where we were assuming the shift amount had the result type (after legalization it'll have the targets shift amount type).

llvm-svn: 349298
2018-12-16 13:33:37 +00:00
Craig Topper
257ce3871e [DAGCombiner][X86] Prevent visitSIGN_EXTEND from returning N when (sext (setcc)) already has the target desired type for the setcc
Summary:
If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node.

To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term.

Reviewers: RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55459

llvm-svn: 349137
2018-12-14 08:28:24 +00:00
Sanjay Patel
093ab45d4c [DAGCombiner] clean up visitEXTRACT_VECTOR_ELT
This isn't quite NFC, but I don't know how to expose
any outward diffs from these changes. Mostly, this
was confusing because it used 'VT' to refer to the
operand type rather the usual type of the input node.

There's also a large block at the end that is dedicated 
solely to matching loads, but that wasn't obvious. This
could probably be split up into separate functions to
make it easier to see. 

It's still not clear to me when we make certain transforms 
because the legality and constant conditions are 
intertwined in a way that might be improved.

llvm-svn: 349095
2018-12-14 00:09:08 +00:00
Sanjay Patel
791ae69afe [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract; 2nd try
This is a retry of rL349051 (reverted at rL349056). I changed the check for dead-ness from
number of uses to an opcode test for DELETED_NODE based on existing similar code.

Differential Revision: https://reviews.llvm.org/D55655

llvm-svn: 349058
2018-12-13 17:05:01 +00:00
Sanjay Patel
c56f5728ee revert rL349051: [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract
This causes an address sanitizer bot failure:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/27187/steps/check-llvm%20asan/logs/stdio

llvm-svn: 349056
2018-12-13 16:32:44 +00:00
Sanjay Patel
a7b115b392 [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract
Differential Revision: https://reviews.llvm.org/D55655

llvm-svn: 349051
2018-12-13 15:44:26 +00:00
Simon Pilgrim
ab973a45b9 [DAGCombine] Moved X86 rotate_amount % bitwidth == 0 early out to DAGCombiner
Remove common code from custom lowering (code is still safe if somehow a zero value gets used).

llvm-svn: 349028
2018-12-13 12:23:32 +00:00
Simon Pilgrim
c73a955370 [DAGCombiner] Remove unnecessary recursive DAGCombiner::visitINSERT_SUBVECTOR call.
As discussed on D55511, this caused an issue if the inner node deletes a node that the outer node depends upon. As it doesn't affect any lit-tests and I've only been able to expose this with the D55511 change I'm committing this now.

llvm-svn: 348781
2018-12-10 18:18:50 +00:00
Francis Visoiu Mistrih
753efe3584 [DAGCombiner] Use the result value type in visitCONCAT_VECTORS
This triggers an assert when combining concat_vectors of a bitcast of
merge_values.

With asserts disabled, it fails to select:
fatal error: error in backend: Cannot select: 0x7ff19d000e90: i32 = any_extend 0x7ff19d000ae8
  0x7ff19d000ae8: f64,ch = CopyFromReg 0x7ff19d000c20:1, Register:f64 %1
    0x7ff19d000b50: f64 = Register %1
In function: d

Differential Revision: https://reviews.llvm.org/D55507

llvm-svn: 348759
2018-12-10 14:31:34 +00:00
Sanjay Patel
e767bf4468 [DAGCombiner] re-enable truncation of binops
This is effectively re-committing the changes from:
rL347917 (D54640)
rL348195 (D55126)
...which were effectively reverted here:
rL348604
...because the code had a bug that could induce infinite looping
or eventual out-of-memory compilation.

The bug was that this code did not guard against transforming
opaque constants. More details are in the post-commit mailing
list thread for r347917. A reduced test for that is included
in the x86 bool-math.ll file. (I wasn't able to reduce a PPC
backend test for this, but it was almost the same pattern.)

Original commit message for r347917:

The motivating case for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=32023
and the corresponding rot16.ll regression tests.

Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc
sequences that don't get folded in IR.

As the TODO comments suggest, there will be regressions if we extend this (for x86,
we mostly seem to be missing LEA opportunities, but there are likely vector folds
missing too). I think those should be considered existing bugs because this is the
same transform that we do as an IR canonicalization in instcombine. We just need
more tests to make those visible independent of this patch.

llvm-svn: 348706
2018-12-08 16:07:38 +00:00
Sanjay Patel
bc47ff86fe [DAGCombiner] split trunc from extend in hoistLogicOpWithSameOpcodeHands; NFC
This duplicates several shared checks, but we need to split
this up to fix underlying bugs in smaller steps.

llvm-svn: 348627
2018-12-07 18:51:08 +00:00
Sanjay Patel
3af4ae9735 [DAGCombiner] disable truncation of binops by default
As discussed in the post-commit thread of r347917, this
transform is fighting with an existing transform causing
an infinite loop or out-of-memory, so this is effectively 
reverting r347917 and its follow-up r348195 while we
investigate the bug.

llvm-svn: 348604
2018-12-07 15:47:52 +00:00
Sanjay Patel
bb796cd61c [DAGCombiner] remove explicit calls to AddToWorkList; NFCI
As noted in the post-commit thread for rL347917:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608936.html
...we don't need to repeat these calls because the combiner does it automatically.

llvm-svn: 348597
2018-12-07 15:00:56 +00:00
Sanjay Patel
c6441c8547 [DAGCombiner] use root SDLoc for all nodes created by logic fold
If this is not a valid way to assign an SDLoc, then we get this
wrong all over SDAG.

I don't know enough about the SDAG to explain this. IIUC, theoretically,
debug info is not supposed to affect codegen. But here it has clearly
affected 3 different targets, and the x86 change is an actual improvement.

llvm-svn: 348552
2018-12-07 00:01:57 +00:00
Sanjay Patel
86cb679851 [DAGCombiner] don't bother saving a SDLoc for a node that's dead; NFCI
We shouldn't care about the debug location for a node that
we're creating, but attaching the root of the pattern should
be the best effort. (If this is not true, then we are doing
it wrong all over the SDAG).

This is no-functional-change-intended, and there are no
regression test diffs...and that's what I expected. But
there's a similar line above this diff, where those
assumptions apparently do not hold.

llvm-svn: 348550
2018-12-06 23:53:58 +00:00
Sanjay Patel
276cef343c [DAGCombiner] more clean up in hoistLogicOpWithSameOpcodeHands(); NFC
This code can still misbehave.

llvm-svn: 348547
2018-12-06 23:39:28 +00:00
Sanjay Patel
70af85b0ac [DAGCombiner] don't group bswap with casts in logic hoisting fold
This was probably organized as it was because bswap is a unary op.
But that's where the similarity to the other opcodes ends. We should
not limit this transform to scalars, and we should not try it if
either input has other uses. This is another step towards trying to
clean this whole function up to prevent it from causing infinite loops
and memory explosions. 

Earlier commits in this series:
rL348501
rL348508
rL348518

llvm-svn: 348534
2018-12-06 22:10:44 +00:00
Sanjay Patel
03a3ef2a0c [DAGCombiner] reduce indent; NFC
Unlike some of the folds in hoistLogicOpWithSameOpcodeHands()
above this shuffle transform, this has the expected hasOneUse()
checks in place.

llvm-svn: 348523
2018-12-06 20:02:47 +00:00
Andrea Di Biagio
52a2bac583 [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef.
This patch introduces a new DAGCombiner rule to simplify concat_vectors nodes:

concat_vectors( bitcast (scalar_to_vector %A), UNDEF)
    --> bitcast (scalar_to_vector %A)

This patch only partially addresses PR39257. In particular, it is enough to fix
one of the two problematic cases mentioned in PR39257. However, it is not enough
to fix the original test case posted by Craig; that particular case would
probably require a more complicated approach (and knowledge about used bits).

Before this patch, we used to generate the following code for function PR39257
(-mtriple=x86_64 , -mattr=+avx):

vmovsd  (%rdi), %xmm0           # xmm0 = mem[0],zero
vxorps  %xmm1, %xmm1, %xmm1
vblendps        $3, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2,3]
vmovaps %ymm0, (%rsi)
vzeroupper
retq

Now we generate this:

vmovsd  (%rdi), %xmm0           # xmm0 = mem[0],zero
vmovaps %ymm0, (%rsi)
vzeroupper
retq

As a side note: that VZEROUPPER is completely redundant...

I guess the vzeroupper insertion pass doesn't realize that the definition of
%xmm0 from vmovsd is already zeroing the upper half of %ymm0. Note that on
%-mcpu=btver2, we don't get that vzeroupper because pass vzeroupper insertion
%pass is disabled.

Differential Revision: https://reviews.llvm.org/D55274

llvm-svn: 348522
2018-12-06 19:55:38 +00:00
Sanjay Patel
bfc7ffa40f [DAGCombiner] don't hoist logic op if operands have other uses, part 2
The PPC test with 2 extra uses seems clearly better by avoiding this transform. 
With 1 extra use, we also prevent an extra register move (although that might
be an RA problem). The general rule should be to only make a change here if
it is always profitable. The x86 diffs are all neutral.

llvm-svn: 348518
2018-12-06 19:18:56 +00:00
Sanjay Patel
c3717cd0d5 [DAGCombiner] don't hoist logic op if operands have other uses
The AVX512 diffs are neutral, but the bswap test shows a clear overreach in 
hoistLogicOpWithSameOpcodeHands(). If we don't check for other uses, we can 
increase the instruction count.

This could also fight with transforms trying to go in the opposite direction 
and possibly blow up/infinite loop. This might be enough to solve the bug 
noted here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html

I did not add the hasOneUse() checks to all opcodes because I see a perf 
regression for at least one opcode. We may decide that's irrelevant in the
face of potential compiler crashing, but I'll see if I can salvage that first.

llvm-svn: 348508
2018-12-06 18:16:32 +00:00
Sanjay Patel
e9bf78fa23 [DAGCombiner] refactor function that hoists bitwise logic; NFCI
Added FIXME and TODO comments for lack of safety checks.
This function is a suspect in out-of-memory errors as discussed in
the follow-up thread to r347917:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html

llvm-svn: 348501
2018-12-06 17:08:03 +00:00
Simon Pilgrim
105a366254 DAGCombiner::visitINSERT_VECTOR_ELT - pull out repeated VT.getVectorNumElements(). NFCI.
llvm-svn: 348494
2018-12-06 15:39:25 +00:00
Sanjay Patel
33a448f935 [DAGCombiner] don't try to extract a fraction of a vector binop and crash (PR39893)
Because we're potentially peeking through a bitcast in this transform,
we need to use overall bitwidths rather than number of elements to
determine when it's safe to proceed.

Should fix:
https://bugs.llvm.org/show_bug.cgi?id=39893

llvm-svn: 348383
2018-12-05 17:10:30 +00:00
Simon Pilgrim
180639afe5 [SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467)
This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions.

Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code.

Differential Revision: https://reviews.llvm.org/D54698

llvm-svn: 348353
2018-12-05 11:12:12 +00:00
Simon Pilgrim
666261cdc8 [TargetLowering] Add SimplifyDemandedVectorElts support to EXTEND opcodes
Add support for ISD::*_EXTEND and ISD::*_EXTEND_VECTOR_INREG opcodes.

The extra broadcast in trunc-subvector.ll will be fixed in an upcoming patch.

llvm-svn: 348246
2018-12-04 10:41:06 +00:00
Sanjay Patel
d24f63477d [DAGCombiner] narrow truncated vector binops when legal
This is the smallest vector enhancement I could find to D54640.
Here, we're allowing narrowing to only legal vector ops because we'll see
regressions without that. All of the test diffs are wins from what I can tell.
With AVX/AVX512, we can shrink ymm/zmm ops to xmm.

x86 vector multiplies are the problem case that we're avoiding due to the
patchwork ISA, and it's not clear to me if we can dance around those
regressions using TLI hooks or if we need preliminary patches to plug those
holes.

Differential Revision: https://reviews.llvm.org/D55126

llvm-svn: 348195
2018-12-03 21:57:35 +00:00
Craig Topper
e35b01f8ea [X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8.
Summary:
Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction.

The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54836

llvm-svn: 348158
2018-12-03 18:26:24 +00:00
Sanjay Patel
2daceedf92 [DAGCombiner] guard against an oversized shift crash
This change prevents the crash noted in the post-commit comments 
for rL347478 :
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181119/605166.html

We can't guarantee that an oversized shift amount is folded away, 
so we have to check for it.

Note that I committed an incomplete fix for that crash with:
rL347502

But as discussed here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181126/605679.html
...we have to try harder.

So I'm not sure how to expose the bug now (and apparently no fuzzers have found 
a way yet either).

On the plus side, we have discovered that we're missing real optimizations by 
not simplifying nodes sooner, so the earlier fix still has value, and there's 
likely more value in extending that so we can simplify more opcodes and simplify 
when doing RAUW and/or putting nodes on the combiner worklist.

Differential Revision: https://reviews.llvm.org/D54954

llvm-svn: 348089
2018-12-02 13:33:56 +00:00
Sanjay Patel
8d27144251 [DAGCombiner] narrow truncated binops
The motivating case for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=32023
and the corresponding rot16.ll regression tests.

Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc 
sequences that don't get folded in IR.

As the TODO comments suggest, there will be regressions if we extend this (for x86, 
we mostly seem to be missing LEA opportunities, but there are likely vector folds 
missing too). I think those should be considered existing bugs because this is the 
same transform that we do as an IR canonicalization in instcombine. We just need 
more tests to make those visible independent of this patch.

Differential Revision: https://reviews.llvm.org/D54640

llvm-svn: 347917
2018-11-29 20:58:26 +00:00
Sanjay Patel
7336e7c67a [x86] limit transform for select-of-fp-constants
This should likely be adjusted to limit this transform
further, but these diffs should be clear wins.

If we have blendv/conditional move, then we should assume 
those are cheap ops. The loads become independent of the
compare, so those can be speculated before we need to use 
the values in the blend/mov.

llvm-svn: 347526
2018-11-25 17:27:02 +00:00
Sanjay Patel
04435677d0 [SelectionDAG] move constant or splat functions to common location
rL347502 moved the null sibling, so we should group all of these
together. I'm not sure why these aren't methods of the SDValue
class itself, but that's another patch if that's possible.

llvm-svn: 347523
2018-11-25 16:09:32 +00:00
Sanjay Patel
7e119c0400 [DAG] consolidate shift simplifications
...and use them to avoid creating obviously undef values as
discussed in the post-commit thread for r347478.

The diffs in vector div/rem show that we were missing real
optimizations by creating bogus shift nodes.

llvm-svn: 347502
2018-11-23 20:05:12 +00:00
Sanjay Patel
3e80019275 [DAGCombiner] form 'not' ops ahead of shifts (PR39657)
We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'),
but that would not matter without this patch because DAGCombiner was 
reversing that transform. I think we need this transform in the backend 
regardless of what happens in IR to catch cases where the shift-xor 
is formed late from GEP or other ops.

https://rise4fun.com/Alive/NC1

  Name: shl
  Pre: (-1 << C2) == C1
  %shl = shl i8 %x, C2
  %r = xor i8 %shl, C1
  =>
  %not = xor i8 %x, -1
  %r = shl i8 %not, C2
  
  Name: shr
  Pre: (-1 u>> C2) == C1
  %sh = lshr i8 %x, C2
  %r = xor i8 %sh, C1
  =>
  %not = xor i8 %x, -1
  %r = lshr i8 %not, C2

https://bugs.llvm.org/show_bug.cgi?id=39657

llvm-svn: 347478
2018-11-22 19:24:10 +00:00
Sanjay Patel
20935e0ab5 [DAGCombiner] refactor select-of-FP-constants transform
This transform needs to be limited. 

We are converting to a constant pool load very early, and we 
are turning loads that are independent of the select condition 
(and therefore speculatable) into a dependent non-speculatable 
load.

We may also be transferring a condition code from an FP register
to integer to create that dependent load.

llvm-svn: 347424
2018-11-21 20:54:47 +00:00
Sanjay Patel
1c74747478 [DAGCombiner] reduce code duplication; NFC
llvm-svn: 347410
2018-11-21 20:00:32 +00:00