Commit Graph

1821 Commits

Author SHA1 Message Date
Konstantin Zhuravlyov
081385a74e [DAGCombiner] Do not remove the load of stored values when optimizations are disabled
This combiner breaks debug experience and should not be run when optimizations are disabled.

For example:
  int main() {
    int j = 0;
    j += 2;
    if (j == 2)
      return 0;
    return 5;
  }
When debugging this code compiled in /O0, it should be valid to break at line "j+=2;" and edit the value of j. It should change the return value of the function.

Differential Revision: https://reviews.llvm.org/D19268

llvm-svn: 284014
2016-10-12 13:44:24 +00:00
Michael Kuperstein
7adbf6b042 [DAG] Fix crash in build_vector -> vector_shuffle combine
Fixes a crash in the build_vector -> vector_shuffle combine
when the first vector input is twice as wide as the output,
and the second input vector is even wider.

llvm-svn: 283953
2016-10-11 22:44:31 +00:00
Sanjay Patel
8253e15ef3 [DAG] add fold for masked negated sign-extended bool
This enhances the fold added with:
https://reviews.llvm.org/rL283900

llvm-svn: 283905
2016-10-11 17:05:52 +00:00
Sanjay Patel
8384703d9b [DAG] add fold for masked negated extended bool
The non-obvious motivation for adding this fold (which already happens in InstCombine)
is that we want to canonicalize IR towards select instructions and canonicalize DAG 
nodes towards boolean math. So we need to recreate some folds in the DAG to handle that
change in direction. 

An interesting implementation difference for cases like this is that InstCombine
generally works top-down while the DAG goes bottom-up. That means we need to detect 
different patterns. In this case, the SimplifyDemandedBits fold prevents us from 
performing a zext to sext fold that would then be recognized as a negation of a sext. 

llvm-svn: 283900
2016-10-11 16:26:36 +00:00
Sanjay Patel
38a42e4bfa [DAG] simplify logic; NFC
llvm-svn: 283885
2016-10-11 14:14:30 +00:00
Sanjay Patel
907ae69125 [DAG] hoist DL(N) and fix formatting; NFC
llvm-svn: 283884
2016-10-11 14:04:24 +00:00
Sanjay Patel
9609f3d6c7 [DAG] fix formatting; NFC
llvm-svn: 283878
2016-10-11 13:47:43 +00:00
Sanjay Patel
14c02052d6 [DAG] clean up foldSelectOfConstants(); NFCI
Rename variables, simplify logic. 
Not clear yet why we don't handle a target with ZeroOrNegativeOneBooleanContent too.

llvm-svn: 283613
2016-10-07 21:55:42 +00:00
Sanjay Patel
ecaf343fe7 [DAG] move fold (select C, 0, 1 -> xor C, 1) to a helper function; NFC
We're missing at least 3 other similar folds based on what we have in InstCombine. 

llvm-svn: 283596
2016-10-07 20:47:51 +00:00
Vedant Kumar
7beb423765 Delete some dead code in SelectionDAG (NFC)
Differential Revision: https://reviews.llvm.org/D24435

llvm-svn: 283505
2016-10-06 22:53:43 +00:00
Michael Kuperstein
7cc2123847 [DAG] Generalize build_vector -> vector_shuffle combine for more than 2 inputs
This generalizes the build_vector -> vector_shuffle combine to support any
number of inputs. The idea is to create a binary tree of shuffles, where
the first layer performs pairwise shuffles of the input vectors placing each
input element into the correct lane, and the rest of the tree blends these
shuffles together.

This doesn't try to be smart and create any sort of "optimal" shuffles.
The assumption is that even a "poor" shuffle sequence is better than extracting
and inserting the elements one by one.

Differential Revision: https://reviews.llvm.org/D24683

llvm-svn: 283480
2016-10-06 18:58:24 +00:00
Sanjay Patel
f7df85af87 fix formatting; NFC
llvm-svn: 283115
2016-10-03 15:18:36 +00:00
Nirav Dave
e524f50882 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT

llvm-svn: 282604
2016-09-28 16:37:50 +00:00
Nirav Dave
e17e055b75 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Simplify Consecutive Merge Store Candidate Search

  Now that address aliasing is much less conservative, push through
  simplified store merging search which only checks for parallel stores
  through the chain subgraph. This is cleaner as the separation of
  non-interfering loads/stores from the store-merging logic.

  Whem merging stores, search up the chain through a single load, and
  finds all possible stores by looking down from through a load and a
  TokenFactor to all stores visited. This improves the quality of the
  output SelectionDAG and generally the output CodeGen (with some
  exceptions).

  Additional Minor Changes:

    1. Finishes removing unused AliasLoad code
    2. Unifies the the chain aggregation in the merged stores across
       code paths
    3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
    4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

  This finishes the change Matt Arsenault started in r246307 and
  jyknight's original patch.

  Many tests required some changes as memory operations are now
  reorderable. Some tests relying on the order were changed to use
  volatile memory operations

  Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -
      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill
      behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 282600
2016-09-28 15:50:43 +00:00
Michael Kuperstein
3e06eafc20 [DAG] Remove isVectorClearMaskLegal() check from vector_build dagcombine
This check currently doesn't seem to do anything useful on any in-tree target:
On non-x86, it always evaluates to false, so we never hit the code path that
creates the shuffle with zero.
On x86, it just forwards to isShuffleMaskLegal(), which is a reasonable thing to
query in general, but doesn't make sense if only restricted to zero blends.

Differential Revision: https://reviews.llvm.org/D24625

llvm-svn: 282567
2016-09-28 06:13:58 +00:00
Wei Mi
ab24cd189f Change the order of the splitted store from high - low to low - high.
It is a trivial change which could make the testcase easier to be reused
for the store splitting in CodeGenPrepare.

llvm-svn: 281846
2016-09-18 06:10:32 +00:00
Sanjay Patel
b1f0a0f4a8 getValueType().getSizeInBits() -> getValueSizeInBits() ; NFCI
llvm-svn: 281493
2016-09-14 16:05:51 +00:00
Sanjay Patel
5f6bb6cd24 getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits() ; NFCI
llvm-svn: 281490
2016-09-14 15:43:44 +00:00
Sanjay Patel
bd6fca1419 getScalarType().getSizeInBits() -> getScalarSizeInBits() ; NFCI
llvm-svn: 281489
2016-09-14 15:21:00 +00:00
Michael Kuperstein
59f8305305 [DAG] Allow build-to-shuffle combine to combine builds from two wide vectors.
This allows us to, in some cases, create a vector_shuffle out of a build_vector, when
the inputs to the build are extract_elements from two different vectors, at least one
of which is wider than the output. (E.g. a <8 x i16> being constructed out of
elements from a <16 x i16> and a <8 x i16>).

Differential Revision: https://reviews.llvm.org/D24491

llvm-svn: 281402
2016-09-13 21:53:32 +00:00
Simon Pilgrim
4a8eba3e96 [DAGCombiner] Use APInt directly in (shl (zext (srl x, C)), C) combine range test
To avoid assertion, we must ensure that the inner shift constant is within range before calling ConstantSDNode::getZExtValue(). We already know that the outer shift constant is in range.

Followup to D23007

llvm-svn: 281362
2016-09-13 18:33:29 +00:00
Simon Pilgrim
bd28a85d14 [DAGCombiner] Use APInt directly in (shl (ext (shl x, c1)), c2) combine
Fix failure to detect out of range shift constants leading to assert in ConstantSDNode::getZExtValue()

Followup to D23007

llvm-svn: 281354
2016-09-13 17:15:28 +00:00
Ayman Musa
0c2da88f82 Remove MVT:i1 xor instruction before SELECT. (Performance improvement).
Differential Revision: https://reviews.llvm.org/D23764

llvm-svn: 281308
2016-09-13 09:12:45 +00:00
Michael Kuperstein
efc0667583 [DAG] Refactor BUILD_VECTOR combine to make it easier to extend. NFCI.
This should make it easier to add cases that we currently don't cover,
like supporting more kinds of type mismatches and more than 2 input vectors.

llvm-svn: 281283
2016-09-13 00:57:43 +00:00
Justin Lebar
adbf09e8cf [CodeGen] Split out the notions of MI invariance and MI dereferenceability.
Summary:
An IR load can be invariant, dereferenceable, neither, or both.  But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.

This patch splits up the notions of invariance and dereferenceability at
the MI level.  It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D23371

llvm-svn: 281151
2016-09-11 01:38:58 +00:00
Simon Pilgrim
153b408433 [SelectionDAG] Ensure DAG::getZeroExtendInReg is called with a scalar type
Fixes issue with rL280927 identified by Mikael Holmén

llvm-svn: 281042
2016-09-09 13:31:52 +00:00
Simon Pilgrim
a01ee07a19 [DAGCombiner] Enable AND combines of splatted constant vectors
Allow AND combines to use a vector splatted constant as well as a constant scalar.

Preliminary part of D24253.

llvm-svn: 280926
2016-09-08 12:36:39 +00:00
Hal Finkel
8ca2ed22b2 [DAGCombine] More fixups to SETCC legality checking (visitANDLike/visitORLike)
I might have called this "r246507, the sequel". It fixes the same issue, as the
issue has cropped up in a few more places. The underlying problem is that
isSetCCEquivalent can pick up select_cc nodes with a result type that is not
legal for a setcc node to have, and if we use that type to create new setcc
nodes, nothing fixes that (and so we've violated the contract that the
infrastructure has with the backend regarding setcc node types).

Fixes PR30276.

For convenience, here's the commit message from r246507, which explains the
problem is greater detail:

[DAGCombine] Fixup SETCC legality checking

SETCC is one of those special node types for which operation actions (legality,
etc.) is keyed off of an operand type, not the node's value type. This makes
sense because the value type of a legal SETCC node is determined by its
operands' value type (via the TLI function getSetCCResultType). When the
SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value
type, or directly with the value type provided by TLI.getSetCCResultType.

The first problem being fixed here is that DAGCombine had several places
querying TLI.isOperationLegal on SETCC, but providing the return of
getSetCCResultType, instead of the operand type directly. This does not mean
what the author thought, and "luckily", most in-tree targets have SETCC with
Custom lowering, instead of marking them Legal, so these checks return false
anyway.

The second problem being fixed here is that two of the DAGCombines could create
SETCC nodes with arbitrary (integer) value types; specifically, those that
would simplify:

  (setcc a, b, op1) and|or (setcc a, b, op2) -> setcc a, b, op3
     (which is possible for some combinations of (op1, op2))

If the operands of the and|or node are actual setcc nodes, then this is not an
issue (because the and|or must share the same type), but, the relevant code in
DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls
DAGCombiner::isSetCCEquivalent on each operand, and that function will
recognise setcc-like select_cc nodes with other return types. And, thus, when
creating new SETCC nodes, we need to be careful to respect the value-type
constraint. This is even true before type legalization, because it is quite
possible for the SELECT_CC node to have a legal type that does not happen to
match the corresponding TLI.getSetCCResultType type.

To be explicit, there is nothing that later fixes the value types of SETCC
nodes (if the type is legal, but does not happen to match
TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to
work only because, either MVT::i1 is not legal, or it is what
TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change,
however. For the time being, restrict the relevant transformations to produce
only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1
prior to type legalization).

Fixes PR24636.

llvm-svn: 280767
2016-09-06 23:02:23 +00:00
Wei Mi
c54d1298f5 Split the store of a wide value merged from an int-fp pair into multiple stores.
For the store of a wide value merged from a pair of values, especially int-fp pair,
sometimes it is more efficent to split it into separate narrow stores, which can
remove the bitwise instructions or sink them to colder places.

Now the feature is only enabled on x86 target, and only store of int-fp pair is
splitted. It is possible that the application scope gets extended with perf evidence
support in the future.

Differential Revision: https://reviews.llvm.org/D22840

llvm-svn: 280505
2016-09-02 17:17:04 +00:00
Andrea Di Biagio
fd503e5af3 [DAGcombiner] Fix incorrect sinking of a truncate into the operand of a shift.
This fixes a regression introduced by revision 268094.
Revision 268094 added the following dag combine rule:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.size / 2

That rule converts a truncate of a shift-by-constant into a shift of a truncated
value. We do this only if the shift count is less than half the size in bits of
the truncated value (K < vt.size / 2).

The problem is that the constraint on the shift count is incorrect, so the rule
doesn't work well in some cases involving vector types. The combine rule should
have been written instead like this:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.getScalarSizeInBits()

Basically, if K is smaller than the "scalar size in bits" of the truncated value
then we know that by "sinking" the truncate into the operand of the shift we
would never accidentally make the shift undefined.

This patch fixes the check on the shift count, and adds test cases to make sure
that we don't regress the behavior.

Differential Revision: https://reviews.llvm.org/D24154

llvm-svn: 280482
2016-09-02 11:29:09 +00:00
Michael Kuperstein
65bc3c89ff [DAGCombine] Don't fold a trunc if it feeds an anyext
Legalization tends to create anyext(trunc) patterns. This should always be
combined - into either a single trunc, a single ext, or nothing if the
types match exactly. But if we happen to combine the trunc first, we may pull
the trunc away from the anyext or make it implicit (e.g. the truncate(extract)
-> extract(bitcast) fold).

To prevent this, we can avoid doing the fold, similarly to how we already handle
fpround(fpextend).

Differential Revision: https://reviews.llvm.org/D23893

llvm-svn: 280386
2016-09-01 17:59:24 +00:00
Simon Pilgrim
02b13d4d3c Use SDValue::getOpcode() helper instead of via SDValue::getNode()
llvm-svn: 279381
2016-08-20 20:04:18 +00:00
Justin Bogner
cd1d5aaf2e Replace a few more "fall through" comments with LLVM_FALLTHROUGH
Follow up to r278902. I had missed "fall through", with a space.

llvm-svn: 278970
2016-08-17 20:30:52 +00:00
David Majnemer
0d955d0bf5 Use the range variant of find instead of unpacking begin/end
If the result of the find is only used to compare against end(), just
use is_contained instead.

No functionality change is intended.

llvm-svn: 278433
2016-08-11 22:21:41 +00:00
David Majnemer
0a16c22846 Use range algorithms instead of unpacking begin/end
No functionality change is intended.

llvm-svn: 278417
2016-08-11 21:15:00 +00:00
Simon Pilgrim
85c7ea86ae [DAGCombine] Avoid INSERT_SUBVECTOR reinsertions (PR28678)
If the input vector to INSERT_SUBVECTOR is another INSERT_SUBVECTOR, and this inserted subvector replaces the last insertion, then insert into the common source vector.

i.e. 
INSERT_SUBVECTOR( INSERT_SUBVECTOR( Vec, SubOld, Idx ), SubNew, Idx ) --> INSERT_SUBVECTOR( Vec, SubNew, Idx )

Differential Revision: https://reviews.llvm.org/D23330

llvm-svn: 278211
2016-08-10 10:50:53 +00:00
Simon Pilgrim
76964e3140 [DAGCombiner] Better support for shifting large value type by constants
As detailed on D22726, much of the shift combining code assume constant values will fit into a uint64_t value and calls ConstantSDNode::getZExtValue where it probably shouldn't (leading to asserts). Using APInt directly avoids this problem but we encounter other assertions if we attempt to compare/operate on 2 APInt of different bitwidths.

This patch adds a helper function to ensure that 2 APInt values are zero extended as required so that they can be safely used together. I've only added an initial example use for this to the '(SHIFT (SHIFT x, c1), c2) --> (SHIFT x, (ADD c1, c2))' combines. Further cases can easily be added as required.

Differential Revision: https://reviews.llvm.org/D23007

llvm-svn: 278141
2016-08-09 17:39:11 +00:00
Nikolai Bozhenov
f679530ba1 [X86] Heuristic to selectively build Newton-Raphson SQRT estimation
On modern Intel processors hardware SQRT in many cases is faster than RSQRT
followed by Newton-Raphson refinement. The patch introduces a simple heuristic
to choose between hardware SQRT instruction and Newton-Raphson software
estimation.

The patch treats scalars and vectors differently. The heuristic is that for
scalars the compiler should optimize for latency while for vectors it should
optimize for throughput. It is based on the assumption that throughput bound
code is likely to be vectorized.

Basically, the patch disables scalar NR for big cores and disables NR completely
for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores.
Secondly, vector SQRT has been greatly improved in Skylake and has better
throughput compared to NR.

Differential Revision: https://reviews.llvm.org/D21379

llvm-svn: 277725
2016-08-04 12:47:28 +00:00
Diana Picus
ddddbc2440 Typo fix in comment. NFC
llvm-svn: 277704
2016-08-04 08:25:08 +00:00
Michael Kuperstein
c97da7f3a4 [DAGCombine] Make sext(setcc) combine respect getBooleanContents
We used to combine "sext(setcc x, y, cc) -> (select (setcc x, y, cc), -1, 0)"
Instead, we should combine to (select (setcc x, y, cc), T, 0) where the value
of T is 1 or -1, depending on the type of the setcc, and getBooleanContents()
for the type if it is not i1.

This fixes PR28504.

llvm-svn: 277371
2016-08-01 19:39:49 +00:00
Matthias Braun
941a705b7b MachineFunction: Return reference for getFrameInfo(); NFC
getFrameInfo() never returns nullptr so we should use a reference
instead of a pointer.

llvm-svn: 277017
2016-07-28 18:40:00 +00:00
Simon Pilgrim
10bf0ff879 [DAGCombiner] Use APInt directly to detect out of range shift constants
Using getZExtValue() will assert if the value doesn't fit into uint64_t - SHL was already doing this, I've just updated ASHR/LSHR to match

As mentioned on D22726

llvm-svn: 276855
2016-07-27 10:30:55 +00:00
Justin Lebar
9c375817ac [SelectionDAG] Get rid of bool parameters in SelectionDAG::getLoad, getStore, and friends.
Summary:
Instead, we take a single flags arg (a bitset).

Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.

This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted.  It also greatly simplifies the process of adding another flag
to getLoad.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits

Differential Revision: http://reviews.llvm.org/D22249

llvm-svn: 275592
2016-07-15 18:27:10 +00:00
Sanjay Patel
fedc01ad76 [DAG] make isConstantSplatVector() available to the rest of lowering
llvm-svn: 275025
2016-07-10 21:27:06 +00:00
Sanjay Patel
303326541b reformat, fix comments/names; NFCI
llvm-svn: 275015
2016-07-10 13:05:57 +00:00
Matt Arsenault
3fb8f9eabf Reapply r274829 with fix for FP vectors
llvm-svn: 274937
2016-07-08 21:25:33 +00:00
Nico Weber
28410c6846 Revert r274829, it caused PR28472.
llvm-svn: 274916
2016-07-08 19:52:19 +00:00
Sjoerd Meijer
1ee119f897 Do not expand SDIV when compiling for minimum code size
Differential Revision: http://reviews.llvm.org/D22139

llvm-svn: 274855
2016-07-08 15:32:01 +00:00
Sjoerd Meijer
46c4c3d31c Addressing post-commit comments regarding not expanding UDIV;
we don't expand only when compiling for minimum code size.

llvm-svn: 274847
2016-07-08 14:17:09 +00:00
Sjoerd Meijer
a625af3feb Code size optimisation: don't expand a div to a mul and and a shift sequence.
As a result, the urem instruction will not be expanded to a sequence of umull,
lsrs, muls and sub instructions, but just a call to __aeabi_uidivmod.

Differential Revision: http://reviews.llvm.org/D22131

llvm-svn: 274843
2016-07-08 12:54:43 +00:00
Matt Arsenault
c3a6fe6ecd Bug 28444: Fix assertion when extract_vector_elt has mismatched type
For some reason extract_vector_elt is sometimes allowed to have
a different result type than the vector element type.

llvm-svn: 274829
2016-07-08 07:05:00 +00:00
Tim Shen
1c3c0afc53 [DAGCombiner] Fix visitSTORE to continue processing current SDNode, if findBetterNeighborChains doesn't actually CombineTo it.
Summary:
findBetterNeighborChains may or may not find a better chain for each node it finds, which include the node ("St") that visitSTORE is currently processing. If no better chain is found for St, visitSTORE should continue instead of return SDValue(St, 0), as if it's CombinedTo'ed.

This fixes bug 28130. There might be other ways to make the test pass (see D21409). I think both of the patches are fixing actual bugs revealed by the same testcase.

Reviewers: echristo, wschmidt, hfinkel, kbarton, amehsan, arsenm, nemanjai, bogner

Subscribers: mehdi_amini, nemanjai, llvm-commits

Differential Revision: http://reviews.llvm.org/D21692

llvm-svn: 274644
2016-07-06 17:44:03 +00:00
Balaram Makam
d4acd7ed10 Revert r259387: "AArch64: Implement missed conditional compare sequences."
This reverts commit r259387 because it inserts illegal code after legalization
    in some backends where i64 OR type is illegal for example.

llvm-svn: 274573
2016-07-05 20:24:05 +00:00
Matt Arsenault
2d79389508 DAGCombiner: Fold away vector extract of insert with the same index
This only really matters when the index is non-constant since the
constant case already gets taken care of by other combines.

llvm-svn: 274569
2016-07-05 18:25:02 +00:00
Craig Topper
d1eca0f32c [CodeGen] Teach OR combine of shuffles involving zero vectors to better handle undef indices.
Undef indices can now be treated as zeros. Or if its undef ORed with zero, we will keep the undef.

llvm-svn: 274472
2016-07-03 19:37:12 +00:00
Craig Topper
2bd8b4b180 [CodeGen,Target] Remove the version of DAG.getVectorShuffle that takes a pointer to a mask array. Convert all callers to use the ArrayRef version. No functional change intended.
For the most part this simplifies all callers. There were two places in X86 that needed an explicit makeArrayRef to shorten a statically sized array.

llvm-svn: 274337
2016-07-01 06:54:47 +00:00
Craig Topper
3a011de10c [DAGCombine] Teach DAG combine to handle ORs of shuffles involving zero vectors where the zero vector is the first operand to the shuffle instead of the second.
llvm-svn: 274097
2016-06-29 03:29:12 +00:00
Matt Arsenault
f0f721a682 DAGCombiner: Don't narrow volatile vector loads + extract
llvm-svn: 273909
2016-06-27 19:31:04 +00:00
Craig Topper
9c809bee1c [SelectionDAG] Use DAG.getCommutedVectorShuffle instead of reimplementing it.
llvm-svn: 273802
2016-06-26 05:10:49 +00:00
Nirav Dave
bfdb483755 Preserve DebugInfo when replacing values in DAGCombiner
Recommiting after correcting over-eager Debug Value transfer fixing PR28270.

[DAG] Previously debug values would transfer debuginfo for the selected
start node for a replacement which allows for debug to be dropped.

Push debug value transfer to occur with node/value replacement in
SelectionDAG, remove now extraneous transfers of debug values.

This refixes PR9817 which was being incompletely checked in the
testsuite.

Reviewers: jyknight

Subscribers: dblaikie, llvm-commits

Differential Revision: http://reviews.llvm.org/D21037

llvm-svn: 273585
2016-06-23 17:52:57 +00:00
Peter Collingbourne
6717803485 Revert r273456, "Preserve DebugInfo when replacing values in DAGCombiner" as it caused pr28270.
llvm-svn: 273518
2016-06-23 00:06:17 +00:00
Nirav Dave
96beb7dee5 Preserve DebugInfo when replacing values in DAGCombiner
Recommiting after fixing over-aggressive assertion

[DAG] Previously debug values would transfer debuginfo for the selected
start node for a replacement which allows for debug to be dropped.

Push debug value transfer to occur with node/value replacement in
SelectionDAG, remove now extraneous transfers of debug values.

This refixes PR9817 which was being incompletely checked in the
testsuite.

Reviewers: jyknight

Subscribers: dblaikie, llvm-commits

Differential Revision: http://reviews.llvm.org/D21037

llvm-svn: 273456
2016-06-22 19:03:26 +00:00
David Majnemer
e61e4bfd87 Replace silly uses of 'signed' with 'int'
llvm-svn: 273244
2016-06-21 05:10:24 +00:00
Sanjay Patel
f664f3a578 [DAG] Remove redundant FMUL in Newton-Raphson SQRT code
When calculating a square root using Newton-Raphson with two constants,
a naive implementation is to use five multiplications (four muls to calculate
reciprocal square root and another one to calculate the square root itself).
However, after some reassociation and CSE the same result can be obtained
with only four multiplications. Unfortunately, there's no reliable way to do
such a reassociation in the back-end. So, the patch modifies NR code itself
so that it directly builds optimal code for SQRT and doesn't rely on any
further reassociation.

Patch by Nikolai Bozhenov!

Differential Revision: http://reviews.llvm.org/D21127

llvm-svn: 272920
2016-06-16 16:58:54 +00:00
Nirav Dave
194cb55f37 Revert "Preserve DebugInfo when replacing values in DAGCombiner"
Reverting due to assertion failure in
lib/CodeGen/SelectionDAG/InstrEmitter.cpp

This reverts commit r272792.

llvm-svn: 272799
2016-06-15 16:08:50 +00:00
Nirav Dave
a72e308403 Preserve DebugInfo when replacing values in DAGCombiner
[DAG] Previously debug values would transfer debuginfo for the selected
start node for a replacement which allows for debug to be dropped.

Push debug value transfer to occur with node/value replacement in
SelectionDAG, remove now extraneous transfers of debug values.

This refixes PR9817 which was being incompletely checked in the
testsuite.

Reviewers: jyknight

Subscribers: dblaikie, llvm-commits

Differential Revision: http://reviews.llvm.org/D21037

llvm-svn: 272792
2016-06-15 14:50:08 +00:00
Benjamin Kramer
bdc4956bac Pass DebugLoc and SDLoc by const ref.
This used to be free, copying and moving DebugLocs became expensive
after the metadata rewrite. Passing by reference eliminates a ton of
track/untrack operations. No functionality change intended.

llvm-svn: 272512
2016-06-12 15:39:02 +00:00
Benjamin Kramer
46e38f3678 Avoid copies of std::strings and APInt/APFloats where we only read from it
As suggested by clang-tidy's performance-unnecessary-copy-initialization.
This can easily hit lifetime issues, so I audited every change and ran the
tests under asan, which came back clean.

llvm-svn: 272126
2016-06-08 10:01:20 +00:00
Sanjay Patel
dba8b4c04d transform obscured FP sign bit ops into a fabs/fneg using TLI hook
This is effectively a revert of:
http://reviews.llvm.org/rL249702 - [InstCombine] transform masking off of an FP sign bit into a fabs() intrinsic call (PR24886)
and:
http://reviews.llvm.org/rL249701 - [ValueTracking] teach computeKnownBits that a fabs() clears sign bits
and a reimplementation as a DAG combine for targets that have IEEE754-compliant fabs/fneg instructions.

This is intended to resolve the objections raised on the dev list:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098154.html
and:
https://llvm.org/bugs/show_bug.cgi?id=24886#c4

In the interest of patch minimalism, I've only partly enabled AArch64. PowerPC, MIPS, x86 and others can enable later.

Differential Revision: http://reviews.llvm.org/D19391

llvm-svn: 271573
2016-06-02 20:01:37 +00:00
Sanjay Patel
f509d85a6d [DAG] use getBitcast() to reduce code
Although this was intended to be NFC, the test case wiggle shows a change in
code scheduling/RA caused by a difference in the SDLoc() generation.

Depending on how you look at it, this is the (dis)advantage of exact checking
in regression tests.

llvm-svn: 271526
2016-06-02 16:01:15 +00:00
Matt Arsenault
5d06439c54 DAGCombiner: Fix broken size check in isAlias
This should have been converting the size to bytes, but wasn't really.
These should probably all be using getStoreSize instead.

I haven't been able to come up with a meaningful testcase for this.
I can trigger it using combinations of struct loads and stores,
but can't observe a difference in non-broken testcases.

isAlias is only really used during store merging, so I'm not sure how
to get into the vector splitting situation the comment describes
since store merging is only done before type legalization.

llvm-svn: 271356
2016-06-01 01:00:36 +00:00
Michael Kuperstein
a75c77b127 [X86] Detect SAD patterns and emit psadbw instructions.
This recommits r267649 with a fix for PR27539.

Differential Revision: http://reviews.llvm.org/D20598

llvm-svn: 271033
2016-05-27 18:53:22 +00:00
Simon Pilgrim
0a6b95a60a Simplify std::all_of predicate (to one line) by using llvm::all_of. NFCI.
llvm-svn: 270747
2016-05-25 20:13:39 +00:00
Sanjay Patel
b2bcd95aab reduce indentation; NFCI
llvm-svn: 270007
2016-05-19 00:33:07 +00:00
Simon Pilgrim
ed39d150f5 Fix unused variable warning.
llvm-svn: 268867
2016-05-07 20:19:59 +00:00
Simon Pilgrim
b6f82c449a [SelectionDAG] Added bitreverse(bitreverse(v)) --> v
Added bitreverse creation testing

llvm-svn: 268865
2016-05-07 20:12:36 +00:00
Matt Arsenault
ab2232cf73 DAGCombiner: Reduce truncated shl width
llvm-svn: 268094
2016-04-29 19:53:16 +00:00
Gerolf Hoflehner
50426191d7 [DAGCombiner] Follow coding convention for function name (NFC)
llvm-svn: 267745
2016-04-27 17:27:16 +00:00
Nico Weber
e69b9548b8 Revert r267649, it caused PR27539.
llvm-svn: 267723
2016-04-27 15:16:54 +00:00
Cong Hou
6f879d9eb1 Detects the SAD pattern on X86 so that much better code will be emitted once the pattern is matched.
Differential revision: http://reviews.llvm.org/D14840

llvm-svn: 267649
2016-04-27 01:29:18 +00:00
Ahmed Bougacha
128f8732a5 [CodeGen] Add getBuildVector and getSplatBuildVector helpers. NFCI.
Differential Revision: http://reviews.llvm.org/D17176

llvm-svn: 267606
2016-04-26 21:15:30 +00:00
Marcin Koscielnicki
1c1af6ef77 [PR27390] [CodeGen] Reject indexed loads in CombinerDAG.
visitAND, when folding and (load) forgets to check which output of
an indexed load is involved, happily folding the updated address
output on the following testcase:

target datalayout = "e-m:e-i64:64-n32:64"
target triple = "powerpc64le-unknown-linux-gnu"

%typ = type { i32, i32 }

define signext i32 @_Z8access_pP1Tc(%typ* %p, i8 zeroext %type) {
  %b = getelementptr inbounds %typ, %typ* %p, i64 0, i32 1
  %1 = load i32, i32* %b, align 4
  %2 = ptrtoint i32* %b to i64
  %3 = and i64 %2, -35184372088833
  %4 = inttoptr i64 %3 to i32*
  %_msld = load i32, i32* %4, align 4
  %zzz = add i32 %1,  %_msld
  ret i32 %zzz
}

Fix this by checking ResNo.

I've found a few more places that currently neglect to check for
indexed load, and tightened them up as well, but I don't have test
cases for them.  In fact, they might not be triggerable at all,
at least with current targets.  Still, better safe than sorry.

Differential Revision: http://reviews.llvm.org/D19202

llvm-svn: 267420
2016-04-25 15:43:44 +00:00
Gerolf Hoflehner
01b3a6184a [MachineCombiner] Support for floating-point FMA on ARM64 (re-commit r267098)
The original patch caused crashes because it could derefence a null pointer
for SelectionDAGTargetInfo for targets that do not define it.

Evaluates fmul+fadd -> fmadd combines and similar code sequences in the
machine combiner. It adds support for float and double similar to the existing
integer implementation. The key features are:

- DAGCombiner checks whether it should combine greedily or let the machine
combiner do the evaluation. This is only supported on ARM64.
- It gives preference to throughput over latency: the heuristic used is
to combine always in loops. The targets decides whether the machine
combiner should optimize for throughput or latency.
- Supports for fmadd, f(n)msub, fmla, fmls patterns
- On by default at O3 ffast-math

llvm-svn: 267328
2016-04-24 05:14:01 +00:00
Craig Topper
36c133159a [CodeGen] Teach DAG combine to fold select_cc seteq X, 0, sizeof(X), ctlz_zero_undef(X) -> ctlz(X). InstCombine already does this for IR and X86 pattern matches this during isel.
A follow up commit will remove the X86 patterns to allow this to be tested.

llvm-svn: 267325
2016-04-24 04:38:32 +00:00
Matt Arsenault
3b748d76f6 DAGCombiner: Relax alignment restriction when changing store type
If the target allows the alignment, this should be OK.

llvm-svn: 267217
2016-04-22 21:01:41 +00:00
Matt Arsenault
629d12de70 DAGCombiner: Relax alignment restriction when changing load type
If the target allows the alignment, this should still be OK.

llvm-svn: 267209
2016-04-22 20:21:36 +00:00
Daniel Sanders
591c379563 Revert r267098 - [MachineCombiner] Support for floating-point FMA on ARM64
It introduced buildbot failures on clang-cmake-mips, clang-ppc64le-linux, among others.

llvm-svn: 267127
2016-04-22 09:37:26 +00:00
Gerolf Hoflehner
b32f11fc62 [MachineCombiner] Support for floating-point FMA on ARM64
Evaluates fmul+fadd -> fmadd combines and similar code sequences in the
machine combiner. It adds support for float and double similar to the existing
integer implementation. The key features are:

- DAGCombiner checks whether it should combine greedily or let the machine
combiner do the evaluation. This is only supported on ARM64.
- It gives preference to throughput over latency: the heuristic used is
to combine always in loops. The targets decides whether the machine
combiner should optimize for throughput or latency.
- Supports for fmadd, f(n)msub, fmla, fmls patterns
- On by default at O3 ffast-math

llvm-svn: 267098
2016-04-22 02:15:19 +00:00
Matt Arsenault
8d1052f55c DAGCombiner: Reduce 64-bit BFE pattern to pattern on 32-bit component
If the extracted bits are restricted to the upper half or lower half,
this can be truncated.

llvm-svn: 267024
2016-04-21 18:03:06 +00:00
Nirav Dave
2477491a92 Cleanup Store Merging in UseAA case
This patch fixes a bug (PR26827) when using anti-aliasing in store
merging. This sets the chain users of the component stores to point to
the new store instead of the component stores chain parent.

Reviewers: jyknight

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18909

llvm-svn: 266217
2016-04-13 17:27:26 +00:00
Simon Pilgrim
82e54871d0 [DAGCombiner] Fold xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B)) anytime before LegalizeVectorOprs
xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B)) was only being combined at the AfterLegalizeTypes stage, this patch permits the combine to occur anytime before then as well.

The main aim with this to improve the ability to recognise bitmasks that can be converted to shuffles.

I had to modify a number of AVX512 mask tests as the basic bitcast to/from scalar pattern was being stripped out, preventing testing of the mmask bitops. By replacing the bitcasts with loads we can get almost the same result.

Differential Revision: http://reviews.llvm.org/D18944

llvm-svn: 265998
2016-04-11 21:10:33 +00:00
Nirav Dave
66f485f4e2 Fix Load Control Dependence in MemCpy Generation
In Memcpy lowering we had missed a dependence from the load of the
operation to successor operations. This causes us to potentially
construct an in initial DAG with a memory dependence not fully
represented in the chain sub-DAG but rather require looking at the
entire DAG breaking alias analysis by allowing incorrect repositioning
of memory operations.

To work around this, r200033 changed DAGCombiner::GatherAllAliases to be
conservative if any possible issues to happen. Unfortunately this check
forbade many non-problematic situations as well. For example, it's
common for incoming argument lowering to add a non-aliasing load hanging
off of EntryNode. Then, if GatherAllAliases visited EntryNode, it would
find that other (unvisited) use of the EntryNode chain, and just give up
entirely. Furthermore, the check was incomplete: it would not actually
detect all such potentially problematic DAG constructions, because
GatherAllAliases did not guarantee to visit all chain nodes going up to
the root EntryNode. This is in general fine -- giving up early will just
miss a potential optimization, not generate incorrect results. But, for
this non-chain dependency detection code, it's possible that you could
have a load attached to a higher-up chain node than any which were
visited. If that load aliases your store, but the only dependency is
through the value operand of a non-aliasing store, it would've been
missed by this code, and potentially reordered.

With the dependence added, this check can be removed and Alias Analysis
can be much more aggressive. This fixes code quality regression in the
Consecutive Store Merge cleanup (D14834).

Test Change:

ppc64-align-long-double.ll now may see multiple serializations
of its stores

Differential Revision: http://reviews.llvm.org/D18062

llvm-svn: 265836
2016-04-08 19:44:40 +00:00
Sanjay Patel
769b5fd546 fix typos; NFC
llvm-svn: 265356
2016-04-04 22:45:56 +00:00
Sanjay Patel
4d71160d5d fix typo; NFC
llvm-svn: 265054
2016-03-31 21:00:48 +00:00
Nirav Dave
83ce54aac2 Prevent X86ISelLowering from merging volatile loads
Change isConsecutiveLoads to check that loads are non-volatile as this
is a requirement for any load merges. Propagate change to two callers.

Reviewers: RKSimon

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18546

llvm-svn: 265013
2016-03-31 13:40:55 +00:00
Nirav Dave
fa250cad37 Prevent construction of cycle in DAG store merge
When merging stores in DAGCombiner, add check to ensure that no
dependenices exist that would cause the construction of a cycle in our
DAG.  This may happen if one store has a data dependence on another
instruction (e.g. a load) which itself has a (chain) dependence on
another store being merged. These stores cannot be merged safely and
doing so results in a cycle that is discovered in LegalizeDAG.

This test is only done in cases where Antialias analysis is used (UseAA)
as non-AA store merge candidates will be merged logically after all
loads which have been checked to not alias.

Reviewers: ahatanak, spatel, niravd, arsenm, hfinkel, tstellarAMD, jyknight

Subscribers: llvm-commits, tberghammer, danalbert, srhines

Differential Revision: http://reviews.llvm.org/D18336

llvm-svn: 264461
2016-03-25 21:06:30 +00:00
Justin Bogner
c35c10593b SelectionDAG: Remove a tautological dyn_cast. NFC
Index is already a StoreSDNode, so this dyn_cast doesn't do anything.

llvm-svn: 264177
2016-03-23 18:15:33 +00:00
Silviu Baranga
46030585b3 [DAGCombine] Catch the case where extract_vector_elt can cause an any_ext while processing AND SDNodes
Summary:
extract_vector_elt can cause an implicit any_ext if the types don't
match. When processing the following pattern:

  (and (extract_vector_elt (load ([non_ext|any_ext|zero_ext] V))), c)

DAGCombine was ignoring the possible extend, and sometimes removing
the AND even though it was required to maintain some of the bits
in the result to 0, resulting in a miscompile.

This change fixes the issue by limiting the transformation only to
cases where the extract_vector_elt doesn't perform the implicit
extend.

Reviewers: t.p.northover, jmolloy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18247

llvm-svn: 263935
2016-03-21 11:43:46 +00:00
Sanjay Patel
7506852709 [DAG] use !isUndef() ; NFCI
llvm-svn: 263453
2016-03-14 18:09:43 +00:00
Sanjay Patel
5719584129 [DAG] use isUndef() ; NFCI
llvm-svn: 263448
2016-03-14 17:28:46 +00:00
Simon Pilgrim
61eb49e437 [X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.

Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion).

Differential Revision: http://reviews.llvm.org/D17691

llvm-svn: 263159
2016-03-10 20:40:26 +00:00
Hans Wennborg
e00b6e7249 Revert r262599 "[X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG"
This caused PR26870.

llvm-svn: 262935
2016-03-08 16:21:41 +00:00
Matt Arsenault
ceb2c06cbd DAGCombiner: Check legality before creating extract_vector_elt
Problem not hit by any in tree target.

llvm-svn: 262852
2016-03-07 21:10:09 +00:00
Michael Kuperstein
b89f0fa2a2 [DAGCombine] Fix divrem combine not to assume div/rem type is simple.
The divrem combine assumed the type of the div/rem is simple, which isn't
necessarily true. This probably worked fine until r250825, since it only
saw legal types, but now breaks when it runs as a pre-type-legalization 
combine.

This fixes PR26835.

Differential Revision: http://reviews.llvm.org/D17878

llvm-svn: 262746
2016-03-04 21:23:29 +00:00
Renato Golin
175c6d6d95 [ARM] Merging 64-bit divmod lib calls into one
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.

This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.

Second attempt, creating TLI.isOperationCustom like isOperationExpand, to make
sure we only emit valid types or the ones that were explicitly marked as custom.
Now, passing check-all and test-suite on x86, ARM and AArch64.

This patch fixes PR17193 (and a long time FIXME in the tests).

llvm-svn: 262738
2016-03-04 19:19:36 +00:00
Simon Pilgrim
91dd0a796c [X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.

Differential Revision: http://reviews.llvm.org/D17691

llvm-svn: 262599
2016-03-03 09:43:28 +00:00
Renato Golin
3d78271eac Revert "[ARM] Merging 64-bit divmod lib calls into one"
This reverts commit r262507, which broke some ARM buildbots.

llvm-svn: 262594
2016-03-03 08:57:44 +00:00
Renato Golin
93e42d9934 [ARM] Merging 64-bit divmod lib calls into one
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.

This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.

This patch fixes PR17193 (and a long time FIXME in the tests).

llvm-svn: 262507
2016-03-02 19:35:45 +00:00
Matt Arsenault
7d0a77b979 DAGCombiner: Make sure an integer is being truncated
llvm-svn: 262446
2016-03-02 01:36:51 +00:00
Matt Arsenault
b36d462fac DAGCombiner: Turn truncate of a bitcasted vector to an extract
On AMDGPU where operations i64 operations are often bitcasted to v2i32
and back, this pattern shows up regularly where it breaks some
expected combines on i64, such as load width reducing.

This fixes some test failures in a future commit when i64 loads
are changed to promote.

llvm-svn: 262397
2016-03-01 21:31:53 +00:00
Vasileios Kalintiris
36901dd1c3 Revert "[mips] Promote the result of SETCC nodes to GPR width."
This reverts commit r262316.

It seems that my change breaks an out-of-tree chromium buildbot, so
I'm reverting this in order to investigate the situation further.

llvm-svn: 262387
2016-03-01 20:25:43 +00:00
Matt Arsenault
03dac8d8e4 DAGCombiner: Turn extract of bitcasted integer into truncate
This reduces the number of bitcast nodes and generally cleans up the
DAG when bitcasting between integers and vectors everywhere.

llvm-svn: 262358
2016-03-01 18:01:37 +00:00
Vasileios Kalintiris
3a8f7f9e31 [mips] Promote the result of SETCC nodes to GPR width.
Summary:
This patch modifies the existing comparison, branch, conditional-move
and select patterns, and adds new ones where needed. Also, the updated
SLT{u,i,iu} set of instructions generate a GPR width result.

The majority of the code changes in the Mips back-end fix the wrong
assumption that the result of SETCC nodes always produce an i32 value.
The changes in the common code path account for the fact that in 64-bit
MIPS targets, i1 is promoted to i32 instead of i64.

Reviewers: dsanders

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D10970

llvm-svn: 262316
2016-03-01 10:08:01 +00:00
Matt Arsenault
982224cfb8 DAGCombiner: Don't unnecessarily swap operands in ReassociateOps
In the case where op = add, y = base_ptr, and x = offset, this
transform:

(op y, (op x, c1)) -> (op (op x, y), c1)

breaks the canonical form of add by putting the base pointer in the
second operand and the offset in the first.

This fix is important for the R600 target, because for some address
spaces the base pointer and the offset are stored in separate register
classes. The old pattern caused the ISel code for matching addressing
modes to put the base pointer and offset in the wrong register classes,
which required no-trivial code transformations to fix.

llvm-svn: 262148
2016-02-27 19:57:45 +00:00
Matt Arsenault
360d244d5b DAGCombiner: Relax sqrt NaN folding check
This is OK for +0 since compares to +/-0 give the same result.

llvm-svn: 262125
2016-02-27 09:38:05 +00:00
Simon Pilgrim
c5199aae82 [DAGCombiner] Use getBitcast helper when possible. NFCI.
llvm-svn: 261437
2016-02-20 15:05:29 +00:00
Ahmed Bougacha
93cff7fb82 [CodeGen] Document and use getConstant's splat-building feature. NFC.
Differential Revision: http://reviews.llvm.org/D17229

llvm-svn: 260901
2016-02-15 18:07:29 +00:00
Pirama Arumuga Nainar
7476bc89e9 Don't combine fp_round (fp_round x) if f80 to f16 is generated
Summary:
This patch skips DAG combine of fp_round (fp_round x) if it results in
an fp_round from f80 to f16.

fp_round from f80 to f16 always generates an expensive (and as yet,
unimplemented) libcall to __truncxfhf2.  This prevents selection of
native f16 conversion instructions from f32 or f64.  Moreover, the first
(value-preserving) fp_round from f80 to either f32 or f64 may become a
NOP in platforms like x86.

Reviewers: ab

Subscribers: srhines, llvm-commits

Differential Revision: http://reviews.llvm.org/D17221

llvm-svn: 260769
2016-02-13 00:08:05 +00:00
Ahmed Bougacha
f8dfb47c02 [CodeGen] Prefer "if (SDValue R = ...)" to "if (R.getNode())". NFCI.
llvm-svn: 260316
2016-02-09 22:54:12 +00:00
Tim Shen
f99f0d5a7e [SelectionDAG] Fix CombineToPreIndexedLoadStore O(n^2) behavior
This patch consists of two parts: a performance fix in DAGCombiner.cpp
and a correctness fix in SelectionDAG.cpp.

The test case tests the bug that's uncovered by the performance fix, and
fixed by the correctness fix.

The performance fix keeps the containers required by the
hasPredecessorHelper (which is a lazy DFS) and reuse them. Since
hasPredecessorHelper is called in a loop, the overall efficiency reduced
from O(n^2) to O(n), where n is the number of SDNodes.

The correctness fix keeps iterating the neighbor list even if it's time
to early return. It will return after finishing adding all neighbors to
Worklist, so that no neighbors are discarded due to the original early
return.

llvm-svn: 259691
2016-02-03 20:58:55 +00:00
Balaram Makam
92431703d7 AArch64: Implement missed conditional compare sequences.
Summary:
This is an extension to the existing implementation of r242436 which
restricts to only select inputs. This version fixes missed opportunities
in pr26084 by attempting to lower conditional compare sequences of
and/or trees with setcc leafs. This will additionaly handle the case
when a tree with select input is not a conjunction-disjunction tree
but some of the sub trees are conjunction-disjunction trees.

Reviewers: jmolloy, t.p.northover, mcrosier, MatzeB

Subscribers: mcrosier, llvm-commits, junbuml, haicheng, mssimpso, gberry

Differential Revision: http://reviews.llvm.org/D16291

llvm-svn: 259387
2016-02-01 19:13:07 +00:00
Matthias Braun
b30f2f5141 Avoid overly large SmallPtrSet/SmallSet
These sets perform linear searching in small mode so it is never a good
idea to use SmallSize/N bigger than 32.

llvm-svn: 259283
2016-01-30 01:24:31 +00:00
Junmo Park
b3327b7007 [DAGCombiner] Don't add volatile or indexed stores to ChainedStores
Summary:
findBetterNeighborChains does not handle volatile or indexed stores.
However, it did not check when adding stores to ChainedStores.

Reviewers: arsenm

Differential Revision: http://reviews.llvm.org/D16463

llvm-svn: 259024
2016-01-28 06:23:33 +00:00
Simon Pilgrim
b9b8fcd831 Tidied up TRUNC combine code. NFC.
Make use of DAG.getBitcast and use clang-format to reduce number of lines (and make it more readable).

llvm-svn: 258644
2016-01-23 21:50:40 +00:00
Dan Gohman
0bf3ae84ca [SelectionDAG] Fold more offsets into GlobalAddresses
This reapplies r258296 and r258366, and also fixes an existing bug in
SelectionDAG.cpp's isMemSrcFromString, neglecting to account for the
offset in a GlobalAddressSDNode, which is uncovered by those patches.

llvm-svn: 258482
2016-01-22 03:57:34 +00:00
Reid Kleckner
b7ecfa5b09 Revert "[SelectionDAG] Fold more offsets into GlobalAddresses"
This reverts r258296 and the follow up r258366. With this change, we
miscompiled the following program on Windows:
  #include <string>
  #include <iostream>
  static const char kData[] = "asdf jkl;";
  int main() {
    std::string s(kData + 3, sizeof(kData) - 3);
    std::cout << s << '\n';
  }

llvm-svn: 258465
2016-01-22 01:09:29 +00:00
Dan Gohman
edf98c5682 [SelectionDAG] Fold more offsets into GlobalAddresses
SelectionDAG previously missed opportunities to fold constants into
GlobalAddresses in several areas. For example, given `(add (add GA, c1), y)`, it
would often reassociate to `(add (add GA, y), c1)`, missing the opportunity to
create `(add GA+c, y)`. This isn't often visible on targets such as X86 which
effectively reassociate adds in their complex address-mode folding logic,
however it is currently visible on WebAssembly since it currently has very
simple address mode folding code that doesn't reassociate anything.

This patch fixes this by making SelectionDAG fold offsets into GlobalAddresses
at the same times that it folds constants together, so that it doesn't miss any
opportunities to perform such folding.

Differential Revision: http://reviews.llvm.org/D16090

llvm-svn: 258296
2016-01-20 07:03:08 +00:00
Sanjay Patel
1dc7dfb9d9 [DAGCombiner] don't dereference an operand that doesn't exist (PR26070)
The bug was introduced with changes for x86-64 fp128:
http://reviews.llvm.org/rL254653

I don't know why an x86 change is here, so I'll follow up in:
http://reviews.llvm.org/D15134

Should fix:
https://llvm.org/bugs/show_bug.cgi?id=26070

llvm-svn: 257200
2016-01-08 19:53:24 +00:00
Tim Shen
9b68bd48ca Test commit access - add a blank line in comment.
llvm-svn: 257192
2016-01-08 19:20:23 +00:00
Dan Gohman
797f639e79 [SelectionDAGBuilder] Set NoUnsignedWrap for inbounds gep and load/store offsets.
In an inbounds getelementptr, when an index produces a constant non-negative
offset to add to the base, the add can be assumed to not have unsigned overflow.

This relies on the assumption that addresses can't occupy more than half the
address space, which isn't possible in C because it wouldn't be possible to
represent the difference between the start of the object and one-past-the-end
in a ptrdiff_t.

Setting the NoUnsignedWrap flag is theoretically useful in general, and is
specifically useful to the WebAssembly backend, since it permits stronger
constant offset folding.

Differential Revision: http://reviews.llvm.org/D15544

llvm-svn: 256890
2016-01-06 00:43:06 +00:00
Eric Christopher
325e8d06dc Fix (bitcast (fabs x)), (bitcast (fneg x)) and (bitcast (fcopysign cst,
x)) combines for ppc_fp128, since signbit computation is more
complicated.

Discussion thread:
http://lists.llvm.org/pipermail/llvm-dev/2015-November/092863.html

Patch by Tim Shen!

llvm-svn: 255305
2015-12-10 22:09:06 +00:00
Sanjay Patel
dc627ad63f fix return values to match bool return type; NFC
llvm-svn: 254968
2015-12-07 23:34:30 +00:00
Chih-Hung Hsieh
ed7d81e5d4 [X86] Part 1 to fix x86-64 fp128 calling convention.
Almost all these changes are conditioned and only apply to the new
x86-64 f128 type configuration, which will be enabled in a follow up
patch. They are required together to make new f128 work. If there is
any error, we should fix or revert them as a whole.
These changes should have no impact to current configurations.

* Relax type legalization checks to accept new f128 type configuration,
  whose TypeAction is TypeSoftenFloat, not TypeLegal, but also has
  TLI.isTypeLegal true.
* Relax GetSoftenedFloat to return in some cases f128 type SDValue,
  which is TLI.isTypeLegal but not "softened" to i128 node.
* Allow customized FABS, FNEG, FCOPYSIGN on new f128 type configuration,
  to generate optimized bitwise operators for libm functions.
* Enhance related Lower* functions to handle f128 type.
* Enhance DAGTypeLegalizer::run, SoftenFloatResult, and related functions
  to keep new f128 type in register, and convert f128 operators to library calls.
* Fix Combiner, Emitter, Legalizer routines that did not handle f128 type.
* Add ExpandConstant to handle i128 constants, ExpandNode
  to handle ISD::Constant node.
* Add one more parameter to getCommonSubClass and firstCommonClass,
  to guarantee that returned common sub class will contain the specified
  simple value type.
  This extra parameter is used by EmitCopyFromReg in InstrEmitter.cpp.
* Fix infinite loop in getTypeLegalizationCost when f128 is the value type.
* Fix printOperand to handle null operand.
* Enhance ISD::BITCAST node to handle f128 constant.
* Expand new f128 type for BR_CC, SELECT_CC, SELECT, SETCC nodes.
* Enhance X86AsmPrinter to emit f128 values in comments.

Differential Revision: http://reviews.llvm.org/D15134

llvm-svn: 254653
2015-12-03 22:02:40 +00:00
Craig Topper
6066164454 Use a lambda instead of std::bind and std::mem_fn I introduced in r254242. NFC
llvm-svn: 254260
2015-11-29 18:05:22 +00:00
Craig Topper
d0573179dc [SelectionDAG] Use std::any_of instead of a manually coded loop. NFC
llvm-svn: 254242
2015-11-29 04:37:11 +00:00
Artyom Skrobov
314ee04268 Expose isXxxConstant() functions from SelectionDAGNodes.h (NFC)
Summary:
Many target lowerings copy-paste the code to test SDValues for known constants.
This code can instead be shared in SelectionDAG.cpp, and reused in the targets.

Reviewers: MatzeB, andreadb, tstellarAMD

Subscribers: arsenm, jyknight, llvm-commits

Differential Revision: http://reviews.llvm.org/D14945

llvm-svn: 254085
2015-11-25 19:41:11 +00:00
Simon Pilgrim
1dfe53e180 Remove duplicate getValueType() calls. NFCI.
llvm-svn: 253823
2015-11-22 16:49:38 +00:00
Jonas Paulsson
8f0d2b7f1f [DAGCombiner] Bugfix for lost chain depenedency.
When MergeConsecutiveStores() combines two loads and two stores into
wider loads and stores, the chain users of both of the original loads
must be transfered to the new load, because it may be that a chain
user only depends on one of the loads.

New test case: test/CodeGen/SystemZ/dag-combine-01.ll

Reviewed by James Y Knight.

Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=25310#c6
llvm-svn: 253779
2015-11-21 13:25:07 +00:00
Hans Wennborg
dcc2500452 X86: More efficient legalization of wide integer compares
In particular, this makes the code for 64-bit compares on 32-bit targets
much more efficient.

Example:

  define i32 @test_slt(i64 %a, i64 %b) {
  entry:
    %cmp = icmp slt i64 %a, %b
    br i1 %cmp, label %bb1, label %bb2
  bb1:
    ret i32 1
  bb2:
    ret i32 2
  }

Before this patch:

  test_slt:
          movl    4(%esp), %eax
          movl    8(%esp), %ecx
          cmpl    12(%esp), %eax
          setae   %al
          cmpl    16(%esp), %ecx
          setge   %cl
          je      .LBB2_2
          movb    %cl, %al
  .LBB2_2:
          testb   %al, %al
          jne     .LBB2_4
          movl    $1, %eax
          retl
  .LBB2_4:
          movl    $2, %eax
          retl

After this patch:

  test_slt:
          movl    4(%esp), %eax
          movl    8(%esp), %ecx
          cmpl    12(%esp), %eax
          sbbl    16(%esp), %ecx
          jge     .LBB1_2
          movl    $1, %eax
          retl
  .LBB1_2:
          movl    $2, %eax
          retl

Differential Revision: http://reviews.llvm.org/D14496

llvm-svn: 253572
2015-11-19 16:35:08 +00:00
Geoff Berry
2ddfc5e60f [DAGCombiner] Improve zextload optimization.
Summary:
Don't fold
  (zext (and (load x), cst)) -> (and (zextload x), (zext cst))
if
  (and (load x) cst)
will match as a zextload already and has additional users.

For example, the following IR:

  %load = load i32, i32* %ptr, align 8
  %load16 = and i32 %load, 65535
  %load64 = zext i32 %load16 to i64
  store i32 %load16, i32* %dst1, align 4
  store i64 %load64, i64* %dst2, align 8

used to produce the following aarch64 code:

	ldr		w8, [x0]
	and	w9, w8, #0xffff
	and	x8, x8, #0xffff
	str		w9, [x1]
	str		x8, [x2]

but with this change produces the following aarch64 code:

	ldrh		w8, [x0]
	str		w8, [x1]
	str		x8, [x2]

Reviewers: resistor, mcrosier

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14340

llvm-svn: 252789
2015-11-11 19:42:52 +00:00
Matt Arsenault
d8fed1b793 Add target preference for GatherAllAliases max depth
llvm-svn: 252775
2015-11-11 18:44:33 +00:00
Sanjay Patel
533c10c651 add a SelectionDAG method to check if no common bits are set in two nodes; NFCI
This was suggested in:
http://reviews.llvm.org/D13956

and is a follow-on to:
http://reviews.llvm.org/rL252515
http://reviews.llvm.org/rL252519

This lets us remove logically equivalent/duplicated code from DAGCombiner and X86ISelDAGToDAG.

A corresponding function for IR instructions already exists in ValueTracking.

llvm-svn: 252539
2015-11-09 23:31:38 +00:00
Tom Stellard
05691a678e DAGCombiner: Check shouldReduceLoadWidth before combining (and (load), x) -> extload
Reviewers: resistor, arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13805

llvm-svn: 252349
2015-11-06 21:58:37 +00:00
James Y Knight
646c4032e7 Fix two issues in MergeConsecutiveStores:
1) PR25154. This is basically a repeat of PR18102, which was fixed in
r200201, and broken again by r234430. The latter changed which of the
store nodes was merged into from the first to the last. Thus, we now
also need to prefer merging a later store at a given address into the
target node, instead of an earlier one.

2) While investigating that, I also realized I'd introduced a bug in
r236850. There, I removed a check for alignment -- not realizing that
nothing except the alignment check was ensuring that none of the stores
were overlapping! This is a really bogus way to ensure there's no
aliased stores.

A better solution to both of these issues is likely to always use the
code added in the 'if (UseAA)' branches which rearrange the chain based
on a more principled analysis. I'll look into whether that can be used
always, but in the interest of getting things back to working, I think a
minimal change makes sense.

llvm-svn: 251816
2015-11-02 18:48:08 +00:00
Sanjay Patel
bbd4c79c8f Use the 'arcp' fast-math-flag when combining repeated FP divisors
This is a usage of the IR-level fast-math-flags now that they are propagated to SDNodes. 
This was originally part of D8900.

Removing the global 'enable-unsafe-fp-math' checks will require auto-upgrade and 
possibly other changes.

Differential Revision: http://reviews.llvm.org/D9708

llvm-svn: 251450
2015-10-27 20:27:25 +00:00
Steve King
fee370be72 Fix llc crash processing S/UREM for -Oz builds caused by rL250825.
When taking the remainder of a value divided by a constant, visitREM()
attempts to convert the REM to a longer but faster sequence of instructions.
This conversion calls combine() on a speculative DIV instruction. Commit
rL250825 may cause this combine() to return a DIVREM, corrupting nearby nodes.
Flow eventually hits unreachable().

This patch adds a test case and a check to prevent visitREM() from trying
to convert the REM instruction in cases where a DIVREM is possible.
See http://reviews.llvm.org/D14035

llvm-svn: 251373
2015-10-27 00:14:06 +00:00
Simon Pilgrim
7430804fe1 [DAGCombiner] Generalize masking of constant rotates.
We don't need a mask of a rotation result to be a constant splat - any constant scalar/vector can be usefully folded.

Followup to D13851.

llvm-svn: 251197
2015-10-24 18:44:52 +00:00
Simon Pilgrim
d5ef318b5b [X86][XOP] Add support for lowering vector rotations
This patch adds support for lowering to the XOP VPROT / VPROTI vector bit rotation instructions.

This has required changes to the DAGCombiner rotation pattern matching to support vector types - so far I've only changed it to support splat vectors, but generalising this further is feasible in the future.

Differential Revision: http://reviews.llvm.org/D13851

llvm-svn: 251188
2015-10-24 13:17:26 +00:00
Zia Ansari
8f509a7044 [X86] - Catch extra combine opportunities for redundant imuls.
When we fold "mul ((add x, c1), c1)" -> "add ((mul x, c2), c1*c2)", we bail if (add x, c1) has multiple
users which would result in an extra add instruction.
In such cases, this patch adds a check to see if we can eliminate a multiply instruction in exchange for the extra add.

I also added the capability of doing the existing optimization with non-splatted vectors (splatted also works).

Differential Revision: http://reviews.llvm.org/D13740

llvm-svn: 251028
2015-10-22 16:14:45 +00:00
Artyom Skrobov
b844fa7fc0 Combining DIV+REM->DIVREM doesn't belong in LegalizeDAG; move it over into DAGCombiner.
Summary:
In addition to moving the code over, this patch amends the DIV,REM -> DIVREM
combining to run on all affected nodes at once: if the nodes are converted
to DIVREM one at a time, then the resulting DIVREM may get legalized by the
backend into something target-specific that we won't be able to recognize
and correlate with the remaining nodes.

The motivation is to "prepare terrain" for D13862: when we set DIV and REM
to be legalized to libcalls, instead of the DIVREM, we otherwise lose the
ability to combine them together. To prevent this, we need to take the
DIV,REM -> DIVREM combining out of the lowering stage.

Reviewers: RKSimon, eli.friedman, rengolin

Subscribers: john.brawn, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D13733

llvm-svn: 250825
2015-10-20 13:06:02 +00:00
Artyom Skrobov
4bca0bb010 A doccomment for CombineTo, and some NFC refactorings
Summary:
Caching SDLoc(N), instead of recreating it in every single
function call, keeps the code denser, and allows to unwrap long lines.

Reviewers: sunfish, atrick, sdmitrouk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13726

llvm-svn: 250305
2015-10-14 17:18:35 +00:00
Artyom Skrobov
a5b9ad22b3 Merge DAGCombiner::visitSREM and DAGCombiner::visitUREM (NFC)
Summary: The two implementations had more code in common than not.

Reviewers: sunfish, MatzeB, sdmitrouk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13724

llvm-svn: 250302
2015-10-14 16:54:14 +00:00
Matt Arsenault
e5d9515fb7 DAGCombiner: Don't stop finding better chain on 2 aliases
The comment says this was stopped because it was unlikely to be
profitable. This is not true if you want to combine vector loads
with multiple components.

For a simple case that looks like

t0 = load t0 ...
t1 = load t0 ...
t2 = load t0 ...
t3 = load t0 ...

t4 = store t0:1, t0:1
  t5 = store t4, t1:0
    t6 = store t5, t2:0
	  t7 = store t6, t3:0

We want to get all of these stores onto a chain
that is a TokenFactor of these N loads. This mostly
solves the AMDGPU merge-stores.ll regressions
with -combiner-alias-analysis for merging vector
stores of vector loads.

llvm-svn: 250138
2015-10-13 00:49:00 +00:00
Matt Arsenault
61dc235f20 DAGCombiner: Combine extract_vector_elt from build_vector
This basic combine was surprisingly missing.
AMDGPU legalizes many operations in terms of 32-bit vector components,
so not doing this results in many extra copies and subregister extracts
that need to be cleaned up later.

InstCombine already does this for the hasOneUse case. The target hook
is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn
from a vector materialize repeated immediate instruction to a constant
vector load with more scalar copies from it.

llvm-svn: 250129
2015-10-12 23:59:50 +00:00
Simon Pilgrim
c8832fc233 [SelectionDAG] Add common vector constant folding helper function
We have a number of functions that implement constant folding of vectors (unary and binary ops) in near identical manners (and the differences don't appear to be critical).

This patch introduces a common implementation (SelectionDAG::FoldConstantVectorArithmetic) and calls this in both the unary and binary op cases.

After this initial patch I intend to begin enabling vector constant folding for a wider number of opcodes in SelectionDAG::getNode().

Differential Revision: http://reviews.llvm.org/D13665

llvm-svn: 250118
2015-10-12 23:00:11 +00:00
Simon Pilgrim
d45c88bbb5 [DAGCombiner] Improved FMA combine support for vectors
Enabled constant canonicalization for all constants.

Improved combining of constant vectors.

llvm-svn: 249993
2015-10-11 19:48:12 +00:00
Simon Pilgrim
5eac2607b9 [DAGCombiner] Tidyup FMINNUM/FMAXNUM constant folding
Enable constant folding for vector splats as well as scalars.

Enable constant canonicalization for all scalar and vector constants.

llvm-svn: 249978
2015-10-11 16:02:28 +00:00
Simon Pilgrim
dde63374c5 [DAGCombiner] Generalize FADD constant combines to work with vectors
Updated the FADD combines to work with vectors as well as scalars.

Differential Revision: http://reviews.llvm.org/D13416

llvm-svn: 249251
2015-10-03 22:06:06 +00:00
Simon Pilgrim
a38d76a087 [DAGCombiner] Merge SIGN_EXTEND_INREG vector constant folding methods. NCI.
visitSIGN_EXTEND_INREG calls SelectionDAG::getNode to constant fold scalar constants but handles vector constants itself, despite getNode being capable of dealing with them.

This required a minor change to the getNode implementation to actually deal with cases where the scalars of a BUILD_VECTOR were wider integers than the vector type - which was the only extra ability of the visitSIGN_EXTEND_INREG implementation.

No codegen intended and all existing tests remain the same.

llvm-svn: 249236
2015-10-03 16:26:52 +00:00
Hal Finkel
bd582581b8 [DAGCombine] Fix getStoreMergeAndAliasCandidates's AA-enabled chain walking
When AA is being used, non-aliasing stores are canonicalized to use the same
chain, and DAGCombiner::getStoreMergeAndAliasCandidates can take advantage of
this by looking only as users of a store's chain operand. However, user
iteration is not result-number specific, we need to check that the use is as a
chain operand, and not via some other operand. It is certainly possible to have
another potentially-aliasing store, which shares the first's base pointer, and
uses the first's chain's node via some other operand.

Failure to catch this situation caused, at least in the included test case, an
assert later because the relative sequence-number ordering caused later
replacement to create a cycle in the DAG.

llvm-svn: 248698
2015-09-28 08:02:14 +00:00
Matt Arsenault
3c07e963b8 DAGCombiner: Check if store is volatile first
This is the simpler check. NFC.

llvm-svn: 248625
2015-09-25 22:06:19 +00:00
Sanjay Patel
bbbf9a1a34 merge vector stores into wider vector stores and fix AArch64 misaligned access TLI hook (PR21711)
This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ).

The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner 
to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling
the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned
accesses up in performSTORECombine() because they are slow.

This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving
existing (perhaps questionable) lowering behavior.

The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned
stores.

Differential Revision: http://reviews.llvm.org/D12635
 

llvm-svn: 248622
2015-09-25 21:49:48 +00:00
Matt Arsenault
c721df0478 Use new TokenFactor chain when merging stores
If the stores are storing values from loads which partially
alias the stores, we could end up placing the merged loads
and stores on the same chain which has the potential to break.
Each store may have a different chain dependency on only some
of the original loads. Create a new TokenFactor to capture all
of the required dependencies of the stores rather than assuming
all stores can use the same chain.

The testcase is a situation where this happens, although
it does not have an observable change from this. The DAG nodes
just happened to not be reordered before despite this missing
chain dependency.

This is based on an off-list report for an out of tree target
which regressed due to r246307 and I haven't managed to find a case
where the nodes do end up reordered with an in tree target.

llvm-svn: 248468
2015-09-24 07:22:38 +00:00
Simon Pilgrim
4003ed2da3 [DAGCombiner] Improve FMA support for interpolation patterns
This patch adds support for combining patterns such as (FMUL(FADD(1.0, x), y)) and (FMUL(FSUB(x, 1.0), y)) to their FMA equivalents.

This is useful in particular for linear interpolation cases such as (FADD(FMUL(x, t), FMUL(y, FSUB(1.0, t))))

Differential Revision: http://reviews.llvm.org/D13003

llvm-svn: 248210
2015-09-21 20:32:48 +00:00
Simon Pilgrim
e8e5a17a12 [DAGCombiner] Tidy up FMA combine helpers. NFCI.
Based on feedback for D13003.

llvm-svn: 248206
2015-09-21 20:15:03 +00:00
Matt Arsenault
8fb9b94f7f Fix accidentally committed debug printing
llvm-svn: 248190
2015-09-21 18:21:10 +00:00
Matt Arsenault
b774834429 DAGCombiner: Replace store of FP constant after attemping store merges
If storing multiple FP constants, some subset of the stores
would be replaced with integers due to visit order, so
MergeConsecutiveStores would only partially merge
these.

llvm-svn: 248169
2015-09-21 15:59:46 +00:00
Matt Arsenault
a30ddb6524 Factor replacement of stores of FP constants into new function
llvm-svn: 248168
2015-09-21 15:59:43 +00:00
Craig Topper
0013be16ff Use makeArrayRef or None to avoid unnecessarily mentioning the ArrayRef type extra times. NFC
llvm-svn: 248140
2015-09-21 05:32:41 +00:00
Sanjay Patel
a260701bbb propagate fast-math-flags on DAG nodes
After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing, 
so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests: 
if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is
one test case in this patch to prove that point.

This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I 
did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF
( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes.

This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as
FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the
current global settings.

Differential Revision: http://reviews.llvm.org/D12095

llvm-svn: 247815
2015-09-16 16:31:21 +00:00
Silviu Baranga
df9ce8408a [DAGCombine] Truncate BUILD_VECTOR operators if necessary when constant folding vectors
Summary:
The BUILD_VECTOR node will truncate its operators to match the
type. We need to take this into account when constant folding -
we need to perform a truncation before constant folding the elements.
This is because the upper bits can change the result, depending on
the operation type (for example this is the case for min/max).

This change also adds a regression test.

Reviewers: jmolloy

Subscribers: jmolloy, llvm-commits

Differential Revision: http://reviews.llvm.org/D12697

llvm-svn: 247265
2015-09-10 10:34:34 +00:00
Sanjay Patel
ce74db9d8d check for fastness before merging in DAGCombiner::MergeConsecutiveStores()
Use and check the 'IsFast' optional parameter to TLI.allowsMemoryAccess() any time
we have a merged access candidate. Without this patch, we were generating unaligned 
16-byte (SSE) memops for x86 targets where those accesses are slow.

This change was mentioned in:
http://reviews.llvm.org/D10662 and
http://reviews.llvm.org/D10905

and will help solve PR21711.

Differential Revision: http://reviews.llvm.org/D12573

llvm-svn: 246771
2015-09-03 15:03:19 +00:00
Hal Finkel
1baec5323b [DAGCombine] Fixup SETCC legality checking
SETCC is one of those special node types for which operation actions (legality,
etc.) is keyed off of an operand type, not the node's value type. This makes
sense because the value type of a legal SETCC node is determined by its
operands' value type (via the TLI function getSetCCResultType). When the
SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value
type, or directly with the value type provided by TLI.getSetCCResultType.

The first problem being fixed here is that DAGCombine had several places
querying TLI.isOperationLegal on SETCC, but providing the return of
getSetCCResultType, instead of the operand type directly. This does not mean
what the author thought, and "luckily", most in-tree targets have SETCC with
Custom lowering, instead of marking them Legal, so these checks return false
anyway.

The second problem being fixed here is that two of the DAGCombines could create
SETCC nodes with arbitrary (integer) value types; specifically, those that
would simplify:

  (setcc a, b, op1) and|or (setcc a, b, op2) -> setcc a, b, op3
     (which is possible for some combinations of (op1, op2))

If the operands of the and|or node are actual setcc nodes, then this is not an
issue (because the and|or must share the same type), but, the relevant code in
DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls
DAGCombiner::isSetCCEquivalent on each operand, and that function will
recognise setcc-like select_cc nodes with other return types. And, thus, when
creating new SETCC nodes, we need to be careful to respect the value-type
constraint. This is even true before type legalization, because it is quite
possible for the SELECT_CC node to have a legal type that does not happen to
match the corresponding TLI.getSetCCResultType type.

To be explicit, there is nothing that later fixes the value types of SETCC
nodes (if the type is legal, but does not happen to match
TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to
work only because, either MVT::i1 is not legal, or it is what
TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change,
however. For the time being, restrict the relevant transformations to produce
only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1
prior to type legalization).

Fixes PR24636.

llvm-svn: 246507
2015-08-31 23:15:04 +00:00
Sanjay Patel
719b3e6a3e don't set a legal vector type if we know we can't use that type (NFCI)
Added benefit: the 'if' logic now matches the text of the comment that describes it.

llvm-svn: 246506
2015-08-31 22:59:03 +00:00
Sanjay Patel
218cbd5a48 generalize helper function of MergeConsecutiveStores to handle vector types (NFCI)
This was part of D7208 (r227242), but that commit was reverted because it exposed
a bug in AArch64 lowering. I should have that fixed and the rest of the commit
reinstated soon.

llvm-svn: 246493
2015-08-31 21:50:16 +00:00
Hal Finkel
2483f2060a [DAGCombine] Use getSetCCResultType utility function
DAGCombine has a utility wrapper around TLI's getSetCCResultType; use it in the
one place in DAGCombine still directly calling the TLI function. NFC.

llvm-svn: 246482
2015-08-31 20:42:38 +00:00
Hal Finkel
a894266d28 [DAGCombine] Remove some old dead code for forming SETCC nodes
This code was dead when it was committed in r23665 (Oct 7, 2005), and before it
reaches its 10th anniversary, it really should go. We can always bring it back
if we'd like, but it forms more SETCC nodes, and the way we do legality
checking on SETCC nodes is wrong in a number of places, and removing this means
fewer places to fix. NFC.

llvm-svn: 246466
2015-08-31 18:38:55 +00:00
Matt Arsenault
d9c830154f Make MergeConsecutiveStores look at other stores on same chain
When combiner AA is enabled, look at stores on the same chain.
Non-aliasing stores are moved to the same chain so the existing
code fails because it expects to find an adajcent store on a consecutive
chain.

Because of how DAGCombiner tries these store combines,
MergeConsecutiveStores doesn't see the correct set of stores on the chain
when it visits the other stores. Each store individually has its chain
fixed before trying to merge consecutive stores, and then tries to merge
stores from that point before the other stores have been processed to
have their chains fixed. To fix this, attempt to use FindBetterChain
on any possibly neighboring stores in visitSTORE.

Suppose you have 4 32-bit stores that should be merged into 1 vector
store. One store would be visited first, fixing the chain. What happens is
because not all of the store chains have yet been fixed, 2 of the stores
are merged. The other 2 stores later have their chains fixed,
but because the other stores were already merged, they have different
memory types and merging the two different sized stores is not
supported and would be more difficult to handle.

llvm-svn: 246307
2015-08-28 17:31:28 +00:00
Ahmed Bougacha
87166905c8 [CodeGen] Check FoldConstantArithmetic result before using it.
Fixes PR24602: r245689 introduced an unguarded use of
SelectionDAG::FoldConstantArithmetic, which returns 0 when it fails
because of opaque (hoisted) constants.

llvm-svn: 246217
2015-08-27 21:46:04 +00:00
Steve King
5cdbd20cc3 Pass function attributes instead of boolean in isIntDivCheap().
llvm-svn: 245921
2015-08-25 02:31:21 +00:00
Oliver Stannard
284f2bffc9 Add DAG optimisation for FP16_TO_FP
The FP16_TO_FP node only uses the bottom 16 bits of its input, so the
following pattern can be optimised by removing the AND:

  (FP16_TO_FP (AND op, 0xffff)) -> (FP16_TO_FP op)

This is a common pattern for ARM targets when functions have __fp16
arguments, as they are passed as floats (so that they get passed in the
correct registers), but then bitcast and truncated to ignore the top 16
bits.

llvm-svn: 245832
2015-08-24 09:47:45 +00:00
Simon Pilgrim
2a7049abe0 [DAGCombiner] Fold CONCAT_VECTORS of bitcasted EXTRACT_SUBVECTOR
Minor generalization of D12125 - peek through any bitcast to the original vector that we're extracting from.

llvm-svn: 245814
2015-08-23 15:22:14 +00:00
Mehdi Amini
5aa7bd7d62 Do not use dyn_cast<> after isa<>
Reported by coverity.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 245799
2015-08-23 00:27:57 +00:00
John Brawn
eab960c46f [DAGCombiner] Fold together mul and shl when both are by a constant
This is intended to improve code generation for GEPs, as the index value is
shifted by the element size and in GEPs of multi-dimensional arrays the index
of higher dimensions is multiplied by the lower dimension size.

Differential Revision: http://reviews.llvm.org/D12197

llvm-svn: 245689
2015-08-21 10:48:17 +00:00
Simon Pilgrim
35f528262f [DAGCombiner] Added SMAX/SMIN/UMAX/UMIN constant folding
We still need to add constant folding of vector comparisons to fold the tests for targets that don't support the respective min/max nodes

I needed to update 2011-12-06-AVXVectorExtractCombine to load a vector instead of using a constant vector to prevent it folding

Differential Revision: http://reviews.llvm.org/D12118

llvm-svn: 245503
2015-08-19 21:11:58 +00:00
Simon Pilgrim
989cbbd2f5 [DAGCombiner] Fold CONCAT_VECTORS of EXTRACT_SUBVECTOR (or undef) to VECTOR_SHUFFLE.
Check to see if this is a CONCAT_VECTORS of a bunch of EXTRACT_SUBVECTOR operations. If so, and if the EXTRACT_SUBVECTOR vector inputs come from at most two distinct vectors the same size as the result, attempt to turn this into a legal shuffle.

Differential Revision: http://reviews.llvm.org/D12125

llvm-svn: 245490
2015-08-19 20:09:50 +00:00
Michael Kuperstein
dcdab4cd3a [TLI] Refactor "is integer division cheap" queries.
This removes the isPow2SDivCheap() query, as it is not currently used in
any meaningful way. isIntDivCheap() no longer relies on a state variable
(as all in-tree target set it to false), but the interface allows querying
based on the type optimization level.

NFC.

Differential Revision: http://reviews.llvm.org/D12082

llvm-svn: 245430
2015-08-19 11:17:59 +00:00
Steve King
d4c8f70ce1 Fix backward operands in call to isTruncateFree() and improve comments.
llvm-svn: 245385
2015-08-18 23:02:41 +00:00
Matthias Braun
fa3b248a66 DAGCombiner: Improve DAGCombiner select normalization
The current code normalizes select(C0, x, select(C1, x, y)) towards
select(C0|C1, x, y) if the targets prefers that form. This patch adds an
additional rule that if the select(C1, x, y) part already exists in the
function then we want to normalize into the other direction because the
effects of reusing the existing value are bigger than transforming into
the target preferred form.

This addresses regressions following r238793, see also:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150727/290272.html

Differential Revision: http://reviews.llvm.org/D11616

llvm-svn: 245350
2015-08-18 20:48:36 +00:00
Matthias Braun
2e920bd04f DAGCombiner: Optimize SELECTs first before turning them into SELECT_CC
This is part of http://reviews.llvm.org/D11616 - I just decided to split
this up into a separate commit.

llvm-svn: 245349
2015-08-18 20:48:29 +00:00
Sanjay Patel
3ab4a73bac use SDValue bool operator; NFCI
llvm-svn: 245181
2015-08-16 17:54:28 +00:00
Simon Pilgrim
0750c84623 [DAGCombiner] Attempt to mask vectors before zero extension instead of after.
For cases where we TRUNCATE and then ZERO_EXTEND to a larger size (often from vector legalization), see if we can mask the source data and then ZERO_EXTEND (instead of after a ANY_EXTEND). This can help avoid having to generate a larger mask, and possibly applying it to several sub-vectors.

(zext (truncate x)) -> (zext (and(x, m))

Includes a minor patch to SystemZ to better recognise 8/16-bit zero extension patterns from RISBG bit-extraction code.

This is the first of a number of minor patches to help improve the conversion of byte masks to clear mask shuffles.

Differential Revision: http://reviews.llvm.org/D11764

llvm-svn: 245160
2015-08-15 13:27:30 +00:00
Alex Lorenz
e40c8a2b26 PseudoSourceValue: Replace global manager with a manager in a machine function.
This commit removes the global manager variable which is responsible for
storing and allocating pseudo source values and instead it introduces a new
manager class named 'PseudoSourceValueManager'. Machine functions now own an
instance of the pseudo source value manager class.

This commit also modifies the 'get...' methods in the 'MachinePointerInfo'
class to construct pseudo source values using the instance of the pseudo
source value manager object from the machine function.

This commit updates calls to the 'get...' methods from the 'MachinePointerInfo'
class in a lot of different files because those calls now need to pass in a
reference to a machine function to those methods.

This change will make it easier to serialize pseudo source values as it will
enable me to transform the mips specific MipsCallEntry PseudoSourceValue
subclass into two target independent subclasses.

Reviewers: Akira Hatanaka
llvm-svn: 244693
2015-08-11 23:09:45 +00:00
Jingyue Wu
99eb4685ef SelectionDAG: Prefer to combine multiplication with less uses for fma
Summary:
For example:

  s6 = s0*s5;
  s2 = s6*s6 + s6;
  ...
  s4 = s6*s3;

We notice that it is possible for s2 is folded to fma (s0, s5, fmul (s6 s6)).
This only happens when Aggressive is true, otherwise hasOneUse() check
already prevents from folding the multiplication with more uses.

Test Plan: test/CodeGen/NVPTX/fma-assoc.ll

Patch by Xuetian Weng

Reviewers: hfinkel, apazos, jingyue, ohsallen, arsenm

Subscribers: arsenm, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D11855

llvm-svn: 244649
2015-08-11 19:21:46 +00:00
Sanjay Patel
924879ad2c wrap OptSize and MinSize attributes for easier and consistent access (NFCI)
Create wrapper methods in the Function class for the OptimizeForSize and MinSize
attributes. We want to hide the logic of "or'ing" them together when optimizing
just for size (-Os).

Currently, we are not consistent about this and rely on a front-end to always set
OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here
that should be added as follow-on patches with regression tests.

This patch is NFC-intended: it just replaces existing direct accesses of the attributes
by the equivalent wrapper call.

Differential Revision: http://reviews.llvm.org/D11734

llvm-svn: 243994
2015-08-04 15:49:57 +00:00
Simon Pilgrim
f328fd4441 Remove trailing whitespace. NFCI.
llvm-svn: 243838
2015-08-01 17:06:47 +00:00
Simon Pilgrim
b447dc5aaa Use SDValue bool check. NFCI.
llvm-svn: 243837
2015-08-01 17:05:50 +00:00
Simon Pilgrim
503a2594c3 [DAGCombiner] Convert constant AND masks to shuffle clear masks down to the byte level
The XformToShuffleWithZero method currently checks AND masks at the per-lane level for all-one and all-zero constants and attempts to convert them to legal shuffle clear masks.

This patch generalises XformToShuffleWithZero, splitting and checking the sub-lanes of the constants down to the byte level to see if any legal shuffle clear masks are possible. This allows a lot of masks (often from legalization or truncation) to be folded into existing shuffle patterns and removes a lot of constant mask loading.

There are a few examples of poor shuffle lowering that are exposed by this patch that will be cleaned up in future patches (e.g. merging shuffles that are separated by bitcasts, x86 legalized v8i8 zero extension uses PMOVZX+AND+AND instead of AND+PMOVZX, etc.)

Differential Revision: http://reviews.llvm.org/D11518

llvm-svn: 243831
2015-08-01 10:01:46 +00:00
Sanjay Patel
0f9dcf8b90 move DAGCombiner's allowableAlignment() helper function into the TLI
Making allowableAlignment() more accessible was suggested as a predecessor patch
for D10662, so I've pulled it into TargetLowering. This let's us remove 4 instances
of duplicate logic in LegalizeDAG.

There's a subtle functional change in the implementation: the existing 
allowableAlignment() code was using getPrefTypeAlignment() when checking 
alignment with the DataLayout and assumed that was fast. In this implementation,
we use getABITypeAlignment() and assume that is fast. See the TODO comment or the
discussion in the Phab review for future improvements in this implementation
(don't use the data layout at all).

There are no regression test changes from this difference, and I'm not sure how to
expose it via a test. I think we actually do want to provide the 'Fast' param when
checking this from DAGCombiner::MergeConsecutiveStores(). Ie, we shouldn't merge 
stores if the new stores are not going to be fast. But that change will require 
fixing allowsMisalignedMemoryAccess() overrides as noted in D10662.

Differential Revision: http://reviews.llvm.org/D10905

llvm-svn: 243549
2015-07-29 18:24:18 +00:00
Sanjay Patel
133e68b45c ignore duplicate divisor uses when transforming into reciprocal multiplies (PR24141)
PR24141: https://llvm.org/bugs/show_bug.cgi?id=24141
contains a test case where we have duplicate entries in a node's uses() list.

After r241826, we use CombineTo() to delete dead nodes when combining the uses into
reciprocal multiplies, but this fails if we encounter the just-deleted node again in
the list.

The solution in this patch is to not add duplicate entries to the list of users that
we will subsequently iterate over. For the test case, this avoids triggering the
combine divisors logic entirely because there really is only one user of the divisor.

Differential Revision: http://reviews.llvm.org/D11345

llvm-svn: 243500
2015-07-28 23:28:22 +00:00
Sanjay Patel
1dd15598cf fix TLI's combineRepeatedFPDivisors interface to return the minimum user threshold
This fix was suggested as part of D11345 and is part of fixing PR24141.

With this change, we can avoid walking the uses of a divisor node if the target
doesn't want the combineRepeatedFPDivisors transform in the first place.

There is no NFC-intended other than that.

Differential Revision: http://reviews.llvm.org/D11531

llvm-svn: 243498
2015-07-28 23:05:48 +00:00
Sanjay Patel
c1c2b87001 move combineRepeatedFPDivisors logic into a helper function; NFCI
llvm-svn: 243293
2015-07-27 17:58:49 +00:00
Simon Pilgrim
4ef0576c40 [DAGCombiner] Fixed minor typo that was missed in D9097.
We don't bitcast the UNDEFs - that is done in visitVECTOR_SHUFFLE, and the getValueType should come from the operand's SDValue not the SDNode.

llvm-svn: 242640
2015-07-19 11:31:40 +00:00
Simon Pilgrim
3aca32ea4a Use SDValue bool check. NFCI.
llvm-svn: 242636
2015-07-19 09:56:36 +00:00
Matt Arsenault
cabe02e141 Only do fmul (fadd x, x), c combine if the fadd only has one use
This was increasing the instruction count if the fadd has multiple uses.

llvm-svn: 242498
2015-07-17 01:14:35 +00:00
Pete Cooper
7e64ef06e6 Use more foreach loops in SelectionDAG. NFC
llvm-svn: 242249
2015-07-14 23:43:29 +00:00
Matt Arsenault
f54dc2384d DAGCombiner: Assume invariant load cannot alias a store
The motivation is to allow GatherAllAliases / FindBetterChain
to not give up on dependent loads of a pointer from constant memory.

This is important for AMDGPU, because most loads are pointers
derived from a load of a kernel argument from constant memory.

llvm-svn: 241948
2015-07-10 22:17:40 +00:00
Sanjay Patel
e2361d4a18 fix an invisible bug when combining repeated FP divisors
This patch fixes bugs that were exposed by the addition of fast-math-flags in the DAG:
r237046 ( http://reviews.llvm.org/rL237046 ):

1. When replacing a division node, it's not enough to RAUW.
   We should call CombineTo() to delete dead nodes and combine again.
2. Because we are changing the DAG, we can't return an empty SDValue
   after the transform. As the code comments say:

    Visitation implementation - Implement dag node combining for different node types.
    The semantics are as follows: Return Value:
      SDValue.getNode() == 0 - No change was made
      SDValue.getNode() == N - N was replaced, is dead and has been handled.
      otherwise - N should be replaced by the returned Operand.

The new test case shows no difference with or without this patch, but it will crash if
we re-apply r237046 or enable FMF via the current -enable-fmf-dag cl::opt.

Differential Revision: http://reviews.llvm.org/D9893

llvm-svn: 241826
2015-07-09 17:28:37 +00:00
Mehdi Amini
eaabc51e78 Re-instate the EVT parameter to getScalarShiftAmountTy() for OOT user
A documentation for this function would be nice by the way.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241807
2015-07-09 15:12:23 +00:00
Mehdi Amini
0cdec1e2ab Make isLegalAddressingMode() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11040

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241778
2015-07-09 02:09:40 +00:00
Mehdi Amini
9639d650bb Make TargetLowering::getShiftAmountTy() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11037

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241776
2015-07-09 02:09:20 +00:00
Mehdi Amini
44ede33a69 Make TargetLowering::getPointerTy() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits

Differential Revision: http://reviews.llvm.org/D11028

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241775
2015-07-09 02:09:04 +00:00
Sanjay Patel
c1afa95a51 early exits -> less indenting; NFCI
llvm-svn: 241716
2015-07-08 19:32:39 +00:00
Mehdi Amini
ffc1402fad Remove IsLittleEndian from TargetLowering and redirect to DataLayout
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11017

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241655
2015-07-08 01:00:38 +00:00
Mehdi Amini
8ac7a9d57a Redirect DataLayout from TargetMachine to Module in SelectionDAG
Summary:
SelectionDAG itself is not invoking directly the DataLayout in the
TargetMachine, but the "TargetLowering" class is still using it. I'll
address it in a following commit.

This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11000

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241618
2015-07-07 19:07:19 +00:00
Pawel Bylica
c52eabb285 Reapply r240291: Fix shl folding in DAG combiner.
The code responsible for shl folding in the DAGCombiner was assuming incorrectly that all constants are less than 64 bits. This patch simply changes the way values are compared.

It has been reverted previously because of some problems with comparing APInt with raw uint64_t. That has been fixed/changed with r241204.

llvm-svn: 241254
2015-07-02 11:44:54 +00:00
Pawel Bylica
143ceb6d46 [DAGCombiner] Fix & simplify constant folding of sext/zext.
Summary: This patch fixes the cases of sext/zext constant folding in DAG combiner where constans do not fit 64 bits. The fix simply removes un$

Test Plan: New regression test included.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: RKSimon, llvm-commits

Differential Revision: http://reviews.llvm.org/D10607

llvm-svn: 240991
2015-06-29 20:28:47 +00:00
Benjamin Kramer
5b455f0b62 [SDAG] Now that we have a way to communicate the exact bit on sdiv use it to simplify sdiv by a constant.
We had a hack in SDAGBuilder in place to work around this but now we
can avoid that. Call BuildExactSDIV from BuildSDIV so DAGCombiner can
perform this trick automatically.

The added check in DAGCombiner is necessary to prevent exact sdiv by pow2
from regressing as the target-specific pow2 lowering is not aware of
exact bits yet.

This is mostly covered by existing tests. One side effect is that we
get the better lowering for exact vector sdivs now too :)

llvm-svn: 240891
2015-06-27 20:33:26 +00:00
Pete Cooper
8c0a710995 Convert a bunch of loops to foreach. NFC.
This uses the new SDNode::op_values() iterator range committed in r240805.

llvm-svn: 240809
2015-06-26 18:41:54 +00:00
Benjamin Kramer
07e70b4fa4 [DAGCombine] fold (X >>?,exact C1) << C2 --> X << (C2-C1)
Instcombine also does this but many opportunities only become visible
after GEPs are lowered.

llvm-svn: 240787
2015-06-26 14:51:36 +00:00
Matt Arsenault
f735cab986 DAGCombiner: Use pop_back_val()
llvm-svn: 240709
2015-06-25 22:15:05 +00:00
Matt Arsenault
c244dcb804 DAGCombiner: Remove redundant check
MemIntrinsicSDNode is already a subclass of MemSDNode,
so the MemSDNode check is sufficient.

llvm-svn: 240672
2015-06-25 18:47:02 +00:00
Alexander Kornienko
f00654e31b Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)
Apparently, the style needs to be agreed upon first.

llvm-svn: 240390
2015-06-23 09:49:53 +00:00
Pawel Bylica
e6fd8c4232 Revert r240291: causes problems in self-hosted builds.
llvm-svn: 240343
2015-06-22 21:54:07 +00:00
Pawel Bylica
06407c0320 Fix shl folding in DAG combiner.
Summary: The code responsible for shl folding in the DAGCombiner was assuming incorrectly that all constants are less than 64 bits. This patch simply changes the way values are compared.

Test Plan: A regression test included.

Reviewers: andreadb

Reviewed By: andreadb

Subscribers: andreadb, test, llvm-commits

Differential Revision: http://reviews.llvm.org/D10602

llvm-svn: 240291
2015-06-22 15:58:11 +00:00
Chandler Carruth
c3f49eb451 [PM/AA] Hoist the AliasResult enum out of the AliasAnalysis class.
This will allow classes to implement the AA interface without deriving
from the class or referencing an internal enum of some other class as
their return types.

Also, to a pretty fundamental extent, concepts such as 'NoAlias',
'MayAlias', and 'MustAlias' are first class concepts in LLVM and we
aren't saving anything by scoping them heavily.

My mild preference would have been to use a scoped enum, but that
feature is essentially completely broken AFAICT. I'm extremely
disappointed. For example, we cannot through any reasonable[1] means
construct an enum class (or analog) which has scoped names but converts
to a boolean in order to test for the possibility of aliasing.

[1]: Richard Smith came up with a "solution", but it requires class
templates, and lots of boilerplate setting up the enumeration multiple
times. Something like Boost.PP could potentially bundle this up, but
even that would be quite painful and it doesn't seem realistically worth
it. The enum class solution would probably work without the need for
a bool conversion.

Differential Revision: http://reviews.llvm.org/D10495

llvm-svn: 240255
2015-06-22 02:16:51 +00:00
Alexander Kornienko
70bc5f1398 Fixed/added namespace ending comments using clang-tidy. NFC
The patch is generated using this command:

tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
  -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
  llvm/lib/


Thanks to Eugene Kosov for the original patch!

llvm-svn: 240137
2015-06-19 15:57:42 +00:00
Chandler Carruth
ac80dc7532 [PM/AA] Remove the Location typedef from the AliasAnalysis class now
that it is its own entity in the form of MemoryLocation, and update all
the callers.

This is an entirely mechanical change. References to "Location" within
AA subclases become "MemoryLocation", and elsewhere
"AliasAnalysis::Location" becomes "MemoryLocation". Hope that helps
out-of-tree folks update.

llvm-svn: 239885
2015-06-17 07:18:54 +00:00
Sanjay Patel
0fcc53f6d6 rename variables; NFC
...because I see 'StoreBW' and read it as 'store bandwidth'

llvm-svn: 239850
2015-06-16 20:47:19 +00:00
Sanjay Patel
bb385ed454 extract some code into a helper function for MergeConsecutiveStores(); NFCI
llvm-svn: 239847
2015-06-16 20:05:00 +00:00
Sanjay Patel
f134048b1d propagate IR-level fast-math-flags to DAG nodes, disabled by default
This is an updated version of the patch that was checked in at:
http://reviews.llvm.org/rL237046

but subsequently reverted because it exposed a bug in the DAG Combiner:
http://reviews.llvm.org/D9893

This time, there's an enablement flag ("EnableFMFInDAG") around the code in
SelectionDAGBuilder where we copy the set of FP optimization flags from IR
instructions to DAG nodes. So, in theory, there should be no functional change
from this patch as-is, but it will allow testing with the added functionality
to proceed via "-enable-fmf-dag" passed to llc.

This patch adds the minimum plumbing necessary to use IR-level
fast-math-flags (FMF) in the backend without actually using
them for anything yet. This is a follow-on to:
http://reviews.llvm.org/rL235997

Differential Revision: http://reviews.llvm.org/D10403

llvm-svn: 239828
2015-06-16 16:25:43 +00:00
Matt Arsenault
ed891b5561 Revert "Revert "Fix merges of non-zero vector stores""
Reapply r239539. Don't assume the collected number of
stores is the same vector size. Just take the first N
stores to fill the vector.

llvm-svn: 239825
2015-06-16 15:51:48 +00:00
Simon Pilgrim
d3f6427446 [DAGCombiner] Added BSWAP(BSWAP(x)) -> x combine pattern.
llvm-svn: 239682
2015-06-13 16:25:12 +00:00
Simon Pilgrim
011381d48b [DAGCombiner] Added BSWAP vector constant folding support.
llvm-svn: 239675
2015-06-13 14:08:15 +00:00
Simon Pilgrim
096cccd01a Stripped trailing whitespace. NFC.
llvm-svn: 239674
2015-06-13 12:57:36 +00:00
Reid Kleckner
2691c59e97 Revert "Fix merges of non-zero vector stores"
This reverts commit r239539.

It was causing SDAG assertions while building freetype.

llvm-svn: 239543
2015-06-11 17:25:24 +00:00
Matt Arsenault
e23a063dc3 Fix merges of non-zero vector stores
Now actually stores the non-zero constant instead of 0.
I somehow forgot to include this part of r238108.

The test change was just an independent instruction order swap,
so just add another check line to satisfy CHECK-NEXT.

llvm-svn: 239539
2015-06-11 16:03:52 +00:00
Simon Pilgrim
4791f6d89b [DAGCombiner] Added CTLZ vector constant folding support.
llvm-svn: 239305
2015-06-08 16:19:00 +00:00
Simon Pilgrim
c789e1d57b [DAGCombiner] Added CTTZ vector constant folding support.
llvm-svn: 239293
2015-06-08 09:57:09 +00:00
Simon Pilgrim
68cd237f57 [DAGCombiner] Added CTPOP vector constant folding support.
Added tests to the existing SSE/AVX test files.

llvm-svn: 239252
2015-06-07 15:37:14 +00:00
Fiona Glaser
666e352440 DAGCombiner: don't duplicate (fmul x, c) in visitFNEG if fneg is free
For targets with a free fneg, this fold is always a net loss if it
ends up duplicating the multiply, so definitely avoid it.

This might be true for some targets without a free fneg too, but
I'll leave that for future investigation.

llvm-svn: 239167
2015-06-05 17:52:34 +00:00
Andrea Di Biagio
eb33134ce7 Simplify code; NFC.
Also, moved test cases from CodeGen/X86/fold-buildvector-bug.ll into
CodeGen/X86/buildvec-insertvec.ll and regenerated CHECK lines using
update_llc_test_checks.py.

llvm-svn: 239142
2015-06-05 10:29:55 +00:00
Andrea Di Biagio
9ac8a6b13d [DAGCombiner] Fix wrong folding of a build_vector into a blend with zero.
Method 'visitBUILD_VECTOR' in the DAGCombiner knows how to combine a
build_vector of a bunch of extract_vector_elt nodes and constant zero nodes
into a shuffle blend with a zero vector.

However, method 'visitBUILD_VECTOR' forgot that a floating point
build_vector may contain negative zero as well as positive zero.

Example:

define <2 x double> @example(<2 x double> %A) {
entry:
  %0 = extractelement <2 x double> %A, i32 0
  %1 = insertelement <2 x double> undef, double %0, i32 0
  %2 = insertelement <2 x double> %1, double -0.0, i32 1
  ret <2 x double> %2
}

Before this patch, llc (with -mattr=+sse4.1) wrongly generated
  movq   %xmm0, %xmm0  # xmm0 = xmm0[0],zero

So, the sign bit of the negative zero was effectively lost.

This patch fixes the problem by adding explicit checks for positive zero.

With this patch, llc produces the following code for the example above:
  movhpd .LCPI0_0(%rip), %xmm0

where .LCPI0_0 referes to a 'double -0'.

llvm-svn: 239070
2015-06-04 19:15:01 +00:00
Matt Arsenault
ca519dc28b Pass address space to isLegalAddressingMode in DAGCombiner
No test because I don't know of a target that makes use
of address spaces and indexed load / store.

llvm-svn: 239051
2015-06-04 16:17:34 +00:00
Matt Arsenault
65ad1602b0 Add target hook to allow merging stores of nonzero constants
On GPU targets, materializing constants is cheap and stores are
expensive, so only doing this for zero vectors was silly.

Most of the new testcases aren't optimally merged, and are for
later improvements.

llvm-svn: 238108
2015-05-24 00:51:27 +00:00
Simon Pilgrim
e054199354 [X86][SSE] Improve support for 128-bit vector sign extension
This patch improves support for sign extension of the lower lanes of vectors of integers by making use of the SSE41 pmovsx* sign extension instructions where possible, and optimizing the sign extension by shifts on pre-SSE41 targets (avoiding the use of i64 arithmetic shifts which require scalarization).

It converts SIGN_EXTEND nodes to SIGN_EXTEND_VECTOR_INREG where necessary, that more closely matches the pmovsx* instruction than the default approach of using SIGN_EXTEND_INREG which splits the operation (into an ANY_EXTEND lowered to a shuffle followed by shifts) making instruction matching difficult during lowering. Necessary support for SIGN_EXTEND_VECTOR_INREG has been added to the DAGCombiner.

Differential Revision: http://reviews.llvm.org/D9848

llvm-svn: 237885
2015-05-21 10:05:03 +00:00
Matthias Braun
56a781495a DAGCombiner: Continue combining if FoldConstantArithmetic() fails.
DAG.FoldConstantArithmetic() can fail even though both operands are
Constants if OpaqueConstants are involved. Continue trying other combine
possibilities in tis case.

Differential Revision: http://reviews.llvm.org/D6946

Somewhat related to PR21801 / rdar://19211454

llvm-svn: 237822
2015-05-20 18:54:02 +00:00
Sanjay Patel
03abbb48a4 use 'auto *' for pointers; clearer usage, no deep copying
llvm-svn: 237719
2015-05-19 20:10:16 +00:00
Sanjay Patel
ad11415962 tidy up
1. remove duplicate local variable
2. add local variable with name to match comment
3. remove useless comment

llvm-svn: 237715
2015-05-19 19:10:57 +00:00
Sanjay Patel
64a6da947a use range-based for-loop
llvm-svn: 237711
2015-05-19 18:24:33 +00:00