teak-llvm

mirror of https://github.com/Gericom/teak-llvm.git synced 2025-06-25 22:38:56 -04:00

Author	SHA1	Message	Date
Konstantin Zhuravlyov	081385a74e	[DAGCombiner] Do not remove the load of stored values when optimizations are disabled This combiner breaks debug experience and should not be run when optimizations are disabled. For example: int main() { int j = 0; j += 2; if (j == 2) return 0; return 5; } When debugging this code compiled in /O0, it should be valid to break at line "j+=2;" and edit the value of j. It should change the return value of the function. Differential Revision: https://reviews.llvm.org/D19268 llvm-svn: 284014	2016-10-12 13:44:24 +00:00
Michael Kuperstein	7adbf6b042	[DAG] Fix crash in build_vector -> vector_shuffle combine Fixes a crash in the build_vector -> vector_shuffle combine when the first vector input is twice as wide as the output, and the second input vector is even wider. llvm-svn: 283953	2016-10-11 22:44:31 +00:00
Sanjay Patel	8253e15ef3	[DAG] add fold for masked negated sign-extended bool This enhances the fold added with: https://reviews.llvm.org/rL283900 llvm-svn: 283905	2016-10-11 17:05:52 +00:00
Sanjay Patel	8384703d9b	[DAG] add fold for masked negated extended bool The non-obvious motivation for adding this fold (which already happens in InstCombine) is that we want to canonicalize IR towards select instructions and canonicalize DAG nodes towards boolean math. So we need to recreate some folds in the DAG to handle that change in direction. An interesting implementation difference for cases like this is that InstCombine generally works top-down while the DAG goes bottom-up. That means we need to detect different patterns. In this case, the SimplifyDemandedBits fold prevents us from performing a zext to sext fold that would then be recognized as a negation of a sext. llvm-svn: 283900	2016-10-11 16:26:36 +00:00
Sanjay Patel	38a42e4bfa	[DAG] simplify logic; NFC llvm-svn: 283885	2016-10-11 14:14:30 +00:00
Sanjay Patel	907ae69125	[DAG] hoist DL(N) and fix formatting; NFC llvm-svn: 283884	2016-10-11 14:04:24 +00:00
Sanjay Patel	9609f3d6c7	[DAG] fix formatting; NFC llvm-svn: 283878	2016-10-11 13:47:43 +00:00
Sanjay Patel	14c02052d6	[DAG] clean up foldSelectOfConstants(); NFCI Rename variables, simplify logic. Not clear yet why we don't handle a target with ZeroOrNegativeOneBooleanContent too. llvm-svn: 283613	2016-10-07 21:55:42 +00:00
Sanjay Patel	ecaf343fe7	[DAG] move fold (select C, 0, 1 -> xor C, 1) to a helper function; NFC We're missing at least 3 other similar folds based on what we have in InstCombine. llvm-svn: 283596	2016-10-07 20:47:51 +00:00
Vedant Kumar	7beb423765	Delete some dead code in SelectionDAG (NFC) Differential Revision: https://reviews.llvm.org/D24435 llvm-svn: 283505	2016-10-06 22:53:43 +00:00
Michael Kuperstein	7cc2123847	[DAG] Generalize build_vector -> vector_shuffle combine for more than 2 inputs This generalizes the build_vector -> vector_shuffle combine to support any number of inputs. The idea is to create a binary tree of shuffles, where the first layer performs pairwise shuffles of the input vectors placing each input element into the correct lane, and the rest of the tree blends these shuffles together. This doesn't try to be smart and create any sort of "optimal" shuffles. The assumption is that even a "poor" shuffle sequence is better than extracting and inserting the elements one by one. Differential Revision: https://reviews.llvm.org/D24683 llvm-svn: 283480	2016-10-06 18:58:24 +00:00
Sanjay Patel	f7df85af87	fix formatting; NFC llvm-svn: 283115	2016-10-03 15:18:36 +00:00
Nirav Dave	e524f50882	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r282600 due to test failues with MCJIT llvm-svn: 282604	2016-09-28 16:37:50 +00:00
Nirav Dave	e17e055b75	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior. Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 282600	2016-09-28 15:50:43 +00:00
Michael Kuperstein	3e06eafc20	[DAG] Remove isVectorClearMaskLegal() check from vector_build dagcombine This check currently doesn't seem to do anything useful on any in-tree target: On non-x86, it always evaluates to false, so we never hit the code path that creates the shuffle with zero. On x86, it just forwards to isShuffleMaskLegal(), which is a reasonable thing to query in general, but doesn't make sense if only restricted to zero blends. Differential Revision: https://reviews.llvm.org/D24625 llvm-svn: 282567	2016-09-28 06:13:58 +00:00
Wei Mi	ab24cd189f	Change the order of the splitted store from high - low to low - high. It is a trivial change which could make the testcase easier to be reused for the store splitting in CodeGenPrepare. llvm-svn: 281846	2016-09-18 06:10:32 +00:00
Sanjay Patel	b1f0a0f4a8	getValueType().getSizeInBits() -> getValueSizeInBits() ; NFCI llvm-svn: 281493	2016-09-14 16:05:51 +00:00
Sanjay Patel	5f6bb6cd24	getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits() ; NFCI llvm-svn: 281490	2016-09-14 15:43:44 +00:00
Sanjay Patel	bd6fca1419	getScalarType().getSizeInBits() -> getScalarSizeInBits() ; NFCI llvm-svn: 281489	2016-09-14 15:21:00 +00:00
Michael Kuperstein	59f8305305	[DAG] Allow build-to-shuffle combine to combine builds from two wide vectors. This allows us to, in some cases, create a vector_shuffle out of a build_vector, when the inputs to the build are extract_elements from two different vectors, at least one of which is wider than the output. (E.g. a <8 x i16> being constructed out of elements from a <16 x i16> and a <8 x i16>). Differential Revision: https://reviews.llvm.org/D24491 llvm-svn: 281402	2016-09-13 21:53:32 +00:00
Simon Pilgrim	4a8eba3e96	[DAGCombiner] Use APInt directly in (shl (zext (srl x, C)), C) combine range test To avoid assertion, we must ensure that the inner shift constant is within range before calling ConstantSDNode::getZExtValue(). We already know that the outer shift constant is in range. Followup to D23007 llvm-svn: 281362	2016-09-13 18:33:29 +00:00
Simon Pilgrim	bd28a85d14	[DAGCombiner] Use APInt directly in (shl (ext (shl x, c1)), c2) combine Fix failure to detect out of range shift constants leading to assert in ConstantSDNode::getZExtValue() Followup to D23007 llvm-svn: 281354	2016-09-13 17:15:28 +00:00
Ayman Musa	0c2da88f82	Remove MVT:i1 xor instruction before SELECT. (Performance improvement). Differential Revision: https://reviews.llvm.org/D23764 llvm-svn: 281308	2016-09-13 09:12:45 +00:00
Michael Kuperstein	efc0667583	[DAG] Refactor BUILD_VECTOR combine to make it easier to extend. NFCI. This should make it easier to add cases that we currently don't cover, like supporting more kinds of type mismatches and more than 2 input vectors. llvm-svn: 281283	2016-09-13 00:57:43 +00:00
Justin Lebar	adbf09e8cf	[CodeGen] Split out the notions of MI invariance and MI dereferenceability. Summary: An IR load can be invariant, dereferenceable, neither, or both. But currently, MI's notion of invariance is IR-invariant && IR-dereferenceable. This patch splits up the notions of invariance and dereferenceability at the MI level. It's NFC, so adds some probably-unnecessary "is-dereferenceable" checks, which we can remove later if desired. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D23371 llvm-svn: 281151	2016-09-11 01:38:58 +00:00
Simon Pilgrim	153b408433	[SelectionDAG] Ensure DAG::getZeroExtendInReg is called with a scalar type Fixes issue with rL280927 identified by Mikael Holmén llvm-svn: 281042	2016-09-09 13:31:52 +00:00
Simon Pilgrim	a01ee07a19	[DAGCombiner] Enable AND combines of splatted constant vectors Allow AND combines to use a vector splatted constant as well as a constant scalar. Preliminary part of D24253. llvm-svn: 280926	2016-09-08 12:36:39 +00:00
Hal Finkel	8ca2ed22b2	[DAGCombine] More fixups to SETCC legality checking (visitANDLike/visitORLike) I might have called this "r246507, the sequel". It fixes the same issue, as the issue has cropped up in a few more places. The underlying problem is that isSetCCEquivalent can pick up select_cc nodes with a result type that is not legal for a setcc node to have, and if we use that type to create new setcc nodes, nothing fixes that (and so we've violated the contract that the infrastructure has with the backend regarding setcc node types). Fixes PR30276. For convenience, here's the commit message from r246507, which explains the problem is greater detail: [DAGCombine] Fixup SETCC legality checking SETCC is one of those special node types for which operation actions (legality, etc.) is keyed off of an operand type, not the node's value type. This makes sense because the value type of a legal SETCC node is determined by its operands' value type (via the TLI function getSetCCResultType). When the SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value type, or directly with the value type provided by TLI.getSetCCResultType. The first problem being fixed here is that DAGCombine had several places querying TLI.isOperationLegal on SETCC, but providing the return of getSetCCResultType, instead of the operand type directly. This does not mean what the author thought, and "luckily", most in-tree targets have SETCC with Custom lowering, instead of marking them Legal, so these checks return false anyway. The second problem being fixed here is that two of the DAGCombines could create SETCC nodes with arbitrary (integer) value types; specifically, those that would simplify: (setcc a, b, op1) and\|or (setcc a, b, op2) -> setcc a, b, op3 (which is possible for some combinations of (op1, op2)) If the operands of the and\|or node are actual setcc nodes, then this is not an issue (because the and\|or must share the same type), but, the relevant code in DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls DAGCombiner::isSetCCEquivalent on each operand, and that function will recognise setcc-like select_cc nodes with other return types. And, thus, when creating new SETCC nodes, we need to be careful to respect the value-type constraint. This is even true before type legalization, because it is quite possible for the SELECT_CC node to have a legal type that does not happen to match the corresponding TLI.getSetCCResultType type. To be explicit, there is nothing that later fixes the value types of SETCC nodes (if the type is legal, but does not happen to match TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to work only because, either MVT::i1 is not legal, or it is what TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change, however. For the time being, restrict the relevant transformations to produce only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1 prior to type legalization). Fixes PR24636. llvm-svn: 280767	2016-09-06 23:02:23 +00:00
Wei Mi	c54d1298f5	Split the store of a wide value merged from an int-fp pair into multiple stores. For the store of a wide value merged from a pair of values, especially int-fp pair, sometimes it is more efficent to split it into separate narrow stores, which can remove the bitwise instructions or sink them to colder places. Now the feature is only enabled on x86 target, and only store of int-fp pair is splitted. It is possible that the application scope gets extended with perf evidence support in the future. Differential Revision: https://reviews.llvm.org/D22840 llvm-svn: 280505	2016-09-02 17:17:04 +00:00
Andrea Di Biagio	fd503e5af3	[DAGcombiner] Fix incorrect sinking of a truncate into the operand of a shift. This fixes a regression introduced by revision 268094. Revision 268094 added the following dag combine rule: // trunc (shl x, K) -> shl (trunc x), K => K < vt.size / 2 That rule converts a truncate of a shift-by-constant into a shift of a truncated value. We do this only if the shift count is less than half the size in bits of the truncated value (K < vt.size / 2). The problem is that the constraint on the shift count is incorrect, so the rule doesn't work well in some cases involving vector types. The combine rule should have been written instead like this: // trunc (shl x, K) -> shl (trunc x), K => K < vt.getScalarSizeInBits() Basically, if K is smaller than the "scalar size in bits" of the truncated value then we know that by "sinking" the truncate into the operand of the shift we would never accidentally make the shift undefined. This patch fixes the check on the shift count, and adds test cases to make sure that we don't regress the behavior. Differential Revision: https://reviews.llvm.org/D24154 llvm-svn: 280482	2016-09-02 11:29:09 +00:00
Michael Kuperstein	65bc3c89ff	[DAGCombine] Don't fold a trunc if it feeds an anyext Legalization tends to create anyext(trunc) patterns. This should always be combined - into either a single trunc, a single ext, or nothing if the types match exactly. But if we happen to combine the trunc first, we may pull the trunc away from the anyext or make it implicit (e.g. the truncate(extract) -> extract(bitcast) fold). To prevent this, we can avoid doing the fold, similarly to how we already handle fpround(fpextend). Differential Revision: https://reviews.llvm.org/D23893 llvm-svn: 280386	2016-09-01 17:59:24 +00:00
Simon Pilgrim	02b13d4d3c	Use SDValue::getOpcode() helper instead of via SDValue::getNode() llvm-svn: 279381	2016-08-20 20:04:18 +00:00
Justin Bogner	cd1d5aaf2e	Replace a few more "fall through" comments with LLVM_FALLTHROUGH Follow up to r278902. I had missed "fall through", with a space. llvm-svn: 278970	2016-08-17 20:30:52 +00:00
David Majnemer	0d955d0bf5	Use the range variant of find instead of unpacking begin/end If the result of the find is only used to compare against end(), just use is_contained instead. No functionality change is intended. llvm-svn: 278433	2016-08-11 22:21:41 +00:00
David Majnemer	0a16c22846	Use range algorithms instead of unpacking begin/end No functionality change is intended. llvm-svn: 278417	2016-08-11 21:15:00 +00:00
Simon Pilgrim	85c7ea86ae	[DAGCombine] Avoid INSERT_SUBVECTOR reinsertions (PR28678) If the input vector to INSERT_SUBVECTOR is another INSERT_SUBVECTOR, and this inserted subvector replaces the last insertion, then insert into the common source vector. i.e. INSERT_SUBVECTOR( INSERT_SUBVECTOR( Vec, SubOld, Idx ), SubNew, Idx ) --> INSERT_SUBVECTOR( Vec, SubNew, Idx ) Differential Revision: https://reviews.llvm.org/D23330 llvm-svn: 278211	2016-08-10 10:50:53 +00:00
Simon Pilgrim	76964e3140	[DAGCombiner] Better support for shifting large value type by constants As detailed on D22726, much of the shift combining code assume constant values will fit into a uint64_t value and calls ConstantSDNode::getZExtValue where it probably shouldn't (leading to asserts). Using APInt directly avoids this problem but we encounter other assertions if we attempt to compare/operate on 2 APInt of different bitwidths. This patch adds a helper function to ensure that 2 APInt values are zero extended as required so that they can be safely used together. I've only added an initial example use for this to the '(SHIFT (SHIFT x, c1), c2) --> (SHIFT x, (ADD c1, c2))' combines. Further cases can easily be added as required. Differential Revision: https://reviews.llvm.org/D23007 llvm-svn: 278141	2016-08-09 17:39:11 +00:00
Nikolai Bozhenov	f679530ba1	[X86] Heuristic to selectively build Newton-Raphson SQRT estimation On modern Intel processors hardware SQRT in many cases is faster than RSQRT followed by Newton-Raphson refinement. The patch introduces a simple heuristic to choose between hardware SQRT instruction and Newton-Raphson software estimation. The patch treats scalars and vectors differently. The heuristic is that for scalars the compiler should optimize for latency while for vectors it should optimize for throughput. It is based on the assumption that throughput bound code is likely to be vectorized. Basically, the patch disables scalar NR for big cores and disables NR completely for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores. Secondly, vector SQRT has been greatly improved in Skylake and has better throughput compared to NR. Differential Revision: https://reviews.llvm.org/D21379 llvm-svn: 277725	2016-08-04 12:47:28 +00:00
Diana Picus	ddddbc2440	Typo fix in comment. NFC llvm-svn: 277704	2016-08-04 08:25:08 +00:00
Michael Kuperstein	c97da7f3a4	[DAGCombine] Make sext(setcc) combine respect getBooleanContents We used to combine "sext(setcc x, y, cc) -> (select (setcc x, y, cc), -1, 0)" Instead, we should combine to (select (setcc x, y, cc), T, 0) where the value of T is 1 or -1, depending on the type of the setcc, and getBooleanContents() for the type if it is not i1. This fixes PR28504. llvm-svn: 277371	2016-08-01 19:39:49 +00:00
Matthias Braun	941a705b7b	MachineFunction: Return reference for getFrameInfo(); NFC getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017	2016-07-28 18:40:00 +00:00
Simon Pilgrim	10bf0ff879	[DAGCombiner] Use APInt directly to detect out of range shift constants Using getZExtValue() will assert if the value doesn't fit into uint64_t - SHL was already doing this, I've just updated ASHR/LSHR to match As mentioned on D22726 llvm-svn: 276855	2016-07-27 10:30:55 +00:00
Justin Lebar	9c375817ac	[SelectionDAG] Get rid of bool parameters in SelectionDAG::getLoad, getStore, and friends. Summary: Instead, we take a single flags arg (a bitset). Also add a default 0 alignment, and change the order of arguments so the alignment comes before the flags. This greatly simplifies many callsites, and fixes a bug in AMDGPUISelLowering, wherein the order of the args to getLoad was inverted. It also greatly simplifies the process of adding another flag to getLoad. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits Differential Revision: http://reviews.llvm.org/D22249 llvm-svn: 275592	2016-07-15 18:27:10 +00:00
Sanjay Patel	fedc01ad76	[DAG] make isConstantSplatVector() available to the rest of lowering llvm-svn: 275025	2016-07-10 21:27:06 +00:00
Sanjay Patel	303326541b	reformat, fix comments/names; NFCI llvm-svn: 275015	2016-07-10 13:05:57 +00:00
Matt Arsenault	3fb8f9eabf	Reapply r274829 with fix for FP vectors llvm-svn: 274937	2016-07-08 21:25:33 +00:00
Nico Weber	28410c6846	Revert r274829, it caused PR28472. llvm-svn: 274916	2016-07-08 19:52:19 +00:00
Sjoerd Meijer	1ee119f897	Do not expand SDIV when compiling for minimum code size Differential Revision: http://reviews.llvm.org/D22139 llvm-svn: 274855	2016-07-08 15:32:01 +00:00
Sjoerd Meijer	46c4c3d31c	Addressing post-commit comments regarding not expanding UDIV; we don't expand only when compiling for minimum code size. llvm-svn: 274847	2016-07-08 14:17:09 +00:00
Sjoerd Meijer	a625af3feb	Code size optimisation: don't expand a div to a mul and and a shift sequence. As a result, the urem instruction will not be expanded to a sequence of umull, lsrs, muls and sub instructions, but just a call to __aeabi_uidivmod. Differential Revision: http://reviews.llvm.org/D22131 llvm-svn: 274843	2016-07-08 12:54:43 +00:00
Matt Arsenault	c3a6fe6ecd	Bug 28444: Fix assertion when extract_vector_elt has mismatched type For some reason extract_vector_elt is sometimes allowed to have a different result type than the vector element type. llvm-svn: 274829	2016-07-08 07:05:00 +00:00
Tim Shen	1c3c0afc53	[DAGCombiner] Fix visitSTORE to continue processing current SDNode, if findBetterNeighborChains doesn't actually CombineTo it. Summary: findBetterNeighborChains may or may not find a better chain for each node it finds, which include the node ("St") that visitSTORE is currently processing. If no better chain is found for St, visitSTORE should continue instead of return SDValue(St, 0), as if it's CombinedTo'ed. This fixes bug 28130. There might be other ways to make the test pass (see D21409). I think both of the patches are fixing actual bugs revealed by the same testcase. Reviewers: echristo, wschmidt, hfinkel, kbarton, amehsan, arsenm, nemanjai, bogner Subscribers: mehdi_amini, nemanjai, llvm-commits Differential Revision: http://reviews.llvm.org/D21692 llvm-svn: 274644	2016-07-06 17:44:03 +00:00
Balaram Makam	d4acd7ed10	Revert r259387: "AArch64: Implement missed conditional compare sequences." This reverts commit r259387 because it inserts illegal code after legalization in some backends where i64 OR type is illegal for example. llvm-svn: 274573	2016-07-05 20:24:05 +00:00
Matt Arsenault	2d79389508	DAGCombiner: Fold away vector extract of insert with the same index This only really matters when the index is non-constant since the constant case already gets taken care of by other combines. llvm-svn: 274569	2016-07-05 18:25:02 +00:00
Craig Topper	d1eca0f32c	[CodeGen] Teach OR combine of shuffles involving zero vectors to better handle undef indices. Undef indices can now be treated as zeros. Or if its undef ORed with zero, we will keep the undef. llvm-svn: 274472	2016-07-03 19:37:12 +00:00
Craig Topper	2bd8b4b180	[CodeGen,Target] Remove the version of DAG.getVectorShuffle that takes a pointer to a mask array. Convert all callers to use the ArrayRef version. No functional change intended. For the most part this simplifies all callers. There were two places in X86 that needed an explicit makeArrayRef to shorten a statically sized array. llvm-svn: 274337	2016-07-01 06:54:47 +00:00
Craig Topper	3a011de10c	[DAGCombine] Teach DAG combine to handle ORs of shuffles involving zero vectors where the zero vector is the first operand to the shuffle instead of the second. llvm-svn: 274097	2016-06-29 03:29:12 +00:00
Matt Arsenault	f0f721a682	DAGCombiner: Don't narrow volatile vector loads + extract llvm-svn: 273909	2016-06-27 19:31:04 +00:00
Craig Topper	9c809bee1c	[SelectionDAG] Use DAG.getCommutedVectorShuffle instead of reimplementing it. llvm-svn: 273802	2016-06-26 05:10:49 +00:00
Nirav Dave	bfdb483755	Preserve DebugInfo when replacing values in DAGCombiner Recommiting after correcting over-eager Debug Value transfer fixing PR28270. [DAG] Previously debug values would transfer debuginfo for the selected start node for a replacement which allows for debug to be dropped. Push debug value transfer to occur with node/value replacement in SelectionDAG, remove now extraneous transfers of debug values. This refixes PR9817 which was being incompletely checked in the testsuite. Reviewers: jyknight Subscribers: dblaikie, llvm-commits Differential Revision: http://reviews.llvm.org/D21037 llvm-svn: 273585	2016-06-23 17:52:57 +00:00
Peter Collingbourne	6717803485	Revert r273456, "Preserve DebugInfo when replacing values in DAGCombiner" as it caused pr28270. llvm-svn: 273518	2016-06-23 00:06:17 +00:00
Nirav Dave	96beb7dee5	Preserve DebugInfo when replacing values in DAGCombiner Recommiting after fixing over-aggressive assertion [DAG] Previously debug values would transfer debuginfo for the selected start node for a replacement which allows for debug to be dropped. Push debug value transfer to occur with node/value replacement in SelectionDAG, remove now extraneous transfers of debug values. This refixes PR9817 which was being incompletely checked in the testsuite. Reviewers: jyknight Subscribers: dblaikie, llvm-commits Differential Revision: http://reviews.llvm.org/D21037 llvm-svn: 273456	2016-06-22 19:03:26 +00:00
David Majnemer	e61e4bfd87	Replace silly uses of 'signed' with 'int' llvm-svn: 273244	2016-06-21 05:10:24 +00:00
Sanjay Patel	f664f3a578	[DAG] Remove redundant FMUL in Newton-Raphson SQRT code When calculating a square root using Newton-Raphson with two constants, a naive implementation is to use five multiplications (four muls to calculate reciprocal square root and another one to calculate the square root itself). However, after some reassociation and CSE the same result can be obtained with only four multiplications. Unfortunately, there's no reliable way to do such a reassociation in the back-end. So, the patch modifies NR code itself so that it directly builds optimal code for SQRT and doesn't rely on any further reassociation. Patch by Nikolai Bozhenov! Differential Revision: http://reviews.llvm.org/D21127 llvm-svn: 272920	2016-06-16 16:58:54 +00:00
Nirav Dave	194cb55f37	Revert "Preserve DebugInfo when replacing values in DAGCombiner" Reverting due to assertion failure in lib/CodeGen/SelectionDAG/InstrEmitter.cpp This reverts commit r272792. llvm-svn: 272799	2016-06-15 16:08:50 +00:00
Nirav Dave	a72e308403	Preserve DebugInfo when replacing values in DAGCombiner [DAG] Previously debug values would transfer debuginfo for the selected start node for a replacement which allows for debug to be dropped. Push debug value transfer to occur with node/value replacement in SelectionDAG, remove now extraneous transfers of debug values. This refixes PR9817 which was being incompletely checked in the testsuite. Reviewers: jyknight Subscribers: dblaikie, llvm-commits Differential Revision: http://reviews.llvm.org/D21037 llvm-svn: 272792	2016-06-15 14:50:08 +00:00
Benjamin Kramer	bdc4956bac	Pass DebugLoc and SDLoc by const ref. This used to be free, copying and moving DebugLocs became expensive after the metadata rewrite. Passing by reference eliminates a ton of track/untrack operations. No functionality change intended. llvm-svn: 272512	2016-06-12 15:39:02 +00:00
Benjamin Kramer	46e38f3678	Avoid copies of std::strings and APInt/APFloats where we only read from it As suggested by clang-tidy's performance-unnecessary-copy-initialization. This can easily hit lifetime issues, so I audited every change and ran the tests under asan, which came back clean. llvm-svn: 272126	2016-06-08 10:01:20 +00:00
Sanjay Patel	dba8b4c04d	transform obscured FP sign bit ops into a fabs/fneg using TLI hook This is effectively a revert of: http://reviews.llvm.org/rL249702 - [InstCombine] transform masking off of an FP sign bit into a fabs() intrinsic call (PR24886) and: http://reviews.llvm.org/rL249701 - [ValueTracking] teach computeKnownBits that a fabs() clears sign bits and a reimplementation as a DAG combine for targets that have IEEE754-compliant fabs/fneg instructions. This is intended to resolve the objections raised on the dev list: http://lists.llvm.org/pipermail/llvm-dev/2016-April/098154.html and: https://llvm.org/bugs/show_bug.cgi?id=24886#c4 In the interest of patch minimalism, I've only partly enabled AArch64. PowerPC, MIPS, x86 and others can enable later. Differential Revision: http://reviews.llvm.org/D19391 llvm-svn: 271573	2016-06-02 20:01:37 +00:00
Sanjay Patel	f509d85a6d	[DAG] use getBitcast() to reduce code Although this was intended to be NFC, the test case wiggle shows a change in code scheduling/RA caused by a difference in the SDLoc() generation. Depending on how you look at it, this is the (dis)advantage of exact checking in regression tests. llvm-svn: 271526	2016-06-02 16:01:15 +00:00
Matt Arsenault	5d06439c54	DAGCombiner: Fix broken size check in isAlias This should have been converting the size to bytes, but wasn't really. These should probably all be using getStoreSize instead. I haven't been able to come up with a meaningful testcase for this. I can trigger it using combinations of struct loads and stores, but can't observe a difference in non-broken testcases. isAlias is only really used during store merging, so I'm not sure how to get into the vector splitting situation the comment describes since store merging is only done before type legalization. llvm-svn: 271356	2016-06-01 01:00:36 +00:00
Michael Kuperstein	a75c77b127	[X86] Detect SAD patterns and emit psadbw instructions. This recommits r267649 with a fix for PR27539. Differential Revision: http://reviews.llvm.org/D20598 llvm-svn: 271033	2016-05-27 18:53:22 +00:00
Simon Pilgrim	0a6b95a60a	Simplify std::all_of predicate (to one line) by using llvm::all_of. NFCI. llvm-svn: 270747	2016-05-25 20:13:39 +00:00
Sanjay Patel	b2bcd95aab	reduce indentation; NFCI llvm-svn: 270007	2016-05-19 00:33:07 +00:00
Simon Pilgrim	ed39d150f5	Fix unused variable warning. llvm-svn: 268867	2016-05-07 20:19:59 +00:00
Simon Pilgrim	b6f82c449a	[SelectionDAG] Added bitreverse(bitreverse(v)) --> v Added bitreverse creation testing llvm-svn: 268865	2016-05-07 20:12:36 +00:00
Matt Arsenault	ab2232cf73	DAGCombiner: Reduce truncated shl width llvm-svn: 268094	2016-04-29 19:53:16 +00:00
Gerolf Hoflehner	50426191d7	[DAGCombiner] Follow coding convention for function name (NFC) llvm-svn: 267745	2016-04-27 17:27:16 +00:00
Nico Weber	e69b9548b8	Revert r267649, it caused PR27539. llvm-svn: 267723	2016-04-27 15:16:54 +00:00
Cong Hou	6f879d9eb1	Detects the SAD pattern on X86 so that much better code will be emitted once the pattern is matched. Differential revision: http://reviews.llvm.org/D14840 llvm-svn: 267649	2016-04-27 01:29:18 +00:00
Ahmed Bougacha	128f8732a5	[CodeGen] Add getBuildVector and getSplatBuildVector helpers. NFCI. Differential Revision: http://reviews.llvm.org/D17176 llvm-svn: 267606	2016-04-26 21:15:30 +00:00
Marcin Koscielnicki	1c1af6ef77	[PR27390] [CodeGen] Reject indexed loads in CombinerDAG. visitAND, when folding and (load) forgets to check which output of an indexed load is involved, happily folding the updated address output on the following testcase: target datalayout = "e-m:e-i64:64-n32:64" target triple = "powerpc64le-unknown-linux-gnu" %typ = type { i32, i32 } define signext i32 @_Z8access_pP1Tc(%typ* %p, i8 zeroext %type) { %b = getelementptr inbounds %typ, %typ* %p, i64 0, i32 1 %1 = load i32, i32* %b, align 4 %2 = ptrtoint i32* %b to i64 %3 = and i64 %2, -35184372088833 %4 = inttoptr i64 %3 to i32* %_msld = load i32, i32* %4, align 4 %zzz = add i32 %1, %_msld ret i32 %zzz } Fix this by checking ResNo. I've found a few more places that currently neglect to check for indexed load, and tightened them up as well, but I don't have test cases for them. In fact, they might not be triggerable at all, at least with current targets. Still, better safe than sorry. Differential Revision: http://reviews.llvm.org/D19202 llvm-svn: 267420	2016-04-25 15:43:44 +00:00
Gerolf Hoflehner	01b3a6184a	[MachineCombiner] Support for floating-point FMA on ARM64 (re-commit r267098) The original patch caused crashes because it could derefence a null pointer for SelectionDAGTargetInfo for targets that do not define it. Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267328	2016-04-24 05:14:01 +00:00
Craig Topper	36c133159a	[CodeGen] Teach DAG combine to fold select_cc seteq X, 0, sizeof(X), ctlz_zero_undef(X) -> ctlz(X). InstCombine already does this for IR and X86 pattern matches this during isel. A follow up commit will remove the X86 patterns to allow this to be tested. llvm-svn: 267325	2016-04-24 04:38:32 +00:00
Matt Arsenault	3b748d76f6	DAGCombiner: Relax alignment restriction when changing store type If the target allows the alignment, this should be OK. llvm-svn: 267217	2016-04-22 21:01:41 +00:00
Matt Arsenault	629d12de70	DAGCombiner: Relax alignment restriction when changing load type If the target allows the alignment, this should still be OK. llvm-svn: 267209	2016-04-22 20:21:36 +00:00
Daniel Sanders	591c379563	Revert r267098 - [MachineCombiner] Support for floating-point FMA on ARM64 It introduced buildbot failures on clang-cmake-mips, clang-ppc64le-linux, among others. llvm-svn: 267127	2016-04-22 09:37:26 +00:00
Gerolf Hoflehner	b32f11fc62	[MachineCombiner] Support for floating-point FMA on ARM64 Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267098	2016-04-22 02:15:19 +00:00
Matt Arsenault	8d1052f55c	DAGCombiner: Reduce 64-bit BFE pattern to pattern on 32-bit component If the extracted bits are restricted to the upper half or lower half, this can be truncated. llvm-svn: 267024	2016-04-21 18:03:06 +00:00
Nirav Dave	2477491a92	Cleanup Store Merging in UseAA case This patch fixes a bug (PR26827) when using anti-aliasing in store merging. This sets the chain users of the component stores to point to the new store instead of the component stores chain parent. Reviewers: jyknight Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18909 llvm-svn: 266217	2016-04-13 17:27:26 +00:00
Simon Pilgrim	82e54871d0	[DAGCombiner] Fold xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B)) anytime before LegalizeVectorOprs xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B)) was only being combined at the AfterLegalizeTypes stage, this patch permits the combine to occur anytime before then as well. The main aim with this to improve the ability to recognise bitmasks that can be converted to shuffles. I had to modify a number of AVX512 mask tests as the basic bitcast to/from scalar pattern was being stripped out, preventing testing of the mmask bitops. By replacing the bitcasts with loads we can get almost the same result. Differential Revision: http://reviews.llvm.org/D18944 llvm-svn: 265998	2016-04-11 21:10:33 +00:00
Nirav Dave	66f485f4e2	Fix Load Control Dependence in MemCpy Generation In Memcpy lowering we had missed a dependence from the load of the operation to successor operations. This causes us to potentially construct an in initial DAG with a memory dependence not fully represented in the chain sub-DAG but rather require looking at the entire DAG breaking alias analysis by allowing incorrect repositioning of memory operations. To work around this, r200033 changed DAGCombiner::GatherAllAliases to be conservative if any possible issues to happen. Unfortunately this check forbade many non-problematic situations as well. For example, it's common for incoming argument lowering to add a non-aliasing load hanging off of EntryNode. Then, if GatherAllAliases visited EntryNode, it would find that other (unvisited) use of the EntryNode chain, and just give up entirely. Furthermore, the check was incomplete: it would not actually detect all such potentially problematic DAG constructions, because GatherAllAliases did not guarantee to visit all chain nodes going up to the root EntryNode. This is in general fine -- giving up early will just miss a potential optimization, not generate incorrect results. But, for this non-chain dependency detection code, it's possible that you could have a load attached to a higher-up chain node than any which were visited. If that load aliases your store, but the only dependency is through the value operand of a non-aliasing store, it would've been missed by this code, and potentially reordered. With the dependence added, this check can be removed and Alias Analysis can be much more aggressive. This fixes code quality regression in the Consecutive Store Merge cleanup (D14834). Test Change: ppc64-align-long-double.ll now may see multiple serializations of its stores Differential Revision: http://reviews.llvm.org/D18062 llvm-svn: 265836	2016-04-08 19:44:40 +00:00
Sanjay Patel	769b5fd546	fix typos; NFC llvm-svn: 265356	2016-04-04 22:45:56 +00:00
Sanjay Patel	4d71160d5d	fix typo; NFC llvm-svn: 265054	2016-03-31 21:00:48 +00:00
Nirav Dave	83ce54aac2	Prevent X86ISelLowering from merging volatile loads Change isConsecutiveLoads to check that loads are non-volatile as this is a requirement for any load merges. Propagate change to two callers. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18546 llvm-svn: 265013	2016-03-31 13:40:55 +00:00
Nirav Dave	fa250cad37	Prevent construction of cycle in DAG store merge When merging stores in DAGCombiner, add check to ensure that no dependenices exist that would cause the construction of a cycle in our DAG. This may happen if one store has a data dependence on another instruction (e.g. a load) which itself has a (chain) dependence on another store being merged. These stores cannot be merged safely and doing so results in a cycle that is discovered in LegalizeDAG. This test is only done in cases where Antialias analysis is used (UseAA) as non-AA store merge candidates will be merged logically after all loads which have been checked to not alias. Reviewers: ahatanak, spatel, niravd, arsenm, hfinkel, tstellarAMD, jyknight Subscribers: llvm-commits, tberghammer, danalbert, srhines Differential Revision: http://reviews.llvm.org/D18336 llvm-svn: 264461	2016-03-25 21:06:30 +00:00
Justin Bogner	c35c10593b	SelectionDAG: Remove a tautological dyn_cast. NFC Index is already a StoreSDNode, so this dyn_cast doesn't do anything. llvm-svn: 264177	2016-03-23 18:15:33 +00:00
Silviu Baranga	46030585b3	[DAGCombine] Catch the case where extract_vector_elt can cause an any_ext while processing AND SDNodes Summary: extract_vector_elt can cause an implicit any_ext if the types don't match. When processing the following pattern: (and (extract_vector_elt (load ([non_ext\|any_ext\|zero_ext] V))), c) DAGCombine was ignoring the possible extend, and sometimes removing the AND even though it was required to maintain some of the bits in the result to 0, resulting in a miscompile. This change fixes the issue by limiting the transformation only to cases where the extract_vector_elt doesn't perform the implicit extend. Reviewers: t.p.northover, jmolloy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18247 llvm-svn: 263935	2016-03-21 11:43:46 +00:00
Sanjay Patel	7506852709	[DAG] use !isUndef() ; NFCI llvm-svn: 263453	2016-03-14 18:09:43 +00:00
Sanjay Patel	5719584129	[DAG] use isUndef() ; NFCI llvm-svn: 263448	2016-03-14 17:28:46 +00:00
Simon Pilgrim	61eb49e437	[X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns. Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion). Differential Revision: http://reviews.llvm.org/D17691 llvm-svn: 263159	2016-03-10 20:40:26 +00:00
Hans Wennborg	e00b6e7249	Revert r262599 "[X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG" This caused PR26870. llvm-svn: 262935	2016-03-08 16:21:41 +00:00
Matt Arsenault	ceb2c06cbd	DAGCombiner: Check legality before creating extract_vector_elt Problem not hit by any in tree target. llvm-svn: 262852	2016-03-07 21:10:09 +00:00
Michael Kuperstein	b89f0fa2a2	[DAGCombine] Fix divrem combine not to assume div/rem type is simple. The divrem combine assumed the type of the div/rem is simple, which isn't necessarily true. This probably worked fine until r250825, since it only saw legal types, but now breaks when it runs as a pre-type-legalization combine. This fixes PR26835. Differential Revision: http://reviews.llvm.org/D17878 llvm-svn: 262746	2016-03-04 21:23:29 +00:00
Renato Golin	175c6d6d95	[ARM] Merging 64-bit divmod lib calls into one When div+rem calls on the same arguments are found, the ARM back-end merges the two calls into one __aeabi_divmod call for up to 32-bits values. However, for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't merging the calls, and thus calling ldivmod twice and spilling the temporary results, which generated pretty bad code. This patch legalises 64-bit lib calls for divmod, so that now all the spilling and the second call are gone. It also relaxes the DivRem combiner a bit on the legal type check, since it was already checking for isLegalOrCustom on every value, so the extra check for isTypeLegal was redundant. Second attempt, creating TLI.isOperationCustom like isOperationExpand, to make sure we only emit valid types or the ones that were explicitly marked as custom. Now, passing check-all and test-suite on x86, ARM and AArch64. This patch fixes PR17193 (and a long time FIXME in the tests). llvm-svn: 262738	2016-03-04 19:19:36 +00:00
Simon Pilgrim	91dd0a796c	[X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns. Differential Revision: http://reviews.llvm.org/D17691 llvm-svn: 262599	2016-03-03 09:43:28 +00:00
Renato Golin	3d78271eac	Revert "[ARM] Merging 64-bit divmod lib calls into one" This reverts commit r262507, which broke some ARM buildbots. llvm-svn: 262594	2016-03-03 08:57:44 +00:00
Renato Golin	93e42d9934	[ARM] Merging 64-bit divmod lib calls into one When div+rem calls on the same arguments are found, the ARM back-end merges the two calls into one __aeabi_divmod call for up to 32-bits values. However, for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't merging the calls, and thus calling ldivmod twice and spilling the temporary results, which generated pretty bad code. This patch legalises 64-bit lib calls for divmod, so that now all the spilling and the second call are gone. It also relaxes the DivRem combiner a bit on the legal type check, since it was already checking for isLegalOrCustom on every value, so the extra check for isTypeLegal was redundant. This patch fixes PR17193 (and a long time FIXME in the tests). llvm-svn: 262507	2016-03-02 19:35:45 +00:00
Matt Arsenault	7d0a77b979	DAGCombiner: Make sure an integer is being truncated llvm-svn: 262446	2016-03-02 01:36:51 +00:00
Matt Arsenault	b36d462fac	DAGCombiner: Turn truncate of a bitcasted vector to an extract On AMDGPU where operations i64 operations are often bitcasted to v2i32 and back, this pattern shows up regularly where it breaks some expected combines on i64, such as load width reducing. This fixes some test failures in a future commit when i64 loads are changed to promote. llvm-svn: 262397	2016-03-01 21:31:53 +00:00
Vasileios Kalintiris	36901dd1c3	Revert "[mips] Promote the result of SETCC nodes to GPR width." This reverts commit r262316. It seems that my change breaks an out-of-tree chromium buildbot, so I'm reverting this in order to investigate the situation further. llvm-svn: 262387	2016-03-01 20:25:43 +00:00
Matt Arsenault	03dac8d8e4	DAGCombiner: Turn extract of bitcasted integer into truncate This reduces the number of bitcast nodes and generally cleans up the DAG when bitcasting between integers and vectors everywhere. llvm-svn: 262358	2016-03-01 18:01:37 +00:00
Vasileios Kalintiris	3a8f7f9e31	[mips] Promote the result of SETCC nodes to GPR width. Summary: This patch modifies the existing comparison, branch, conditional-move and select patterns, and adds new ones where needed. Also, the updated SLT{u,i,iu} set of instructions generate a GPR width result. The majority of the code changes in the Mips back-end fix the wrong assumption that the result of SETCC nodes always produce an i32 value. The changes in the common code path account for the fact that in 64-bit MIPS targets, i1 is promoted to i32 instead of i64. Reviewers: dsanders Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D10970 llvm-svn: 262316	2016-03-01 10:08:01 +00:00
Matt Arsenault	982224cfb8	DAGCombiner: Don't unnecessarily swap operands in ReassociateOps In the case where op = add, y = base_ptr, and x = offset, this transform: (op y, (op x, c1)) -> (op (op x, y), c1) breaks the canonical form of add by putting the base pointer in the second operand and the offset in the first. This fix is important for the R600 target, because for some address spaces the base pointer and the offset are stored in separate register classes. The old pattern caused the ISel code for matching addressing modes to put the base pointer and offset in the wrong register classes, which required no-trivial code transformations to fix. llvm-svn: 262148	2016-02-27 19:57:45 +00:00
Matt Arsenault	360d244d5b	DAGCombiner: Relax sqrt NaN folding check This is OK for +0 since compares to +/-0 give the same result. llvm-svn: 262125	2016-02-27 09:38:05 +00:00
Simon Pilgrim	c5199aae82	[DAGCombiner] Use getBitcast helper when possible. NFCI. llvm-svn: 261437	2016-02-20 15:05:29 +00:00
Ahmed Bougacha	93cff7fb82	[CodeGen] Document and use getConstant's splat-building feature. NFC. Differential Revision: http://reviews.llvm.org/D17229 llvm-svn: 260901	2016-02-15 18:07:29 +00:00
Pirama Arumuga Nainar	7476bc89e9	Don't combine fp_round (fp_round x) if f80 to f16 is generated Summary: This patch skips DAG combine of fp_round (fp_round x) if it results in an fp_round from f80 to f16. fp_round from f80 to f16 always generates an expensive (and as yet, unimplemented) libcall to __truncxfhf2. This prevents selection of native f16 conversion instructions from f32 or f64. Moreover, the first (value-preserving) fp_round from f80 to either f32 or f64 may become a NOP in platforms like x86. Reviewers: ab Subscribers: srhines, llvm-commits Differential Revision: http://reviews.llvm.org/D17221 llvm-svn: 260769	2016-02-13 00:08:05 +00:00
Ahmed Bougacha	f8dfb47c02	[CodeGen] Prefer "if (SDValue R = ...)" to "if (R.getNode())". NFCI. llvm-svn: 260316	2016-02-09 22:54:12 +00:00
Tim Shen	f99f0d5a7e	[SelectionDAG] Fix CombineToPreIndexedLoadStore O(n^2) behavior This patch consists of two parts: a performance fix in DAGCombiner.cpp and a correctness fix in SelectionDAG.cpp. The test case tests the bug that's uncovered by the performance fix, and fixed by the correctness fix. The performance fix keeps the containers required by the hasPredecessorHelper (which is a lazy DFS) and reuse them. Since hasPredecessorHelper is called in a loop, the overall efficiency reduced from O(n^2) to O(n), where n is the number of SDNodes. The correctness fix keeps iterating the neighbor list even if it's time to early return. It will return after finishing adding all neighbors to Worklist, so that no neighbors are discarded due to the original early return. llvm-svn: 259691	2016-02-03 20:58:55 +00:00
Balaram Makam	92431703d7	AArch64: Implement missed conditional compare sequences. Summary: This is an extension to the existing implementation of r242436 which restricts to only select inputs. This version fixes missed opportunities in pr26084 by attempting to lower conditional compare sequences of and/or trees with setcc leafs. This will additionaly handle the case when a tree with select input is not a conjunction-disjunction tree but some of the sub trees are conjunction-disjunction trees. Reviewers: jmolloy, t.p.northover, mcrosier, MatzeB Subscribers: mcrosier, llvm-commits, junbuml, haicheng, mssimpso, gberry Differential Revision: http://reviews.llvm.org/D16291 llvm-svn: 259387	2016-02-01 19:13:07 +00:00
Matthias Braun	b30f2f5141	Avoid overly large SmallPtrSet/SmallSet These sets perform linear searching in small mode so it is never a good idea to use SmallSize/N bigger than 32. llvm-svn: 259283	2016-01-30 01:24:31 +00:00
Junmo Park	b3327b7007	[DAGCombiner] Don't add volatile or indexed stores to ChainedStores Summary: findBetterNeighborChains does not handle volatile or indexed stores. However, it did not check when adding stores to ChainedStores. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D16463 llvm-svn: 259024	2016-01-28 06:23:33 +00:00
Simon Pilgrim	b9b8fcd831	Tidied up TRUNC combine code. NFC. Make use of DAG.getBitcast and use clang-format to reduce number of lines (and make it more readable). llvm-svn: 258644	2016-01-23 21:50:40 +00:00
Dan Gohman	0bf3ae84ca	[SelectionDAG] Fold more offsets into GlobalAddresses This reapplies r258296 and r258366, and also fixes an existing bug in SelectionDAG.cpp's isMemSrcFromString, neglecting to account for the offset in a GlobalAddressSDNode, which is uncovered by those patches. llvm-svn: 258482	2016-01-22 03:57:34 +00:00
Reid Kleckner	b7ecfa5b09	Revert "[SelectionDAG] Fold more offsets into GlobalAddresses" This reverts r258296 and the follow up r258366. With this change, we miscompiled the following program on Windows: #include <string> #include <iostream> static const char kData[] = "asdf jkl;"; int main() { std::string s(kData + 3, sizeof(kData) - 3); std::cout << s << '\n'; } llvm-svn: 258465	2016-01-22 01:09:29 +00:00
Dan Gohman	edf98c5682	[SelectionDAG] Fold more offsets into GlobalAddresses SelectionDAG previously missed opportunities to fold constants into GlobalAddresses in several areas. For example, given `(add (add GA, c1), y)`, it would often reassociate to `(add (add GA, y), c1)`, missing the opportunity to create `(add GA+c, y)`. This isn't often visible on targets such as X86 which effectively reassociate adds in their complex address-mode folding logic, however it is currently visible on WebAssembly since it currently has very simple address mode folding code that doesn't reassociate anything. This patch fixes this by making SelectionDAG fold offsets into GlobalAddresses at the same times that it folds constants together, so that it doesn't miss any opportunities to perform such folding. Differential Revision: http://reviews.llvm.org/D16090 llvm-svn: 258296	2016-01-20 07:03:08 +00:00
Sanjay Patel	1dc7dfb9d9	[DAGCombiner] don't dereference an operand that doesn't exist (PR26070) The bug was introduced with changes for x86-64 fp128: http://reviews.llvm.org/rL254653 I don't know why an x86 change is here, so I'll follow up in: http://reviews.llvm.org/D15134 Should fix: https://llvm.org/bugs/show_bug.cgi?id=26070 llvm-svn: 257200	2016-01-08 19:53:24 +00:00
Tim Shen	9b68bd48ca	Test commit access - add a blank line in comment. llvm-svn: 257192	2016-01-08 19:20:23 +00:00
Dan Gohman	797f639e79	[SelectionDAGBuilder] Set NoUnsignedWrap for inbounds gep and load/store offsets. In an inbounds getelementptr, when an index produces a constant non-negative offset to add to the base, the add can be assumed to not have unsigned overflow. This relies on the assumption that addresses can't occupy more than half the address space, which isn't possible in C because it wouldn't be possible to represent the difference between the start of the object and one-past-the-end in a ptrdiff_t. Setting the NoUnsignedWrap flag is theoretically useful in general, and is specifically useful to the WebAssembly backend, since it permits stronger constant offset folding. Differential Revision: http://reviews.llvm.org/D15544 llvm-svn: 256890	2016-01-06 00:43:06 +00:00
Eric Christopher	325e8d06dc	Fix (bitcast (fabs x)), (bitcast (fneg x)) and (bitcast (fcopysign cst, x)) combines for ppc_fp128, since signbit computation is more complicated. Discussion thread: http://lists.llvm.org/pipermail/llvm-dev/2015-November/092863.html Patch by Tim Shen! llvm-svn: 255305	2015-12-10 22:09:06 +00:00
Sanjay Patel	dc627ad63f	fix return values to match bool return type; NFC llvm-svn: 254968	2015-12-07 23:34:30 +00:00
Chih-Hung Hsieh	ed7d81e5d4	[X86] Part 1 to fix x86-64 fp128 calling convention. Almost all these changes are conditioned and only apply to the new x86-64 f128 type configuration, which will be enabled in a follow up patch. They are required together to make new f128 work. If there is any error, we should fix or revert them as a whole. These changes should have no impact to current configurations. * Relax type legalization checks to accept new f128 type configuration, whose TypeAction is TypeSoftenFloat, not TypeLegal, but also has TLI.isTypeLegal true. * Relax GetSoftenedFloat to return in some cases f128 type SDValue, which is TLI.isTypeLegal but not "softened" to i128 node. * Allow customized FABS, FNEG, FCOPYSIGN on new f128 type configuration, to generate optimized bitwise operators for libm functions. * Enhance related Lower* functions to handle f128 type. * Enhance DAGTypeLegalizer::run, SoftenFloatResult, and related functions to keep new f128 type in register, and convert f128 operators to library calls. * Fix Combiner, Emitter, Legalizer routines that did not handle f128 type. * Add ExpandConstant to handle i128 constants, ExpandNode to handle ISD::Constant node. * Add one more parameter to getCommonSubClass and firstCommonClass, to guarantee that returned common sub class will contain the specified simple value type. This extra parameter is used by EmitCopyFromReg in InstrEmitter.cpp. * Fix infinite loop in getTypeLegalizationCost when f128 is the value type. * Fix printOperand to handle null operand. * Enhance ISD::BITCAST node to handle f128 constant. * Expand new f128 type for BR_CC, SELECT_CC, SELECT, SETCC nodes. * Enhance X86AsmPrinter to emit f128 values in comments. Differential Revision: http://reviews.llvm.org/D15134 llvm-svn: 254653	2015-12-03 22:02:40 +00:00
Craig Topper	6066164454	Use a lambda instead of std::bind and std::mem_fn I introduced in r254242. NFC llvm-svn: 254260	2015-11-29 18:05:22 +00:00
Craig Topper	d0573179dc	[SelectionDAG] Use std::any_of instead of a manually coded loop. NFC llvm-svn: 254242	2015-11-29 04:37:11 +00:00
Artyom Skrobov	314ee04268	Expose isXxxConstant() functions from SelectionDAGNodes.h (NFC) Summary: Many target lowerings copy-paste the code to test SDValues for known constants. This code can instead be shared in SelectionDAG.cpp, and reused in the targets. Reviewers: MatzeB, andreadb, tstellarAMD Subscribers: arsenm, jyknight, llvm-commits Differential Revision: http://reviews.llvm.org/D14945 llvm-svn: 254085	2015-11-25 19:41:11 +00:00
Simon Pilgrim	1dfe53e180	Remove duplicate getValueType() calls. NFCI. llvm-svn: 253823	2015-11-22 16:49:38 +00:00
Jonas Paulsson	8f0d2b7f1f	[DAGCombiner] Bugfix for lost chain depenedency. When MergeConsecutiveStores() combines two loads and two stores into wider loads and stores, the chain users of both of the original loads must be transfered to the new load, because it may be that a chain user only depends on one of the loads. New test case: test/CodeGen/SystemZ/dag-combine-01.ll Reviewed by James Y Knight. Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=25310#c6 llvm-svn: 253779	2015-11-21 13:25:07 +00:00
Hans Wennborg	dcc2500452	X86: More efficient legalization of wide integer compares In particular, this makes the code for 64-bit compares on 32-bit targets much more efficient. Example: define i32 @test_slt(i64 %a, i64 %b) { entry: %cmp = icmp slt i64 %a, %b br i1 %cmp, label %bb1, label %bb2 bb1: ret i32 1 bb2: ret i32 2 } Before this patch: test_slt: movl 4(%esp), %eax movl 8(%esp), %ecx cmpl 12(%esp), %eax setae %al cmpl 16(%esp), %ecx setge %cl je .LBB2_2 movb %cl, %al .LBB2_2: testb %al, %al jne .LBB2_4 movl $1, %eax retl .LBB2_4: movl $2, %eax retl After this patch: test_slt: movl 4(%esp), %eax movl 8(%esp), %ecx cmpl 12(%esp), %eax sbbl 16(%esp), %ecx jge .LBB1_2 movl $1, %eax retl .LBB1_2: movl $2, %eax retl Differential Revision: http://reviews.llvm.org/D14496 llvm-svn: 253572	2015-11-19 16:35:08 +00:00
Geoff Berry	2ddfc5e60f	[DAGCombiner] Improve zextload optimization. Summary: Don't fold (zext (and (load x), cst)) -> (and (zextload x), (zext cst)) if (and (load x) cst) will match as a zextload already and has additional users. For example, the following IR: %load = load i32, i32* %ptr, align 8 %load16 = and i32 %load, 65535 %load64 = zext i32 %load16 to i64 store i32 %load16, i32* %dst1, align 4 store i64 %load64, i64* %dst2, align 8 used to produce the following aarch64 code: ldr w8, [x0] and w9, w8, #0xffff and x8, x8, #0xffff str w9, [x1] str x8, [x2] but with this change produces the following aarch64 code: ldrh w8, [x0] str w8, [x1] str x8, [x2] Reviewers: resistor, mcrosier Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14340 llvm-svn: 252789	2015-11-11 19:42:52 +00:00
Matt Arsenault	d8fed1b793	Add target preference for GatherAllAliases max depth llvm-svn: 252775	2015-11-11 18:44:33 +00:00
Sanjay Patel	533c10c651	add a SelectionDAG method to check if no common bits are set in two nodes; NFCI This was suggested in: http://reviews.llvm.org/D13956 and is a follow-on to: http://reviews.llvm.org/rL252515 http://reviews.llvm.org/rL252519 This lets us remove logically equivalent/duplicated code from DAGCombiner and X86ISelDAGToDAG. A corresponding function for IR instructions already exists in ValueTracking. llvm-svn: 252539	2015-11-09 23:31:38 +00:00
Tom Stellard	05691a678e	DAGCombiner: Check shouldReduceLoadWidth before combining (and (load), x) -> extload Reviewers: resistor, arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13805 llvm-svn: 252349	2015-11-06 21:58:37 +00:00
James Y Knight	646c4032e7	Fix two issues in MergeConsecutiveStores: 1) PR25154. This is basically a repeat of PR18102, which was fixed in r200201, and broken again by r234430. The latter changed which of the store nodes was merged into from the first to the last. Thus, we now also need to prefer merging a later store at a given address into the target node, instead of an earlier one. 2) While investigating that, I also realized I'd introduced a bug in r236850. There, I removed a check for alignment -- not realizing that nothing except the alignment check was ensuring that none of the stores were overlapping! This is a really bogus way to ensure there's no aliased stores. A better solution to both of these issues is likely to always use the code added in the 'if (UseAA)' branches which rearrange the chain based on a more principled analysis. I'll look into whether that can be used always, but in the interest of getting things back to working, I think a minimal change makes sense. llvm-svn: 251816	2015-11-02 18:48:08 +00:00
Sanjay Patel	bbd4c79c8f	Use the 'arcp' fast-math-flag when combining repeated FP divisors This is a usage of the IR-level fast-math-flags now that they are propagated to SDNodes. This was originally part of D8900. Removing the global 'enable-unsafe-fp-math' checks will require auto-upgrade and possibly other changes. Differential Revision: http://reviews.llvm.org/D9708 llvm-svn: 251450	2015-10-27 20:27:25 +00:00
Steve King	fee370be72	Fix llc crash processing S/UREM for -Oz builds caused by rL250825. When taking the remainder of a value divided by a constant, visitREM() attempts to convert the REM to a longer but faster sequence of instructions. This conversion calls combine() on a speculative DIV instruction. Commit rL250825 may cause this combine() to return a DIVREM, corrupting nearby nodes. Flow eventually hits unreachable(). This patch adds a test case and a check to prevent visitREM() from trying to convert the REM instruction in cases where a DIVREM is possible. See http://reviews.llvm.org/D14035 llvm-svn: 251373	2015-10-27 00:14:06 +00:00
Simon Pilgrim	7430804fe1	[DAGCombiner] Generalize masking of constant rotates. We don't need a mask of a rotation result to be a constant splat - any constant scalar/vector can be usefully folded. Followup to D13851. llvm-svn: 251197	2015-10-24 18:44:52 +00:00
Simon Pilgrim	d5ef318b5b	[X86][XOP] Add support for lowering vector rotations This patch adds support for lowering to the XOP VPROT / VPROTI vector bit rotation instructions. This has required changes to the DAGCombiner rotation pattern matching to support vector types - so far I've only changed it to support splat vectors, but generalising this further is feasible in the future. Differential Revision: http://reviews.llvm.org/D13851 llvm-svn: 251188	2015-10-24 13:17:26 +00:00
Zia Ansari	8f509a7044	[X86] - Catch extra combine opportunities for redundant imuls. When we fold "mul ((add x, c1), c1)" -> "add ((mul x, c2), c1*c2)", we bail if (add x, c1) has multiple users which would result in an extra add instruction. In such cases, this patch adds a check to see if we can eliminate a multiply instruction in exchange for the extra add. I also added the capability of doing the existing optimization with non-splatted vectors (splatted also works). Differential Revision: http://reviews.llvm.org/D13740 llvm-svn: 251028	2015-10-22 16:14:45 +00:00
Artyom Skrobov	b844fa7fc0	Combining DIV+REM->DIVREM doesn't belong in LegalizeDAG; move it over into DAGCombiner. Summary: In addition to moving the code over, this patch amends the DIV,REM -> DIVREM combining to run on all affected nodes at once: if the nodes are converted to DIVREM one at a time, then the resulting DIVREM may get legalized by the backend into something target-specific that we won't be able to recognize and correlate with the remaining nodes. The motivation is to "prepare terrain" for D13862: when we set DIV and REM to be legalized to libcalls, instead of the DIVREM, we otherwise lose the ability to combine them together. To prevent this, we need to take the DIV,REM -> DIVREM combining out of the lowering stage. Reviewers: RKSimon, eli.friedman, rengolin Subscribers: john.brawn, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D13733 llvm-svn: 250825	2015-10-20 13:06:02 +00:00
Artyom Skrobov	4bca0bb010	A doccomment for CombineTo, and some NFC refactorings Summary: Caching SDLoc(N), instead of recreating it in every single function call, keeps the code denser, and allows to unwrap long lines. Reviewers: sunfish, atrick, sdmitrouk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13726 llvm-svn: 250305	2015-10-14 17:18:35 +00:00
Artyom Skrobov	a5b9ad22b3	Merge DAGCombiner::visitSREM and DAGCombiner::visitUREM (NFC) Summary: The two implementations had more code in common than not. Reviewers: sunfish, MatzeB, sdmitrouk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13724 llvm-svn: 250302	2015-10-14 16:54:14 +00:00
Matt Arsenault	e5d9515fb7	DAGCombiner: Don't stop finding better chain on 2 aliases The comment says this was stopped because it was unlikely to be profitable. This is not true if you want to combine vector loads with multiple components. For a simple case that looks like t0 = load t0 ... t1 = load t0 ... t2 = load t0 ... t3 = load t0 ... t4 = store t0:1, t0:1 t5 = store t4, t1:0 t6 = store t5, t2:0 t7 = store t6, t3:0 We want to get all of these stores onto a chain that is a TokenFactor of these N loads. This mostly solves the AMDGPU merge-stores.ll regressions with -combiner-alias-analysis for merging vector stores of vector loads. llvm-svn: 250138	2015-10-13 00:49:00 +00:00
Matt Arsenault	61dc235f20	DAGCombiner: Combine extract_vector_elt from build_vector This basic combine was surprisingly missing. AMDGPU legalizes many operations in terms of 32-bit vector components, so not doing this results in many extra copies and subregister extracts that need to be cleaned up later. InstCombine already does this for the hasOneUse case. The target hook is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn from a vector materialize repeated immediate instruction to a constant vector load with more scalar copies from it. llvm-svn: 250129	2015-10-12 23:59:50 +00:00
Simon Pilgrim	c8832fc233	[SelectionDAG] Add common vector constant folding helper function We have a number of functions that implement constant folding of vectors (unary and binary ops) in near identical manners (and the differences don't appear to be critical). This patch introduces a common implementation (SelectionDAG::FoldConstantVectorArithmetic) and calls this in both the unary and binary op cases. After this initial patch I intend to begin enabling vector constant folding for a wider number of opcodes in SelectionDAG::getNode(). Differential Revision: http://reviews.llvm.org/D13665 llvm-svn: 250118	2015-10-12 23:00:11 +00:00
Simon Pilgrim	d45c88bbb5	[DAGCombiner] Improved FMA combine support for vectors Enabled constant canonicalization for all constants. Improved combining of constant vectors. llvm-svn: 249993	2015-10-11 19:48:12 +00:00
Simon Pilgrim	5eac2607b9	[DAGCombiner] Tidyup FMINNUM/FMAXNUM constant folding Enable constant folding for vector splats as well as scalars. Enable constant canonicalization for all scalar and vector constants. llvm-svn: 249978	2015-10-11 16:02:28 +00:00
Simon Pilgrim	dde63374c5	[DAGCombiner] Generalize FADD constant combines to work with vectors Updated the FADD combines to work with vectors as well as scalars. Differential Revision: http://reviews.llvm.org/D13416 llvm-svn: 249251	2015-10-03 22:06:06 +00:00
Simon Pilgrim	a38d76a087	[DAGCombiner] Merge SIGN_EXTEND_INREG vector constant folding methods. NCI. visitSIGN_EXTEND_INREG calls SelectionDAG::getNode to constant fold scalar constants but handles vector constants itself, despite getNode being capable of dealing with them. This required a minor change to the getNode implementation to actually deal with cases where the scalars of a BUILD_VECTOR were wider integers than the vector type - which was the only extra ability of the visitSIGN_EXTEND_INREG implementation. No codegen intended and all existing tests remain the same. llvm-svn: 249236	2015-10-03 16:26:52 +00:00
Hal Finkel	bd582581b8	[DAGCombine] Fix getStoreMergeAndAliasCandidates's AA-enabled chain walking When AA is being used, non-aliasing stores are canonicalized to use the same chain, and DAGCombiner::getStoreMergeAndAliasCandidates can take advantage of this by looking only as users of a store's chain operand. However, user iteration is not result-number specific, we need to check that the use is as a chain operand, and not via some other operand. It is certainly possible to have another potentially-aliasing store, which shares the first's base pointer, and uses the first's chain's node via some other operand. Failure to catch this situation caused, at least in the included test case, an assert later because the relative sequence-number ordering caused later replacement to create a cycle in the DAG. llvm-svn: 248698	2015-09-28 08:02:14 +00:00
Matt Arsenault	3c07e963b8	DAGCombiner: Check if store is volatile first This is the simpler check. NFC. llvm-svn: 248625	2015-09-25 22:06:19 +00:00
Sanjay Patel	bbbf9a1a34	merge vector stores into wider vector stores and fix AArch64 misaligned access TLI hook (PR21711) This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ). The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned accesses up in performSTORECombine() because they are slow. This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving existing (perhaps questionable) lowering behavior. The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned stores. Differential Revision: http://reviews.llvm.org/D12635 llvm-svn: 248622	2015-09-25 21:49:48 +00:00
Matt Arsenault	c721df0478	Use new TokenFactor chain when merging stores If the stores are storing values from loads which partially alias the stores, we could end up placing the merged loads and stores on the same chain which has the potential to break. Each store may have a different chain dependency on only some of the original loads. Create a new TokenFactor to capture all of the required dependencies of the stores rather than assuming all stores can use the same chain. The testcase is a situation where this happens, although it does not have an observable change from this. The DAG nodes just happened to not be reordered before despite this missing chain dependency. This is based on an off-list report for an out of tree target which regressed due to r246307 and I haven't managed to find a case where the nodes do end up reordered with an in tree target. llvm-svn: 248468	2015-09-24 07:22:38 +00:00
Simon Pilgrim	4003ed2da3	[DAGCombiner] Improve FMA support for interpolation patterns This patch adds support for combining patterns such as (FMUL(FADD(1.0, x), y)) and (FMUL(FSUB(x, 1.0), y)) to their FMA equivalents. This is useful in particular for linear interpolation cases such as (FADD(FMUL(x, t), FMUL(y, FSUB(1.0, t)))) Differential Revision: http://reviews.llvm.org/D13003 llvm-svn: 248210	2015-09-21 20:32:48 +00:00
Simon Pilgrim	e8e5a17a12	[DAGCombiner] Tidy up FMA combine helpers. NFCI. Based on feedback for D13003. llvm-svn: 248206	2015-09-21 20:15:03 +00:00
Matt Arsenault	8fb9b94f7f	Fix accidentally committed debug printing llvm-svn: 248190	2015-09-21 18:21:10 +00:00
Matt Arsenault	b774834429	DAGCombiner: Replace store of FP constant after attemping store merges If storing multiple FP constants, some subset of the stores would be replaced with integers due to visit order, so MergeConsecutiveStores would only partially merge these. llvm-svn: 248169	2015-09-21 15:59:46 +00:00
Matt Arsenault	a30ddb6524	Factor replacement of stores of FP constants into new function llvm-svn: 248168	2015-09-21 15:59:43 +00:00
Craig Topper	0013be16ff	Use makeArrayRef or None to avoid unnecessarily mentioning the ArrayRef type extra times. NFC llvm-svn: 248140	2015-09-21 05:32:41 +00:00
Sanjay Patel	a260701bbb	propagate fast-math-flags on DAG nodes After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing, so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests: if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is one test case in this patch to prove that point. This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF ( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes. This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the current global settings. Differential Revision: http://reviews.llvm.org/D12095 llvm-svn: 247815	2015-09-16 16:31:21 +00:00
Silviu Baranga	df9ce8408a	[DAGCombine] Truncate BUILD_VECTOR operators if necessary when constant folding vectors Summary: The BUILD_VECTOR node will truncate its operators to match the type. We need to take this into account when constant folding - we need to perform a truncation before constant folding the elements. This is because the upper bits can change the result, depending on the operation type (for example this is the case for min/max). This change also adds a regression test. Reviewers: jmolloy Subscribers: jmolloy, llvm-commits Differential Revision: http://reviews.llvm.org/D12697 llvm-svn: 247265	2015-09-10 10:34:34 +00:00
Sanjay Patel	ce74db9d8d	check for fastness before merging in DAGCombiner::MergeConsecutiveStores() Use and check the 'IsFast' optional parameter to TLI.allowsMemoryAccess() any time we have a merged access candidate. Without this patch, we were generating unaligned 16-byte (SSE) memops for x86 targets where those accesses are slow. This change was mentioned in: http://reviews.llvm.org/D10662 and http://reviews.llvm.org/D10905 and will help solve PR21711. Differential Revision: http://reviews.llvm.org/D12573 llvm-svn: 246771	2015-09-03 15:03:19 +00:00
Hal Finkel	1baec5323b	[DAGCombine] Fixup SETCC legality checking SETCC is one of those special node types for which operation actions (legality, etc.) is keyed off of an operand type, not the node's value type. This makes sense because the value type of a legal SETCC node is determined by its operands' value type (via the TLI function getSetCCResultType). When the SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value type, or directly with the value type provided by TLI.getSetCCResultType. The first problem being fixed here is that DAGCombine had several places querying TLI.isOperationLegal on SETCC, but providing the return of getSetCCResultType, instead of the operand type directly. This does not mean what the author thought, and "luckily", most in-tree targets have SETCC with Custom lowering, instead of marking them Legal, so these checks return false anyway. The second problem being fixed here is that two of the DAGCombines could create SETCC nodes with arbitrary (integer) value types; specifically, those that would simplify: (setcc a, b, op1) and\|or (setcc a, b, op2) -> setcc a, b, op3 (which is possible for some combinations of (op1, op2)) If the operands of the and\|or node are actual setcc nodes, then this is not an issue (because the and\|or must share the same type), but, the relevant code in DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls DAGCombiner::isSetCCEquivalent on each operand, and that function will recognise setcc-like select_cc nodes with other return types. And, thus, when creating new SETCC nodes, we need to be careful to respect the value-type constraint. This is even true before type legalization, because it is quite possible for the SELECT_CC node to have a legal type that does not happen to match the corresponding TLI.getSetCCResultType type. To be explicit, there is nothing that later fixes the value types of SETCC nodes (if the type is legal, but does not happen to match TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to work only because, either MVT::i1 is not legal, or it is what TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change, however. For the time being, restrict the relevant transformations to produce only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1 prior to type legalization). Fixes PR24636. llvm-svn: 246507	2015-08-31 23:15:04 +00:00
Sanjay Patel	719b3e6a3e	don't set a legal vector type if we know we can't use that type (NFCI) Added benefit: the 'if' logic now matches the text of the comment that describes it. llvm-svn: 246506	2015-08-31 22:59:03 +00:00
Sanjay Patel	218cbd5a48	generalize helper function of MergeConsecutiveStores to handle vector types (NFCI) This was part of D7208 (r227242), but that commit was reverted because it exposed a bug in AArch64 lowering. I should have that fixed and the rest of the commit reinstated soon. llvm-svn: 246493	2015-08-31 21:50:16 +00:00
Hal Finkel	2483f2060a	[DAGCombine] Use getSetCCResultType utility function DAGCombine has a utility wrapper around TLI's getSetCCResultType; use it in the one place in DAGCombine still directly calling the TLI function. NFC. llvm-svn: 246482	2015-08-31 20:42:38 +00:00
Hal Finkel	a894266d28	[DAGCombine] Remove some old dead code for forming SETCC nodes This code was dead when it was committed in r23665 (Oct 7, 2005), and before it reaches its 10th anniversary, it really should go. We can always bring it back if we'd like, but it forms more SETCC nodes, and the way we do legality checking on SETCC nodes is wrong in a number of places, and removing this means fewer places to fix. NFC. llvm-svn: 246466	2015-08-31 18:38:55 +00:00
Matt Arsenault	d9c830154f	Make MergeConsecutiveStores look at other stores on same chain When combiner AA is enabled, look at stores on the same chain. Non-aliasing stores are moved to the same chain so the existing code fails because it expects to find an adajcent store on a consecutive chain. Because of how DAGCombiner tries these store combines, MergeConsecutiveStores doesn't see the correct set of stores on the chain when it visits the other stores. Each store individually has its chain fixed before trying to merge consecutive stores, and then tries to merge stores from that point before the other stores have been processed to have their chains fixed. To fix this, attempt to use FindBetterChain on any possibly neighboring stores in visitSTORE. Suppose you have 4 32-bit stores that should be merged into 1 vector store. One store would be visited first, fixing the chain. What happens is because not all of the store chains have yet been fixed, 2 of the stores are merged. The other 2 stores later have their chains fixed, but because the other stores were already merged, they have different memory types and merging the two different sized stores is not supported and would be more difficult to handle. llvm-svn: 246307	2015-08-28 17:31:28 +00:00
Ahmed Bougacha	87166905c8	[CodeGen] Check FoldConstantArithmetic result before using it. Fixes PR24602: r245689 introduced an unguarded use of SelectionDAG::FoldConstantArithmetic, which returns 0 when it fails because of opaque (hoisted) constants. llvm-svn: 246217	2015-08-27 21:46:04 +00:00
Steve King	5cdbd20cc3	Pass function attributes instead of boolean in isIntDivCheap(). llvm-svn: 245921	2015-08-25 02:31:21 +00:00
Oliver Stannard	284f2bffc9	Add DAG optimisation for FP16_TO_FP The FP16_TO_FP node only uses the bottom 16 bits of its input, so the following pattern can be optimised by removing the AND: (FP16_TO_FP (AND op, 0xffff)) -> (FP16_TO_FP op) This is a common pattern for ARM targets when functions have __fp16 arguments, as they are passed as floats (so that they get passed in the correct registers), but then bitcast and truncated to ignore the top 16 bits. llvm-svn: 245832	2015-08-24 09:47:45 +00:00
Simon Pilgrim	2a7049abe0	[DAGCombiner] Fold CONCAT_VECTORS of bitcasted EXTRACT_SUBVECTOR Minor generalization of D12125 - peek through any bitcast to the original vector that we're extracting from. llvm-svn: 245814	2015-08-23 15:22:14 +00:00
Mehdi Amini	5aa7bd7d62	Do not use dyn_cast<> after isa<> Reported by coverity. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 245799	2015-08-23 00:27:57 +00:00
John Brawn	eab960c46f	[DAGCombiner] Fold together mul and shl when both are by a constant This is intended to improve code generation for GEPs, as the index value is shifted by the element size and in GEPs of multi-dimensional arrays the index of higher dimensions is multiplied by the lower dimension size. Differential Revision: http://reviews.llvm.org/D12197 llvm-svn: 245689	2015-08-21 10:48:17 +00:00
Simon Pilgrim	35f528262f	[DAGCombiner] Added SMAX/SMIN/UMAX/UMIN constant folding We still need to add constant folding of vector comparisons to fold the tests for targets that don't support the respective min/max nodes I needed to update 2011-12-06-AVXVectorExtractCombine to load a vector instead of using a constant vector to prevent it folding Differential Revision: http://reviews.llvm.org/D12118 llvm-svn: 245503	2015-08-19 21:11:58 +00:00
Simon Pilgrim	989cbbd2f5	[DAGCombiner] Fold CONCAT_VECTORS of EXTRACT_SUBVECTOR (or undef) to VECTOR_SHUFFLE. Check to see if this is a CONCAT_VECTORS of a bunch of EXTRACT_SUBVECTOR operations. If so, and if the EXTRACT_SUBVECTOR vector inputs come from at most two distinct vectors the same size as the result, attempt to turn this into a legal shuffle. Differential Revision: http://reviews.llvm.org/D12125 llvm-svn: 245490	2015-08-19 20:09:50 +00:00
Michael Kuperstein	dcdab4cd3a	[TLI] Refactor "is integer division cheap" queries. This removes the isPow2SDivCheap() query, as it is not currently used in any meaningful way. isIntDivCheap() no longer relies on a state variable (as all in-tree target set it to false), but the interface allows querying based on the type optimization level. NFC. Differential Revision: http://reviews.llvm.org/D12082 llvm-svn: 245430	2015-08-19 11:17:59 +00:00
Steve King	d4c8f70ce1	Fix backward operands in call to isTruncateFree() and improve comments. llvm-svn: 245385	2015-08-18 23:02:41 +00:00
Matthias Braun	fa3b248a66	DAGCombiner: Improve DAGCombiner select normalization The current code normalizes select(C0, x, select(C1, x, y)) towards select(C0\|C1, x, y) if the targets prefers that form. This patch adds an additional rule that if the select(C1, x, y) part already exists in the function then we want to normalize into the other direction because the effects of reusing the existing value are bigger than transforming into the target preferred form. This addresses regressions following r238793, see also: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150727/290272.html Differential Revision: http://reviews.llvm.org/D11616 llvm-svn: 245350	2015-08-18 20:48:36 +00:00
Matthias Braun	2e920bd04f	DAGCombiner: Optimize SELECTs first before turning them into SELECT_CC This is part of http://reviews.llvm.org/D11616 - I just decided to split this up into a separate commit. llvm-svn: 245349	2015-08-18 20:48:29 +00:00
Sanjay Patel	3ab4a73bac	use SDValue bool operator; NFCI llvm-svn: 245181	2015-08-16 17:54:28 +00:00
Simon Pilgrim	0750c84623	[DAGCombiner] Attempt to mask vectors before zero extension instead of after. For cases where we TRUNCATE and then ZERO_EXTEND to a larger size (often from vector legalization), see if we can mask the source data and then ZERO_EXTEND (instead of after a ANY_EXTEND). This can help avoid having to generate a larger mask, and possibly applying it to several sub-vectors. (zext (truncate x)) -> (zext (and(x, m)) Includes a minor patch to SystemZ to better recognise 8/16-bit zero extension patterns from RISBG bit-extraction code. This is the first of a number of minor patches to help improve the conversion of byte masks to clear mask shuffles. Differential Revision: http://reviews.llvm.org/D11764 llvm-svn: 245160	2015-08-15 13:27:30 +00:00
Alex Lorenz	e40c8a2b26	PseudoSourceValue: Replace global manager with a manager in a machine function. This commit removes the global manager variable which is responsible for storing and allocating pseudo source values and instead it introduces a new manager class named 'PseudoSourceValueManager'. Machine functions now own an instance of the pseudo source value manager class. This commit also modifies the 'get...' methods in the 'MachinePointerInfo' class to construct pseudo source values using the instance of the pseudo source value manager object from the machine function. This commit updates calls to the 'get...' methods from the 'MachinePointerInfo' class in a lot of different files because those calls now need to pass in a reference to a machine function to those methods. This change will make it easier to serialize pseudo source values as it will enable me to transform the mips specific MipsCallEntry PseudoSourceValue subclass into two target independent subclasses. Reviewers: Akira Hatanaka llvm-svn: 244693	2015-08-11 23:09:45 +00:00
Jingyue Wu	99eb4685ef	SelectionDAG: Prefer to combine multiplication with less uses for fma Summary: For example: s6 = s0s5; s2 = s6s6 + s6; ... s4 = s6*s3; We notice that it is possible for s2 is folded to fma (s0, s5, fmul (s6 s6)). This only happens when Aggressive is true, otherwise hasOneUse() check already prevents from folding the multiplication with more uses. Test Plan: test/CodeGen/NVPTX/fma-assoc.ll Patch by Xuetian Weng Reviewers: hfinkel, apazos, jingyue, ohsallen, arsenm Subscribers: arsenm, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D11855 llvm-svn: 244649	2015-08-11 19:21:46 +00:00
Sanjay Patel	924879ad2c	wrap OptSize and MinSize attributes for easier and consistent access (NFCI) Create wrapper methods in the Function class for the OptimizeForSize and MinSize attributes. We want to hide the logic of "or'ing" them together when optimizing just for size (-Os). Currently, we are not consistent about this and rely on a front-end to always set OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here that should be added as follow-on patches with regression tests. This patch is NFC-intended: it just replaces existing direct accesses of the attributes by the equivalent wrapper call. Differential Revision: http://reviews.llvm.org/D11734 llvm-svn: 243994	2015-08-04 15:49:57 +00:00
Simon Pilgrim	f328fd4441	Remove trailing whitespace. NFCI. llvm-svn: 243838	2015-08-01 17:06:47 +00:00
Simon Pilgrim	b447dc5aaa	Use SDValue bool check. NFCI. llvm-svn: 243837	2015-08-01 17:05:50 +00:00
Simon Pilgrim	503a2594c3	[DAGCombiner] Convert constant AND masks to shuffle clear masks down to the byte level The XformToShuffleWithZero method currently checks AND masks at the per-lane level for all-one and all-zero constants and attempts to convert them to legal shuffle clear masks. This patch generalises XformToShuffleWithZero, splitting and checking the sub-lanes of the constants down to the byte level to see if any legal shuffle clear masks are possible. This allows a lot of masks (often from legalization or truncation) to be folded into existing shuffle patterns and removes a lot of constant mask loading. There are a few examples of poor shuffle lowering that are exposed by this patch that will be cleaned up in future patches (e.g. merging shuffles that are separated by bitcasts, x86 legalized v8i8 zero extension uses PMOVZX+AND+AND instead of AND+PMOVZX, etc.) Differential Revision: http://reviews.llvm.org/D11518 llvm-svn: 243831	2015-08-01 10:01:46 +00:00
Sanjay Patel	0f9dcf8b90	move DAGCombiner's allowableAlignment() helper function into the TLI Making allowableAlignment() more accessible was suggested as a predecessor patch for D10662, so I've pulled it into TargetLowering. This let's us remove 4 instances of duplicate logic in LegalizeDAG. There's a subtle functional change in the implementation: the existing allowableAlignment() code was using getPrefTypeAlignment() when checking alignment with the DataLayout and assumed that was fast. In this implementation, we use getABITypeAlignment() and assume that is fast. See the TODO comment or the discussion in the Phab review for future improvements in this implementation (don't use the data layout at all). There are no regression test changes from this difference, and I'm not sure how to expose it via a test. I think we actually do want to provide the 'Fast' param when checking this from DAGCombiner::MergeConsecutiveStores(). Ie, we shouldn't merge stores if the new stores are not going to be fast. But that change will require fixing allowsMisalignedMemoryAccess() overrides as noted in D10662. Differential Revision: http://reviews.llvm.org/D10905 llvm-svn: 243549	2015-07-29 18:24:18 +00:00
Sanjay Patel	133e68b45c	ignore duplicate divisor uses when transforming into reciprocal multiplies (PR24141) PR24141: https://llvm.org/bugs/show_bug.cgi?id=24141 contains a test case where we have duplicate entries in a node's uses() list. After r241826, we use CombineTo() to delete dead nodes when combining the uses into reciprocal multiplies, but this fails if we encounter the just-deleted node again in the list. The solution in this patch is to not add duplicate entries to the list of users that we will subsequently iterate over. For the test case, this avoids triggering the combine divisors logic entirely because there really is only one user of the divisor. Differential Revision: http://reviews.llvm.org/D11345 llvm-svn: 243500	2015-07-28 23:28:22 +00:00
Sanjay Patel	1dd15598cf	fix TLI's combineRepeatedFPDivisors interface to return the minimum user threshold This fix was suggested as part of D11345 and is part of fixing PR24141. With this change, we can avoid walking the uses of a divisor node if the target doesn't want the combineRepeatedFPDivisors transform in the first place. There is no NFC-intended other than that. Differential Revision: http://reviews.llvm.org/D11531 llvm-svn: 243498	2015-07-28 23:05:48 +00:00
Sanjay Patel	c1c2b87001	move combineRepeatedFPDivisors logic into a helper function; NFCI llvm-svn: 243293	2015-07-27 17:58:49 +00:00
Simon Pilgrim	4ef0576c40	[DAGCombiner] Fixed minor typo that was missed in D9097. We don't bitcast the UNDEFs - that is done in visitVECTOR_SHUFFLE, and the getValueType should come from the operand's SDValue not the SDNode. llvm-svn: 242640	2015-07-19 11:31:40 +00:00
Simon Pilgrim	3aca32ea4a	Use SDValue bool check. NFCI. llvm-svn: 242636	2015-07-19 09:56:36 +00:00
Matt Arsenault	cabe02e141	Only do fmul (fadd x, x), c combine if the fadd only has one use This was increasing the instruction count if the fadd has multiple uses. llvm-svn: 242498	2015-07-17 01:14:35 +00:00
Pete Cooper	7e64ef06e6	Use more foreach loops in SelectionDAG. NFC llvm-svn: 242249	2015-07-14 23:43:29 +00:00
Matt Arsenault	f54dc2384d	DAGCombiner: Assume invariant load cannot alias a store The motivation is to allow GatherAllAliases / FindBetterChain to not give up on dependent loads of a pointer from constant memory. This is important for AMDGPU, because most loads are pointers derived from a load of a kernel argument from constant memory. llvm-svn: 241948	2015-07-10 22:17:40 +00:00
Sanjay Patel	e2361d4a18	fix an invisible bug when combining repeated FP divisors This patch fixes bugs that were exposed by the addition of fast-math-flags in the DAG: r237046 ( http://reviews.llvm.org/rL237046 ): 1. When replacing a division node, it's not enough to RAUW. We should call CombineTo() to delete dead nodes and combine again. 2. Because we are changing the DAG, we can't return an empty SDValue after the transform. As the code comments say: Visitation implementation - Implement dag node combining for different node types. The semantics are as follows: Return Value: SDValue.getNode() == 0 - No change was made SDValue.getNode() == N - N was replaced, is dead and has been handled. otherwise - N should be replaced by the returned Operand. The new test case shows no difference with or without this patch, but it will crash if we re-apply r237046 or enable FMF via the current -enable-fmf-dag cl::opt. Differential Revision: http://reviews.llvm.org/D9893 llvm-svn: 241826	2015-07-09 17:28:37 +00:00
Mehdi Amini	eaabc51e78	Re-instate the EVT parameter to getScalarShiftAmountTy() for OOT user A documentation for this function would be nice by the way. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241807	2015-07-09 15:12:23 +00:00
Mehdi Amini	0cdec1e2ab	Make isLegalAddressingMode() taking DataLayout as an argument Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11040 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241778	2015-07-09 02:09:40 +00:00
Mehdi Amini	9639d650bb	Make TargetLowering::getShiftAmountTy() taking DataLayout as an argument Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11037 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241776	2015-07-09 02:09:20 +00:00
Mehdi Amini	44ede33a69	Make TargetLowering::getPointerTy() taking DataLayout as an argument Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits Differential Revision: http://reviews.llvm.org/D11028 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241775	2015-07-09 02:09:04 +00:00
Sanjay Patel	c1afa95a51	early exits -> less indenting; NFCI llvm-svn: 241716	2015-07-08 19:32:39 +00:00
Mehdi Amini	ffc1402fad	Remove IsLittleEndian from TargetLowering and redirect to DataLayout Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11017 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241655	2015-07-08 01:00:38 +00:00
Mehdi Amini	8ac7a9d57a	Redirect DataLayout from TargetMachine to Module in SelectionDAG Summary: SelectionDAG itself is not invoking directly the DataLayout in the TargetMachine, but the "TargetLowering" class is still using it. I'll address it in a following commit. This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11000 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241618	2015-07-07 19:07:19 +00:00
Pawel Bylica	c52eabb285	Reapply r240291: Fix shl folding in DAG combiner. The code responsible for shl folding in the DAGCombiner was assuming incorrectly that all constants are less than 64 bits. This patch simply changes the way values are compared. It has been reverted previously because of some problems with comparing APInt with raw uint64_t. That has been fixed/changed with r241204. llvm-svn: 241254	2015-07-02 11:44:54 +00:00
Pawel Bylica	143ceb6d46	[DAGCombiner] Fix & simplify constant folding of sext/zext. Summary: This patch fixes the cases of sext/zext constant folding in DAG combiner where constans do not fit 64 bits. The fix simply removes un$ Test Plan: New regression test included. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: RKSimon, llvm-commits Differential Revision: http://reviews.llvm.org/D10607 llvm-svn: 240991	2015-06-29 20:28:47 +00:00
Benjamin Kramer	5b455f0b62	[SDAG] Now that we have a way to communicate the exact bit on sdiv use it to simplify sdiv by a constant. We had a hack in SDAGBuilder in place to work around this but now we can avoid that. Call BuildExactSDIV from BuildSDIV so DAGCombiner can perform this trick automatically. The added check in DAGCombiner is necessary to prevent exact sdiv by pow2 from regressing as the target-specific pow2 lowering is not aware of exact bits yet. This is mostly covered by existing tests. One side effect is that we get the better lowering for exact vector sdivs now too :) llvm-svn: 240891	2015-06-27 20:33:26 +00:00
Pete Cooper	8c0a710995	Convert a bunch of loops to foreach. NFC. This uses the new SDNode::op_values() iterator range committed in r240805. llvm-svn: 240809	2015-06-26 18:41:54 +00:00
Benjamin Kramer	07e70b4fa4	[DAGCombine] fold (X >>?,exact C1) << C2 --> X << (C2-C1) Instcombine also does this but many opportunities only become visible after GEPs are lowered. llvm-svn: 240787	2015-06-26 14:51:36 +00:00
Matt Arsenault	f735cab986	DAGCombiner: Use pop_back_val() llvm-svn: 240709	2015-06-25 22:15:05 +00:00
Matt Arsenault	c244dcb804	DAGCombiner: Remove redundant check MemIntrinsicSDNode is already a subclass of MemSDNode, so the MemSDNode check is sufficient. llvm-svn: 240672	2015-06-25 18:47:02 +00:00
Alexander Kornienko	f00654e31b	Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC) Apparently, the style needs to be agreed upon first. llvm-svn: 240390	2015-06-23 09:49:53 +00:00
Pawel Bylica	e6fd8c4232	Revert r240291: causes problems in self-hosted builds. llvm-svn: 240343	2015-06-22 21:54:07 +00:00
Pawel Bylica	06407c0320	Fix shl folding in DAG combiner. Summary: The code responsible for shl folding in the DAGCombiner was assuming incorrectly that all constants are less than 64 bits. This patch simply changes the way values are compared. Test Plan: A regression test included. Reviewers: andreadb Reviewed By: andreadb Subscribers: andreadb, test, llvm-commits Differential Revision: http://reviews.llvm.org/D10602 llvm-svn: 240291	2015-06-22 15:58:11 +00:00
Chandler Carruth	c3f49eb451	[PM/AA] Hoist the AliasResult enum out of the AliasAnalysis class. This will allow classes to implement the AA interface without deriving from the class or referencing an internal enum of some other class as their return types. Also, to a pretty fundamental extent, concepts such as 'NoAlias', 'MayAlias', and 'MustAlias' are first class concepts in LLVM and we aren't saving anything by scoping them heavily. My mild preference would have been to use a scoped enum, but that feature is essentially completely broken AFAICT. I'm extremely disappointed. For example, we cannot through any reasonable[1] means construct an enum class (or analog) which has scoped names but converts to a boolean in order to test for the possibility of aliasing. [1]: Richard Smith came up with a "solution", but it requires class templates, and lots of boilerplate setting up the enumeration multiple times. Something like Boost.PP could potentially bundle this up, but even that would be quite painful and it doesn't seem realistically worth it. The enum class solution would probably work without the need for a bool conversion. Differential Revision: http://reviews.llvm.org/D10495 llvm-svn: 240255	2015-06-22 02:16:51 +00:00
Alexander Kornienko	70bc5f1398	Fixed/added namespace ending comments using clang-tidy. NFC The patch is generated using this command: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-,llvm-namespace-comment -header-filter='llvm/.\|clang/.*' \ llvm/lib/ Thanks to Eugene Kosov for the original patch! llvm-svn: 240137	2015-06-19 15:57:42 +00:00
Chandler Carruth	ac80dc7532	[PM/AA] Remove the Location typedef from the AliasAnalysis class now that it is its own entity in the form of MemoryLocation, and update all the callers. This is an entirely mechanical change. References to "Location" within AA subclases become "MemoryLocation", and elsewhere "AliasAnalysis::Location" becomes "MemoryLocation". Hope that helps out-of-tree folks update. llvm-svn: 239885	2015-06-17 07:18:54 +00:00
Sanjay Patel	0fcc53f6d6	rename variables; NFC ...because I see 'StoreBW' and read it as 'store bandwidth' llvm-svn: 239850	2015-06-16 20:47:19 +00:00
Sanjay Patel	bb385ed454	extract some code into a helper function for MergeConsecutiveStores(); NFCI llvm-svn: 239847	2015-06-16 20:05:00 +00:00
Sanjay Patel	f134048b1d	propagate IR-level fast-math-flags to DAG nodes, disabled by default This is an updated version of the patch that was checked in at: http://reviews.llvm.org/rL237046 but subsequently reverted because it exposed a bug in the DAG Combiner: http://reviews.llvm.org/D9893 This time, there's an enablement flag ("EnableFMFInDAG") around the code in SelectionDAGBuilder where we copy the set of FP optimization flags from IR instructions to DAG nodes. So, in theory, there should be no functional change from this patch as-is, but it will allow testing with the added functionality to proceed via "-enable-fmf-dag" passed to llc. This patch adds the minimum plumbing necessary to use IR-level fast-math-flags (FMF) in the backend without actually using them for anything yet. This is a follow-on to: http://reviews.llvm.org/rL235997 Differential Revision: http://reviews.llvm.org/D10403 llvm-svn: 239828	2015-06-16 16:25:43 +00:00
Matt Arsenault	ed891b5561	Revert "Revert "Fix merges of non-zero vector stores"" Reapply r239539. Don't assume the collected number of stores is the same vector size. Just take the first N stores to fill the vector. llvm-svn: 239825	2015-06-16 15:51:48 +00:00
Simon Pilgrim	d3f6427446	[DAGCombiner] Added BSWAP(BSWAP(x)) -> x combine pattern. llvm-svn: 239682	2015-06-13 16:25:12 +00:00
Simon Pilgrim	011381d48b	[DAGCombiner] Added BSWAP vector constant folding support. llvm-svn: 239675	2015-06-13 14:08:15 +00:00
Simon Pilgrim	096cccd01a	Stripped trailing whitespace. NFC. llvm-svn: 239674	2015-06-13 12:57:36 +00:00
Reid Kleckner	2691c59e97	Revert "Fix merges of non-zero vector stores" This reverts commit r239539. It was causing SDAG assertions while building freetype. llvm-svn: 239543	2015-06-11 17:25:24 +00:00
Matt Arsenault	e23a063dc3	Fix merges of non-zero vector stores Now actually stores the non-zero constant instead of 0. I somehow forgot to include this part of r238108. The test change was just an independent instruction order swap, so just add another check line to satisfy CHECK-NEXT. llvm-svn: 239539	2015-06-11 16:03:52 +00:00
Simon Pilgrim	4791f6d89b	[DAGCombiner] Added CTLZ vector constant folding support. llvm-svn: 239305	2015-06-08 16:19:00 +00:00
Simon Pilgrim	c789e1d57b	[DAGCombiner] Added CTTZ vector constant folding support. llvm-svn: 239293	2015-06-08 09:57:09 +00:00
Simon Pilgrim	68cd237f57	[DAGCombiner] Added CTPOP vector constant folding support. Added tests to the existing SSE/AVX test files. llvm-svn: 239252	2015-06-07 15:37:14 +00:00
Fiona Glaser	666e352440	DAGCombiner: don't duplicate (fmul x, c) in visitFNEG if fneg is free For targets with a free fneg, this fold is always a net loss if it ends up duplicating the multiply, so definitely avoid it. This might be true for some targets without a free fneg too, but I'll leave that for future investigation. llvm-svn: 239167	2015-06-05 17:52:34 +00:00
Andrea Di Biagio	eb33134ce7	Simplify code; NFC. Also, moved test cases from CodeGen/X86/fold-buildvector-bug.ll into CodeGen/X86/buildvec-insertvec.ll and regenerated CHECK lines using update_llc_test_checks.py. llvm-svn: 239142	2015-06-05 10:29:55 +00:00
Andrea Di Biagio	9ac8a6b13d	[DAGCombiner] Fix wrong folding of a build_vector into a blend with zero. Method 'visitBUILD_VECTOR' in the DAGCombiner knows how to combine a build_vector of a bunch of extract_vector_elt nodes and constant zero nodes into a shuffle blend with a zero vector. However, method 'visitBUILD_VECTOR' forgot that a floating point build_vector may contain negative zero as well as positive zero. Example: define <2 x double> @example(<2 x double> %A) { entry: %0 = extractelement <2 x double> %A, i32 0 %1 = insertelement <2 x double> undef, double %0, i32 0 %2 = insertelement <2 x double> %1, double -0.0, i32 1 ret <2 x double> %2 } Before this patch, llc (with -mattr=+sse4.1) wrongly generated movq %xmm0, %xmm0 # xmm0 = xmm0[0],zero So, the sign bit of the negative zero was effectively lost. This patch fixes the problem by adding explicit checks for positive zero. With this patch, llc produces the following code for the example above: movhpd .LCPI0_0(%rip), %xmm0 where .LCPI0_0 referes to a 'double -0'. llvm-svn: 239070	2015-06-04 19:15:01 +00:00
Matt Arsenault	ca519dc28b	Pass address space to isLegalAddressingMode in DAGCombiner No test because I don't know of a target that makes use of address spaces and indexed load / store. llvm-svn: 239051	2015-06-04 16:17:34 +00:00
Matt Arsenault	65ad1602b0	Add target hook to allow merging stores of nonzero constants On GPU targets, materializing constants is cheap and stores are expensive, so only doing this for zero vectors was silly. Most of the new testcases aren't optimally merged, and are for later improvements. llvm-svn: 238108	2015-05-24 00:51:27 +00:00
Simon Pilgrim	e054199354	[X86][SSE] Improve support for 128-bit vector sign extension This patch improves support for sign extension of the lower lanes of vectors of integers by making use of the SSE41 pmovsx* sign extension instructions where possible, and optimizing the sign extension by shifts on pre-SSE41 targets (avoiding the use of i64 arithmetic shifts which require scalarization). It converts SIGN_EXTEND nodes to SIGN_EXTEND_VECTOR_INREG where necessary, that more closely matches the pmovsx* instruction than the default approach of using SIGN_EXTEND_INREG which splits the operation (into an ANY_EXTEND lowered to a shuffle followed by shifts) making instruction matching difficult during lowering. Necessary support for SIGN_EXTEND_VECTOR_INREG has been added to the DAGCombiner. Differential Revision: http://reviews.llvm.org/D9848 llvm-svn: 237885	2015-05-21 10:05:03 +00:00
Matthias Braun	56a781495a	DAGCombiner: Continue combining if FoldConstantArithmetic() fails. DAG.FoldConstantArithmetic() can fail even though both operands are Constants if OpaqueConstants are involved. Continue trying other combine possibilities in tis case. Differential Revision: http://reviews.llvm.org/D6946 Somewhat related to PR21801 / rdar://19211454 llvm-svn: 237822	2015-05-20 18:54:02 +00:00
Sanjay Patel	03abbb48a4	use 'auto *' for pointers; clearer usage, no deep copying llvm-svn: 237719	2015-05-19 20:10:16 +00:00
Sanjay Patel	ad11415962	tidy up 1. remove duplicate local variable 2. add local variable with name to match comment 3. remove useless comment llvm-svn: 237715	2015-05-19 19:10:57 +00:00
Sanjay Patel	64a6da947a	use range-based for-loop llvm-svn: 237711	2015-05-19 18:24:33 +00:00

... 3 4 5 6 7 ...

1821 Commits