teak-llvm

mirror of https://github.com/Gericom/teak-llvm.git synced 2025-06-28 15:58:57 -04:00

Author	SHA1	Message	Date
Michael Zolotukhin	a425c9d0e3	[Unroll] Don't analyze blocks outside the loop. llvm-svn: 243466	2015-07-28 19:21:21 +00:00
Michael Zolotukhin	57776b8159	Handle resolvable branches in complete loop unroll heuristic. Summary: Resolving a branch allows us to ignore blocks that won't be executed, and thus make our estimate more accurate. This patch is intended to be applied after D10205 (though it could be applied independently). Reviewers: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10206 llvm-svn: 243084	2015-07-24 01:53:04 +00:00
Michael Zolotukhin	31b3eaaf28	[LoopUnrolling] Handle cast instructions. During estimation of unrolling effect we should be able to propagate constants through casts. Differential Revision: http://reviews.llvm.org/D10207 llvm-svn: 242257	2015-07-15 00:19:51 +00:00
Mark Heffernan	d7ebc24112	Enable runtime unrolling with unroll pragma metadata Enable runtime unrolling for loops with unroll count metadata ("#pragma unroll N") and a runtime trip count. Also, do not unroll loops with unroll full metadata if the loop has a runtime loop count. Previously, such loops would be unrolled with a very large threshold (pragma-unroll-threshold) if runtime unrolled happened to be enabled resulting in a very large (and likely unwise) unroll factor. llvm-svn: 242047	2015-07-13 18:26:27 +00:00
Alexander Kornienko	f00654e31b	Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC) Apparently, the style needs to be agreed upon first. llvm-svn: 240390	2015-06-23 09:49:53 +00:00
Alexander Kornienko	70bc5f1398	Fixed/added namespace ending comments using clang-tidy. NFC The patch is generated using this command: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-,llvm-namespace-comment -header-filter='llvm/.\|clang/.*' \ llvm/lib/ Thanks to Eugene Kosov for the original patch! llvm-svn: 240137	2015-06-19 15:57:42 +00:00
Michael Zolotukhin	c4e4f33e29	Update stale comment before analyzeLoopUnrollCost. NFC. llvm-svn: 239565	2015-06-11 22:17:39 +00:00
Michael Zolotukhin	a60bdb5639	Remove SCEVCache and FindConstantPointers from complete loop unrolling heuristic. Summary: Using some SCEV functionality helped to entirely remove SCEVCache class and FindConstantPointers SCEV visitor. Also, this makes the code more universal - I'll take advandate of it in next patches where I start handling additional types of instructions. Test Plan: Tests would be submitted in subsequent patches. Reviewers: atrick, chandlerc Reviewed By: atrick, chandlerc Subscribers: atrick, llvm-commits Differential Revision: http://reviews.llvm.org/D10205 llvm-svn: 239282	2015-06-08 03:28:06 +00:00
Sanjoy Das	ad714b1af3	[LoopUnroll] Fix truncation bug in canUnrollCompletely. Summary: canUnrollCompletely takes `unsigned` values for `UnrolledCost` and `RolledDynamicCost` but is passed in `uint64_t`s that are silently truncated. Because of this, when `UnrolledSize` is a large integer that has a small remainder with UINT32_MAX, LLVM tries to completely unroll loops with high trip counts. Reviewers: mzolotukhin, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10293 llvm-svn: 239218	2015-06-06 05:24:10 +00:00
Chandler Carruth	9dabd14d59	[Unroll] Rework the naming and structure of the new unroll heuristics. The new naming is (to me) much easier to understand. Here is a summary of the new state of the world: - 'Threshold' is the threshold for full unrolling. It is measured against the estimated unrolled cost as computed by getUserCost in TTI (or CodeMetrics, etc). We will exceed this threshold when unrolling loops where unrolling exposes a significant degree of simplification of the logic within the loop. - 'PercentDynamicCostSavedThreshold' is the percentage of the loop's estimated dynamic execution cost which needs to be saved by unrolling to apply a discount to the estimated unrolled cost. - 'DynamicCostSavingsDiscount' is the discount applied to the estimated unrolling cost when the dynamic savings are expected to be high. When actually analyzing the loop, we now produce both an estimated unrolled cost, and an estimated rolled cost. The rolled cost is notably a dynamic estimate based on our analysis of the expected execution of each iteration. While we're still working to build up the infrastructure for making these estimates, to me it is much more clear how* to make them better when they have reasonably descriptive names. For example, we may want to apply estimated (from heuristics or profiles) dynamic execution weights to the dynamic cost estimates. If we start doing that, we would also need to track the static unrolled cost and the dynamic unrolled cost, as only the latter could reasonably be weighted by profile information. This patch is sadly not without functionality change for the new unroll analysis logic. Buried in the heuristic management were several things that surprised me. For example, we never subtracted the optimized instruction count off when comparing against the unroll heursistics! I don't know if this just got lost somewhere along the way or what, but with the new accounting of things, this is much easier to keep track of and we use the post-simplification cost estimate to compare to the thresholds, and use the dynamic cost reduction ratio to select whether we can exceed the baseline threshold. The old values of these flags also don't necessarily make sense. My impression is that none of these thresholds or discounts have been tuned yet, and so they're just arbitrary placehold numbers. As such, I've not bothered to adjust for the fact that this is now a discount and not a tow-tier threshold model. We need to tune all these values once the logic is ready to be enabled. Differential Revision: http://reviews.llvm.org/D9966 llvm-svn: 239164	2015-06-05 17:01:43 +00:00
Chandler Carruth	04cc665cef	[Unroll] Switch from an eagerly populated SCEV cache to one that is lazily built. Also, make it a much more generic SCEV cache, which today exposes only a reduced GEP model description but could be extended in the future to do other profitable caching of SCEV information. llvm-svn: 238124	2015-05-25 01:00:46 +00:00
Chandler Carruth	0215608bda	[Unroll] Separate the logic for testing each iteration of the loop, accumulating estimated cost, and other loop-centric logic from the logic used to analyze instructions in a particular iteration. This makes the visitor very narrow in scope -- all it does is visit instructions, update a map of simplified values, and return whether it is able to optimize away a particular instruction. The two cost metrics are now returned as an optional struct. When the optional is left unengaged, there is no information about the unrolled cost of the loop, when it is engaged the cost metrics are available to run against the thresholds. No functionality changed. llvm-svn: 238033	2015-05-22 17:41:35 +00:00
Chandler Carruth	5189559905	[Unroll] Replace a hand-wavy FIXME with a FIXME that explains the actual problem instead of suggesting doing something that is trivial to do but incorrect given the current design of the libraries. llvm-svn: 237994	2015-05-22 03:07:28 +00:00
Chandler Carruth	e1a0462dcc	[Unroll] Extract the logic for caching SCEV-modeled GEPs with their simplified model for use simulating each iteration into a separate helper function that just returns the cache. Building this cache had nothing to do with the rest of the unroll analysis and so this removes an unnecessary coupling, etc. It should also make it easier to think about the concept of providing fast cached access to basic SCEV models as an orthogonal concept to the overall unroll simulation. I'd really like to see this kind of caching logic folded into SCEV itself, it seems weird for us to provide it at this layer rather than making repeated queries into SCEV fast all on their own. No functionality changed. llvm-svn: 237993	2015-05-22 03:02:22 +00:00
Chandler Carruth	f174a156c3	[Unroll] Refactor the accumulation of optimized instruction costs into a single location. This reduces code duplication a bit and will also pave the way for a better separation between the visitation algorithm and the unroll analysis. No functionality changed. llvm-svn: 237990	2015-05-22 02:47:29 +00:00
Chandler Carruth	a6ae877aec	[Unrolling] Refactor the start and step offsets to simplify overflow checking and make the cache faster and smaller. I had thought that using an APInt here would be useful, but I think I was just wrong. Notably, we don't have to do any fancy overflow checking, we can just bound the values as quite small and do the math in a higher precision integer. I've switched to a signed integer so that UBSan will even point out if we ever have integer overflow. I've added various asserts to try to catch things as well and hoisted the overflow checks so that we just leave the too-large offsets out of the SCEV-GEP cache. This makes the value in the cache quite a bit smaller which is probably worthwhile. No functionality changed here (for trip counts under 1 billion). llvm-svn: 237209	2015-05-12 23:32:56 +00:00
Michael Zolotukhin	8c68171fef	Reimplement heuristic for estimating complete-unroll optimization effects. Summary: This patch reimplements heuristic that tries to estimate optimization beneftis from complete loop unrolling. In this patch I kept the minimal changes - e.g. I removed code handling branches and folding compares. That's a promising area, but now there are too many questions to discuss before we can enable it. Test Plan: Tests are included in the patch. Reviewers: hfinkel, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8816 llvm-svn: 237156	2015-05-12 17:20:03 +00:00
Sanjoy Das	e178f46965	[LoopUnrollRuntime] Avoid high-cost trip count computation. Summary: Runtime unrolling of loops needs to emit an expression to compute the loop's runtime trip-count. Avoid runtime unrolling if this computation will be expensive. Depends on D8993. Reviewers: atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8994 llvm-svn: 234846	2015-04-14 03:20:38 +00:00
Benjamin Kramer	799003bf8c	Re-sort includes with sort-includes.py and insert raw_ostream.h where it's used. llvm-svn: 232998	2015-03-23 19:32:43 +00:00
Benjamin Kramer	51f6096cf8	Move private classes into anonymous namespaces NFC. llvm-svn: 232944	2015-03-23 12:30:58 +00:00
Mehdi Amini	a28d91d81b	DataLayout is mandatory, update the API to reflect it with references. Summary: Now that the DataLayout is a mandatory part of the module, let's start cleaning the codebase. This patch is a first attempt at doing that. This patch is not exactly NFC as for instance some places were passing a nullptr instead of the DataLayout, possibly just because there was a default value on the DataLayout argument to many functions in the API. Even though it is not purely NFC, there is no change in the validation. I turned as many pointer to DataLayout to references, this helped figuring out all the places where a nullptr could come up. I had initially a local version of this patch broken into over 30 independant, commits but some later commit were cleaning the API and touching part of the code modified in the previous commits, so it seemed cleaner without the intermediate state. Test Plan: Reviewers: echristo Subscribers: llvm-commits From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231740	2015-03-10 02:37:25 +00:00
Kevin Qin	715b01e979	Introduce runtime unrolling disable matadata and use it to mark the scalar loop from vectorization. Runtime unrolling is an expensive optimization which can bring benefit only if the loop is hot and iteration number is relatively large enough. For some loops, we know they are not worth to be runtime unrolled. The scalar loop from vectorization is one of the cases. llvm-svn: 231631	2015-03-09 06:14:18 +00:00
Duncan P. N. Exon Smith	2c79ad974c	Transforms: Canonicalize access to function attributes, NFC Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229202	2015-02-14 01:11:29 +00:00
Chandler Carruth	1fbc316534	[unroll] Concede defeat and disable the unroll analyzer for now. The issues with the new unroll analyzer are more fundamental than code cleanup, algorithm, or data structure changes. I've sent an email to the original commit thread with details and a proposal for how to redesign things. I'm disabling this for now so that we don't spend time debugging issues with it in its current state. llvm-svn: 229064	2015-02-13 05:31:46 +00:00
Chandler Carruth	6c03dff7cc	[unroll] Merge the simplification and DCE estimation methods on the UnrollAnalyzer. Now they share a single worklist and have less implicit state between them. There was no real benefit to separating these two things out. I'm going to subsequently refactor things to share even more code. llvm-svn: 229062	2015-02-13 04:39:05 +00:00
Chandler Carruth	d9591d8922	[unroll] Remove pointless dyn_cast<>s to Instruction - the users of an instruction must by definition be instructions. llvm-svn: 229061	2015-02-13 04:33:21 +00:00
Chandler Carruth	5457e20d27	[unroll] Don't check the loop set for whether an instruction is contained in it each time we try to add it to the worklist, just check this when pulling it off the worklist. That way we do it at most once per instruction with the cost of the worklist set we would need to pay anyways. llvm-svn: 229060	2015-02-13 04:30:44 +00:00
Chandler Carruth	e5c30e4e10	[unroll] Change the other worklist in the unroll analyzer to be a set vector. In addition to dramatically reducing the work required for contrived example loops, this also has to correct some serious latent bugs in the cost computation. Previously, we might add an instruction onto the worklist once for every load which it used and was simplified. Then we would visit it many times and accumulate "savings" each time. I mean, fortunately this couldn't matter for things like calls with 100s of operands, but even for binary operators this code seems like it must be double counting the savings. I just noticed this by inspection and due to the runtime problems it can introduce, I don't have any test cases for cases where the cost produced by this routine is unacceptable. llvm-svn: 229059	2015-02-13 04:27:50 +00:00
Chandler Carruth	7824bc9241	[unroll] Replace a boolean, for loop, condition, and break with std::all_of and a lambda. Much cleaner, no functionality changed. llvm-svn: 229058	2015-02-13 04:18:14 +00:00
Chandler Carruth	06d537cdd6	[unroll] Directly query for dead instructions. In the unroll analyzer, it is checking each user to see if that user will become dead. However, it first checked if that user was missing from the simplified values map, and then if was also missing from the dead instructions set. We add everything from the simplified values map to the dead instructions set, so the first step is completely subsumed by the second. Moreover, the first step requires inserting something into the simplified value map which isn't what we want at all. This also replaces a dyn_cast with a cast as an instruction cannot be used by a non-instruction. llvm-svn: 229057	2015-02-13 04:14:05 +00:00
Chandler Carruth	82cb30f10c	[unroll] Replace a linear time check for no uses with a constant time check. Also hoist this into the enqueue process as it is faster even than testing the worklist set, we should just directly filter these out much like we filter out constants and such. llvm-svn: 229056	2015-02-13 04:06:08 +00:00
Chandler Carruth	3b057b3216	[unroll] Rather than an operand set, use a setvector for the worklist. We don't just want to handle duplicate operands within an instruction, but also duplicates across operands of different instructions. I should have gone straight to this, but I had convinced myself that it wasn't going to be necessary briefly. I've come to my senses after chatting more with Nick, and am now happier here. llvm-svn: 229054	2015-02-13 03:57:40 +00:00
Chandler Carruth	17a0496b5a	[unroll] Extract the code to enqueue operansd for the worklist in the unroll analysis into a lambda and call it. That's much simpler than duplicating all the code. llvm-svn: 229053	2015-02-13 03:49:41 +00:00
Chandler Carruth	8c86375a10	[unroll] Use a small set to de-duplicate operands prior to putting them into the worklist. This avoids allocating lots of worklist memory for them when there are large numbers of repeated operands. llvm-svn: 229052	2015-02-13 03:48:38 +00:00
Chandler Carruth	93063e6191	[unroll] Make the unroll cost analysis terminate deterministically and reasonably quickly. I don't have a reduced test case, but for a version of FFMPEG, this makes the loop unroller start finishing at all (after over 15 minutes of running, it hadn't terminated for me, no idea if it was a true infloop or just exponential work). The key thing here is to check the DeadInstructions set when pulling things off the worklist. Without this, we would re-walk the user list of already dead instructions again and again and again. Consider phi nodes with many, many operands and other patterns. The other important aspect of this is that because we would keep re-visiting instructions that were already known dead, we kept adding their cost savings to this! This would cause our cost savings to be insanely inflated from this. While I was here, I also rotated the operand walk out of the worklist loop to make the code easier to read. There is still work to be done to minimize worklist traffic because we don't de-duplicate operands. This means we may add the same instruction onto the worklist 1000s of times if it shows up in 1000s of operansd to a PHI node for example. Still, with this patch, the ffmpeg testcase I have finishes quickly and I can't measure the runtime impact of the unroll analysis any more. I'll probably try to do a few more cleanups to this code, but not sure how much cleanup I can justify right now. llvm-svn: 229038	2015-02-13 03:40:58 +00:00
Chandler Carruth	dd6029fc6e	[unroll] Make range based for loops a bit more explicit and more readable. The biggest thing that was causing me problems is recognizing the references vs. poniters here. I also found that for maps naming the loop variable as KeyValue helps make it obvious why you don't actually use it directly. Finally, using 'auto' instead of 'User *' doesn't seem like a good tradeoff. Much like with the other cases, I like to know its a pointer, and 'User' is just as long and tells the reader a lot more. llvm-svn: 229033	2015-02-13 02:45:17 +00:00
Chandler Carruth	415f41258f	[unroll] Avoid the "Insn" abbreviation of Instruction. This is quite hard to type and read for me, and is inconsistent with the other abbreviation in the base class "Inst". For most of these (where they are used widely) I prefer just spelling it out as Instruction. I've changed two of the short-lived variables to use "Inst" to match the base class. llvm-svn: 229028	2015-02-13 02:17:39 +00:00
Chandler Carruth	302a133b1e	[unroll] Tidy up the integer we use to accumululate the number of instructions optimized. NFC, just separating this out from the functionality changing commit. llvm-svn: 229026	2015-02-13 02:10:56 +00:00
Chandler Carruth	10a9926ab5	[unroll] Don't use a map from pointer to bool. Use a set. This is much more efficient. In particular, the query with the user instruction has to insert a false for every missing instruction into the set. This is just a cleanup a long the way to fixing the underlying algorithm problems here. llvm-svn: 228994	2015-02-13 00:29:39 +00:00
Michael Zolotukhin	1b48019751	Prevent division by 0. When we try to estimate number of potentially removed instructions in loop unroller, we analyze first N iterations and then scale the computed number by TripCount/N. We should bail out early if N is 0. llvm-svn: 228988	2015-02-13 00:17:03 +00:00
Chandler Carruth	186ad60815	[unroll] Update the new analysis logic from r228265 to use modern coding conventions for function names consistently. Some were already using this but not all. llvm-svn: 228987	2015-02-13 00:00:24 +00:00
Michael Zolotukhin	7af83c1f39	Use estimated number of optimized insns in unroll-threshold computation. If complete-unroll could help us to optimize away N% of instructions, we might want to do this even if the final size would exceed loop-unroll threshold. However, we don't want to unroll huge loop, and we are add AbsoluteThreshold to avoid that - this threshold will never be crossed, even if we expect to optimize 99% instructions after that. llvm-svn: 228434	2015-02-06 20:20:40 +00:00
Michael Zolotukhin	4e8598eee3	[InstSimplify] Add SimplifyFPBinOp function. It is a variation of SimplifyBinOp, but it takes into account FastMathFlags. It is needed in inliner and loop-unroller to accurately predict the transformation's outcome (previously we dropped the flags and were too conservative in some cases). Example: float foo(float a, float b) { float r; if (a[1] b) r = /* a lot of expensive computations /; else r = 1; return r; } float boo(float a) { return foo(a, 0.0); } Without this patch, we don't inline 'foo' into 'boo'. llvm-svn: 228432	2015-02-06 20:02:51 +00:00
Michael Zolotukhin	a9aadd2903	Implement new heuristic for complete loop unrolling. Complete loop unrolling can make some loads constant, thus enabling a lot of other optimizations. To catch such cases, we look for loads that might become constants and estimate number of instructions that would be simplified or become dead after substitution. Example: Suppose we have: int a[] = {0, 1, 0}; v = 0; for (i = 0; i < 3; i ++) v += b[i]a[i]; If we completely unroll the loop, we would get: v = b[0]a[0] + b[1]a[1] + b[2]a[2] Which then will be simplified to: v = b[0]* 0 + b[1]* 1 + b[2]* 0 And finally: v = b[1] llvm-svn: 228265	2015-02-05 02:34:00 +00:00
Jingyue Wu	49a766e468	Resurrect the assertion removed by r227717 Summary: MSVC can compile "LoopID->getOperand(0) == LoopID" when LoopID is MDNode*. Test Plan: no regression Reviewers: mkuper Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D7327 llvm-svn: 227853	2015-02-02 20:41:11 +00:00
Chandler Carruth	21fc195c13	[multiversion] Kill FunctionTargetTransformInfo, TTI itself is now per-function and supports the exact desired interface. llvm-svn: 227743	2015-02-01 14:37:03 +00:00
Chandler Carruth	fdb9c573f7	[multiversion] Thread a function argument through all the callers of the getTTI method used to get an actual TTI object. No functionality changed. This just threads the argument and ensures code like the inliner can correctly look up the callee's TTI rather than using a fixed one. The next change will use this to implement per-function subtarget usage by TTI. The changes after that should eliminate the need for FTTI as that will have become the default. llvm-svn: 227730	2015-02-01 12:01:35 +00:00
Jingyue Wu	0220df0dfd	[NVPTX] Emit .pragma "nounroll" for loops marked with nounroll Summary: CUDA driver can unroll loops when jit-compiling PTX. To prevent CUDA driver from unrolling a loop marked with llvm.loop.unroll.disable is not unrolled by CUDA driver, we need to emit .pragma "nounroll" at the header of that loop. This patch also extracts getting unroll metadata from loop ID metadata into a shared helper function. Test Plan: test/CodeGen/NVPTX/nounroll.ll Reviewers: eliben, meheff, jholewinski Reviewed By: jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D7041 llvm-svn: 227703	2015-02-01 02:27:45 +00:00
Chandler Carruth	705b185f90	[PM] Change the core design of the TTI analysis to use a polymorphic type erased interface and a single analysis pass rather than an extremely complex analysis group. The end result is that the TTI analysis can contain a type erased implementation that supports the polymorphic TTI interface. We can build one from a target-specific implementation or from a dummy one in the IR. I've also factored all of the code into "mix-in"-able base classes, including CRTP base classes to facilitate calling back up to the most specialized form when delegating horizontally across the surface. These aren't as clean as I would like and I'm planning to work on cleaning some of this up, but I wanted to start by putting into the right form. There are a number of reasons for this change, and this particular design. The first and foremost reason is that an analysis group is complete overkill, and the chaining delegation strategy was so opaque, confusing, and high overhead that TTI was suffering greatly for it. Several of the TTI functions had failed to be implemented in all places because of the chaining-based delegation making there be no checking of this. A few other functions were implemented with incorrect delegation. The message to me was very clear working on this -- the delegation and analysis group structure was too confusing to be useful here. The other reason of course is that this is much more natural fit for the new pass manager. This will lay the ground work for a type-erased per-function info object that can look up the correct subtarget and even cache it. Yet another benefit is that this will significantly simplify the interaction of the pass managers and the TargetMachine. See the future work below. The downside of this change is that it is very, very verbose. I'm going to work to improve that, but it is somewhat an implementation necessity in C++ to do type erasure. =/ I discussed this design really extensively with Eric and Hal prior to going down this path, and afterward showed them the result. No one was really thrilled with it, but there doesn't seem to be a substantially better alternative. Using a base class and virtual method dispatch would make the code much shorter, but as discussed in the update to the programmer's manual and elsewhere, a polymorphic interface feels like the more principled approach even if this is perhaps the least compelling example of it. ;] Ultimately, there is still a lot more to be done here, but this was the huge chunk that I couldn't really split things out of because this was the interface change to TTI. I've tried to minimize all the other parts of this. The follow up work should include at least: 1) Improving the TargetMachine interface by having it directly return a TTI object. Because we have a non-pass object with value semantics and an internal type erasure mechanism, we can narrow the interface of the TargetMachine to just do what we need: build and return a TTI object that we can then insert into the pass pipeline. 2) Make the TTI object be fully specialized for a particular function. This will include splitting off a minimal form of it which is sufficient for the inliner and the old pass manager. 3) Add a new pass manager analysis which produces TTI objects from the target machine for each function. This may actually be done as part of #2 in order to use the new analysis to implement #2. 4) Work on narrowing the API between TTI and the targets so that it is easier to understand and less verbose to type erase. 5) Work on narrowing the API between TTI and its clients so that it is easier to understand and less verbose to forward. 6) Try to improve the CRTP-based delegation. I feel like this code is just a bit messy and exacerbating the complexity of implementing the TTI in each target. Many thanks to Eric and Hal for their help here. I ended up blocked on this somewhat more abruptly than I expected, and so I appreciate getting it sorted out very quickly. Differential Revision: http://reviews.llvm.org/D7293 llvm-svn: 227669	2015-01-31 03:43:40 +00:00
Chandler Carruth	4f8f307c77	[PM] Split the LoopInfo object apart from the legacy pass, creating a LoopInfoWrapperPass to wire the object up to the legacy pass manager. This switches all the clients of LoopInfo over and paves the way to port LoopInfo to the new pass manager. No functionality change is intended with this iteration. llvm-svn: 226373	2015-01-17 14:16:18 +00:00
Hal Finkel	38dd590861	[LoopUnroll] Fix the partial unrolling threshold for small loop sizes When we compute the size of a loop, we include the branch on the backedge and the comparison feeding the conditional branch. Under normal circumstances, these don't get replicated with the rest of the loop body when we unroll. This led to the somewhat surprising behavior that really small loops would not get unrolled enough -- they could be unrolled more and the resulting loop would be below the threshold, because we were assuming they'd take (LoopSize * UnrollingFactor) instructions after unrolling, instead of (((LoopSize-2) * UnrollingFactor)+2) instructions. This fixes that computation. llvm-svn: 225565	2015-01-10 00:30:55 +00:00
Chandler Carruth	66b3130cda	[PM] Split the AssumptionTracker immutable pass into two separate APIs: a cache of assumptions for a single function, and an immutable pass that manages those caches. The motivation for this change is two fold. Immutable analyses are really hacks around the current pass manager design and don't exist in the new design. This is usually OK, but it requires that the core logic of an immutable pass be reasonably partitioned off from the pass logic. This change does precisely that. As a consequence it also paves the way for the many utility functions that deal in the assumptions to live in both pass manager worlds by creating an separate non-pass object with its own independent API that they all rely on. Now, the only bits of the system that deal with the actual pass mechanics are those that actually need to deal with the pass mechanics. Once this separation is made, several simplifications become pretty obvious in the assumption cache itself. Rather than using a set and callback value handles, it can just be a vector of weak value handles. The callers can easily skip the handles that are null, and eventually we can wrap all of this up behind a filter iterator. For now, this adds boiler plate to the various passes, but this kind of boiler plate will end up making it possible to port these passes to the new pass manager, and so it will end up factored away pretty reasonably. llvm-svn: 225131	2015-01-04 12:03:27 +00:00
Duncan P. N. Exon Smith	5bf8fef580	IR: Split Metadata from Value Split `Metadata` away from the `Value` class hierarchy, as part of PR21532. Assembly and bitcode changes are in the wings, but this is the bulk of the change for the IR C++ API. I have a follow-up patch prepared for `clang`. If this breaks other sub-projects, I apologize in advance :(. Help me compile it on Darwin I'll try to fix it. FWIW, the errors should be easy to fix, so it may be simpler to just fix it yourself. This breaks the build for all metadata-related code that's out-of-tree. Rest assured the transition is mechanical and the compiler should catch almost all of the problems. Here's a quick guide for updating your code: - `Metadata` is the root of a class hierarchy with three main classes: `MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from the `Value` class hierarchy. It is typeless -- i.e., instances do not have a `Type`. - `MDNode`'s operands are all `Metadata ` (instead of `Value `). - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively. If you're referring solely to resolved `MDNode`s -- post graph construction -- just use `MDNode`. - `MDNode` (and the rest of `Metadata`) have only limited support for `replaceAllUsesWith()`. As long as an `MDNode` is pointing at a forward declaration -- the result of `MDNode::getTemporary()` -- it maintains a side map of its uses and can RAUW itself. Once the forward declarations are fully resolved RAUW support is dropped on the ground. This means that uniquing collisions on changing operands cause nodes to become "distinct". (This already happened fairly commonly, whenever an operand went to null.) If you're constructing complex (non self-reference) `MDNode` cycles, you need to call `MDNode::resolveCycles()` on each node (or on a top-level node that somehow references all of the nodes). Also, don't do that. Metadata cycles (and the RAUW machinery needed to construct them) are expensive. - An `MDNode` can only refer to a `Constant` through a bridge called `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`). As a side effect, accessing an operand of an `MDNode` that is known to be, e.g., `ConstantInt`, takes three steps: first, cast from `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`; third, cast down to `ConstantInt`. The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have metadata schema owners transition away from using `Constant`s when the type isn't important (and they don't care about referring to `GlobalValue`s). In the meantime, I've added transitional API to the `mdconst` namespace that matches semantics with the old code, in order to avoid adding the error-prone three-step equivalent to every call site. If your old code was: MDNode N = foo(); bar(isa <ConstantInt>(N->getOperand(0))); baz(cast <ConstantInt>(N->getOperand(1))); bak(cast_or_null <ConstantInt>(N->getOperand(2))); bat(dyn_cast <ConstantInt>(N->getOperand(3))); bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4))); you can trivially match its semantics with: MDNode N = foo(); bar(mdconst::hasa <ConstantInt>(N->getOperand(0))); baz(mdconst::extract <ConstantInt>(N->getOperand(1))); bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2))); bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3))); bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4))); and when you transition your metadata schema to `MDInt`: MDNode N = foo(); bar(isa <MDInt>(N->getOperand(0))); baz(cast <MDInt>(N->getOperand(1))); bak(cast_or_null <MDInt>(N->getOperand(2))); bat(dyn_cast <MDInt>(N->getOperand(3))); bay(dyn_cast_or_null<MDInt>(N->getOperand(4))); - A `CallInst` -- specifically, intrinsic instructions -- can refer to metadata through a bridge called `MetadataAsValue`. This is a subclass of `Value` where `getType()->isMetadataTy()`. `MetadataAsValue` is the only class that can legally refer to a `LocalAsMetadata`, which is a bridged form of non-`Constant` values like `Argument` and `Instruction`. It can also refer to any other `Metadata` subclass. (I'll break all your testcases in a follow-up commit, when I propagate this change to assembly.) llvm-svn: 223802	2014-12-09 18:38:53 +00:00
Chandler Carruth	6666c27e99	[SCEV] Add some asserts to the recently improved trip count computation routines and fix all of the bugs they expose. I hit a test case that crashed even without these asserts due to passing a non-exiting latch to the ExitingBlock parameter of the trip count computation machinery. However, when I add the nice asserts, it turns out we have plenty of coverage of these bugs, they just didn't manifest in crashers. The core problem seems to stem from an assumption that the latch is the exiting block. While this is often true, and somewhat the "normal" way to think about loops, it isn't necessarily true. The correct way to call the trip count routines in a generic fashion (that is, without a particular exit in mind) is to just use the loop's single exiting block if it has one. The trip count can't be computed generically unless it does. This works great for the loop vectorizer. The loop unroller actually wants to select the latch when it has to chose between multiple exits because for unrolling it is the latch trips that matter. But if this is the desire, it needs to explicitly guard for non-exiting latches and check for the generic trip count in that case. I've added the asserts, and added convenience APIs for querying the trip count generically that check for a single exit block. I've kept the APIs consistent between computing trip count and trip multiples. Thansk to Mark for the help debugging and tracking down the right fix here! llvm-svn: 219550	2014-10-11 00:12:11 +00:00
Eric Christopher	d85ffb1fc0	Add a new pass FunctionTargetTransformInfo. This pass serves as a shim between the TargetTransformInfo immutable pass and the Subtarget via the TargetMachine and Function. Migrate a single call from BasicTargetTransformInfo as an example and provide shims where TargetMachine begins taking a Function to determine the subtarget. No functional change. llvm-svn: 218004	2014-09-18 00:34:14 +00:00
Hal Finkel	57f03dda49	Add functions for finding ephemeral values This adds a set of utility functions for collecting 'ephemeral' values. These are LLVM IR values that are used only by @llvm.assume intrinsics (directly or indirectly), and thus will be removed prior to code generation, implying that they should be considered free for certain purposes (like inlining). The inliner's cost analysis, and a few other passes, have been updated to account for ephemeral values using the provided functionality. This functionality is important for the usability of @llvm.assume, because it limits the "non-local" side-effects of adding llvm.assume on inlining, loop unrolling, etc. (these are hints, and do not generate code, so they should not directly contribute to estimates of execution cost). llvm-svn: 217335	2014-09-07 13:49:57 +00:00
Hal Finkel	74c2f355d2	Add an Assumption-Tracking Pass This adds an immutable pass, AssumptionTracker, which keeps a cache of @llvm.assume call instructions within a module. It uses callback value handles to keep stale functions and intrinsics out of the map, and it relies on any code that creates new @llvm.assume calls to notify it of the new instructions. The benefit is that code needing to find @llvm.assume intrinsics can do so directly, without scanning the function, thus allowing the cost of @llvm.assume handling to be negligible when none are present. The current design is intended to be lightweight. We don't keep track of anything until we need a list of assumptions in some function. The first time this happens, we scan the function. After that, we add/remove @llvm.assume calls from the cache in response to registration calls and ValueHandle callbacks. There are no new direct test cases for this pass, but because it calls it validation function upon module finalization, we'll pick up detectable inconsistencies from the other tests that touch @llvm.assume calls. This pass will be used by follow-up commits that make use of @llvm.assume. llvm-svn: 217334	2014-09-07 12:44:26 +00:00
Benjamin Kramer	89854ebe8e	Make some helpers static or move into the llvm namespace. llvm-svn: 217077	2014-09-03 21:04:12 +00:00
Mark Heffernan	8ec1474f7f	After unrolling a loop with llvm.loop.unroll.count metadata (unroll factor hint) the loop unroller replaces the llvm.loop.unroll.count metadata with llvm.loop.unroll.disable metadata to prevent any subsequent unrolling passes from unrolling more than the hint indicates. This patch fixes an issue where loop unrolling could be disabled for other loops as well which share the same llvm.loop metadata. llvm-svn: 213900	2014-07-24 22:36:40 +00:00
Mark Heffernan	9e112443b6	Do not add unroll disable metadata after unrolling pass for loops with #pragma clang loop unroll(full). llvm-svn: 213789	2014-07-23 20:05:44 +00:00
Mark Heffernan	e6b4ba1c41	In unroll pragma syntax and loop hint metadata, change "enable" forms to a new form using the string "full". llvm-svn: 213772	2014-07-23 17:31:37 +00:00
Mark Heffernan	f3764da8ec	Fix build breakage introduced with r213412. llvm-svn: 213414	2014-07-18 21:29:41 +00:00
Mark Heffernan	053a68688a	Remove unroll pragma metadata after it is used. llvm-svn: 213412	2014-07-18 21:04:33 +00:00
Eli Bendersky	5d5e18da3e	Rename loop unrolling and loop vectorizer metadata to have a common prefix. [LLVM part] These patches rename the loop unrolling and loop vectorizer metadata such that they have a common 'llvm.loop.' prefix. Metadata name changes: llvm.vectorizer.* => llvm.loop.vectorizer.* llvm.loopunroll.* => llvm.loop.unroll.* This was a suggestion from an earlier review (http://reviews.llvm.org/D4090) which added the loop unrolling metadata. Patch by Mark Heffernan. llvm-svn: 211710	2014-06-25 15:41:00 +00:00
Eli Bendersky	ff90324599	Teach LoopUnrollPass to respect loop unrolling hints in metadata. [This is resubmitting r210721, which was reverted due to suspected breakage which turned out to be unrelated]. Some extra review comments were addressed. See D4090 and D4147 for more details. The Clang change that produces this metadata was committed in r210667 Patch by Mark Heffernan. llvm-svn: 211076	2014-06-16 23:53:02 +00:00
Eli Bendersky	dc6de2ce29	Revert r210721 as it causes breakage in internal builds (and possibly GDB). llvm-svn: 210807	2014-06-12 18:05:39 +00:00
Eli Bendersky	899bef099f	Teach LoopUnrollPass to respect loop unrolling hints in metadata. See http://reviews.llvm.org/D4090 for more details. The Clang change that produces this metadata was committed in r210667 Patch by Mark Heffernan. llvm-svn: 210721	2014-06-11 23:15:35 +00:00
Benjamin Kramer	9130cb8547	LoopUnroll: If we're doing partial unrolling, use the PartialThreshold to limit unrolling. Otherwise we use the same threshold as for complete unrolling, which is way too high. This made us unroll any loop smaller than 150 instructions by 8 times, but only if someone specified -march=core2 or better, which happens to be the default on darwin. llvm-svn: 207940	2014-05-04 19:12:38 +00:00
Chandler Carruth	964daaaf19	[Modules] Fix potential ODR violations by sinking the DEBUG_TYPE definition below all of the header #include lines, lib/Transforms/... edition. This one is tricky for two reasons. We again have a couple of passes that define something else before the includes as well. I've sunk their name macros with the DEBUG_TYPE. Also, InstCombine contains headers that need DEBUG_TYPE, so now those headers #define and #undef DEBUG_TYPE around their code, leaving them well formed modular headers. Fixing these headers was a large motivation for all of these changes, as "leaky" macros of this form are hard on the modules implementation. llvm-svn: 206844	2014-04-22 02:55:47 +00:00
Hal Finkel	6386cb8d4d	Add some additional fields to TTI::UnrollingPreferences In preparation for an upcoming commit implementing unrolling preferences for x86, this adds additional fields to the UnrollingPreferences structure: - PartialThreshold and PartialOptSizeThreshold - Like Threshold and OptSizeThreshold, but used when not fully unrolling. These are necessary because we need different thresholds for full unrolling from those used when partially unrolling (the full unrolling thresholds are generally going to be larger). - MaxCount - A cap on the unrolling factor when partially unrolling. This can be used by a target to prevent the unrolled loop from exceeding some resource limit independent of the loop size (such as number of branches). There should be no functionality change for any in-tree targets. llvm-svn: 205347	2014-04-01 18:50:30 +00:00
Hal Finkel	86b3064f2b	Move partial/runtime unrolling late in the pipeline The generic (concatenation) loop unroller is currently placed early in the standard optimization pipeline. This is a good place to perform full unrolling, but not the right place to perform partial/runtime unrolling. However, most targets don't enable partial/runtime unrolling, so this never mattered. However, even some x86 cores benefit from partial/runtime unrolling of very small loops, and follow-up commits will enable this. First, we need to move partial/runtime unrolling late in the optimization pipeline (importantly, this is after SLP and loop vectorization, as vectorization can drastically change the size of a loop), while keeping the full unrolling where it is now. This change does just that. llvm-svn: 205264	2014-03-31 23:23:51 +00:00
Craig Topper	3e4c697ca1	[C++11] Add 'override' keyword to virtual methods that override their base class. llvm-svn: 202953	2014-03-05 09:10:37 +00:00
Paul Robinson	af4e64d095	Disable most IR-level transform passes on functions marked 'optnone'. Ideally only those transform passes that run at -O0 remain enabled, in reality we get as close as we reasonably can. Passes are responsible for disabling themselves, it's not the job of the pass manager to do it for them. llvm-svn: 200892	2014-02-06 00:07:05 +00:00
Chandler Carruth	aa7fa5e4b2	[LPM] Make LoopSimplify no longer a LoopPass and instead both a utility function and a FunctionPass. This has many benefits. The motivating use case was to be able to compute function analysis passes after running LoopSimplify (to avoid invalidating them) and then to run other passes which require LoopSimplify. Specifically passes like unrolling and vectorization are critical to wire up to BranchProbabilityInfo and BlockFrequencyInfo so that they can be profile aware. For the LoopVectorize pass the only things in the way are LoopSimplify and LCSSA. This fixes LoopSimplify and LCSSA is next on my list. There are also a bunch of other benefits of doing this: - It is now very feasible to make more passes preserve LoopSimplify because they can simply run it after changing a loop. Because subsequence passes can assume LoopSimplify is preserved we can reduce the runs of this pass to the times when we actually mutate a loop structure. - The new pass manager should be able to more easily support loop passes factored in this way. - We can at long, long last observe that LoopSimplify is preserved across SCEV. This halves the number of times we run LoopSimplify!!! Now, getting here wasn't trivial. First off, the interfaces used by LoopSimplify are all over the map regarding how analysis are updated. We end up with weird "pass" parameters as a consequence. I'll try to clean at least some of this up later -- I'll have to have it all clean for the new pass manager. Next up I discovered a really frustrating bug. LoopUnroll claims to preserve LoopSimplify. That's actually a lie. But the way the LoopPassManager ends up running the passes, it always ran LoopSimplify on the unrolled-into loop, rectifying this oversight before any verification could kick in and point out that in fact nothing was preserved. So I've added code to the unroller to actually simplify the surrounding loop when it succeeds at unrolling. The only functional change in the test suite is that we now catch a case that was previously missed because SCEV and other loop transforms see their containing loops as simplified and thus don't miss some opportunities. One test case has been converted to check that we catch this case rather than checking that we miss it but at least don't get the wrong answer. Note that I have #if-ed out all of the verification logic in LoopSimplify! This is a temporary workaround while extracting these bits from the LoopPassManager. Currently, there is no way to have a pass in the LoopPassManager which preserves LoopSimplify along with one which does not. The LPM will try to verify on each loop in the nest that LoopSimplify holds but the now-Function-pass cannot distinguish what loop is being verified and so must try to verify all of them. The inner most loop is clearly no longer simplified as there is a pass which didn't even attempt to preserve it. =/ Once I get LCSSA out (and maybe LoopVectorize and some other fixes) I'll be able to re-enable this check and catch any places where we are still failing to preserve LoopSimplify. If this causes problems I can back this out and try to commit all of this at once, but so far this seems to work and allow much more incremental progress. llvm-svn: 199884	2014-01-23 11:23:19 +00:00
Chandler Carruth	73523021d0	[PM] Split DominatorTree into a concrete analysis result object which can be used by both the new pass manager and the old. This removes it from any of the virtual mess of the pass interfaces and lets it derive cleanly from the DominatorTreeBase<> template. In turn, tons of boilerplate interface can be nuked and it turns into a very straightforward extension of the base DominatorTree interface. The old analysis pass is now a simple wrapper. The names and style of this split should match the split between CallGraph and CallGraphWrapperPass. All of the users of DominatorTree have been updated to match using many of the same tricks as with CallGraph. The goal is that the common type remains the resulting DominatorTree rather than the pass. This will make subsequent work toward the new pass manager significantly easier. Also in numerous places things became cleaner because I switched from re-running the pass (!!! mid way through some other passes run!!!) to directly recomputing the domtree. llvm-svn: 199104	2014-01-13 13:07:17 +00:00
Chandler Carruth	5ad5f15cff	[cleanup] Move the Dominators.h and Verifier.h headers into the IR directory. These passes are already defined in the IR library, and it doesn't make any sense to have the headers in Analysis. Long term, I think there is going to be a much better way to divide these matters. The dominators code should be fully separated into the abstract graph algorithm and have that put in Support where it becomes obvious that evn Clang's CFGBlock's can use it. Then the verifier can manually construct dominance information from the Support-driven interface while the Analysis library can provide a pass which both caches, reconstructs, and supports a nice update API. But those are very long term, and so I don't want to leave the really confusing structure until that day arrives. llvm-svn: 199082	2014-01-13 09:26:24 +00:00
Jakub Staszak	3ab283c157	Don't #include heavy Dominators.h file in LoopInfo.h. This change reduces overall time of LLVM compilation by ~1%. llvm-svn: 196667	2013-12-07 21:20:17 +00:00
Alp Toker	f907b891da	Correct word hyphenations This patch tries to avoid unrelated changes other than fixing a few hyphen-related ambiguities and contractions in nearby lines. llvm-svn: 196471	2013-12-05 05:44:44 +00:00
Hal Finkel	081eaef6fa	Add a runtime unrolling parameter to the LoopUnroll pass constructor As with the other loop unrolling parameters (the unrolling threshold, partial unrolling, etc.) runtime unrolling can now also be controlled via the constructor. This will be necessary for moving non-trivial unrolling late in the pass manager (after loop vectorization). No functionality change intended. llvm-svn: 194027	2013-11-05 00:08:03 +00:00
Hal Finkel	8f2e700522	Add getUnrollingPreferences to TTI Allow targets to customize the default behavior of the generic loop unrolling transformation. This will be used by the PowerPC backend when targeting the A2 core (which is in-order with a deep pipeline), and using more aggressive defaults is important. llvm-svn: 190542	2013-09-11 19:25:43 +00:00
Hal Finkel	8e83820a04	Revert: r189565 - Add getUnrollingPreferences to TTI Revert unintentional commit (of an unreviewed change). Original commit message: Add getUnrollingPreferences to TTI Allow targets to customize the default behavior of the generic loop unrolling transformation. This will be used by the PowerPC backend when targeting the A2 core (which is in-order with a deep pipeline), and using more aggressive defaults is important. llvm-svn: 189566	2013-08-29 03:33:15 +00:00
Hal Finkel	63e6c0e9fb	Add getUnrollingPreferences to TTI Allow targets to customize the default behavior of the generic loop unrolling transformation. This will be used by the PowerPC backend when targeting the A2 core (which is in-order with a deep pipeline), and using more aggressive defaults is important. llvm-svn: 189565	2013-08-29 03:29:57 +00:00
Chandler Carruth	bb9caa9241	Switch CodeMetrics itself over to use TTI to determine if an instruction is free. The whole CodeMetrics API should probably be reworked more, but this is enough to allow deleting the duplicate code there for computing whether an instruction is free. All of the passes using this have been updated to pull in TTI and hand it to the CodeMetrics stuff. Further, a dead CodeMetrics API (analyzeFunction) is nuked for lack of users. llvm-svn: 173036	2013-01-21 13:04:33 +00:00
Chandler Carruth	9fb823bbd4	Move all of the header files which are involved in modelling the LLVM IR into their new header subdirectory: include/llvm/IR. This matches the directory structure of lib, and begins to correct a long standing point of file layout clutter in LLVM. There are still more header files to move here, but I wanted to handle them in separate commits to make tracking what files make sense at each layer easier. The only really questionable files here are the target intrinsic tablegen files. But that's a battle I'd rather not fight today. I've updated both CMake and Makefile build systems (I think, and my tests think, but I may have missed something). I've also re-sorted the includes throughout the project. I'll be committing updates to Clang, DragonEgg, and Polly momentarily. llvm-svn: 171366	2013-01-02 11:36:10 +00:00
Bill Wendling	698e84fc4f	Remove the Function::getFnAttributes method in favor of using the AttributeSet directly. This is in preparation for removing the use of the 'Attribute' class as a collection of attributes. That will shift to the AttributeSet class instead. llvm-svn: 171253	2012-12-30 10:32:01 +00:00
James Molloy	4f6fb953a7	Add a new attribute, 'noduplicate'. If a function contains a noduplicate call, the call cannot be duplicated - Jump threading, loop unrolling, loop unswitching, and loop rotation are inhibited if they would duplicate the call. Similarly inlining of the function is inhibited, if that would duplicate the call (in particular inlining is still allowed when there is only one callsite and the function has internal linkage). llvm-svn: 170704	2012-12-20 16:04:27 +00:00
Bill Wendling	3d7b0b8ac7	Rename the 'Attributes' class to 'Attribute'. It's going to represent a single attribute in the future. llvm-svn: 170502	2012-12-19 07:18:57 +00:00
Chandler Carruth	ed0881b2a6	Use the new script to sort the includes of every file under lib. Sooooo many of these had incorrect or strange main module includes. I have manually inspected all of these, and fixed the main module include to be the nearest plausible thing I could find. If you own or care about any of these source files, I encourage you to take some time and check that these edits were sensible. I can't have broken anything (I strictly added headers, and reordered them, never removed), but they may not be the headers you'd really like to identify as containing the API being implemented. Many forward declarations and missing includes were added to a header files to allow them to parse cleanly when included first. The main module rule does in fact have its merits. =] llvm-svn: 169131	2012-12-03 16:50:05 +00:00
Bill Wendling	c9b22d735a	Create enums for the different attributes. We use the enums to query whether an Attributes object has that attribute. The opaque layer is responsible for knowing where that specific attribute is stored. llvm-svn: 165488	2012-10-09 07:45:08 +00:00
Micah Villmow	cdfe20b97f	Move TargetData to DataLayout. llvm-svn: 165402	2012-10-08 16:38:25 +00:00
Bill Wendling	863bab689a	Remove the `hasFnAttr' method from Function. The hasFnAttr method has been replaced by querying the Attributes explicitly. No intended functionality change. llvm-svn: 164725	2012-09-26 21:48:26 +00:00
Hongbin Zheng	b21b865fe8	LoopUnrollPass: Use variable "Threshold" instead of "CurrentThreshold" when reducing unroll count, otherwise the reduced unroll count is not taking the "OptimizeForSize" attribute into account. llvm-svn: 154007	2012-04-04 11:44:08 +00:00
Andrew Trick	d04d152998	Add -unroll-runtime for unrolling loops with run-time trip counts. Patch by Brendon Cahoon! This extends the existing LoopUnroll and LoopUnrollPass. Brendon measured no regressions in the llvm test suite with -unroll-runtime enabled. This implementation works by using the existing loop unrolling code to unroll the loop by a power-of-two (default 8). It generates an if-then-else sequence of code prior to the loop to execute the extra iterations before entering the unrolled loop. llvm-svn: 146245	2011-12-09 06:19:40 +00:00
Andrew Trick	a8bdb7cbf1	Remove the temporary flag -disable-unroll-scev and dead code. SCEV should now be used for trip count analysis, not LoopInfo. llvm-svn: 145262	2011-11-28 19:22:09 +00:00
Devang Patel	88b4fa21c8	Initialze ScalarEvalution dependency. Patch by Pranav Bhandarkar! llvm-svn: 142556	2011-10-19 23:56:07 +00:00
Andrew Trick	f7656015fc	Inlining and unrolling heuristics should be aware of free truncs. We want heuristics to be based on accurate data, but more importantly we don't want llvm to behave randomly. A benign trunc inserted by an upstream pass should not cause a wild swings in optimization level. See PR11034. It's a general problem with threshold-based heuristics, but we can make it less bad. llvm-svn: 140919	2011-10-01 01:39:05 +00:00
Andrew Trick	31b941a60d	Enable SCEV-based unrolling by default. This changes loop unrolling to use the same mechanism for trip count computation as indvars. This is a stronger check that tends to unroll more loops. A very common side-effect is that many single iteration loops will be removed sooner. The real goal was simply to remove dependence on canonical IVs. x86 is break even. ARM performance changes to expect (+ is good): External/SPEC/CFP2000/183.equake/183.equake +13% SingleSource/Benchmarks/Dhrystone/fldry +21% MultiSource/Applications/spiff/spiff +3% SingleSource/Benchmarks/Stanford/Puzzle -14% The Puzzle regression is actually an improvement in loop optimization that defeats GVN: rdar://problem/10065079. llvm-svn: 139009	2011-09-02 17:26:28 +00:00
Andrew Trick	2b6860f0a1	Allow loop unrolling to get known trip counts from ScalarEvolution. SCEV unrolling can unroll loops with arbitrary induction variables. It is a prerequisite for -disable-iv-rewrite performance. It is also easily handles loops of arbitrary structure including multiple exits and is generally more robust. This is under a temporary option to avoid affecting default behavior for the next couple of weeks. It is needed so that I can checkin unit tests for updateUnloop. llvm-svn: 137384	2011-08-11 23:36:16 +00:00
Andrew Trick	4d0040baf8	Invoke SimplifyIndVar when we partially unroll a loop. Fixes PR10534. llvm-svn: 137203	2011-08-10 04:29:49 +00:00
Andrew Trick	1cabe54fab	Move trip count discovery outside of the generic LoopUnroll helper. This removes its dependence on canonical induction variables. llvm-svn: 135829	2011-07-23 00:33:05 +00:00

1 2 3 4

184 Commits