Commit Graph

371 Commits

Author SHA1 Message Date
Nadav Rotem
861bef7dd0 Set the default insert point to the first instruction, and not to end()
llvm-svn: 185953
2013-07-09 17:55:36 +00:00
Nadav Rotem
c9c57518ab This patch changes the saved IRBuilder insert point from BasicBlock::iterator to AssertingVH.
Commit 185883 fixes a bug in the IRBuilder that should fix the ASan bot. AssertingVH can help in exposing some RAUW problems.

Thanks Ben and Alexey!

llvm-svn: 185886
2013-07-08 23:31:13 +00:00
Nadav Rotem
2ee35771a8 Clear the builder insert point between tree-vectorization phases.
llvm-svn: 185777
2013-07-07 14:57:18 +00:00
Nadav Rotem
2041b742d4 SLPVectorizer: Implement DCE as part of vectorization.
This is a complete re-write if the bottom-up vectorization class.
Before this commit we scanned the instruction tree 3 times. First in search of merge points for the trees. Second, for estimating the cost. And finally for vectorization.
There was a lot of code duplication and adding the DCE exposed bugs. The new design is simpler and DCE was a part of the design.
In this implementation we build the tree once. After that we estimate the cost by scanning the different entries in the constructed tree (in any order). The vectorization phase also works on the built tree.

llvm-svn: 185774
2013-07-07 06:57:07 +00:00
Craig Topper
af0dea1347 Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid specifying the vector size.
llvm-svn: 185606
2013-07-04 01:31:24 +00:00
Arnold Schwaighofer
ef51cf202b LoopVectorize: Math functions only read rounding mode
Math functions are mark as readonly because they read the floating point
rounding mode. Because we don't vectorize loops that would contain function
calls that set the rounding mode it is safe to ignore this memory read.

llvm-svn: 185299
2013-07-01 00:54:44 +00:00
Benjamin Kramer
4ab72f9b9a LoopVectorizer: Pack MemAccessInfo pairs.
llvm-svn: 185263
2013-06-29 17:52:08 +00:00
Benjamin Kramer
53545693d7 Move helper classes into anonymous namespaces.
llvm-svn: 185262
2013-06-29 17:02:06 +00:00
Nadav Rotem
0a25727f31 We preserve the CFG and some of the analysis passes.
llvm-svn: 185251
2013-06-29 05:38:15 +00:00
Nadav Rotem
e00343446c Update docs.
llvm-svn: 185250
2013-06-29 05:37:19 +00:00
Nadav Rotem
060be733a5 SLP Vectorizer: Add support for trees with external users.
To support this we have to insert 'extractelement' instructions to pick the right lane.
We had this functionality before but I removed it when we moved to the multi-block design because it was too complicated.

llvm-svn: 185230
2013-06-28 22:07:09 +00:00
Nadav Rotem
9ce3fedcdd LoopVectorizer: Refactor the code that checks if it is safe to predicate blocks.
In this code we keep track of pointers that we are allowed to read from, if they are accessed by non-predicated blocks.
We use this list to allow vectorization of conditional loads in predicated blocks because we know that these addresses don't segfault.

llvm-svn: 185214
2013-06-28 20:46:27 +00:00
Arnold Schwaighofer
ce2c766f61 LoopVectorize: Pull dyn_cast into setDebugLocFromInst
llvm-svn: 185168
2013-06-28 17:14:48 +00:00
Arnold Schwaighofer
3b27b992ca LoopVectorize: Use static function instead of DebugLocSetter class
I used the class to safely reset the state of the builder's debug location.  I
think I have caught all places where we need to set the debug location to a new
one. Therefore, we can replace the class by a function that just sets the debug
location.

llvm-svn: 185165
2013-06-28 16:26:54 +00:00
Arnold Schwaighofer
12ecb331af LoopVectorize: Preserve debug location info
radar://14169017

llvm-svn: 185122
2013-06-28 00:38:54 +00:00
Arnold Schwaighofer
38de7cd464 LoopVectorize: Cache edge masks created during if-conversion
Otherwise, we end up with an exponential IR blowup.
Fixes PR16472.

llvm-svn: 185097
2013-06-27 20:31:06 +00:00
Arnold Schwaighofer
a2dd195fb3 LoopVectorize: Use vectorized loop invariant gep index anchored in loop
Use vectorized instruction instead of original instruction anchored in the
original loop.

Fixes PR16452 and t2075.c of PR16455.

llvm-svn: 185081
2013-06-27 15:11:55 +00:00
Arnold Schwaighofer
ccd6c9929b LoopVectorize: Don't store a reversed value in the vectorized value map
When we store values for reversed induction stores we must not store the
reversed value in the vectorized value map. Another instruction might use this
value.

This fixes 3 test cases of PR16455.

llvm-svn: 185051
2013-06-27 00:45:41 +00:00
Nadav Rotem
8edefb3665 No need to use a Set when a vector would do.
llvm-svn: 185047
2013-06-27 00:14:13 +00:00
Nadav Rotem
93f880fb77 SLP: When searching for vectorization opportunities scan the blocks in post-order because we grow chains upwards.
llvm-svn: 185041
2013-06-26 23:44:45 +00:00
Nadav Rotem
7f0d6d7975 SLP: Dont erase instructions during vectorization because it prevents the outerloops from iterating over the instructions.
llvm-svn: 185040
2013-06-26 23:43:23 +00:00
Nadav Rotem
4c5b2d1de6 Erase all of the instructions that we RAUWed
llvm-svn: 184969
2013-06-26 17:16:09 +00:00
Nadav Rotem
f4ca3994b8 Do not add cse-ed instructions into the visited map because we dont want to consider them as a candidate for replacement of instructions to be visited.
llvm-svn: 184966
2013-06-26 16:54:53 +00:00
Nadav Rotem
0794acc1da SLPVectorizer: support slp-vectorization of PHINodes between basic blocks
llvm-svn: 184888
2013-06-25 23:04:09 +00:00
Nadav Rotem
3de032a3b6 Fix a typo in the code that collected the costs recursively.
llvm-svn: 184827
2013-06-25 05:30:56 +00:00
Nadav Rotem
9c7c997a7e Rename the variable to fix a warning. Thanks Andy Gibbs.
llvm-svn: 184749
2013-06-24 15:59:47 +00:00
Arnold Schwaighofer
b252c11ccc Reapply 184685 after the SetVector iteration order fix.
This should hopefully have fixed the stage2/stage3 miscompare on the dragonegg
testers.

"LoopVectorize: Use the dependence test utility class

We now no longer need alias analysis - the cases that alias analysis would
handle are now handled as accesses with a large dependence distance.

We can now vectorize loops with simple constant dependence distances.

  for (i = 8; i < 256; ++i) {
    a[i] = a[i+4] * a[i+8];
  }

  for (i = 8; i < 256; ++i) {
    a[i] = a[i-4] * a[i-8];
  }

We would be able to vectorize about 200 more loops (in many cases the cost model
instructs us no to) in the test suite now. Results on x86-64 are a wash.

I have seen one degradation in ammp. Interestingly, the function in which we
now vectorize a loop is never executed so we probably see some instruction
cache effects. There is a 2% improvement in h264ref. There is one or the other
TSCV loop kernel that speeds up.

radar://13681598"

llvm-svn: 184724
2013-06-24 12:09:15 +00:00
Arnold Schwaighofer
91472fa4fc LoopVectorize: Use SetVector for the access set
We are creating the runtime checks using this set so we need a deterministic
iteration order.

llvm-svn: 184723
2013-06-24 12:09:12 +00:00
Arnold Schwaighofer
58ca945f38 Revert "LoopVectorize: Use the dependence test utility class"
This reverts commit cbfa1ca993363ca5c4dbf6c913abc957c584cbac.

We are seeing a stage2 and stage3 miscompare on some dragonegg bots.

llvm-svn: 184690
2013-06-24 06:10:41 +00:00
Arnold Schwaighofer
b914a7e2ef LoopVectorize: Use the dependence test utility class
We now no longer need alias analysis - the cases that alias analysis would
handle are now handled as accesses with a large dependence distance.

We can now vectorize loops with simple constant dependence distances.

  for (i = 8; i < 256; ++i) {
    a[i] = a[i+4] * a[i+8];
  }

  for (i = 8; i < 256; ++i) {
    a[i] = a[i-4] * a[i-8];
  }

We would be able to vectorize about 200 more loops (in many cases the cost model
instructs us no to) in the test suite now. Results on x86-64 are a wash.

I have seen one degradation in ammp. Interestingly, the function in which we
now vectorize a loop is never executed so we probably see some instruction
cache effects. There is a 2% improvement in h264ref. There is one or the other
TSCV loop kernel that speeds up.

radar://13681598

llvm-svn: 184685
2013-06-24 03:55:48 +00:00
Arnold Schwaighofer
d517976758 LoopVectorize: Add utility class for checking dependency among accesses
This class checks dependences by subtracting two Scalar Evolution access
functions allowing us to catch very simple linear dependences.

The checker assumes source order in determining whether vectorization is safe.
We currently don't reorder accesses.
Positive true dependencies need to be a multiple of VF otherwise we impede
store-load forwarding.

llvm-svn: 184684
2013-06-24 03:55:45 +00:00
Arnold Schwaighofer
d57419696d LoopVectorize: Add utility class for building sets of dependent accesses
Sets of dependent accesses are built by unioning sets based on underlying
objects. This class will be used by the upcoming dependence checker.

llvm-svn: 184683
2013-06-24 03:55:44 +00:00
Nadav Rotem
210e86d7c4 SLP Vectorizer: Add support for vectorizing parts of the tree.
Untill now we detected the vectorizable tree and evaluated the cost of the
entire tree.  With this patch we can decide to trim-out branches of the tree
that are not profitable to vectorizer.

Also, increase the max depth from 6 to 12. In the worse possible case where all
of the code is made of diamond-shaped graph this can bring the cost to 2**10,
but diamonds are not very common.

llvm-svn: 184681
2013-06-24 02:52:43 +00:00
Nadav Rotem
0323925d51 SLP Vectorizer: Fix a bug in the code that does CSE on the generated gather sequences.
Make sure that we don't replace and RAUW two sequences if one does not dominate the other.

llvm-svn: 184674
2013-06-23 21:57:27 +00:00
Nadav Rotem
78428401e9 SLP Vectorizer: Erase instructions outside the vectorizeTree method.
The RAII builder location guard is saving a reference to instructions, so we can't erase instructions during vectorization.

llvm-svn: 184671
2013-06-23 19:38:56 +00:00
Nadav Rotem
eb65e67eea SLP Vectorizer: Implement a simple CSE optimization for the gather sequences.
llvm-svn: 184660
2013-06-23 06:15:46 +00:00
Nadav Rotem
80de0a28f1 SLP Vectorizer: Implement multi-block slp-vectorization.
Rewrote the SLP-vectorization as a whole-function vectorization pass. It is now able to vectorize chains across multiple basic blocks.
It still does not vectorize PHIs, but this should be easy to do now that we scan the entire function.
I removed the support for extracting values from trees.
We are now able to vectorize more programs, but there are some serious regressions in many workloads (such as flops-6 and mandel-2).

llvm-svn: 184647
2013-06-22 21:34:10 +00:00
Nadav Rotem
e1713e5fcf SLP Vectorizer: do not search for store-chains that are wider than the vector-register size.
llvm-svn: 184527
2013-06-21 04:18:13 +00:00
Nadav Rotem
b488beefeb Clang-format the SLP vectorizer. No functionality change.
llvm-svn: 184446
2013-06-20 17:54:36 +00:00
Nadav Rotem
14a89c5428 SLPVectorization: Add a basic support for cross-basic block slp vectorization.
We collect gather sequences when we vectorize basic blocks. Gather sequences are excellent
hints for vectorization of other basic blocks.

llvm-svn: 184444
2013-06-20 17:41:45 +00:00
Nadav Rotem
c41028a013 Change the debug type to match the debug type that is used by vecutils.cpp.
This change makes it easier to filter debug messages.

llvm-svn: 184440
2013-06-20 16:38:05 +00:00
Nadav Rotem
1e9668ea81 SLPVectorizer: handle scalars that are extracted from vectors (using ExtractElementInst).
llvm-svn: 184325
2013-06-19 17:33:16 +00:00
Nadav Rotem
86e848c849 SLPVectorizer: start constructing chains at stores that are not power of two.
The type <3 x i8> is a common in graphics and we want to be able to vectorize it.

This changes accelerates bullet by 12% and 471_omnetpp by 5%.

llvm-svn: 184317
2013-06-19 15:57:29 +00:00
Nadav Rotem
e98da7f548 SLPVectorizer: vectorize compares and selects.
llvm-svn: 184282
2013-06-19 05:49:52 +00:00
Nadav Rotem
4f3224f3ed Document the return value and fix a typo.
llvm-svn: 184281
2013-06-19 05:47:33 +00:00
Nadav Rotem
1f96427da0 Scan the successor blocks and use the PHI nodes as a hint for possible chain roots.
llvm-svn: 184201
2013-06-18 15:58:05 +00:00
Nadav Rotem
3349feac4e Add a return value to make this function more useful.
llvm-svn: 184200
2013-06-18 15:57:12 +00:00
Pekka Jaaskelainen
eb90fd1c3b Fix for a regression caused by the LoopVectorizer when
vectorizing loops with memory accesses to non-zero address spaces. It
simply dropped the AS info. Fixes PR16306.

llvm-svn: 184103
2013-06-17 18:49:06 +00:00
Arnold Schwaighofer
7b1b4db35e LoopVectorize: Change API call to get the backedge taken count
Use ScalarEvolution's getBackedgeTakenCount API instead of getExitCount since
that is really what we want to know. Using the more specific getExitCount was
safe because we made sure that there is only one exiting block.

No functionality change.

llvm-svn: 183047
2013-05-31 21:48:56 +00:00
Arnold Schwaighofer
70a9be5297 LoopVectorize: PHIs with only outside users should prevent vectorization
We check that instructions in the loop don't have outside users (except if
they are reduction values). Unfortunately, we skipped this check for
if-convertable PHIs.

Fixes PR16184.

llvm-svn: 183035
2013-05-31 19:53:50 +00:00