Commit Graph

700 Commits

Author SHA1 Message Date
Michael Kuperstein
3d23d4a234 [LV] Don't vectorize when we have a small static bound on trip count
We currently check if the exact trip count is known and is smaller than the
"tiny loop" bound. We should be checking the maximum bound on the trip count
instead.

Differential Revision: https://reviews.llvm.org/D27690

llvm-svn: 289583
2016-12-13 20:38:18 +00:00
Matthew Simpson
364da7e527 [LV] Scalarize operands of predicated instructions
This patch attempts to scalarize the operand expressions of predicated
instructions if they were conditionally executed in the original loop. After
scalarization, the expressions will be sunk inside the blocks created for the
predicated instructions. The transformation essentially performs
un-if-conversion on the operands.

The cost model has been updated to determine if scalarization is profitable. It
compares the cost of a vectorized instruction, assuming it will be
if-converted, to the cost of the scalarized instruction, assuming that the
instructions corresponding to each vector lane will be sunk inside a predicated
block, possibly avoiding execution. If it's more profitable to scalarize the
entire expression tree feeding the predicated instruction, the expression will
be scalarized; otherwise, it will be vectorized. We only consider the cost of
the entire expression to accurately estimate the cost of the required
insertelement and extractelement instructions.

Differential Revision: https://reviews.llvm.org/D26083

llvm-svn: 288909
2016-12-07 15:03:32 +00:00
Mandeep Singh Grang
000ce9a686 [LoopVectorize] Fix for non-determinism in codegen
Summary: This patch fixes issues in codegen uncovered due to https://reviews.llvm.org/D26718

Reviewers: mssimpso

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D26727

llvm-svn: 287135
2016-11-16 18:53:17 +00:00
Robert Lougher
b0905209dd [LoopVectorizer] When estimating reg usage, unused insts may "end" another use
The register usage algorithm incorrectly treats instructions whose value is
not used within the loop (e.g. those that do not produce a value).

The algorithm first calculates the usages within the loop.  It iterates over
the instructions in order, and records at which instruction index each use
ends (in fact, they're actually recorded against the next index, as this is
when we want to delete them from the open intervals).

The algorithm then iterates over the instructions again, adding each
instruction in turn to a list of open intervals.  Instructions are then
removed from the list of open intervals when they occur in the list of uses
ended at the current index.

The problem is, instructions which are not used in the loop are skipped.
However, although they aren't used, the last use of a value may have been
recorded against that instruction index.  In this case, the use is not deleted
from the open intervals, which may then bump up the estimated register usage.

This patch fixes the issue by simply moving the "is used" check after the loop
which erases the uses at the current index.

Differential Revision: https://reviews.llvm.org/D26554

llvm-svn: 286969
2016-11-15 14:27:33 +00:00
Florian Hahn
4b4dc172e7 Test commit, remove trailing space.
This commit is used to test commit access.

llvm-svn: 286957
2016-11-15 13:28:42 +00:00
Adam Nemet
9bfbf8bbdf [LV] Stop saying "use -Rpass-analysis=loop-vectorize"
This is PR28376.

Unfortunately given the current structure of optimization diagnostics we
lack the capability to tell whether the user has
passed -Rpass-analysis=loop-vectorize since this is local to the
front-end (BackendConsumer::OptimizationRemarkHandler).

So rather than printing this even if the user has already
passed -Rpass-analysis, this patch just punts and stops recommending
this option.  I don't think that getting this right is worth the
complexity.

Differential Revision: https://reviews.llvm.org/D26563

llvm-svn: 286662
2016-11-11 22:51:46 +00:00
Dehao Chen
d74e1e161d Reset debug loc to OldInduction in InnerLoopVectorizer::createInductionVariable. (NFC)
This is to prevent SetInsertionPoint from setting debug loc to Latch->getTerminator().

llvm-svn: 286159
2016-11-07 21:59:40 +00:00
Dorit Nuzman
bf2c15b5dc Second attempt at r285517.
llvm-svn: 285568
2016-10-31 13:17:31 +00:00
Dorit Nuzman
06903d16af Revert r285517 due to build failures.
llvm-svn: 285518
2016-10-30 14:34:57 +00:00
Dorit Nuzman
3c1c658f24 [LoopVectorize] Make interleaved-accesses analysis less conservative about
possible pointer-wrap-around concerns, in some cases.

Before this patch, collectConstStridedAccesses (part of interleaved-accesses
analysis) called getPtrStride with [Assume=false, ShouldCheckWrap=true] when
examining all candidate pointers. This is too conservative. Instead, this
patch makes collectConstStridedAccesses use an optimistic approach, calling
getPtrStride with [Assume=true, ShouldCheckWrap=false], and then, once the
candidate interleave groups have been formed, revisits the pointer-wrapping
analysis but only where it matters: namely, in groups that have gaps, and where
the gaps are not at the very end of the group (in which case the loop is
peeled). This second time getPtrStride is called with [Assume=false,
ShouldCheckWrap=true], but this could further be improved to using Assume=true,
once we also add the logic to track that we are not going to meet the scev
runtime checks threshold.

Differential Revision: https://reviews.llvm.org/D25276

llvm-svn: 285517
2016-10-30 12:23:26 +00:00
Matthew Simpson
c62266d680 [LV] Sink scalar operands of predicated instructions
When we predicate an instruction (div, rem, store) we place the instruction in
its own basic block within the vectorized loop. If a predicated instruction has
scalar operands, it's possible to recursively sink these scalar expressions
into the predicated block so that they might avoid execution. This patch sinks
as much scalar computation as possible into predicated blocks. We previously
were able to sink such operands only if they were extractelement instructions.

Differential Revision: https://reviews.llvm.org/D25632

llvm-svn: 285097
2016-10-25 18:59:45 +00:00
Matthew Simpson
41fa838f07 [LV] Avoid emitting trivially dead instructions
Some instructions from the original loop, when vectorized, can become trivially
dead. This happens because of the way we structure the new loop. For example,
we create new induction variables and induction variable "steps" in the new
loop. Thus, when we go to vectorize the original induction variable update, it
may no longer be needed due to the instructions we've already created. This
patch prevents us from creating these redundant instructions. This reduces code
size before simplification and allows greater flexibility in code generation
since we have fewer unnecessary instruction uses.

Differential Revision: https://reviews.llvm.org/D25631

llvm-svn: 284631
2016-10-19 19:22:02 +00:00
Matthew Simpson
1d4b163fc0 [LV] Account for predicated stores in instruction costs
This patch ensures that we scale the estimated cost of predicated stores by
block probability. This is a follow-on patch for r284123.

llvm-svn: 284126
2016-10-13 14:54:31 +00:00
Matthew Simpson
6cdb5a6f96 [LV] Avoid rounding errors for predicated instruction costs
This patch modifies the cost calculation of predicated instructions (div and
rem) to avoid the accumulation of rounding errors due to multiple truncating
integer divisions. The calculation for predicated stores will be addressed in a
follow-on patch since we currently don't scale the cost of predicated stores by
block probability.

Differential Revision: https://reviews.llvm.org/D25333

llvm-svn: 284123
2016-10-13 14:19:48 +00:00
Matthew Simpson
a371c14ffe [LV] Don't mark multi-use branch conditions uniform
Previously, we marked the branch conditions of latch blocks uniform after
vectorization if they were instructions contained in the loop. However, if a
condition instruction has users other than the branch, it may not remain
uniform. This patch ensures the conditions we mark uniform are only used by the
branch. This should fix PR30627.

Reference: https://llvm.org/bugs/show_bug.cgi?id=30627
llvm-svn: 283563
2016-10-07 15:20:13 +00:00
Matthew Simpson
a58c50dff0 [LV] Pass profitability analysis in vectorizer constructor (NFC)
The vectorizer already holds a pointer to one cost model artifact in a member
variable (i.e., MinBWs). As we add more, it will be easier to communicate these
artifacts to the vectorizer if we simply pass a pointer to the cost model
instead.

llvm-svn: 283373
2016-10-05 20:23:46 +00:00
Matthew Simpson
386546124f [LV] Pass legality analysis in vectorizer constructor (NFC)
The vectorizer already holds a pointer to the legality analysis in a member
variable, so it makes sense that we would pass it in the constructor.

llvm-svn: 283368
2016-10-05 19:53:20 +00:00
Matthew Simpson
6a8e0bcf3d [LV] Remove obsolete comment (NFC)
llvm-svn: 283365
2016-10-05 19:19:49 +00:00
Matthew Simpson
ee3fdc7e26 [LV] Use getScalarizationOverhead in memory instruction costs (NFC)
This patch refactors the cost estimation of scalarized loads and stores to
reuse getScalarizationOverhead for the cost of the extractelement and
insertelement instructions we might create. The existing code accounted for
this cost, but it was functionally equivalent to the helper function.

llvm-svn: 283364
2016-10-05 19:11:54 +00:00
Matthew Simpson
1755d81b29 [LV] Add helper function for predicated block probability (NFC)
The cost model has to estimate the probability of executing predicated blocks.
However, we currently always assume predicated blocks have a 50% chance of
executing (this value is hardcoded in several places throughout the code).
Since we always use the same value, this patch adds a helper function for
getting this uniform probability. The function simplifies some comments and
makes our assumptions more clear. In the future, we may want to extend this
with actual block probability information if it's available.

llvm-svn: 283354
2016-10-05 18:30:36 +00:00
Matthew Simpson
c631167609 [LV] Add isScalarWithPredication helper function (NFC)
This patch adds a single helper function for checking if an instruction will be
scalarized with predication. Such instructions include conditional stores and
instructions that may divide by zero. Existing checks have been updated to use
the new function.

llvm-svn: 283350
2016-10-05 17:52:34 +00:00
Matthew Simpson
7808833e28 [LV] Build all scalar steps for non-uniform induction variables
When building the steps for scalar induction variables, we previously attempted
to determine if all the scalar users of the induction variable were uniform. If
they were, we would only emit the step corresponding to vector lane zero. This
optimization was too aggressive. We generally don't know the entire set of
induction variable users that will be scalar. We have
isScalarAfterVectorization, but this is only a conservative estimate of the
instructions that will be scalarized. Thus, an induction variable may have
scalar users that aren't already known to be scalar. To avoid emitting unused
steps, we can only check that the induction variable is uniform. This should
fix PR30542.

Reference: https://llvm.org/bugs/show_bug.cgi?id=30542
llvm-svn: 282863
2016-09-30 15:13:52 +00:00
Adam Nemet
951c6b1955 [LV] Port the remarks in processLoop to the new streaming API
This completes LV.

llvm-svn: 282821
2016-09-30 00:29:30 +00:00
Adam Nemet
4fd9c42279 [LV] Port the last opt remark in Hints to the new streaming interface
llvm-svn: 282820
2016-09-30 00:29:25 +00:00
Adam Nemet
877ccee8cc [LAA, LV] Port to new streaming interface for opt remarks. Update LV
(Recommit after making sure IsVerbose gets properly initialized in
DiagnosticInfoOptimizationBase.  See previous commit that takes care of
this.)

OptimizationRemarkAnalysis directly takes the role of the report that is
generated by LAA.

Then we need the magic to be able to turn an LAA remark into an LV
remark.  This is done via a new OptimizationRemark ctor.

llvm-svn: 282813
2016-09-30 00:01:30 +00:00
Adam Nemet
556a06b1ee Revert "[LAA, LV] Port to new streaming interface for opt remarks. Update LV"
This reverts commit r282758.

There are some clang failures I haven't seen.

llvm-svn: 282759
2016-09-29 20:17:37 +00:00
Adam Nemet
c1d21817d1 [LAA, LV] Port to new streaming interface for opt remarks. Update LV
OptimizationRemarkAnalysis directly takes the role of the report that is
generated by LAA.

Then we need the magic to be able to turn an LAA remark into an LV
remark.  This is done via a new OptimizationRemark ctor.

llvm-svn: 282758
2016-09-29 20:12:18 +00:00
Adam Nemet
3628282a77 [LV] Port OptimizationRemarkAnalysisFPCommute and
OptimizationRemarkAnalysisAliasing to new streaming API for opt remarks

llvm-svn: 282742
2016-09-29 18:04:47 +00:00
Adam Nemet
6e1edd5d1f [LV] Convert processLoop to new streaming API for opt remarks
llvm-svn: 282740
2016-09-29 17:55:13 +00:00
Adam Nemet
eb0ba8d50f [LV] Move static createMissedAnalysis from anonymous to global namespace
This is an attempt to fix a windows bot.

llvm-svn: 282730
2016-09-29 17:25:00 +00:00
Adam Nemet
0bfa441701 [LV] Convert CostModel to use the new streaming opt remark API
Here we can already remove the member function emitAnalysis.

llvm-svn: 282729
2016-09-29 17:15:48 +00:00
Adam Nemet
70757dd95a [LV] Split most of createMissedAnalysis into a static function. NFC
This will be shared between Legality and CostModel.

llvm-svn: 282728
2016-09-29 17:05:35 +00:00
Adam Nemet
9988ca3db3 [LV] Convert all but one opt remark in Legality to new streaming interface
The last one remaining after which emitAnalysis can be removed is when
we convert the LAA's report to a vectorization report.  This requires
converting LAA to the new interface first.

llvm-svn: 282726
2016-09-29 16:49:42 +00:00
Adam Nemet
9a1a5ef212 [LV] Convert emitRemark to new opt remark streaming interface
Also renamed the function to emitRemarkWithHints to better reflect what
the function actually does.

llvm-svn: 282723
2016-09-29 16:23:12 +00:00
Adam Nemet
04758ba385 Shorten DiagnosticInfoOptimizationRemark* to OptimizationRemark*. NFC
With the new streaming interface, these class names need to be typed a
lot and it's way too looong.

llvm-svn: 282544
2016-09-27 22:19:23 +00:00
Matthew Simpson
b764aba2ab [LV] Scalarize instructions marked scalar after vectorization
This patch ensures that we actually scalarize instructions marked scalar after
vectorization. Previously, such instructions may have been vectorized instead.

Differential Revision: https://reviews.llvm.org/D23889

llvm-svn: 282418
2016-09-26 17:08:37 +00:00
Matthew Simpson
15869f86d8 [LV] Don't emit unused scalars for uniform instructions
If we identify an instruction as uniform after vectorization, we know that we
should only use the value corresponding to the first vector lane of each unroll
iteration. However, when scalarizing such instructions, we still produce values
for the other vector lanes. This patch prevents us from generating the unused
scalars.

Differential Revision: https://reviews.llvm.org/D24275

llvm-svn: 282087
2016-09-21 16:50:24 +00:00
Matthew Simpson
a95e8bb7ed [LV] Rename "Width" to "Lane" (NFC)
llvm-svn: 282083
2016-09-21 16:09:23 +00:00
Elena Demikhovsky
5f8cc0c346 [Loop Vectorizer] Consecutive memory access - fixed and simplified
Amended consecutive memory access detection in Loop Vectorizer.
Load/Store were not handled properly without preceding GEP instruction.

Differential Revision: https://reviews.llvm.org/D20789

llvm-svn: 281853
2016-09-18 13:56:08 +00:00
Elena Demikhovsky
a1a0e7ddbe [Loop vectorizer] Simplified GEP cloning. NFC.
Simplified GEP cloning in vectorizeMemoryInstruction().
Added an assertion that checks consecutive GEP, which should have only one loop-variant operand.

Differential Revision: https://reviews.llvm.org/D24557

llvm-svn: 281851
2016-09-18 09:22:54 +00:00
Matthew Simpson
b25e87fca5 [LV] Process pointer IVs with PHINodes in collectLoopUniforms
This patch moves the processing of pointer induction variables in
collectLoopUniforms from the consecutive pointer phase of the analysis to the
phi node phase. Previously, if a pointer induction variable was used by both a
scalarized non-memory instruction as well as a vectorized memory instruction,
we would incorrectly identify the pointer as uniform. Pointer induction
variables should be treated the same as other phi nodes. That is, they are
uniform if all users of the induction variable and induction variable update
are uniform.

Differential Revision: https://reviews.llvm.org/D24511

llvm-svn: 281485
2016-09-14 14:47:40 +00:00
Matthew Simpson
81335bec96 [LV] Clean up uniform induction variable analysis (NFC)
llvm-svn: 281368
2016-09-13 19:01:45 +00:00
Matthew Simpson
bfe5e1817b [LV] Ensure proper handling of multi-use case when collecting uniforms
The test case included in r280979 wasn't checking what it was supposed to be
checking for the predicated store case. Fixing the test revealed that the
multi-use case (when a pointer is used by both vectorized and scalarized memory
accesses) wasn't being handled properly. We can't skip over
non-consecutive-like pointers since they may have looked consecutive-like with
a different memory access.

llvm-svn: 280992
2016-09-08 21:38:26 +00:00
Matthew Simpson
408a3abcfe [LV] Don't mark pointers used by scalarized memory accesses uniform
Previously, all consecutive pointers were marked uniform after vectorization.
However, if a consecutive pointer is used by a memory access that is eventually
scalarized, the pointer won't remain uniform after all. An example is
predicated stores. Even though a predicated store may be consecutive, it will
still be scalarized, making it's pointer operand non-uniform.

This patch updates the logic in collectLoopUniforms to consider the cases where
a memory access may be scalarized. If a memory access may be scalarized, its
pointer operand is not marked uniform. The determination of whether a given
memory instruction will be scalarized or not has been moved into a common
function that is used by the vectorizer, cost model, and legality analysis.

Differential Revision: https://reviews.llvm.org/D24271

llvm-svn: 280979
2016-09-08 19:11:07 +00:00
Matthew Simpson
b65c230eab [LV] Ensure reverse interleaved group GEPs remain uniform
For uniform instructions, we're only required to generate a scalar value for
the first vector lane of each unroll iteration. Thus, if we have a reverse
interleaved group, computing the member index off the scalar GEP corresponding
to the last vector lane of its pointer operand technically makes the GEP
non-uniform. We should compute the member index off the first scalar GEP
instead.

I've added the updated member index computation to the existing reverse
interleaved group test.

llvm-svn: 280497
2016-09-02 16:19:22 +00:00
Matthew Simpson
9e7851ed31 [LV] Use ScalarParts for ad-hoc pointer IV scalarization (NFCI)
We can now maintain scalar values in VectorLoopValueMap. Thus, we no longer
have to create temporary vectors with insertelement instructions when handling
pointer induction variables. This case was mistakenly missed from r279649 when
refactoring the other scalarization code.

llvm-svn: 280405
2016-09-01 19:40:19 +00:00
Matthew Simpson
922af076c7 [LV] Move VectorParts allocation and mapping into PHI widening (NFC)
This patch moves the allocation of VectorParts for PHI nodes into the actual
PHI widening code. Previously, we allocated these VectorParts in
vectorizeBlockInLoop, and passed them by reference to widenPHIInstruction. Upon
returning, we would then map the VectorParts in VectorLoopValueMap. This
behavior is problematic for the cases where we only want to generate a scalar
version of a PHI node. For example, if in the future we only generate a scalar
version of an induction variable, we would end up inserting an empty vector
entry into the map once we return to vectorizeBlockInLoop. We now no longer
need to pass VectorParts to the various PHI widening functions, and we can keep
VectorParts allocation as close as possible to the point at which they are
actually mapped in VectorLoopValueMap.

llvm-svn: 280390
2016-09-01 18:14:27 +00:00
Michael Kuperstein
2954d1db77 [LoopVectorizer] Predicate instructions in blocks with several incoming edges
We don't need to limit predication to blocks that have a single incoming
edge, we just need to use the right mask.
This fixes PR30172.

Differential Revision: https://reviews.llvm.org/D24009

llvm-svn: 280148
2016-08-30 20:22:21 +00:00
Matthew Simpson
df19502b16 [LV] Move insertelement sequence after scalar definitions
After r279649 when getting a vector value from VectorLoopValueMap, we create an
insertelement sequence on-demand if the value has been scalarized instead of
vectorized. We previously inserted this insertelement sequence before the
value's first vector user. However, this insert location is problematic if that
user is the phi node of a first-order recurrence. With this patch, we move the
insertelement sequence after the last scalar instruction we created when
scalarizing the value. Thus, the value's vector definition in the new loop will
immediately follow its scalar definitions. This should fix PR30183.

Reference: https://llvm.org/bugs/show_bug.cgi?id=30183
llvm-svn: 280001
2016-08-29 20:14:04 +00:00
Matthew Simpson
abd2be1e2e [LV] Unify vector and scalar maps
This patch unifies the data structures we use for mapping instructions from the
original loop to their corresponding instructions in the new loop. Previously,
we maintained two distinct maps for this purpose: WidenMap and ScalarIVMap.
WidenMap maintained the vector values each instruction from the old loop was
represented with, and ScalarIVMap maintained the scalar values each scalarized
induction variable was represented with. With this patch, all values created
for the new loop are maintained in VectorLoopValueMap.

The change allows for several simplifications. Previously, when an instruction
was scalarized, we had to insert the scalar values into vectors in order to
maintain the mapping in WidenMap. Then, if a user of the scalarized value was
also scalar, we had to extract the scalar values from the temporary vector we
created. We now aovid these unnecessary scalar-to-vector-to-scalar conversions.
If a scalarized value is used by a scalar instruction, the scalar value is used
directly. However, if the scalarized value is needed by a vector instruction,
we generate the needed insertelement instructions on-demand.

A common idiom in several locations in the code (including the scalarization
code), is to first get the vector values an instruction from the original loop
maps to, and then extract a particular scalar value. This patch adds
getScalarValue for this purpose along side getVectorValue as an interface into
VectorLoopValueMap. These functions work together to return the requested
values if they're available or to produce them if they're not.

The mapping has also be made less permissive. Entries can be added to
VectorLoopValue map with the new initVector and initScalar functions.
getVectorValue has been modified to return a constant reference to the mapped
entries.

There's no real functional change with this patch; however, in some cases we
will generate slightly different code. For example, instead of an insertelement
sequence following the definition of an instruction, it will now precede the
first use of that instruction. This can be seen in the test case changes.

Differential Revision: https://reviews.llvm.org/D23169

llvm-svn: 279649
2016-08-24 18:23:17 +00:00