Commit Graph

334 Commits

Author SHA1 Message Date
Evgeniy Stepanov
95294127d0 Revert "[GVNHoist] Move GVNHoist to function simplification part of pipeline."
This reverts r289696, which caused TSan perf regression.

See PR31382.

llvm-svn: 290030
2016-12-17 01:53:15 +00:00
Dehao Chen
a99e082e15 Create SampleProfileLoader pass in llvm instead of clang
Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder.

Reviewers: tejohnson, davidxl, dnovillo

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D27743

llvm-svn: 289714
2016-12-14 21:40:47 +00:00
Geoff Berry
ca11a1e147 [GVNHoist] Move GVNHoist to function simplification part of pipeline.
Summary:
Move GVNHoist to later in the optimization pipeline, specifically, to
the function simplification part of the pipeline.  The new pipeline
location allows GVNHoist to run on a function after its callees have
been inlined but before the function has been considered for inlining
into its callers, exposing more opportunities for hoisting.

Performance results on AArch64 kryo:
Improvements:
  Benchmarks/CoyoteBench/fftbench  -24.952%
  spec2006/bzip2                    -4.071%
  internal bmark                    -3.177%
  Benchmarks/PAQ8p/paq8p            -1.754%
  spec2000/perlbmk                  -1.328%
  spec2006/h264ref                  -1.140%

Regressions:
  internal bmark                    +1.818%
  Benchmarks/mafft/pairlocalalign   +1.084%

Reviewers: sebpop, dberlin, hiraditya

Subscribers: aemerson, mehdi_amini, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D27722

llvm-svn: 289696
2016-12-14 19:38:22 +00:00
Dehao Chen
23025f8483 revert r289669 which breaks bots
llvm-svn: 289676
2016-12-14 17:23:16 +00:00
Dehao Chen
cb61c94d87 Create SampleProfileLoader pass in llvm instead of clang
Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder.

Reviewers: tejohnson, davidxl, dnovillo

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D27743

llvm-svn: 289669
2016-12-14 16:49:28 +00:00
Peter Collingbourne
f72a8d4e08 Introduce GlobalSplit pass.
This pass splits globals into elements using inrange annotations on
getelementptr indices.

Differential Revision: https://reviews.llvm.org/D22295

llvm-svn: 287178
2016-11-16 23:40:26 +00:00
Dehao Chen
5492f8646c Add comments about why we put LoopSink pass at the very late stage.
llvm-svn: 286480
2016-11-10 17:42:18 +00:00
Dehao Chen
947dbe1254 Enable Loop Sink pass for functions that has profile.
Summary: For functions with profile data, we are confident that loop sink will be optimal in sinking code.

Reviewers: davidxl, hfinkel

Subscribers: mehdi_amini, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D26155

llvm-svn: 286325
2016-11-09 00:58:19 +00:00
Rong Xu
1c0e9b97d2 Conditionally eliminate library calls where the result value is not used
Summary:
This pass shrink-wraps a condition to some library calls where the call
result is not used. For example:
   sqrt(val);
 is transformed to
   if (val < 0)
     sqrt(val);
Even if the result of library call is not being used, the compiler cannot
safely delete the call because the function can set errno on error
conditions.
Note in many functions, the error condition solely depends on the incoming
parameter. In this optimization, we can generate the condition can lead to
the errno to shrink-wrap the call. Since the chances of hitting the error
condition is low, the runtime call is effectively eliminated.

These partially dead calls are usually results of C++ abstraction penalty
exposed by inlining. This optimization hits 108 times in 19 C/C++ programs
in SPEC2006.

Reviewers: hfinkel, mehdi_amini, davidxl

Subscribers: modocache, mgorny, mehdi_amini, xur, llvm-commits, beanz

Differential Revision: https://reviews.llvm.org/D24414

llvm-svn: 284542
2016-10-18 21:36:27 +00:00
Mehdi Amini
732afdd09a Turn cl::values() (for enum) from a vararg function to using C++ variadic template
The core of the change is supposed to be NFC, however it also fixes
what I believe was an undefined behavior when calling:

 va_start(ValueArgs, Desc);

with Desc being a StringRef.

Differential Revision: https://reviews.llvm.org/D25342

llvm-svn: 283671
2016-10-08 19:41:06 +00:00
Teresa Johnson
fbb431b292 [ThinLTO] Ensure anonymous globals renamed even at -O0
Summary:
This fixes an issue when files are compiled with -flto=thin
at default -O0. We need to rename anonymous globals before attempting
to write the module summary because all values need names for
the summary. This was happening at -O1 and above, but not before
the early exit when constructing the pipeline for -O0.

Also add an internal -prepare-for-thinlto option to enable this
to be tested via opt.

Fixes PR30419.

Reviewers: mehdi_amini

Subscribers: probinson, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24701

llvm-svn: 281840
2016-09-17 20:40:16 +00:00
Mehdi Amini
27d2379b4e Rename NameAnonFunctions to NameAnonGlobals to match what it is doing (NFC)
llvm-svn: 281745
2016-09-16 16:56:30 +00:00
Teresa Johnson
8c1bc986b5 [ThinLTO] Indirect call promotion fixes for promoted local functions
Summary:
Fix a couple issues limiting the application of indirect call promotion
in ThinLTO mode:
- Invoke indirect call promotion before globalopt, since it may
  eliminate imported functions which appear unreferenced.
- Invoke indirect call promotion with InLTO=true so that the PGOFuncName
  metadata is used to get the name for locals which would have been
  renamed during promotion.

Reviewers: davidxl, mehdi_amini

Subscribers: Prazek, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24004

llvm-svn: 280024
2016-08-29 22:46:56 +00:00
Easwaran Raman
61edc107bb Add a new method to create SimpleInliner instance and make pre-inliner use this.
This adds a createFunctionInliningPass pass that takes an InlineParams object and use this to create the pre-inliner pass. This prevents the regular inliner's threshold flag from influencing the preinliner.

Differential revision: https://reviews.llvm.org/D23377

llvm-svn: 278377
2016-08-11 18:24:08 +00:00
Sebastian Pop
bfb96c5bfd GVN-hoist: enable by default
llvm-svn: 278010
2016-08-08 14:46:15 +00:00
Matthias Braun
9a0035d8d2 Revert "(refs/bisect/bad) GVN-hoist: enable by default"
GVN-Hoist appears to miscompile llvm-testsuite
SingleSource/Benchmarks/Misc/fbench.c at the moment.

I filed http://llvm.org/PR28880

This reverts commit r277786.

llvm-svn: 277909
2016-08-06 02:23:15 +00:00
Sebastian Pop
c33f0e25c9 GVN-hoist: enable by default
llvm-svn: 277786
2016-08-04 23:49:07 +00:00
Bruno Cardoso Lopes
bd887581fc Revert "GVN-hoist: enable by default" & "Make GVN Hoisting obey optnone/bisect."
This reverts commits r277685 & r277688. r277685 broke compiler-rt
compilation http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_build/23335
and r277685 is a followup from it.

llvm-svn: 277690
2016-08-04 04:16:24 +00:00
Sebastian Pop
70ffe6523f GVN-hoist: enable by default
As we addressed all compilation time problems with GVN-hoist
https://llvm.org/bugs/show_bug.cgi?id=28670
this patch turns GVN-hoist back by default.

Differential Revision: https://reviews.llvm.org/D23136

llvm-svn: 277685
2016-08-04 01:59:42 +00:00
David Majnemer
6e9b47bc8a Add EP_CGSCCOptimizerLate extension point to PassManagerBuilder
The EP_CGSCCOptimizerLate extension point allows adding CallGraphSCC
passes at the end of the main CallGraphSCC passes and before any
function simplification passes run by CGPassManager.

Patch by Gor Nishanov!

Differential Revision: https://reviews.llvm.org/D22897

llvm-svn: 276953
2016-07-28 03:28:43 +00:00
Xinliang David Li
9239245401 [Profile] Use explicit flag to enable IR PGO
Patch by Jake VanAdrighem

Differential Revision: http://reviews.llvm.org/D22607

llvm-svn: 276516
2016-07-23 04:28:52 +00:00
Alina Sbirlea
ba21ffebff Add flag to PassManagerBuilder to disable GVN Hoist Pass.
Summary:
Adding a flag to diable GVN Hoisting by default.
Note: The GVN Hoist Pass causes some Halide tests to hang. Halide will disable the pass while investigating.

Reviewers: llvm-commits, chandlerc, spop, dberlin

Subscribers: mehdi_amini

Differential Revision: https://reviews.llvm.org/D22639

llvm-svn: 276479
2016-07-22 22:02:19 +00:00
George Burgess IV
f84bb6fa67 Make help text more consistent. NFC.
llvm-svn: 276205
2016-07-20 23:14:29 +00:00
Teresa Johnson
cd21a646f6 [ThinLTO] Perform profile-guided indirect call promotion
Summary:
To enable profile-guided indirect call promotion in ThinLTO mode, we
simply add call graph edges for each profitable target from the profile
to the summaries, then the summary-guided importing will consider the
callee for importing as usual.

Also we need to enable the indirect call promotion pass creation in the
PassManagerBuilder when PerformThinLTO=true (we are in the ThinLTO
backend), so that the newly imported functions are considered for
promotion in the backends.

The IC promotion profiles refer to callees by GUID, which required
adding GUIDs to the per-module VST in bitcode (and assigning them
valueIds similar to how they are assigned valueIds in the combined
index).

Reviewers: mehdi_amini, xur

Subscribers: mehdi_amini, davidxl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21932

llvm-svn: 275707
2016-07-17 14:47:01 +00:00
Rong Xu
96a19d35ae [PGO] IRPGO pre-cleanup pass changes
This patch adds a selected set of cleanup passes including a pre-inline pass
before LLVM IR PGO instrumentation. The inline is only intended to apply those
obvious/trivial ones before instrumentation so that much less instrumentation
is needed to get better profiling information. This will drastically improve
the instrumented code performance for large C++ applications. Another benefit
is the context sensitive counts that can potentially improve the PGO
optimization.

Differential Revision: http://reviews.llvm.org/D21405

llvm-svn: 275588
2016-07-15 18:10:49 +00:00
Sebastian Pop
4177480aad code hoisting pass based on GVN
This pass hoists duplicated computations in the program. The primary goal of
gvn-hoist is to reduce the size of functions before inline heuristics to reduce
the total cost of function inlining.

Pass written by Sebastian Pop, Aditya Kumar, Xiaoyu Hu, and Brian Rzycki.
Important algorithmic contributions by Daniel Berlin under the form of reviews.

Differential Revision: http://reviews.llvm.org/D19338

llvm-svn: 275561
2016-07-15 13:45:20 +00:00
Nico Weber
755cd760cd Revert r275401, it caused PR28551.
llvm-svn: 275420
2016-07-14 14:41:25 +00:00
Sebastian Pop
63847d04e7 code hoisting pass based on GVN
This pass hoists duplicated computations in the program. The primary goal of
gvn-hoist is to reduce the size of functions before inline heuristics to reduce
the total cost of function inlining.

Pass written by Sebastian Pop, Aditya Kumar, Xiaoyu Hu, and Brian Rzycki.
Important algorithmic contributions by Daniel Berlin under the form of reviews.

Differential Revision: http://reviews.llvm.org/D19338

llvm-svn: 275401
2016-07-14 12:18:53 +00:00
George Burgess IV
bfa401e5ad [CFLAA] Split into Anders+Steens analysis.
StratifiedSets (as implemented) is very fast, but its accuracy is also
limited. If we take a more aggressive andersens-like approach, we can be
way more accurate, but we'll also end up being slower.

So, we've decided to split CFLAA into CFLSteensAA and CFLAndersAA.

Long-term, we want to end up in a place where CFLSteens is queried
first; if it can provide an answer, great (since queries are basically
map lookups). Otherwise, we'll fall back to CFLAnders, BasicAA, etc.

This patch splits everything out so we can try to do something like
that when we get a reasonable CFLAnders implementation.

Patch by Jia Chen.

Differential Revision: http://reviews.llvm.org/D21910

llvm-svn: 274589
2016-07-06 00:26:41 +00:00
Duncan P. N. Exon Smith
9d1f156418 Revert "code hoisting pass based on GVN"
This reverts commit r274305, since it breaks self-hosting:
  http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_build/22349/
  http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/17232

Note that the blamelist on lab.llvm.org:8011 is incorrect.  The previous
build was r274299, but somehow r274305 wasn't included in the blamelist:
  http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules

llvm-svn: 274320
2016-07-01 01:51:40 +00:00
Sebastian Pop
5c5798c57c code hoisting pass based on GVN
This pass hoists duplicated computations in the program. The primary goal of
gvn-hoist is to reduce the size of functions before inline heuristics to reduce
the total cost of function inlining.

Pass written by Sebastian Pop, Aditya Kumar, Xiaoyu Hu, and Brian Rzycki.
Important algorithmic contributions by Daniel Berlin under the form of reviews.

Differential Revision: http://reviews.llvm.org/D19338

llvm-svn: 274305
2016-07-01 00:24:31 +00:00
Peter Collingbourne
7efd750607 IR: New representation for CFI and virtual call optimization pass metadata.
The bitset metadata currently used in LLVM has a few problems:

1. It has the wrong name. The name "bitset" refers to an implementation
   detail of one use of the metadata (i.e. its original use case, CFI).
   This makes it harder to understand, as the name makes no sense in the
   context of virtual call optimization.

2. It is represented using a global named metadata node, rather than
   being directly associated with a global. This makes it harder to
   manipulate the metadata when rebuilding global variables, summarise it
   as part of ThinLTO and drop unused metadata when associated globals are
   dropped. For this reason, CFI does not currently work correctly when
   both CFI and vcall opt are enabled, as vcall opt needs to rebuild vtable
   globals, and fails to associate metadata with the rebuilt globals. As I
   understand it, the same problem could also affect ASan, which rebuilds
   globals with a red zone.

This patch solves both of those problems in the following way:

1. Rename the metadata to "type metadata". This new name reflects how
   the metadata is currently being used (i.e. to represent type information
   for CFI and vtable opt). The new name is reflected in the name for the
   associated intrinsic (llvm.type.test) and pass (LowerTypeTests).

2. Attach metadata directly to the globals that it pertains to, rather
   than using the "llvm.bitsets" global metadata node as we are doing now.
   This is done using the newly introduced capability to attach
   metadata to global variables (r271348 and r271358).

See also: http://lists.llvm.org/pipermail/llvm-dev/2016-June/100462.html

Differential Revision: http://reviews.llvm.org/D21053

llvm-svn: 273729
2016-06-24 21:21:32 +00:00
David Majnemer
cbf614a93b Remove the ScalarReplAggregates pass
Nearly all the changes to this pass have been done while maintaining and
updating other parts of LLVM.  LLVM has had another pass, SROA, which
has superseded ScalarReplAggregates for quite some time.

Differential Revision: http://reviews.llvm.org/D21316

llvm-svn: 272737
2016-06-15 00:19:09 +00:00
Manuel Jacob
a485984c0c [PM] Schedule InstSimplify after late LICM run, to clean up LCSSA nodes.
Summary:
The module pass pipeline includes a late LICM run after loop
unrolling.  LCSSA is implicitly run as a pass dependency of LICM.  However no
cleanup pass was run after this, so the LCSSA nodes ended in the optimized output.

Reviewers: hfinkel, mehdi_amini

Subscribers: majnemer, bruno, mzolotukhin, mehdi_amini, llvm-commits

Differential Revision: http://reviews.llvm.org/D20606

llvm-svn: 271602
2016-06-02 22:14:26 +00:00
Peter Collingbourne
fad596aa81 Move whole-program virtual call optimization pass after function attribute inference in LTO pipeline.
As a result of D18634 we no longer infer certain attributes on linkonce_odr
functions at compile time, and may only infer them at LTO time. The readnone
attribute in particular is required for virtual constant propagation (part
of whole-program virtual call optimization) to work correctly.

This change moves the whole-program virtual call optimization pass after
the function attribute inference passes, and enables the attribute inference
passes at opt level 1, so that virtual constant propagation has a chance to
work correctly for linkonce_odr functions.

Differential Revision: http://reviews.llvm.org/D20643

llvm-svn: 270765
2016-05-25 21:26:14 +00:00
Xinliang David Li
72616180df Rename pass name to prepare to new PM porting /NFC
llvm-svn: 269586
2016-05-15 01:04:24 +00:00
Xinliang David Li
d55827f7b2 [PM] code refactoring -- preparation for new PM porting /NFC
llvm-svn: 268851
2016-05-07 05:39:12 +00:00
Mehdi Amini
31407ba009 Tweak the ThinLTO pass pipeline
Summary:
The original ThinLTO pipeline was derived from some
work I did tuning FullLTO on the test suite and SPEC. This
patch reduces the amount of work done in the "linker phase" of
the build, and extend the function simplifications passes
performed during the "compile phase". This helps the build time
by reducing the IR as much as possible during the compile phase
and limiting the work to be performed during the "link phase",
while keeping the performance "on par" with the existing pipeline.

Reviewers: tejohnson

Subscribers: llvm-commits, joker.eph

Differential Revision: http://reviews.llvm.org/D19773

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268769
2016-05-06 18:17:03 +00:00
Xinliang David Li
8aebf44c97 [PM] port IR based PGO prof-gen pass to new pass manager
llvm-svn: 268710
2016-05-06 05:49:19 +00:00
Mehdi Amini
7f7d8be518 Move "Eliminate Available Externally" immediately after the inliner
This pass is supposed to reduce the size of the IR for compile time
purpose. We should run it ASAP, except when we prepare for LTO or
ThinLTO, and we want to keep them available for link-time inline.

Differential Revision: http://reviews.llvm.org/D19813

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268394
2016-05-03 15:46:00 +00:00
Mehdi Amini
45c7b3ecb5 Move createReversePostOrderFunctionAttrsPass right after the inliner is done
This is where it was originally, until LoopVersioningLICM was
inserted before in r259986, I don't believe it was on purpose.

Differential Revision: http://reviews.llvm.org/D19809

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268252
2016-05-02 16:53:16 +00:00
Rong Xu
6e34c490ff [PGO] Promote indirect calls to conditional direct calls with value-profile
This patch implements the transformation that promotes indirect calls to
conditional direct calls when the indirect-call value profile meta-data is
available.

Differential Revision: http://reviews.llvm.org/D17864

llvm-svn: 267815
2016-04-27 23:20:27 +00:00
Adam Nemet
d2fa414718 [LoopDist] Add llvm.loop.distribute.enable loop metadata
Summary:
D19403 adds a new pragma for loop distribution.  This change adds
support for the corresponding metadata that the pragma is translated to
by the FE.

As part of this I had to rethink the flag -enable-loop-distribute.  My
goal was to be backward compatible with the existing behavior:

  A1. pass is off by default from the optimization pipeline
  unless -enable-loop-distribute is specified

  A2. pass is on when invoked directly from opt (e.g. for unit-testing)

The new pragma/metadata overrides these defaults so the new behavior is:

  B1. A1 + enable distribution for individual loop with the pragma/metadata

  B2. A2 + disable distribution for individual loop with the pragma/metadata

The default value whether the pass is on or off comes from the initiator
of the pass.  From the PassManagerBuilder the default is off, from opt
it's on.

I moved -enable-loop-distribute under the pass.  If the flag is
specified it overrides the default from above.

Then the pragma/metadata can further modifies this per loop.

As a side-effect, we can now also use -enable-loop-distribute=0 from opt
to emulate the default from the optimization pipeline.  So to be precise
this is the new behavior:

  C1. pass is off by default from the optimization pipeline
  unless -enable-loop-distribute or the pragma/metadata enables it

  C2. pass is on when invoked directly from opt
  unless -enable-loop-distribute=0 or the pragma/metadata disables it

Reviewers: hfinkel

Subscribers: joker.eph, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D19431

llvm-svn: 267672
2016-04-27 05:28:18 +00:00
Mehdi Amini
bf4513b9aa Run GlobalOpt before emitting the bitcode for ThinLTO
This is motivated by reducing the size of the IR and thus reduce
compile time.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267385
2016-04-25 08:47:49 +00:00
Mehdi Amini
f72ca86b71 ThinLTO: Move createNameAnonFunctionPass insertion in PassManagerBuilder (NFC)
It is just code motion, but makes more sense this way.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267384
2016-04-25 08:47:37 +00:00
Xinliang David Li
e6b892940f Port InstrProfiling pass to the new pass manager
Differential Revision: http://reviews.llvm.org/D18126

llvm-svn: 266637
2016-04-18 17:47:38 +00:00
Justin Lebar
cf63b64fc6 [PM] Add a SpeculativeExecution pass for targets with divergent branches.
Summary:
This IR pass is helpful for GPUs, and other targets with divergent
branches.  It's a nop on targets without divergent branches.

Reviewers: chandlerc

Subscribers: llvm-commits, jingyue, rnk, joker.eph, tra

Differential Revision: http://reviews.llvm.org/D18626

llvm-svn: 266399
2016-04-15 00:32:12 +00:00
Mehdi Amini
d5faa267c4 Add a pass to name anonymous/nameless function
Summary:
For correct handling of alias to nameless
function, we need to be able to refer them through a GUID in the summary.
Here we name them using a hash of the non-private global names in the module.

Reviewers: tejohnson

Subscribers: joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D18883

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266132
2016-04-12 21:35:28 +00:00
Justin Lebar
2fe1323112 [PassManager] Make PassManagerBuilder::addExtension take an std::function, rather than a function pointer.
Summary:
This gives callers flexibility to pass lambdas with captures, which lets
callers avoid the C-style void*-ptr closure style.  (Currently, callers
in clang store state in the PassManagerBuilderBase arg.)

No functional change, and the new API is backwards-compatible.

Reviewers: chandlerc

Subscribers: joker.eph, cfe-commits

Differential Revision: http://reviews.llvm.org/D18613

llvm-svn: 264918
2016-03-30 20:39:29 +00:00
Adam Nemet
c979c6e123 Turn LoopLoadElimination on again
The latent bug that LLE exposed in the LoopVectorizer was resolved
(PR26952).

The pass can be disabled with -mllvm -enable-loop-load-elim=0

llvm-svn: 263595
2016-03-15 22:26:12 +00:00
Teresa Johnson
26ab5772b0 [ThinLTO] Renaming of function index to module summary index (NFC)
(Resubmitting after fixing missing file issue)

With the changes in r263275, there are now more than just functions in
the summary. Completed the renaming of data structures (started in
r263275) to reflect the wider scope. In particular, changed the
FunctionIndex* data structures to ModuleIndex*, and renamed related
variables and comments. Also renamed the files to reflect the changes.

A companion clang patch will immediately succeed this patch to reflect
this renaming.

llvm-svn: 263513
2016-03-15 00:04:37 +00:00
Teresa Johnson
cec0cae313 Revert "[ThinLTO] Renaming of function index to module summary index (NFC)"
This reverts commit r263490. Missed a file.

llvm-svn: 263493
2016-03-14 21:18:10 +00:00
Teresa Johnson
892920b358 [ThinLTO] Renaming of function index to module summary index (NFC)
With the changes in r263275, there are now more than just functions in
the summary. Completed the renaming of data structures (started in
r263275) to reflect the wider scope. In particular, changed the
FunctionIndex* data structures to ModuleIndex*, and renamed related
variables and comments. Also renamed the files to reflect the changes.

A companion clang patch will immediately succeed this patch to reflect
this renaming.

llvm-svn: 263490
2016-03-14 21:05:56 +00:00
Adam Nemet
bb45810e4f Revert "Turn LoopLoadElimination on again"
This reverts commit r263472.

There is an LNT failure on clang-ppc64be-linux-lnt.  Turn this off,
while I am investigating.

llvm-svn: 263485
2016-03-14 20:38:55 +00:00
Adam Nemet
5a19ae917b Turn LoopLoadElimination on again
The two issues that were discovered got fixed (r263058, r263173).

The pass can be disabled with -mllvm -enable-loop-load-elim=0

llvm-svn: 263472
2016-03-14 19:40:25 +00:00
Chandler Carruth
89c45a162f [PM] Port GVN to the new pass manager, wire it up, and teach a couple of
tests to run GVN in both modes.

This is mostly the boring refactoring just like SROA and other complex
transformation passes. There is some trickiness in that GVN's
ValueNumber class requires hand holding to get to compile cleanly. I'm
open to suggestions about a better pattern there, but I tried several
before settling on this. I was trying to balance my desire to sink as
much implementation detail into the source file as possible without
introducing overly many layers of abstraction.

Much like with SROA, the design of this system is made somewhat more
cumbersome by the need to support both pass managers without duplicating
the significant state and logic of the pass. The same compromise is
struck here.

I've also left a FIXME in a doxygen comment as the GVN pass seems to
have pretty woeful documentation within it. I'd like to submit this with
the FIXME and let those more deeply familiar backfill the information
here now that we have a nice place in an interface to put that kind of
documentaiton.

Differential Revision: http://reviews.llvm.org/D18019

llvm-svn: 263208
2016-03-11 08:50:55 +00:00
Matthias Braun
c31032d607 InstCombine: Restrict computeKnownBits() on all Values to OptLevel > 2
As part of r251146 InstCombine was extended to call computeKnownBits on
every value in the function to determine whether it happens to be
constant. This increases typical compiletime by 1-3% (5% in irgen+opt
time) in my measurements. On the other hand this case did not trigger
once in the whole llvm-testsuite.

This patch introduces the notion of ExpensiveCombines which are only
enabled for OptLevel > 2. I removed the check in InstructionSimplify as
that is called from various places where the OptLevel is not known but
given the rarity of the situation I think a check in InstCombine is
enough.

Differential Revision: http://reviews.llvm.org/D16835

llvm-svn: 263047
2016-03-09 18:47:11 +00:00
Adam Nemet
81113ef68c Revert "Enable LoopLoadElimination by default"
This reverts commit r262250.

It causes SPEC2006/gcc to generate wrong result (166.s) in AArch64 when
running with *ref* data set.  The error happens with
"-Ofast -flto -fuse-ld=gold" or "-O3 -fno-strict-aliasing".

llvm-svn: 262839
2016-03-07 17:38:02 +00:00
Adam Nemet
dd9e637aca Enable LoopLoadElimination by default
Summary:
I re-benchmarked this and results are similar to original results in
D13259:

On ARM64:
  SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog -59.27%
  SingleSource/Benchmarks/Polybench/stencils/adi                   -19.78%

On x86:
  SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog  -27.14%

And of course the original ~20% gain on SPECint_2006/456.hmmer with Loop
Distribution.

In terms of compile time, there is ~5% increase on both
SingleSource/Benchmarks/Misc/oourafft and
SingleSource/Benchmarks/Linkpack/linkpack-pc.  These are both very tiny
loop-intensive programs where SCEV computations dominates compile time.

The reason that time spent in SCEV increases has to do with the design
of the old pass manager.  If a transform pass does not preserve an
analysis we *invalidate* the analysis even if there was *no*
modification made by the transform pass.

This means that currently we don't take advantage of LLE and LV sharing
the same analysis (LAA) and unfortunately we recompute LAA *and* SCEV
for LLE.

(There should be a way to work around this limitation in the case of
SCEV and LAA since both compute things on demand and internally cache
their result.  Thus we could pretend that transform passes preserve
these analyses and manually invalidate them upon actual modification.
On the other hand the new pass manager is supposed to solve so I am not
sure if this is worthwhile.)

Reviewers: hfinkel, dberlin

Subscribers: dberlin, reames, mssimpso, aemerson, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D16300

llvm-svn: 262250
2016-02-29 20:35:11 +00:00
Chandler Carruth
9c4ed175c2 [PM] Port the PostOrderFunctionAttrs pass to the new pass manager and
convert one test to use this.

This is a particularly significant milestone because it required
a working per-function AA framework which can be queried over each
function from within a CGSCC transform pass (and additionally a module
analysis to be accessible). This is essentially *the* point of the
entire pass manager rewrite. A CGSCC transform is able to query for
multiple different function's analysis results. It works. The whole
thing appears to actually work and accomplish the original goal. While
we were able to hack function attrs and basic-aa to "work" in the old
pass manager, this port doesn't use any of that, it directly leverages
the new fundamental functionality.

For this to work, the CGSCC framework also has to support SCC-based
behavior analysis, etc. The only part of the CGSCC pass infrastructure
not sorted out at this point are the updates in the face of inlining and
running function passes that mutate the call graph.

The changes are pretty boring and boiler-plate. Most of the work was
factored into more focused preperatory patches. But this is what wires
it all together.

llvm-svn: 261203
2016-02-18 11:03:11 +00:00
Mehdi Amini
1db10ac6ce Define the ThinLTO Pipeline (experimental)
Summary:
On the contrary to Full LTO, ThinLTO can afford to shift compile time
from the frontend to the linker: both phases are parallel (even if
it is not totally "free": projects like clang are reusing product
from the "compile phase" for multiple link, think about
libLLVMSupport reused for opt, llc, etc.).

This pipeline is based on the proposal in D13443 for full LTO. We
didn't move forward on this proposal because the LTO link was far too
long after that. We believe that we can afford it with ThinLTO.

The ThinLTO pipeline integrates in the regular O2/O3 flow:

 - The compile phase perform the inliner with a somehow lighter
   function simplification. (TODO: tune the inliner thresholds here)
   This is intendend to simplify the IR and get rid of obvious things
   like linkonce_odr that will be inlined.
 - The link phase will run the pipeline from the start, extended with
   some specific passes that leverage the augmented knowledge we have
   during LTO. Especially after the inliner is done, a sequence of
   globalDCE/globalOpt is performed, followed by another run of the
   "function simplification" passes. It is not clear if this part
   of the pipeline will stay as is, as the split model of ThinLTO
   does not allow the same benefit as FullLTO without added tricks.

The measurements on the public test suite as well as on our internal
suite show an overall net improvement. The binary size for the clang
executable is reduced by 5%. We're still tuning it with the bringup
of ThinLTO and it will evolve, but this should provide a good starting
point.

Reviewers: tejohnson

Differential Revision: http://reviews.llvm.org/D17115

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261029
2016-02-16 23:02:29 +00:00
Mehdi Amini
ec8bee1879 Refactor the PassManagerBuilder: extract a "addFunctionSimplificationPasses()" (NFC)
It is intended to contains the passes run over a function after the
inliner is done with a function and before it moves to its callers.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261028
2016-02-16 22:54:27 +00:00
Mehdi Amini
9c1c3ac627 Revert "Refactor the PassManagerBuilder: extract a "addFunctionSimplificationPasses()""
This reverts commit r260603.
I didn't intend to push it :(

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 260607
2016-02-11 22:09:11 +00:00
Mehdi Amini
c5bf5ecc1b Revert "Define the ThinLTO Pipeline"
This reverts commit r260604.
I didn't intend to push this now.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 260606
2016-02-11 22:09:07 +00:00
Mehdi Amini
484470d605 Define the ThinLTO Pipeline
Summary:
On the contrary to Full LTO, ThinLTO can afford to shift compile time
from the frontend to the linker: both phases are parallel.
This pipeline is based on the proposal in D13443 for full LTO. We ]
didn't move forward on this proposal because the link was far too long
after that.

This patch refactor the "function simplification" passes that are part
of the inliner loop in a helper function (this part is NFC and can be
commited separately to simplify the diff). The ThinLTO pipeline
integrates in the regular O2/O3 flow:

 - The compile phase perform the inliner with a somehow lighter
   function simplification. (TODO: tune the inliner thresholds here)
   This is intendend to simplify the IR and get rid of obvious things
   like linkonce_odr that will be inlined.
 - The link phase will run the pipeline from the start, extended with
   some specific passes that leverage the augmented knowledge we have
   during LTO. Especially after the inliner is done, a sequence of
   globalDCE/globalOpt is performed, followed by another run of the
   "function simplification" passes.

The measurements on the public test suite as well as on our internal
suite show an overall net improvement. The binary size for the clang
executable is reduced by 5%. We're still tuning it with the bringup
of ThinLTO but this should provide a good starting point.

Reviewers: tejohnson

Subscribers: joker.eph, llvm-commits, dexonsmith

Differential Revision: http://reviews.llvm.org/D17115

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 260604
2016-02-11 22:00:31 +00:00
Mehdi Amini
f9a3718c5a Refactor the PassManagerBuilder: extract a "addFunctionSimplificationPasses()"
It is intended to contains the passes run over a function after the
inliner is done with a function and before it moves to its callers.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 260603
2016-02-11 22:00:25 +00:00
Ashutosh Nema
2260a3a046 Fixed typo in comment & coding style for LoopVersioningLICM.
llvm-svn: 260504
2016-02-11 09:23:53 +00:00
Peter Collingbourne
df49d1bbb2 WholeProgramDevirt: introduce.
This pass implements whole program optimization of virtual calls in cases
where we know (via bitset information) that the list of callees is fixed. This
includes the following:

- Single implementation devirtualization: if a virtual call has a single
  possible callee, replace all calls with a direct call to that callee.

- Virtual constant propagation: if the virtual function's return type is an
  integer <=64 bits and all possible callees are readnone, for each class and
  each list of constant arguments: evaluate the function, store the return
  value alongside the virtual table, and rewrite each virtual call as a load
  from the virtual table.

- Uniform return value optimization: if the conditions for virtual constant
  propagation hold and each function returns the same constant value, replace
  each virtual call with that constant.

- Unique return value optimization for i1 return values: if the conditions
  for virtual constant propagation hold and a single vtable's function
  returns 0, or a single vtable's function returns 1, replace each virtual
  call with a comparison of the vptr against that vtable's address.

Differential Revision: http://reviews.llvm.org/D16795

llvm-svn: 260312
2016-02-09 22:50:34 +00:00
Ashutosh Nema
df6763abe8 New Loop Versioning LICM Pass
Summary:
When alias analysis is uncertain about the aliasing between any two accesses,
it will return MayAlias. This uncertainty from alias analysis restricts LICM
from proceeding further. In cases where alias analysis is uncertain we might
use loop versioning as an alternative.

Loop Versioning will create a version of the loop with aggressive aliasing
assumptions in addition to the original with conservative (default) aliasing
assumptions. The version of the loop making aggressive aliasing assumptions
will have all the memory accesses marked as no-alias. These two versions of
loop will be preceded by a memory runtime check. This runtime check consists
of bound checks for all unique memory accessed in loop, and it ensures the
lack of memory aliasing. The result of the runtime check determines which of
the loop versions is executed: If the runtime check detects any memory
aliasing, then the original loop is executed. Otherwise, the version with
aggressive aliasing assumptions is used.

The pass is off by default and can be enabled with command line option 
-enable-loop-versioning-licm.

Reviewers: hfinkel, anemet, chatur01, reames

Subscribers: MatzeB, grosser, joker.eph, sanjoy, javed.absar, sbaranga,
             llvm-commits

Differential Revision: http://reviews.llvm.org/D9151

llvm-svn: 259986
2016-02-06 07:47:48 +00:00
Rong Xu
34abbfb78e [PGO] Passmanagerbuilder change that enable IR level PGO instrumentation
This patch includes the passmanagerbuilder change that enables IR level PGO instrumentation. It adds two passmanagerbuilder options: -profile-generate=<profile_filename> and -profile-use=<profile_filename>. The new options are primarily for debug purpose.

Reviewers: davidxl, silvas

Differential Revision: http://reviews.llvm.org/D15828

llvm-svn: 258420
2016-01-21 18:28:59 +00:00
James Molloy
31f3ddd589 [LTO] Add a run of LoopUnroll
Loop trip counts can often be resolved during LTO. We should obviously be unrolling small loops once those trip counts have been resolved, but we weren't.

llvm-svn: 257767
2016-01-14 15:00:09 +00:00
Chandler Carruth
1926b70e37 [attrs] Split the late-revisit pattern for deducing norecurse in
a top-down manner into a true top-down or RPO pass over the call graph.

There are specific patterns of function attributes, notably the
norecurse attribute, which are most effectively propagated top-down
because all they us caller information.

Walk in RPO over the call graph SCCs takes the form of a module pass run
immediately after the CGSCC pass managers postorder walk of the SCCs,
trying again to deduce norerucrse for each singular SCC in the call
graph.

This removes a very legacy pass manager specific trick of using a lazy
revisit list traversed during finalization of the CGSCC pass. There is
no analogous finalization step in the new pass manager, and a lazy
revisit list is just trying to produce an RPO iteration of the call
graph. We can do that more directly if more expensively. It seems
unlikely that this will be the expensive part of any compilation though
as we never examine the function bodies here. Even in an LTO run over
a very large module, this should be a reasonable fast set of operations
over a reasonably small working set -- the function call graph itself.

In the future, if this really is a compile time performance issue, we
can look at building support for both post order and RPO traversals
directly into a pass manager that builds and maintains the PO list of
SCCs.

Differential Revision: http://reviews.llvm.org/D15785

llvm-svn: 257163
2016-01-08 10:55:52 +00:00
Chandler Carruth
3a040e6d47 [attrs] Extract the pure inference of function attributes into
a standalone pass.

There is no call graph or even interesting analysis for this part of
function attributes -- it is literally inferring attributes based on the
target library identification. As such, we can do it using a much
simpler module pass that just walks the declarations. This can also
happen much earlier in the pass pipeline which has benefits for any
number of other passes.

In the process, I've cleaned up one particular aspect of the logic which
was necessary in order to separate the two passes cleanly. It now counts
inferred attributes independently rather than just counting all the
inferred attributes as one, and the counts are more clearly explained.

The two test cases we had for this code path are both ... woefully
inadequate and copies of each other. I've kept the superset test and
updated it. We need more testing here, but I had to pick somewhere to
stop fixing everything broken I saw here.

Differential Revision: http://reviews.llvm.org/D15676

llvm-svn: 256466
2015-12-27 08:41:34 +00:00
Chandler Carruth
f49f1a87ef [attrs] Split off the forced attributes utility into its own pass that
is (by default) run much earlier than FuncitonAttrs proper.

This allows forcing optnone or other widely impactful attributes. It is
also a bit simpler as the force attribute behavior needs no specific
iteration order.

I've added the pass into the default module pass pipeline and LTO pass
pipeline which mirrors where function attrs itself was being run.

Differential Revision: http://reviews.llvm.org/D15668

llvm-svn: 256465
2015-12-27 08:13:45 +00:00
Evgeniy Stepanov
67849d56c3 Cross-DSO control flow integrity (LLVM part).
An LTO pass that generates a __cfi_check() function that validates a
call based on a hash of the call-site-known type and the target
pointer.

llvm-svn: 255693
2015-12-15 23:00:08 +00:00
James Molloy
6045cc89bd [PassManagerBuilder] Add a few more scalar optimization passes
This patch does two things:
  1. mem2reg is now run immediately after globalopt. Now that globalopt
     can localize variables more aggressively, it makes sense to lower
     them to SSA form earlier rather than later so they can benefit from
     the full set of optimization passes.

  2. More scalar optimizations are run after the loop optimizations in
     LTO mode. The loop optimizations (especially indvars) can clean up
     scalar code sufficiently to make it worthwhile running more scalar
     passes. I've particularly added SCCP here as it isn't run anywhere
     else in the LTO pass pipeline.

Mem2reg is super cheap and shouldn't affect compilation time at all. The
rest of the added passes are in the LTO pipeline only so doesn't affect
the vast majority of compilations, just the link step.

llvm-svn: 255634
2015-12-15 09:24:01 +00:00
Teresa Johnson
5fcbdb717c [ThinLTO] Support for specifying function index from pass manager
Summary:
Add a field on the PassManagerBuilder that clang or gold can use to pass
down a pointer to the function index in memory to use for importing when
the ThinLTO backend is triggered. Add support to supply this to the
function import pass.

Reviewers: joker.eph, dexonsmith

Subscribers: davidxl, llvm-commits, joker.eph

Differential Revision: http://reviews.llvm.org/D15024

llvm-svn: 254926
2015-12-07 19:21:11 +00:00
James Molloy
9ad4f22538 [LTO] Add an early run of functionattrs
Because we internalize early, we can potentially mark a bunch of functions as norecurse. Do this before globalopt.

llvm-svn: 253451
2015-11-18 11:24:42 +00:00
Adam Nemet
e54a4fa95d LLE 6/6: Add LoopLoadElimination pass
Summary:
The goal of this pass is to perform store-to-load forwarding across the
backedge of a loop.  E.g.:

  for (i)
     A[i + 1] = A[i] + B[i]

  =>

  T = A[0]
  for (i)
     T = T + B[i]
     A[i + 1] = T

The pass relies on loop dependence analysis via LoopAccessAnalisys to
find opportunities of loop-carried dependences with a distance of one
between a store and a load.  Since it's using LoopAccessAnalysis, it was
easy to also add support for versioning away may-aliasing intervening
stores that would otherwise prevent this transformation.

This optimization is also performed by Load-PRE in GVN without the
option of multi-versioning.  As was discussed with Daniel Berlin in
http://reviews.llvm.org/D9548, this is inferior to a more loop-aware
solution applied here.  Hopefully, we will be able to remove some
complexity from GVN/MemorySSA as a consequence.

In the long run, we may want to extend this pass (or create a new one if
there is little overlap) to also eliminate loop-indepedent redundant
loads and store that *require* versioning due to may-aliasing
intervening stores/loads.  I have some motivating cases for store
elimination. My plan right now is to wait for MemorySSA to come online
first rather than using memdep for this.

The main motiviation for this pass is the 456.hmmer loop in SPECint2006
where after distributing the original loop and vectorizing the top part,
we are left with the critical path exposed in the bottom loop.  Being
able to promote the memory dependence into a register depedence (even
though the HW does perform store-to-load fowarding as well) results in a
major gain (~20%).  This gain also transfers over to x86: it's
around 8-10%.

Right now the pass is off by default and can be enabled
with -enable-loop-load-elim.  On the LNT testsuite, there are two
performance changes (negative number -> improvement):

  1. -28% in Polybench/linear-algebra/solvers/dynprog: the length of the
     critical paths is reduced
  2. +2% in Polybench/stencils/adi: Unfortunately, I couldn't reproduce this
     outside of LNT

The pass is scheduled after the loop vectorizer (which is after loop
distribution).  The rational is to try to reuse LAA state, rather than
recomputing it.  The order between LV and LLE is not critical because
normally LV does not touch scalar st->ld forwarding cases where
vectorizing would inhibit the CPU's st->ld forwarding to kick in.

LoopLoadElimination requires LAA to provide the full set of dependences
(including forward dependences).  LAA is known to omit loop-independent
dependences in certain situations.  The big comment before
removeDependencesFromMultipleStores explains why this should not occur
for the cases that we're interested in.

Reviewers: dberlin, hfinkel

Subscribers: junbuml, dberlin, mssimpso, rengolin, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D13259

llvm-svn: 252017
2015-11-03 23:50:08 +00:00
James Molloy
4015925606 [GlobalsAA] Turn GlobalsAA on again by default
Now that all the known faults with GlobalsAA have been fixed, flip the big switch on -enable-non-lto-gmr again.

Feel free to pester me with any more bugs found, and don't hesitate to flip the switch back off.

llvm-svn: 250157
2015-10-13 10:43:57 +00:00
Michael Zolotukhin
74621cced7 Add CFG Simplification pass after Loop Unswitching.
Loop unswitching produces conditional branches with constant condition,
and it's beneficial for later passes to clean this up with simplify-cfg.
We do this after the second invocation of loop-unswitch, but not after
the first one. Not doing so might cause problem for passes like
LoopUnroll, whose estimate of loop body size would be less accurate.

Reviewers: hfinkel

Differential Revision: http://reviews.llvm.org/D13064

llvm-svn: 248460
2015-09-24 03:50:17 +00:00
James Molloy
d5b161a221 [GlobalsAA] Disable globals-aa by default
Several issues have been found with it - disabling in the meantime.

llvm-svn: 247674
2015-09-15 10:44:06 +00:00
James Molloy
d47634d781 Enable GlobalsAA by default
This can give significant improvements to alias analysis in some situations, and improves its testing coverage in all situations.

llvm-svn: 247264
2015-09-10 10:22:20 +00:00
Chandler Carruth
7b560d40bd [PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible
with the new pass manager, and no longer relying on analysis groups.

This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:

- FunctionAAResults is a type-erasing alias analysis results aggregation
  interface to walk a single query across a range of results from
  different alias analyses. Currently this is function-specific as we
  always assume that aliasing queries are *within* a function.

- AAResultBase is a CRTP utility providing stub implementations of
  various parts of the alias analysis result concept, notably in several
  cases in terms of other more general parts of the interface. This can
  be used to implement only a narrow part of the interface rather than
  the entire interface. This isn't really ideal, this logic should be
  hoisted into FunctionAAResults as currently it will cause
  a significant amount of redundant work, but it faithfully models the
  behavior of the prior infrastructure.

- All the alias analysis passes are ported to be wrapper passes for the
  legacy PM and new-style analysis passes for the new PM with a shared
  result object. In some cases (most notably CFL), this is an extremely
  naive approach that we should revisit when we can specialize for the
  new pass manager.

- BasicAA has been restructured to reflect that it is much more
  fundamentally a function analysis because it uses dominator trees and
  loop info that need to be constructed for each function.

All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.

The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.

This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.

Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.

One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.

Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.

Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.

Differential Revision: http://reviews.llvm.org/D12080

llvm-svn: 247167
2015-09-09 17:55:00 +00:00
Yaron Keren
611c7cff53 Move createEliminateAvailableExternallyPass earlier in the pass pipeline
to save running many ModulePasses on available external functions that
are thrown away anyhow.

llvm-svn: 246619
2015-09-02 06:34:11 +00:00
Mehdi Amini
d134a67ce9 Require Dominator Tree For SROA, improve compile-time
TL-DR: SROA is followed by EarlyCSE which requires the DominatorTree.
There is no reason not to require it up-front for SROA.

Some history is necessary to understand why we ended-up here.

r123437 switched the second (Legacy)SROA in the optimizer pipeline to
use SSAUpdater in order to avoid recomputing the costly
DominanceFrontier. The purpose was to speed-up the compile-time.

Later r123609 removed the need for the DominanceFrontier in
(Legacy)SROA.

Right after, some cleanup was made in r123724 to remove any reference
to the DominanceFrontier. SROA existed in two flavors: SROA_SSAUp and
SROA_DT (the latter replacing SROA_DF).
The second argument of `createScalarReplAggregatesPass` was renamed
from `UseDomFrontier` to `UseDomTree`.
I believe this is were a mistake was made. The pipeline was not
updated and the call site was still:
    PM->add(createScalarReplAggregatesPass(-1, false));

At that time, SROA was immediately followed in the pipeline by
EarlyCSE which required alread the DominatorTree. Not requiring
the DominatorTree in SROA didn't save anything, but unfortunately
it was lost at this point.

When the new SROA Pass was introduced in r163965, I believe the goal
was to have an exact replacement of the existing SROA, this bug
slipped through.

You can see currently:

$ echo "" | clang -x c++  -O3 -c - -mllvm -debug-pass=Structure
...
...
      FunctionPass Manager
        SROA
        Dominator Tree Construction
        Early CSE

After this patch:

$ echo "" | clang -x c++  -O3 -c - -mllvm -debug-pass=Structure
...
...
      FunctionPass Manager
        Dominator Tree Construction
        SROA
        Early CSE

This improves the compile time from 88s to 23s for PR17855.
https://llvm.org/bugs/show_bug.cgi?id=17855

And from 113s to 12s for PR16756
https://llvm.org/bugs/show_bug.cgi?id=16756

Reviewers: chandlerc

Differential Revision: http://reviews.llvm.org/D12267

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 245820
2015-08-23 22:15:49 +00:00
Chandler Carruth
21dcff799a [PM/AA] Extract the interface for GlobalsModRef into a header along with
its creation function.

This required shifting a bunch of method definitions to be out-of-line
so that we could leave most of the implementation guts in the .cpp file.

llvm-svn: 245021
2015-08-14 03:48:20 +00:00
Chandler Carruth
1db22822b4 [PM/AA] Hoist the interface to TBAA into a dedicated header along with
its creation function. Update the relevant includes accordingly.

llvm-svn: 245019
2015-08-14 03:33:48 +00:00
Chandler Carruth
42ff448fe4 [PM/AA] Hoist ScopedNoAliasAA's interface into a header and move the
creation function there.

Same basic refactoring as the other alias analyses. Nothing special
required this time around.

llvm-svn: 245012
2015-08-14 02:55:50 +00:00
Chandler Carruth
8b046a42f4 [PM/AA] Extract a minimal interface for CFLAA to its own header file.
I've used forward declarations and reorderd the source code some to make
this reasonably clean and keep as much of the code as possible in the
source file, including all the stratified set details. Just the basic AA
interface and the create function are in the header file, and the header
file is now included into the relevant locations.

llvm-svn: 245009
2015-08-14 02:42:20 +00:00
Teresa Johnson
c4279a7fb2 Enable EliminateAvailableExternally pass in the LTO pipeline.
Summary:
For LTO we need to enable this pass in the LTO pipeline,
as it is skipped during the "-flto -c" compile step (when PrepareForLTO is
set).

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11919

llvm-svn: 244622
2015-08-11 16:26:41 +00:00
Chandler Carruth
17e0bc37fd [PM/AA] Hoist the interface for BasicAA into a header file.
This is the first mechanical step in preparation for making this and all
the other alias analysis passes available to the new pass manager. I'm
factoring out all the totally boring changes I can so I'm moving code
around here with no other changes. I've even minimized the formatting
churn.

I'll reformat and freshen comments on the interface now that its located
in the right place so that the substantive changes don't triger this.

llvm-svn: 244197
2015-08-06 07:33:15 +00:00
Chandler Carruth
08eebe2074 [GMR] Add a late run of GlobalsModRef to the main pass pipeline behind
the general GMR-in-non-LTO flag.

Without this, we have the global information during the CGSCC pipeline
for GVN and such, but don't have it available during the late loop
optimizations such as the vectorizer. Moreover, after the CGSCC pipeline
has finished we have substantially more accurate and refined call graph
information, function annotations, etc, which will make GMR even more
powerful than it is early in the pipelien.

Note that we have to play silly games with preserving AliasAnalysis
(which is now trivially preserved) in order to let a module analysis
magically be preserved into the entire function pass pipeline.
Simultaneously we have to not make GMR an immutable pass in order to be
able to re-run it and collect fresh data on the final call graph.

llvm-svn: 242999
2015-07-23 09:34:01 +00:00
Chandler Carruth
e9ea5a66f2 [GMR] Add a flag to enable GlobalsModRef in the normal compilation
pipeline.

Even before I started improving its runtime, it was already crazy fast
once the call graph exists, and if we can get it to be conservatively
correct, will still likely catch a lot of interesting and useful cases.
So it may well be useful to enable by default.

But more importantly for me, this should make it easier for me to test
that changes aren't breaking it in fundamental ways by enabling it for
normal builds.

llvm-svn: 242895
2015-07-22 11:57:28 +00:00
Tobias Grosser
39a7bd182e Add PM extension point EP_VectorizerStart
This extension point allows passes to be executed right before the vectorizer
and other highly target specific optimizations are run.

llvm-svn: 242389
2015-07-16 08:20:37 +00:00
Alexey Bataev
da33d80e9a Disable loop re-rotation for -Oz (patch by Andrey Turetsky)
After changes in rL231820 loop re-rotation is performed even in -Oz mode. Since loop rotation is disabled for -Oz, it seems loop re-rotation should be disabled too.
Differential Revision: http://reviews.llvm.org/D10961

llvm-svn: 241897
2015-07-10 10:37:09 +00:00
Teresa Johnson
d3a33a16bb Resubmit "Add new EliminateAvailableExternally module pass" (r239480)
This change includes a fix for https://code.google.com/p/chromium/issues/detail?id=499508#c3,
which required updating the visibility for symbols with eliminated definitions.

--Original Commit Message--

Add new EliminateAvailableExternally module pass, which is performed in
O2 compiles just before GlobalDCE, unless we are preparing for LTO.

This pass eliminates available externally globals (turning them into
declarations), regardless of whether they are dead/unreferenced, since
we are guaranteed to have a copy available elsewhere at link time.
This enables additional opportunities for GlobalDCE.

If we are preparing for LTO (e.g. a -flto -c compile), the pass is not
included as we want to preserve available externally functions for possible
link time inlining. The FE indicates whether we are doing an -flto compile
via the new PrepareForLTO flag on the PassManagerBuilder.

llvm-svn: 241466
2015-07-06 16:22:42 +00:00
Teresa Johnson
43a65d9529 Revert commit r239480 as it causes https://code.google.com/p/chromium/issues/detail?id=499508#c3.
llvm-svn: 239589
2015-06-12 03:12:00 +00:00
Teresa Johnson
232fa9af3b Add new EliminateAvailableExternally module pass, which is performed in
O2 compiles just before GlobalDCE, unless we are preparing for LTO.

This pass eliminates available externally globals (turning them into
declarations), regardless of whether they are dead/unreferenced, since
we are guaranteed to have a copy available elsewhere at link time.
This enables additional opportunities for GlobalDCE.

If we are preparing for LTO (e.g. a -flto -c compile), the pass is not
included as we want to preserve available externally functions for possible
link time inlining. The FE indicates whether we are doing an -flto compile
via the new PrepareForLTO flag on the PassManagerBuilder.

llvm-svn: 239480
2015-06-10 17:49:28 +00:00
Akira Hatanaka
d9699bc7bd Remove DisableTailCalls from TargetOptions and the code in resetTargetOptions
that was resetting it.

Remove the uses of DisableTailCalls in subclasses of TargetLowering and use
the value of function attribute "disable-tail-calls" instead. Also,
unconditionally add pass TailCallElim to the pipeline and check the function
attribute at the start of runOnFunction to disable the pass on a per-function
basis. 
 
This is part of the work to remove TargetMachine::resetTargetOptions, and since
DisableTailCalls was the last non-fast-math option that was being reset in that
function, we should be able to remove the function entirely after the work to
propagate IR-level fast-math flags to DAG nodes is completed.

Out-of-tree users should remove the uses of DisableTailCalls and make changes
to attach attribute "disable-tail-calls"="true" or "false" to the functions in
the IR.

rdar://problem/13752163

Differential Revision: http://reviews.llvm.org/D10099

llvm-svn: 239427
2015-06-09 19:07:19 +00:00
Wei Mi
6a671635e6 Remove the InstructionSimplifierPass immediately after InstructionCombiningPass.
InstructionCombiningPass was added after LoopUnrollPass in r237395. Because
InstructionCombiningPass is strictly more powerful than InstructionSimplifierPass,
remove the unnecessary InstructionSimplifierPass.

Differential Revision: http://reviews.llvm.org/D9838

llvm-svn: 237702
2015-05-19 16:09:11 +00:00
Jingyue Wu
25e2500ac8 [NFC] remove an extra new line
llvm-svn: 237462
2015-05-15 18:32:21 +00:00
Jingyue Wu
154eb5aa1d Add a speculative execution pass
Summary:
This is a pass for speculative execution of instructions for simple if-then (triangle) control flow. It's aimed at GPUs, but could perhaps be used in other contexts. Enabling this pass gives us a 1.0% geomean improvement on Google benchmark suites, with one benchmark improving 33%.

Credit goes to Jingyue Wu for writing an earlier version of this pass.

Patched by Bjarke Roune. 

Test Plan:
This patch adds a set of tests in test/Transforms/SpeculativeExecution/spec.ll
The pass is controlled by a flag which defaults to having the pass not run.

Reviewers: eliben, dberlin, meheff, jingyue, hfinkel

Reviewed By: jingyue, hfinkel

Subscribers: majnemer, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D9360

llvm-svn: 237459
2015-05-15 17:54:48 +00:00
Wei Mi
bf727ba371 Add another InstCombine pass after LoopUnroll.
This is to cleanup some redundency generated by LoopUnroll pass. Such redundency may not be cleaned up by existing passes after LoopUnroll.

Differential Revision: http://reviews.llvm.org/D9777

llvm-svn: 237395
2015-05-14 22:02:54 +00:00
Adam Nemet
938d3d63d6 New Loop Distribution pass
Summary:
This implements the initial version as was proposed earlier this year
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-January/080462.html).
Since then Loop Access Analysis was split out from the Loop Vectorizer
and was made into a separate analysis pass.  Loop Distribution becomes
the second user of this analysis.

The pass is off by default and can be enabled
with -enable-loop-distribution.  There is currently no notion of
profitability; if there is a loop with dependence cycles, the pass will
try to split them off from other memory operations into a separate loop.

I decided to remove the control-dependence calculation from this first
version.  This and the issues with the PDT are actively discussed so it
probably makes sense to treat it separately.  Right now I just mark all
terminator instruction required which keeps identical CFGs for each
distributed loop.  This seems to be working pretty well for 456.hmmer
where even though there is an empty if-then block in the distributed
loop initially, it gets completely removed.

The pass keeps DominatorTree and LoopInfo updated.  I've tested this
with -loop-distribute-verify with the testsuite where we distribute ~90
loops.  SimplifyLoop is violated in some cases and I have a FIXME
covering this.

Reviewers: hfinkel, nadav, aschwaighofer

Reviewed By: aschwaighofer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8831

llvm-svn: 237358
2015-05-14 12:05:18 +00:00
Karthik Bhat
8210fdf26e Add support to interchange loops with reductions.
This patch enables interchanging of tightly nested loops with reductions.
Differential Revision: http://reviews.llvm.org/D8314

llvm-svn: 235571
2015-04-23 04:51:44 +00:00
James Molloy
0cbb2a8603 Reapply r233175 and r233183: float2int.
This re-adds float2int to the tree, after fixing PR23038. It turns
out the argument to APSInt() is true-if-unsigned, rather than
true-if-signed :(. Added testcase and explanatory comment.

llvm-svn: 233370
2015-03-27 10:36:57 +00:00
Nick Lewycky
ffb0864b44 Revert r233175 and r233183 with it. This pulls float2int back out of the tree, due to PR23038.
llvm-svn: 233350
2015-03-27 02:00:11 +00:00
James Molloy
cb75d92458 Reapply r233062: "float2int": Add a new pass to demote from float to int where possible.
Now with a fix for PR23008 and extra regression test.

llvm-svn: 233175
2015-03-25 10:03:42 +00:00
Hans Wennborg
e42c64551a Revert r233062 ""float2int": Add a new pass to demote from float to int where possible."
This caused PR23008, compiles failing with: "Use still stuck around after Def is
destroyed: %.sroa.speculated"

Also reverting follow-up r233064.

llvm-svn: 233105
2015-03-24 20:07:08 +00:00
James Molloy
408df5160c "float2int": Add a new pass to demote from float to int where possible.
It is possible to have code that converts from integer to float, performs operations then converts back, and the result is provably the same as if integers were used.

This can come from different sources, but the most obvious is a helper function that uses floats but the arguments given at an inlined callsites are integers.

This pass considers all integers requiring a bitwidth less than or equal to the bitwidth of the mantissa of a floating point type (23 for floats, 52 for doubles) as exactly representable in floating point.

To reduce the risk of harming efficient code, the pass only attempts to perform complete removal of inttofp/fptoint operations, not just move them around.

llvm-svn: 233062
2015-03-24 11:15:23 +00:00
Duncan P. N. Exon Smith
ab58a568ee Verifier: Remove the separate -verify-di pass
Remove `DebugInfoVerifierLegacyPass` and the `-verify-di` pass.
Instead, call into the `DebugInfoVerifier` from inside
`VerifierLegacyPass::finalizeModule()`.  This better matches the logic
in `verifyModule()` (used by the new PassManager), avoids requiring two
separate passes to verify the IR, and makes the API for "add a pass to
verify the IR" simple.

Note: the `-verify-debug-info` flag still works (for now, at least;
eventually it might make sense to just remove it).

llvm-svn: 232772
2015-03-19 22:24:17 +00:00
Peter Collingbourne
070843d60b libLTO, llvm-lto, gold: Introduce flag for controlling optimization level.
This change also introduces a link-time optimization level of 1. This
optimization level runs only the globaldce pass as well as cleanup passes for
passes that run at -O0, specifically simplifycfg which cleans up lowerbitsets.

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150316/266951.html

llvm-svn: 232769
2015-03-19 22:01:00 +00:00
Duncan P. N. Exon Smith
0a93e2db9c PassManagerBuilder: Remove effectively dead 'StripDebug' option
`StripDebug` was only used by tools/opt/opt.cpp in
`AddStandardLinkPasses()`, but opt.cpp adds the same pass based on its
command-line flag before it calls `AddStandardLinkPasses()`.  Stripping
debug info twice isn't very useful.

llvm-svn: 232765
2015-03-19 21:37:17 +00:00
Kevin Qin
49bc764310 Reapply 'Run LICM pass after loop unrolling pass.'
It's firstly committed at r231630, and reverted at r231635.

Function pass InstructionSimplifier is inserted as barrier to
make sure loop unroll pass won't affect on LICM pass.

llvm-svn: 232011
2015-03-12 05:36:01 +00:00
Michael Zolotukhin
267e12f714 Enable loop-rotate before loop-vectorize by default
llvm-svn: 231820
2015-03-10 19:07:41 +00:00
Kevin Qin
65b07b8e1b Revert r231630 - Run LICM pass after loop unrolling pass.
As it broke llvm bootstrap.

llvm-svn: 231635
2015-03-09 07:26:37 +00:00
Kevin Qin
a998735def Run LICM pass after loop unrolling pass.
Runtime unrollng will introduce a runtime check in loop prologue.
If the unrolled loop is a inner loop, then the proglogue will be inside
the outer loop. LICM pass can help to promote the runtime check out if
the checked value is loop invariant.

llvm-svn: 231630
2015-03-09 06:14:07 +00:00
Karthik Bhat
88db86dd29 Add a new pass "Loop Interchange"
This pass interchanges loops to provide a more cache-friendly memory access.

For e.g. given a loop like -
  for(int i=0;i<N;i++)
    for(int j=0;j<N;j++)
      A[j][i] = A[j][i]+B[j][i];

is interchanged to -
  for(int j=0;j<N;j++)
    for(int i=0;i<N;i++)
      A[j][i] = A[j][i]+B[j][i];

This pass is currently disabled by default.

To give a brief introduction it consists of 3 stages-

LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix.
LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time.
LoopInterchangeTransform : Which does the actual transform.

LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks.

TODO:
1) Add support for reductions and lcssa phi.
2) Improve profitability model.
3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange.
4) Improve compile time regression found in llvm lnt due to this pass.
5) Fix issues in Dependency Analysis module.

A special thanks to Hal for reviewing this code.
Review: http://reviews.llvm.org/D7499

llvm-svn: 231458
2015-03-06 10:11:25 +00:00
Peter Collingbourne
e6909c8e8b Introduce bitset metadata format and bitset lowering pass.
This patch introduces a new mechanism that allows IR modules to co-operatively
build pointer sets corresponding to addresses within a given set of
globals. One particular use case for this is to allow a C++ program to
efficiently verify (at each call site) that a vtable pointer is in the set
of valid vtable pointers for the class or its derived classes. One way of
doing this is for a toolchain component to build, for each class, a bit set
that maps to the memory region allocated for the vtables, such that each 1
bit in the bit set maps to a valid vtable for that class, and lay out the
vtables next to each other, to minimize the total size of the bit sets.

The patch introduces a metadata format for representing pointer sets, an
'@llvm.bitset.test' intrinsic and an LTO lowering pass that lays out the globals
and builds the bitsets, and documents the new feature.

Differential Revision: http://reviews.llvm.org/D7288

llvm-svn: 230054
2015-02-20 20:30:47 +00:00
Hal Finkel
2bb61ba2fe [BDCE] Add a bit-tracking DCE pass
BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
"aggressive DCE" pass), with the added capability to track dead bits of integer
valued instructions and remove those instructions when all of the bits are
dead.

Currently, it does not actually do this all-bits-dead removal, but rather
replaces the instruction's uses with a constant zero, and lets instcombine (and
the later run of ADCE) do the rest. Because we essentially get a run of ADCE
"for free" while tracking the dead bits, we also do what ADCE does and removes
actually-dead instructions as well (this includes instructions newly trivially
dead because all bits were dead, but not all such instructions can be removed).

The motivation for this is a case like:

int __attribute__((const)) foo(int i);
int bar(int x) {
  x |= (4 & foo(5));
  x |= (8 & foo(3));
  x |= (16 & foo(2));
  x |= (32 & foo(1));
  x |= (64 & foo(0));
  x |= (128& foo(4));
  return x >> 4;
}

As it turns out, if you order the bit-field insertions so that all of the dead
ones come last, then instcombine will remove them. However, if you pick some
other order (such as the one above), the fact that some of the calls to foo()
are useless is not locally obvious, and we don't remove them (without this
pass).

I did a quick compile-time overhead check using sqlite from the test suite
(Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
twice as expensive as ADCE).

I've not looked at why yet, but we eliminate instructions due to having
all-dead bits in:
External/SPEC/CFP2006/447.dealII/447.dealII
External/SPEC/CINT2006/400.perlbench/400.perlbench
External/SPEC/CINT2006/403.gcc/403.gcc
MultiSource/Applications/ClamAV/clamscan
MultiSource/Benchmarks/7zip/7zip-benchmark

llvm-svn: 229462
2015-02-17 01:36:59 +00:00
James Molloy
83570247f1 Run LICM as part of the cleanup phase from the scalar optimizer.
Things like LoopUnrolling can produce loop invariant values - make sure
we pick them up.

llvm-svn: 229419
2015-02-16 18:59:54 +00:00
Chandler Carruth
30d69c2e36 [PM] Remove the old 'PassManager.h' header file at the top level of
LLVM's include tree and the use of using declarations to hide the
'legacy' namespace for the old pass manager.

This undoes the primary modules-hostile change I made to keep
out-of-tree targets building. I sent an email inquiring about whether
this would be reasonable to do at this phase and people seemed fine with
it, so making it a reality. This should allow us to start bootstrapping
with modules to a certain extent along with making it easier to mix and
match headers in general.

The updates to any code for users of LLVM are very mechanical. Switch
from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h".
Qualify the types which now produce compile errors with "legacy::". The
most common ones are "PassManager", "PassManagerBase", and
"FunctionPassManager".

llvm-svn: 229094
2015-02-13 10:01:29 +00:00
Chandler Carruth
1efa12d6d8 [PM] Sink the population of the pass manager with target-specific
analyses back into the LTO code generator.

The pass manager builder (and the transforms library in general)
shouldn't be referencing the target machine at all.

This makes the LTO population work like the others -- the data layout
and target transform info need to be pre-populated.

llvm-svn: 227576
2015-01-30 13:33:42 +00:00
Eric Christopher
b9f60c17dc Remove unused include.
llvm-svn: 227170
2015-01-27 05:58:44 +00:00
Chandler Carruth
b98f63dbdb [PM] Separate the TargetLibraryInfo object from the immutable pass.
The pass is really just a means of accessing a cached instance of the
TargetLibraryInfo object, and this way we can re-use that object for the
new pass manager as its result.

Lots of delta, but nothing interesting happening here. This is the
common pattern that is developing to allow analyses to live in both the
old and new pass manager -- a wrapper pass in the old pass manager
emulates the separation intrinsic to the new pass manager between the
result and pass for analyses.

llvm-svn: 226157
2015-01-15 10:41:28 +00:00
Chandler Carruth
62d4215baa [PM] Move TargetLibraryInfo into the Analysis library.
While the term "Target" is in the name, it doesn't really have to do
with the LLVM Target library -- this isn't an abstraction which LLVM
targets generally need to implement or extend. It has much more to do
with modeling the various runtime libraries on different OSes and with
different runtime environments. The "target" in this sense is the more
general sense of a target of cross compilation.

This is in preparation for porting this analysis to the new pass
manager.

No functionality changed, and updates inbound for Clang and Polly.

llvm-svn: 226078
2015-01-15 02:16:27 +00:00
Roman Divacky
d2b9a1b890 Disable header duplication at -Oz in loop-rotate pass.
llvm-svn: 222562
2014-11-21 19:53:24 +00:00
Arnold Schwaighofer
eb1a38fa73 Add an option to the LTO code generator to disable vectorization during LTO
We used to always vectorize (slp and loop vectorize) in the LTO pass pipeline.

r220345 changed it so that we used the PassManager's fields 'LoopVectorize' and
'SLPVectorize' out of the desire to be able to disable vectorization using the
cl::opt flags 'vectorize-loops'/'slp-vectorize' which the before mentioned
fields default to.
Unfortunately, this turns off vectorization because those fields
default to false.
This commit adds flags to the LTO library to disable lto vectorization which
reconciles the desire to optionally disable vectorization during LTO and
the desired behavior of defaulting to enabled vectorization.

We really want tools to set PassManager flags directly to enable/disable
vectorization and not go the route via cl::opt flags *in*
PassManagerBuilder.cpp.

llvm-svn: 220652
2014-10-26 21:50:58 +00:00
Nick Lewycky
592d84974c If requested, apply function merging at -O0 too. It's useful there to reduce the time to compile.
llvm-svn: 220537
2014-10-23 23:49:31 +00:00
JF Bastien
f42a6ea5ac LTO: respect command-line options that disable vectorization.
Summary: Patches 202051 and 208013 added calls to LTO's PassManager which unconditionally add LoopVectorizePass and SLPVectorizerPass instead of following the logic in PassManagerBuilder::populateModulePassManager and honoring the -vectorize-loops -run-slp-after-loop-vectorization flags.

Reviewers: nadav, aschwaighofer, yijiang

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5884

llvm-svn: 220345
2014-10-21 23:18:21 +00:00
Chandler Carruth
7b8297a61e Add some optional passes around the vectorizer to both better prepare
the IR going into it and to clean up the IR produced by the vectorizers.

Note that these are *off by default* right now while folks collect data
on whether the performance tradeoff is reasonable.

In a build of the 'opt' binary, I see about 2% compile time regression
due to this change on average. This is in my mind essentially the worst
expected case: very little of the opt binary is going to *benefit* from
these extra passes.

I've seen several benchmarks improve in performance my small amounts due
to running these passes, and there are certain (rare) cases where these
passes make a huge difference by either enabling the vectorizer at all
or by hoisting runtime checks out of the outer loop. My primary
motivation is to prevent people from seeing runtime check overhead in
benchmarks where the existing passes and optimizers would be able to
eliminate that.

I've chosen the sequence of passes based on the kinds of things that
seem likely to be relevant for the code at each stage: rotaing loops for
the vectorizer, finding correlated values, loop invariants, and
unswitching opportunities from any runtime checks, and cleaning up
commonalities exposed by the SLP vectorizer.

I'll be pinging existing threads where some of these issues have come up
and will start new threads to get folks to benchmark and collect data on
whether this is the right tradeoff or we should do something else.

llvm-svn: 219644
2014-10-14 00:31:29 +00:00
Nick Lewycky
9e6d184803 Add control of function merging to the PMBuilder.
llvm-svn: 217731
2014-09-13 21:46:00 +00:00
Rafael Espindola
c435adcde0 Add doInitialization/doFinalization to DataLayoutPass.
With this a DataLayoutPass can be reused for multiple modules.

Once we have doInitialization/doFinalization, it doesn't seem necessary to pass
a Module to the constructor.

Overall this change seems in line with the idea of making DataLayout a required
part of Module. With it the only way of having a DataLayout used is to add it
to the Module.

llvm-svn: 217548
2014-09-10 21:27:43 +00:00
Gerolf Hoflehner
008e5cdcba [PassManager] Adding Hidden attribute to EnableMLSM option
llvm-svn: 217539
2014-09-10 20:24:03 +00:00
Gerolf Hoflehner
24815d9b8f [MergedLoadStoreMotion] Move pass enabling option to PassManagerBuilder
llvm-svn: 217538
2014-09-10 19:55:29 +00:00
Hal Finkel
d67e463901 Add an AlignmentFromAssumptions Pass
This adds a ScalarEvolution-powered transformation that updates load, store and
memory intrinsic pointer alignments based on invariant((a+q) & b == 0)
expressions. Many of the simple cases we can get with ValueTracking, but we
still need something like this for the more complicated cases (such as those
with an offset) that require some algebra. Note that gcc's
__builtin_assume_aligned's optional third argument provides exactly for this
kind of 'misalignment' offset for which this kind of logic is necessary.

The primary motivation is to fixup alignments for vector loads/stores after
vectorization (and unrolling). This pass is added to the optimization pipeline
just after the SLP vectorizer runs (which, admittedly, does not preserve SE,
although I imagine it could).  Regardless, I actually don't think that the
preservation matters too much in this case: SE computes lazily, and this pass
won't issue any SE queries unless there are any assume intrinsics, so there
should be no real additional cost in the common case (SLP does preserve DT and
LoopInfo).

llvm-svn: 217344
2014-09-07 20:05:11 +00:00
James Molloy
6b95d8ed36 Enable noalias metadata by default and swap the order of the SLP and Loop vectorizers by default.
After some time maturing, hopefully the flags themselves will be removed.

llvm-svn: 217144
2014-09-04 13:23:08 +00:00
Hal Finkel
445dda5c4a Add pass-manager flags to use CFL AA
Add -use-cfl-aa (and -use-cfl-aa-in-codegen) to add CFL AA in the default pass
managers (for easy testing).

llvm-svn: 216978
2014-09-02 22:12:54 +00:00
Rafael Espindola
7cebf36a95 Move some logic to populateLTOPassManager.
This will avoid code duplication in the next commit which calls it directly
from the gold plugin.

llvm-svn: 216211
2014-08-21 20:03:44 +00:00
Rafael Espindola
216e0c0617 Respect LibraryInfo in populateLTOPassManager and use it. NFC.
llvm-svn: 216203
2014-08-21 18:49:52 +00:00
Rafael Espindola
e07caad9e7 Handle inlining in populateLTOPassManager like in populateModulePassManager.
No functionality change.

llvm-svn: 216178
2014-08-21 13:35:30 +00:00
Rafael Espindola
208bc533cd Move DisableGVNLoadPRE from populateLTOPassManager to PassManagerBuilder.
llvm-svn: 216174
2014-08-21 13:13:17 +00:00
James Molloy
568da0990e Add a new option -run-slp-after-loop-vectorization.
This swaps the order of the loop vectorizer and the SLP/BB vectorizers. It is disabled by default so we can do performance testing - ideally we want to change to having the loop vectorizer running first, and the SLP vectorizer using its leftovers instead of the other way around.

llvm-svn: 214963
2014-08-06 12:56:19 +00:00
Rafael Espindola
f9e52cf015 Don't internalize all but main by default.
This is mostly a cleanup, but it changes a fairly old behavior.

Every "real" LTO user was already disabling the silly internalize pass
and creating the internalize pass itself. The difference with this
patch is for "opt -std-link-opts" and the C api.

Now to get a usable behavior out of opt one doesn't need the funny
looking command line:

opt -internalize -disable-internalize -internalize-public-api-list=foo,bar -std-link-opts

llvm-svn: 214919
2014-08-05 20:10:38 +00:00
Hal Finkel
9414665a3b Add scoped-noalias metadata
This commit adds scoped noalias metadata. The primary motivations for this
feature are:
  1. To preserve noalias function attribute information when inlining
  2. To provide the ability to model block-scope C99 restrict pointers

Neither of these two abilities are added here, only the necessary
infrastructure. In fact, there should be no change to existing functionality,
only the addition of new features. The logic that converts noalias function
parameters into this metadata during inlining will come in a follow-up commit.

What is added here is the ability to generally specify noalias memory-access
sets. Regarding the metadata, alias-analysis scopes are defined similar to TBAA
nodes:

!scope0 = metadata !{ metadata !"scope of foo()" }
!scope1 = metadata !{ metadata !"scope 1", metadata !scope0 }
!scope2 = metadata !{ metadata !"scope 2", metadata !scope0 }
!scope3 = metadata !{ metadata !"scope 2.1", metadata !scope2 }
!scope4 = metadata !{ metadata !"scope 2.2", metadata !scope2 }

Loads and stores can be tagged with an alias-analysis scope, and also, with a
noalias tag for a specific scope:

... = load %ptr1, !alias.scope !{ !scope1 }
... = load %ptr2, !alias.scope !{ !scope1, !scope2 }, !noalias !{ !scope1 }

When evaluating an aliasing query, if one of the instructions is associated
with an alias.scope id that is identical to the noalias scope associated with
the other instruction, or is a descendant (in the scope hierarchy) of the
noalias scope associated with the other instruction, then the two memory
accesses are assumed not to alias.

Note that is the first element of the scope metadata is a string, then it can
be combined accross functions and translation units. The string can be replaced
by a self-reference to create globally unqiue scope identifiers.

[Note: This overview is slightly stylized, since the metadata nodes really need
to just be numbers (!0 instead of !scope0), and the scope lists are also global
unnamed metadata.]

Existing noalias metadata in a callee is "cloned" for use by the inlined code.
This is necessary because the aliasing scopes are unique to each call site
(because of possible control dependencies on the aliasing properties). For
example, consider a function: foo(noalias a, noalias b) { *a = *b; } that gets
inlined into bar() { ... if (...) foo(a1, b1); ... if (...) foo(a2, b2); } --
now just because we know that a1 does not alias with b1 at the first call site,
and a2 does not alias with b2 at the second call site, we cannot let inlining
these functons have the metadata imply that a1 does not alias with b2.

llvm-svn: 213864
2014-07-24 14:25:39 +00:00
Gerolf Hoflehner
f27ae6cdcf MergedLoadStoreMotion pass
Merges equivalent loads on both sides of a hammock/diamond
and hoists into into the header.
Merges equivalent stores on both sides of a hammock/diamond
and sinks it to the footer.
Can enable if conversion and tolerate better load misses
and store operand latencies.

llvm-svn: 213396
2014-07-18 19:13:09 +00:00
Gerolf Hoflehner
65b13324e1 Run interprocedural const prop before global optimizer
Exposes more constant globals that can be removed by
the global optimizer. A specific example is the removal
of the static global block address array in
clang/test/CodeGen/indirect-goto.c. This change impacts only
lower optimization levels. With LTO interprocedural
const prop runs already before global opt.

llvm-svn: 212284
2014-07-03 19:28:15 +00:00
Michael J. Spencer
289067cc3d Add LoadCombine pass.
This pass is disabled by default. Use -combine-loads to enable in -O[1-3]

Differential revision: http://reviews.llvm.org/D3580

llvm-svn: 209791
2014-05-29 01:55:07 +00:00
Peter Collingbourne
0a4376190f Add an extension point for peephole optimizers.
This extension point allows adding passes that perform peephole optimizations
similar to the instruction combiner. These passes will be inserted after
each instance of the instruction combiner pass.

Differential Revision: http://reviews.llvm.org/D3905

llvm-svn: 209595
2014-05-25 10:27:02 +00:00