teak-llvm

mirror of https://github.com/Gericom/teak-llvm.git synced 2025-06-26 06:48:51 -04:00

Author	SHA1	Message	Date
Vedant Kumar	bd94b4287c	[HotColdSplit] Do not split out `resume` instructions Resumes that are not reachable from a cleanup landing pad are considered to be unreachable. It’s not safe to split them out. rdar://47808235 llvm-svn: 353242	2019-02-05 23:39:02 +00:00
Vedant Kumar	db3f9774ee	[HotColdSplit] Introduce a cost model to control splitting behavior The main goal of the model is to avoid increasing function size, as that would eradicate any memory locality benefits from splitting. This happens when: - There are too many inputs or outputs to the cold region. Argument materialization and reloads of outputs have a cost. - The cold region has too many distinct exit blocks, causing a large switch to be formed in the caller. - The code size cost of the split code is less than the cost of a set-up call. A secondary goal is to prevent excessive overall binary size growth. With the cost model in place, I experimented to find a splitting threshold that works well in practice. To make warm & cold code easily separable for analysis purposes, I moved split functions to a "cold" section. I experimented with thresholds between [0, 4] and set the default to the threshold which minimized geomean __text size. Experiment data from building LNT+externals for X86 (N = 639 programs, all sizes in bytes): \| Configuration \| __text geom size \| __cold geom size \| TEXT geom size \| \| -Os \| 1736.3 \| 0, n=0 \| 10961.6 \| \| -Os, thresh=0 \| 1740.53 \| 124.482, n=134 \| 11014 \| \| -Os, thresh=1 \| 1734.79 \| 57.8781, n=90 \| 10978.6 \| \| -Os, thresh=2 \| 1733.85 \| 65.6604, n=61 \| 10977.6 \| \| -Os, thresh=3 \| 1733.85 \| 65.3071, n=61 \| 10977.6 \| \| -Os, thresh=4 \| 1735.08 \| 67.5156, n=54 \| 10965.7 \| \| -Oz \| 1554.4 \| 0, n=0 \| 10153 \| \| -Oz, thresh=2 \| 1552.2 \| 65.633, n=61 \| 10176 \| \| -O3 \| 2563.37 \| 0, n=0 \| 13105.4 \| \| -O3, thresh=2 \| 2559.49 \| 71.1072, n=61 \| 13162.4 \| Picking thresh=2 reduces the geomean __text section size by 0.14% at -Os, -Oz, and -O3 and causes ~0.2% growth in the TEXT segment. Note that TEXT size is page-aligned, whereas section sizes are byte-aligned. Experiment data from building LNT+externals for ARM64 (N = 558 programs, all sizes in bytes): \| Configuration \| __text geom size \| __cold geom size \| TEXT geom size \| \| -Os \| 1763.96 \| 0, n=0 \| 42934.9 \| \| -Os, thresh=2 \| 1760.9 \| 76.6755, n=61 \| 42934.9 \| Picking thresh=2 reduces the geomean __text section size by 0.17% at -Os and causes no growth in the TEXT segment. Measurements were done with D57082 (r352080) applied. Differential Revision: https://reviews.llvm.org/D57125 llvm-svn: 352228	2019-01-25 18:30:37 +00:00
Vedant Kumar	9d70f2b939	[HotColdSplit] Describe the pass in more detail, NFC llvm-svn: 352161	2019-01-25 03:22:38 +00:00
Vedant Kumar	65de025d64	[HotColdSplit] Split more aggressively before/after cold invokes While a cold invoke itself and its unwind destination can't be extracted, code which unconditionally executes before/after the invoke may still be profitable to extract. With cost model changes from D57125 applied, this gives a 3.5% increase in split text across LNT+externals on arm64 at -Os. llvm-svn: 352160	2019-01-25 03:22:23 +00:00
Florian Hahn	bed7f9eab2	Revert "[HotColdSplitting] Get DT and PDT from the pass manager." This reverts commit `a6982414ed` (llvm-svn: 352036), because it causes a memory leak in the pass manager. Failing bot http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/10351/steps/check-llvm%20asan/logs/stdio llvm-svn: 352041	2019-01-24 11:22:08 +00:00
Florian Hahn	a6982414ed	[HotColdSplitting] Get DT and PDT from the pass manager. Instead of manually computing DT and PDT, we can get the from the pass manager, which ideally has them already cached. With the new pass manager, we could even preserve DT/PDT on a per function basis in a module pass. I think this also addresses the TODO about re-using the computed DTs for BFI. IIUC, GetBFI will fetch the DT from the pass manager and when we will fetch the cached version later. Reviewers: vsk, hiraditya, tejohnson, thegameg, sebpop Reviewed By: vsk Differential Revision: https://reviews.llvm.org/D57092 llvm-svn: 352036	2019-01-24 09:44:52 +00:00
Florian Hahn	68cea130df	[HotColdSplitting] Remove unused SSAUpdater.h include (NFC). llvm-svn: 351945	2019-01-23 11:51:38 +00:00
Vedant Kumar	cde65c0fac	[HotColdSplit] Calculate BFI lazily to reduce compile-time, NFC The splitting pass does not need BFI unless the Module actually has a profile summary. Do not calcualte BFI unless the summary is present. For the sqlite3 amalgamation, this reduces time spent in the splitting pass from 0.4% of the total to under 0.1%. llvm-svn: 351894	2019-01-22 22:49:22 +00:00
Vedant Kumar	38874c8f7b	[HotColdSplit] Calculate domtrees lazily to reduce compile-time, NFC The splitting pass does not need (post)domtrees until after it's found a cold block. Defer domtree calculation until a cold block is found. For the sqlite3 amalgamation, this reduces time spent in the splitting pass from 0.8% of the total to 0.4%. llvm-svn: 351892	2019-01-22 22:49:08 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Vedant Kumar	b755a2df51	[HotColdSplit] Mark inherently cold functions as such If an inherently cold function is found, mark it as cold. For now this means applying the `cold` and `minsize` attributes. As a drive-by, revisit and clean up the criteria for considering a function for splitting. Add tests. llvm-svn: 351623	2019-01-19 02:38:47 +00:00
Vedant Kumar	4de1962bb9	[HotColdSplit] Remove a set which tracked split functions (NFC) Use the begin/end iterator idiom to avoid visiting split functions, instead of doing a set lookup. llvm-svn: 351622	2019-01-19 02:38:17 +00:00
Vedant Kumar	f529b50702	[HotColdSplit] Allow outlining with live outputs Prior to r348205, extracting code regions with live output values was disabled because of a miscompilation (PR39433). Lift the restriction as PR39433 has been addressed. Tested on LNT+externals, on a run of check-llvm in a stage2 build, and with a full build of iOS (with hot/cold splitting enabled). As a drive-by, remove an errant TODO. llvm-svn: 351492	2019-01-17 22:36:05 +00:00
Vedant Kumar	b70e20db62	[HotColdSplit] Consider resume instructions to be cold Resuming exception unwinding is roughly as unlikely as throwing an exception. Tested on LNT+externals (in particular, the C++ EH regression tests provide end-to-end test coverage), as well as with a full build of iOS. llvm-svn: 351491	2019-01-17 22:35:47 +00:00
Vedant Kumar	4541be0686	[HotColdSplit] Relax requirement that the cold sink block be extractable Relaxing this requirement creates opportunities to split code dominated by an EH pad. Tested on LNT+externals. llvm-svn: 351483	2019-01-17 21:42:36 +00:00
Vedant Kumar	32a014d048	[HotColdSplit] Simplify tests by lowering their splitting thresholds This gets rid of the brittle/mysterious calls to @sink()/@sideeffect() peppered throughout the test cases. They are no longer needed to force splitting to occur. llvm-svn: 351480	2019-01-17 21:29:34 +00:00
Benjamin Kramer	b17d2136ea	Give helper classes/functions local linkage. NFC. llvm-svn: 351016	2019-01-12 18:36:22 +00:00
Vedant Kumar	b3a7cae045	[HotColdSplitting] Disable outlining landingpad instructions (PR39917) It's currently not safe to outline landingpad instructions (see llvm.org/PR39917). Like @llvm.eh.typeid.for, the order and content of previous landingpad instructions in a function alters the lowering of subsequent landingpads by renumbering type info ID's. Outlining a landingpad therefore breaks exception handling & unwinding. llvm-svn: 348870	2018-12-11 18:05:31 +00:00
Vedant Kumar	03f9f15b16	[HotColdSplitting] Refine definition of unlikelyExecuted The splitting pass uses its 'unlikelyExecuted' predicate to statically decide which blocks are cold. - Do not treat noreturn calls as if they are cold unless they are actually marked cold. This is motivated by functions like exit() and longjmp(), which are not beneficial to outline. - Do not treat inline asm as an outlining barrier. In practice asm("") is frequently used to inhibit basic block merging; enabling outlining in this case results in substantial memory savings. - Treat invokes of cold functions as cold. As a drive-by, remove the 'exceptionHandlingFunctions' predicate, because it's no longer needed. The pass can identify & outline blocks dominated by EH pads, so there's no need to special-case __cxa_begin_catch etc. Differential Revision: https://reviews.llvm.org/D54244 llvm-svn: 348640	2018-12-07 20:24:04 +00:00
Vedant Kumar	03aaa3e2aa	[HotColdSplitting] Outline more than once per function Algorithm: Identify maximal cold regions and put them in a worklist. If a candidate region overlaps with another, discard it. While the worklist is full, remove a single-entry sub-region from the worklist and attempt to outline it. By the non-overlap property, this should not invalidate parts of the domtree pertaining to other outlining regions. Testing: LNT results on X86 are clean. With test-suite + externals, llvm outlines 134KB pre-patch, and 352KB post-patch (+ ~2.6x). The file 483.xalancbmk/src/Constants.cpp stands out as an extreme case where llvm outlines over 100 times in some functions (mostly EH paths). There was not a significant performance impact pre vs. post-patch. Differential Revision: https://reviews.llvm.org/D53887 llvm-svn: 348639	2018-12-07 20:23:52 +00:00
Vedant Kumar	e7b789b529	[ProfileSummary] Standardize methods and fix comment Every Analysis pass has a get method that returns a reference of the Result of the Analysis, for example, BlockFrequencyInfo &BlockFrequencyInfoWrapperPass::getBFI(). I believe that ProfileSummaryInfo::getPSI() is the only exception to that, as it was returning a pointer. Another change is renaming isHotBB and isColdBB to isHotBlock and isColdBlock, respectively. Most methods use BB as the argument of variable names while methods usually refer to Basic Blocks as Blocks, instead of BB. For example, Function::getEntryBlock, Loop:getExitBlock, etc. I also fixed one of the comments. Patch by Rodrigo Caetano Rocha! Differential Revision: https://reviews.llvm.org/D54669 llvm-svn: 347182	2018-11-19 05:23:16 +00:00
Vedant Kumar	d2a895a972	[HotColdSplitting] Use TTI to inform outlining threshold Using TargetTransformInfo allows the splitting pass to factor in the code size cost of instructions as it decides whether or not outlining is profitable. This did not regress the overall amount of outlining seen on the handful of internal frameworks I tested. Thanks to Jun Bum Lim for suggesting this! Differential Revision: https://reviews.llvm.org/D53835 llvm-svn: 346108	2018-11-04 23:11:57 +00:00
Vedant Kumar	dd4be53b20	[HotColdSplitting] Allow outlining single-block cold regions It can be profitable to outline single-block cold regions because they may be large. Allow outlining single-block regions if they have over some threshold of non-debug, non-terminator instructions. I chose 3 as the threshold after experimenting with several internal frameworks. In practice, reducing the threshold further did not give much improvement, whereas increasing it resulted in substantial regressions. Differential Revision: https://reviews.llvm.org/D53824 llvm-svn: 345524	2018-10-29 19:15:39 +00:00
Vedant Kumar	c299006879	[HotColdSplitting] Identify larger cold regions using domtree queries The current splitting algorithm works in three stages: 1) Identify cold blocks, then 2) Use forward/backward propagation to mark hot blocks, then 3) Grow a SESE region of blocks outside of the set of hot blocks and start outlining. While testing this pass on Apple internal frameworks I noticed that some kinds of control flow (e.g. loops) are never outlined, even though they unconditionally lead to / follow cold blocks. I noticed two other issues related to how cold regions are identified: - An inconsistency can arise in the internal state of the hotness propagation stage, as a block may end up in both the ColdBlocks set and the HotBlocks set. Further inconsistencies can arise as these sets do not match what's in ProfileSummaryInfo. - It isn't necessary to limit outlining to single-exit regions. This patch teaches the splitting algorithm to identify maximal cold regions and outline them. A maximal cold region is defined as the set of blocks post-dominated by a cold sink block, or dominated by that sink block. This approach can successfully outline loops in the cold path. As a side benefit, it maintains less internal state than the current approach. Due to a limitation in CodeExtractor, blocks within the maximal cold region which aren't dominated by a single entry point (a so-called "max ancestor") are filtered out. Results: - X86 (LNT + -Os + externals): 134KB of TEXT were outlined compared to 47KB pre-patch, or a ~3x improvement. Did not see a performance impact across two runs. - AArch64 (LNT + -Os + externals + Apple-internal benchmarks): 149KB of TEXT were outlined. Ditto re: performance impact. - Outlining results improve marginally in the internal frameworks I tested. Follow-ups: - Outline more than once per function, outline large single basic blocks, & try to remove unconditional branches in outlined functions. Differential Revision: https://reviews.llvm.org/D53627 llvm-svn: 345209	2018-10-24 22:15:41 +00:00
Teresa Johnson	c8dba682bb	[hot-cold-split] Name split functions with ".cold" suffix Summary: The current default of appending "_"+entry block label to the new extracted cold function breaks demangling. Change the deliminator from "_" to "." to enable demangling. Because the header block label will be empty for release compile code, use "extracted" after the "." when the label is empty. Additionally, add a mechanism for the client to pass in an alternate suffix applied after the ".", and have the hot cold split pass use "cold."+Count, where the Count is currently 1 but can be used to uniquely number multiple cold functions split out from the same function with D53588. Reviewers: sebpop, hiraditya Subscribers: llvm-commits, erik.pilkington Differential Revision: https://reviews.llvm.org/D53534 llvm-svn: 345178	2018-10-24 18:53:47 +00:00
Vedant Kumar	503154615d	[HotColdSplitting] Attach MinSize to outlined code Outlined code is cold by assumption, so it makes sense to optimize it for minimal code size rather than performance. After r344869 moved the splitting pass to the end of the IR pipeline, this does not result in much of a code size reduction. This is probably because a comparatively small number backend transforms make use of the MinSize hint. Running LNT on x86_64, I see that 33/1020 binaries shrink for a total of 919 bytes of TEXT reduction. I didn't measure a significant performance impact. Differential Revision: https://reviews.llvm.org/D53518 llvm-svn: 345072	2018-10-23 19:41:12 +00:00
Teresa Johnson	f431a2f261	[hot-cold-split] Add opt remark on success Summary: Emit optimization remark on successful hot cold split. Reviewers: sebpop, hiraditya Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53512 llvm-svn: 344938	2018-10-22 19:06:42 +00:00
Lang Hames	8f9a2446e0	Change a TerminatorInst* to an Instruction* in HotColdSplitting.cpp. r344558 added an assignment to a TerminatorInst* from BasicBlock::getTerminatorInst(), but BasicBlock::getTerminatorInst() returns an Instruction* rather than a TerminatorInst* since r344504 so this fails to compile. Changing the variable to an Instruction* should get the bots building again. llvm-svn: 344566	2018-10-15 22:27:03 +00:00
Sebastian Pop	542e522b87	[hot-cold-split] fix static analysis of cold regions Make the code of blockEndsInUnreachable to match the function blockEndsInUnreachable in CodeGen/BranchFolding.cpp. I also have added a note to make sure the code of this function will not be modified unless the back-end version is also modified. An early return before outlining has been added to avoid outlining the full function body when the first block in the function is marked cold. The static analysis of cold code has been amended to avoid marking the whole function as cold by back-propagation because the back-propagation would mark blocks with return statements as cold. The patch adds debug statements to help discover these problems. Differential Revision: https://reviews.llvm.org/D52904 llvm-svn: 344558	2018-10-15 21:43:11 +00:00
Chandler Carruth	edb12a838a	[TI removal] Make variables declared as `TerminatorInst` and initialized by `getTerminator()` calls instead be declared as `Instruction`. This is the biggest remaining chunk of the usage of `getTerminator()` that insists on the narrow type and so is an easy batch of updates. Several files saw more extensive updates where this would cascade to requiring API updates within the file to use `Instruction` instead of `TerminatorInst`. All of these were trivial in nature (pervasively using `Instruction` instead just worked). llvm-svn: 344502	2018-10-15 10:04:59 +00:00
Aditya Kumar	a27014b851	Improve static analysis of cold basic blocks Differential Revision: https://reviews.llvm.org/D52704 Reviewers: sebpop, tejohnson, brzycki, SirishP Reviewed By: sebpop llvm-svn: 343663	2018-10-03 06:21:05 +00:00
Aditya Kumar	9e20ade72a	Add support for new pass manager Modified the testcases to use both pass managers Use single commandline flag for both pass managers. Differential Revision: https://reviews.llvm.org/D52708 Reviewers: sebpop, tejohnson, brzycki, SirishP Reviewed By: tejohnson, brzycki llvm-svn: 343662	2018-10-03 05:55:20 +00:00
Sebastian Pop	0f30f08b02	HotColdSplit: fix invalid SSA due to outlining The test used to fail with an invalid phi node: the two predecessors were outlined and the SSA representation was left invalid. The patch adds the exit block to the cold region. llvm-svn: 342277	2018-09-14 20:36:19 +00:00
Sebastian Pop	3abcf69074	HotColdSplit: fix isSingleEntrySingleExit remove duplicate entries from isSingleEntrySingleExit: the Entry block is already added by the loop over the dominance frontier. Remove the heuristic from isOutlineCandidate that a region is too small when it only contains a basic block. With this change we now grow regions starting from a block and we continue adding to the ValidColdRegion. Check the heuristic just before code generation. llvm-svn: 342276	2018-09-14 20:36:14 +00:00
Sebastian Pop	1217160bb3	HotColdSplit: add back propagation to extend cold regions Also fix a problem in forward propagation: const TerminatorInst *TI = It->getTerminator(); was set outside the while loop that iterates over It. llvm-svn: 342275	2018-09-14 20:36:10 +00:00
Sebastian Pop	a1f20fcc96	HotColdSplitting: check that target supports cold calling convention Before tagging a function with coldcc make sure the target supports cold calling convention. Without this patch HotColdSplitting pass fails on aarch64 with: fatal error: error in backend: Unsupported calling convention. llvm-svn: 341838	2018-09-10 15:08:02 +00:00
Aditya Kumar	801394a3d7	Hot cold splitting pass Find cold blocks based on profile information (or optionally with static analysis). Forward propagate profile information to all cold-blocks. Outline a cold region. Set calling conv and prof hint for the callsite of the outlined function. Worked in collaboration with: Sebastian Pop <s.pop@samsung.com> Differential Revision: https://reviews.llvm.org/D50658 llvm-svn: 341669	2018-09-07 15:03:49 +00:00

37 Commits