Patch by Sunita Marathe
Third try, now following fixes to MSan to handle mempcy in such a way that this commit won't break the MSan buildbots. (Thanks, Evegenii!)
llvm-svn: 277189
Summary:
Instead, we take a single flags arg (a bitset).
Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.
This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted. It also greatly simplifies the process of adding another flag
to getLoad.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits
Differential Revision: http://reviews.llvm.org/D22249
llvm-svn: 275592
Summary:
Previously we took an unsigned.
Hooray for type-safety.
Reviewers: chandlerc
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D22282
llvm-svn: 275591
We can now handle concatenation of each source multiple times. The previous code just checked for each source to appear once in either order.
This also now handles an entire source vector sized piece having undef indices correctly. We now concat with UNDEF instead of using one of the sources. This is responsible for the test case change.
llvm-svn: 274483
For the most part this simplifies all callers. There were two places in X86 that needed an explicit makeArrayRef to shorten a statically sized array.
llvm-svn: 274337
The setCallee function will set the number of fixed arguments based
on the size of the argument list. The FixedArgs parameter was often
explicitly set to 0, leading to a lack of consistent value for non-
vararg functions.
Differential Revision: http://reviews.llvm.org/D20376
llvm-svn: 273403
The exit-on-error flag in the ARM test is necessary in order to avoid an
unreachable in the DAGTypeLegalizer, when trying to expand a physical register.
We can also avoid this situation by introducing a bitcast early on, where the
invalid scalar-to-vector conversion is detected.
We also add a test for PowerPC, which goes through a similar code path in the
SelectionDAGBuilder.
Fixes PR27765.
Differential Revision: http://reviews.llvm.org/D21061
llvm-svn: 272644
This used to be free, copying and moving DebugLocs became expensive
after the metadata rewrite. Passing by reference eliminates a ton of
track/untrack operations. No functionality change intended.
llvm-svn: 272512
Summary:
This patch is adding support for the MSVC buffer security check implementation
The buffer security check is turned on with the '/GS' compiler switch.
* https://msdn.microsoft.com/en-us/library/8dbf701c.aspx
* To be added to clang here: http://reviews.llvm.org/D20347
Some overview of buffer security check feature and implementation:
* https://msdn.microsoft.com/en-us/library/aa290051(VS.71).aspx
* http://www.ksyash.com/2011/01/buffer-overflow-protection-3/
* http://blog.osom.info/2012/02/understanding-vs-c-compilers-buffer.html
For the following example:
```
int example(int offset, int index) {
char buffer[10];
memset(buffer, 0xCC, index);
return buffer[index];
}
```
The MSVC compiler is adding these instructions to perform stack integrity check:
```
push ebp
mov ebp,esp
sub esp,50h
[1] mov eax,dword ptr [__security_cookie (01068024h)]
[2] xor eax,ebp
[3] mov dword ptr [ebp-4],eax
push ebx
push esi
push edi
mov eax,dword ptr [index]
push eax
push 0CCh
lea ecx,[buffer]
push ecx
call _memset (010610B9h)
add esp,0Ch
mov eax,dword ptr [index]
movsx eax,byte ptr buffer[eax]
pop edi
pop esi
pop ebx
[4] mov ecx,dword ptr [ebp-4]
[5] xor ecx,ebp
[6] call @__security_check_cookie@4 (01061276h)
mov esp,ebp
pop ebp
ret
```
The instrumentation above is:
* [1] is loading the global security canary,
* [3] is storing the local computed ([2]) canary to the guard slot,
* [4] is loading the guard slot and ([5]) re-compute the global canary,
* [6] is validating the resulting canary with the '__security_check_cookie' and performs error handling.
Overview of the current stack-protection implementation:
* lib/CodeGen/StackProtector.cpp
* There is a default stack-protection implementation applied on intermediate representation.
* The target can overload 'getIRStackGuard' method if it has a standard location for the stack protector cookie.
* An intrinsic 'Intrinsic::stackprotector' is added to the prologue. It will be expanded by the instruction selection pass (DAG or Fast).
* Basic Blocks are added to every instrumented function to receive the code for handling stack guard validation and errors handling.
* Guard manipulation and comparison are added directly to the intermediate representation.
* lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
* lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
* There is an implementation that adds instrumentation during instruction selection (for better handling of sibbling calls).
* see long comment above 'class StackProtectorDescriptor' declaration.
* The target needs to override 'getSDagStackGuard' to activate SDAG stack protection generation. (note: getIRStackGuard MUST be nullptr).
* 'getSDagStackGuard' returns the appropriate stack guard (security cookie)
* The code is generated by 'SelectionDAGBuilder.cpp' and 'SelectionDAGISel.cpp'.
* include/llvm/Target/TargetLowering.h
* Contains function to retrieve the default Guard 'Value'; should be overriden by each target to select which implementation is used and provide Guard 'Value'.
* lib/Target/X86/X86ISelLowering.cpp
* Contains the x86 specialisation; Guard 'Value' used by the SelectionDAG algorithm.
Function-based Instrumentation:
* The MSVC doesn't inline the stack guard comparison in every function. Instead, a call to '__security_check_cookie' is added to the epilogue before every return instructions.
* To support function-based instrumentation, this patch is
* adding a function to get the function-based check (llvm 'Value', see include/llvm/Target/TargetLowering.h),
* If provided, the stack protection instrumentation won't be inlined and a call to that function will be added to the prologue.
* modifying (SelectionDAGISel.cpp) do avoid producing basic blocks used for inline instrumentation,
* generating the function-based instrumentation during the ISEL pass (SelectionDAGBuilder.cpp),
* if FastISEL (not SelectionDAG), using the fallback which rely on the same function-based implemented over intermediate representation (StackProtector.cpp).
Modifications
* adding support for MSVC (lib/Target/X86/X86ISelLowering.cpp)
* adding support function-based instrumentation (lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, .h)
Results
* IR generated instrumentation:
```
clang-cl /GS test.cc /Od /c -mllvm -print-isel-input
```
```
*** Final LLVM Code input to ISel ***
; Function Attrs: nounwind sspstrong
define i32 @"\01?example@@YAHHH@Z"(i32 %offset, i32 %index) #0 {
entry:
%StackGuardSlot = alloca i8* <<<-- Allocated guard slot
%0 = call i8* @llvm.stackguard() <<<-- Loading Stack Guard value
call void @llvm.stackprotector(i8* %0, i8** %StackGuardSlot) <<<-- Prologue intrinsic call (store to Guard slot)
%index.addr = alloca i32, align 4
%offset.addr = alloca i32, align 4
%buffer = alloca [10 x i8], align 1
store i32 %index, i32* %index.addr, align 4
store i32 %offset, i32* %offset.addr, align 4
%arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 0
%1 = load i32, i32* %index.addr, align 4
call void @llvm.memset.p0i8.i32(i8* %arraydecay, i8 -52, i32 %1, i32 1, i1 false)
%2 = load i32, i32* %index.addr, align 4
%arrayidx = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 %2
%3 = load i8, i8* %arrayidx, align 1
%conv = sext i8 %3 to i32
%4 = load volatile i8*, i8** %StackGuardSlot <<<-- Loading Guard slot
call void @__security_check_cookie(i8* %4) <<<-- Epilogue function-based check
ret i32 %conv
}
```
* SelectionDAG generated instrumentation:
```
clang-cl /GS test.cc /O1 /c /FA
```
```
"?example@@YAHHH@Z": # @"\01?example@@YAHHH@Z"
# BB#0: # %entry
pushl %esi
subl $16, %esp
movl ___security_cookie, %eax <<<-- Loading Stack Guard value
movl 28(%esp), %esi
movl %eax, 12(%esp) <<<-- Store to Guard slot
leal 2(%esp), %eax
pushl %esi
pushl $204
pushl %eax
calll _memset
addl $12, %esp
movsbl 2(%esp,%esi), %esi
movl 12(%esp), %ecx <<<-- Loading Guard slot
calll @__security_check_cookie@4 <<<-- Epilogue function-based check
movl %esi, %eax
addl $16, %esp
popl %esi
retl
```
Reviewers: kcc, pcc, eugenis, rnk
Subscribers: majnemer, llvm-commits, hans, thakis, rnk
Differential Revision: http://reviews.llvm.org/D20346
llvm-svn: 272053
When processing inline asm that contains errors, make sure we can recover
gracefully by creating an UNDEF SDValue for the inline asm statement before
returning from SelectionDAGBuilder::visitInlineAsm. This is necessary for
consumers that don't exit on the first error that is emitted (e.g. clang)
and that would assert later on.
Fixes PR24071.
Patch by Diana Picus.
llvm-svn: 269811
Allow two users of the condition if the other user
is also a min/max select. i.e.
%c = icmp slt i32 %x, %y
%min = select i1 %c, i32 %x, i32 %y
%max = select i1 %c, i32 %y, i32 %x
llvm-svn: 269699
With this change, ideally IR pass can always generate llvm.stackguard
call to get the stack guard; but for now there are still IR form stack
guard customizations around (see getIRStackGuard()). Future SSP
customization should go through LOAD_STACK_GUARD.
There is a behavior change: stack guard values are not CSEed anymore,
since we should never reuse the value in case that it has been spilled (and
corrupted). See ssp-guard-spill.ll. This also cause the change of stack
size and codegen in X86 and AArch64 test cases.
Ideally we'd like to know if the guard created in llvm.stackprotector() gets
spilled or not. If the value is spilled, discard the value and reload
stack guard; otherwise reuse the value. This can be done by teaching
register allocator to know how to rematerialize LOAD_STACK_GUARD and
force a rematerialization (which seems hard), or check for spilling in
expandPostRAPseudo. It only makes sense when the stack guard is a global
variable, which requires more instructions to load. Anyway, this seems to go out
of the scope of the current patch.
llvm-svn: 266806
The behavior of {MIN,MAX}NAN differs from that of {MIN,MAX}NUM when only
one of the inputs is NaN: -NUM will return the non-NaN argument while
-NAN would return NaN.
It is desirable to lower to @llvm.{min,max}num to -NAN if they don't
have a native instruction for -NUM. Notably, ARMv7 NEON's vmin has the
-NAN semantics.
N.B. Of course, it is only safe to do this if the intrinsic call is
marked nnan.
llvm-svn: 266279
Previously, we were using isGCRelocate predicates. Using a subclass of IntrinsicInst is far more idiomatic. The refactoring also enables a couple of minor simplifications and code sharing.
llvm-svn: 266098
This is a cleanup patch for SSP support in LLVM. There is no functional change.
llvm.stackprotectorcheck is not needed, because SelectionDAG isn't
actually lowering it in SelectBasicBlock; rather, it adds check code in
FinishBasicBlock, ignoring the position where the intrinsic is inserted
(See FindSplitPointForStackProtector()).
llvm-svn: 265851
Summary:
In the context of http://wg21.link/lwg2445 C++ uses the concept of
'stronger' ordering but doesn't define it properly. This should be fixed
in C++17 barring a small question that's still open.
The code currently plays fast and loose with the AtomicOrdering
enum. Using an enum class is one step towards tightening things. I later
also want to tighten related enums, such as clang's
AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI'
enum).
This change touches a few lines of code which can be improved later, I'd
like to keep it as NFC for now as it's already quite complex. I have
related changes for clang.
As a follow-up I'll add:
bool operator<(AtomicOrdering, AtomicOrdering) = delete;
bool operator>(AtomicOrdering, AtomicOrdering) = delete;
bool operator<=(AtomicOrdering, AtomicOrdering) = delete;
bool operator>=(AtomicOrdering, AtomicOrdering) = delete;
This is separate so that clang and LLVM changes don't need to be in sync.
Reviewers: jyknight, reames
Subscribers: jyknight, llvm-commits
Differential Revision: http://reviews.llvm.org/D18775
llvm-svn: 265602
While preserving the return value for @llvm.experimental.deoptimize at
the IR level is useful during mid-level optimization, doing so at the
machine instruction level requires generating some extra code and a
return that is non-ideal. This change has LLVM lower
```
%val = call @llvm.experimental.deoptimize
ret %val
```
to effectively
```
call @__llvm_deoptimize()
unreachable
```
instead.
llvm-svn: 265502
At IR level, the swifterror argument is an input argument with type
ErrorObject**. For targets that support swifterror, we want to optimize it
to behave as an inout value with type ErrorObject*; it will be passed in a
fixed physical register.
The main idea is to track the virtual registers for each swifterror value. We
define swifterror values as AllocaInsts with swifterror attribute or a function
argument with swifterror attribute.
In SelectionDAGISel.cpp, we set up swifterror values (SwiftErrorVals) before
handling the basic blocks.
When iterating over all basic blocks in RPO, before actually visiting the basic
block, we call mergeIncomingSwiftErrors to merge incoming swifterror values when
there are multiple predecessors or to simply propagate them. There, we create a
virtual register for each swifterror value in the entry block. For predecessors
that are not yet visited, we create virtual registers to hold the swifterror
values at the end of the predecessor. The assignments are saved in
SwiftErrorWorklist and will be materialized at the end of visiting the basic
block.
When visiting a load from a swifterror value, we copy from the current virtual
register assignment. When visiting a store to a swifterror value, we create a
virtual register to hold the swifterror value and update SwiftErrorMap to
track the current virtual register assignment.
Differential Revision: http://reviews.llvm.org/D18108
llvm-svn: 265433
A ``swifterror`` attribute can be applied to a function parameter or an
AllocaInst.
This commit does not include any target-specific change. The target-specific
optimization will come as a follow-up patch.
Differential Revision: http://reviews.llvm.org/D18092
llvm-svn: 265189
Re-enable an assertion enabled by Justin Lebar in rL265092. rL265092
was breaking test/CodeGen/X86/deopt-intrinsic.ll because webkit_jscc
does not like non-i64 return types. Change the test case to not do
that.
llvm-svn: 265099
Add function soft attribute to the generation of Jump Tables in CodeGen
as initial step towards clang support of gcc's no-jump-table support
Reviewers: hans, echristo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18321
llvm-svn: 264756
Minimum density for both optsize and non optsize are now options
-sparse-jump-table-density (default 10) for non optsize functions
-dense-jump-table-density (default 40) for optsize functions, which
matches the current default. This improves several benchmarks at google
at the cost of a small codesize increase. For code compiled with -Os,
the old behavior continues
llvm-svn: 264689
Summary:
Only adds support for "naked" calls to llvm.experimental.deoptimize.
Support for round-tripping through RewriteStatepointsForGC will come
as a separate patch (should be simpler than this one).
Reviewers: reames
Subscribers: sanjoy, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18429
llvm-svn: 264329
Summary:
After this change, deopt operand bundles can be lowered directly by
SelectionDAG into STATEPOINT instructions (which are then lowered to a
call or sequence of nop, with an associated __llvm_stackmaps entry0.
This obviates the need to round-trip deoptimization state through
gc.statepoint via RewriteStatepointsForGC.
Reviewers: reames, atrick, majnemer, JosephTremoulet, pgavlin
Subscribers: sanjoy, mcrosier, majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D18257
llvm-svn: 264015
SelectionDAGBuilder::populateCallLoweringInfo is now used instead of
SelectionDAGBuilder::lowerCallOperands. The populateCallLoweringInfo
interface is more composable in face of design changes like
http://reviews.llvm.org/D18106
llvm-svn: 263663
Summary:
The code in SelectionDAG did not handle the case where the
register type and output types were different, but had the same size.
Reviewers: arsenm, echristo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17940
llvm-svn: 263022
This reverts commit r262316.
It seems that my change breaks an out-of-tree chromium buildbot, so
I'm reverting this in order to investigate the situation further.
llvm-svn: 262387
Summary:
Calls sometimes need to be convergent. This is already handled at the
LLVM IR level, but it also needs to be handled at the MI level.
Ideally we'd propagate convergence from instructions, down through the
selection DAG, and into MIs. But this is Hard, and would affect
optimizations in the SDNs -- right now only SDNs with two operands have
any flags at all.
Instead, here's a much simpler hack: Add new opcodes for NVPTX for
convergent calls, and generate these when lowering convergent LLVM
calls.
Reviewers: jholewinski
Subscribers: jholewinski, chandlerc, joker.eph, jhen, tra, llvm-commits
Differential Revision: http://reviews.llvm.org/D17423
llvm-svn: 262373
Summary:
This patch modifies the existing comparison, branch, conditional-move
and select patterns, and adds new ones where needed. Also, the updated
SLT{u,i,iu} set of instructions generate a GPR width result.
The majority of the code changes in the Mips back-end fix the wrong
assumption that the result of SETCC nodes always produce an i32 value.
The changes in the common code path account for the fact that in 64-bit
MIPS targets, i1 is promoted to i32 instead of i64.
Reviewers: dsanders
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D10970
llvm-svn: 262316
(This is the second attemp to commit this patch, after fixing pr26652 & pr26653).
This patch detects vector reductions before instruction selection. Vector
reductions are vectorized reduction operations, and for such operations we have
freedom to reorganize the elements of the result as long as the reduction of them
stay unchanged. This will enable some reduction pattern recognition during
instruction combine such as SAD/dot-product on X86. A flag is added to
SDNodeFlags to mark those vector reduction nodes to be checked during instruction
combine.
To detect those vector reductions, we search def-use chains starting from the
given instruction, and check if all uses fall into two categories:
1. Reduction with another vector.
2. Reduction on all elements.
in which 2 is detected by recognizing the pattern that the loop vectorizer
generates to reduce all elements in the vector outside of the loop, which
includes several ShuffleVector and one ExtractElement instructions.
Differential revision: http://reviews.llvm.org/D15250
llvm-svn: 261804
This is a part of the refactoring to unify isSafeToLoadUnconditionally and isDereferenceablePointer functions. In subsequent change I'm going to eliminate isDerferenceableAndAlignedPointer from Loads API, leaving isSafeToLoadSpecualtively the only function to check is load instruction can be speculated.
Reviewed By: hfinkel
Differential Revision: http://reviews.llvm.org/D16180
llvm-svn: 261736
This patch detects vector reductions before instruction selection. Vector
reductions are vectorized reduction operations, and for such operations we have
freedom to reorganize the elements of the result as long as the reduction of them
stay unchanged. This will enable some reduction pattern recognition during
instruction combine such as SAD/dot-product on X86. A flag is added to
SDNodeFlags to mark those vector reduction nodes to be checked during instruction
combine.
To detect those vector reductions, we search def-use chains starting from the
given instruction, and check if all uses fall into two categories:
1. Reduction with another vector.
2. Reduction on all elements.
in which 2 is detected by recognizing the pattern that the loop vectorizer
generates to reduce all elements in the vector outside of the loop, which
includes several ShuffleVector and one ExtractElement instructions.
Differential revision: http://reviews.llvm.org/D15250
llvm-svn: 261070
This matches GCC and MSVC's behaviour, and saves on code size.
We were already not extending i1 return values on x86_64 after r127766. This
takes that patch further by applying it to x86 target as well, and also for i8
and i16.
The ABI docs have been unclear about the required behaviour here. The new i386
psABI [1] clearly states (Table 2.4, page 14) that i1, i8, and i16 return
vales do not need to be extended beyond 8 bits. The x86_64 ABI doc is being
updated to say the same [2].
Differential Revision: http://reviews.llvm.org/D16907
[1]. https://01.org/sites/default/files/file_attach/intel386-psabi-1.0.pdf
[2]. https://groups.google.com/d/msg/x86-64-abi/E8O33onbnGQ/_RFWw_ixDQAJ
llvm-svn: 260133
If a range has a lower bound of 0, add an AssertZext from the
nearest floor power of two.
This allows operations with some workitem intrinsics with known
maximum ranges to use fast 24-bit multiplies.
llvm-svn: 260109
Summary:
GEPOperator: provide getResultElementType alongside getSourceElementType.
This is made possible by adding a result element type field to GetElementPtrConstantExpr, which GetElementPtrInst already has.
GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType.
Reviewers: mjacob, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16275
llvm-svn: 258145
Summary:
Rename to getCatchSwitchParentPad, to make it more clear which ancestor
the "parent" in question is. Add a comment pointing out the key feature
that the returned pad indicates which funclet contains the successor
block.
Reviewers: rnk, andrew.w.kaylor, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16222
llvm-svn: 257933
The functionality that calculateCatchReturnSuccessorColors provides was
once non-trivial: it was a computation layered on top of funclet
coloring.
These days, LLVM IR directly encodes what
calculateCatchReturnSuccessorColors computed, obsoleting the need for
it.
No functionality change is intended.
llvm-svn: 256965
In an inbounds getelementptr, when an index produces a constant non-negative
offset to add to the base, the add can be assumed to not have unsigned overflow.
This relies on the assumption that addresses can't occupy more than half the
address space, which isn't possible in C because it wouldn't be possible to
represent the difference between the start of the object and one-past-the-end
in a ptrdiff_t.
Setting the NoUnsignedWrap flag is theoretically useful in general, and is
specifically useful to the WebAssembly backend, since it permits stronger
constant offset folding.
Differential Revision: http://reviews.llvm.org/D15544
llvm-svn: 256890
Summary:
This commit renames GCRelocateOperands to GCRelocateInst and makes it an
intrinsic wrapper, similar to e.g. MemCpyInst. Also, all users of
GCRelocateOperands were changed to use the new intrinsic wrapper instead.
Reviewers: sanjoy, reames
Subscribers: reames, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D15762
llvm-svn: 256811
This adds support for the MCU psABI in a way different from r251223 and r251224,
basically reverting most of these two patches. The problem with the approach
taken in r251223/4 is that it only handled libcalls that originated from the backend.
However, the mid-end also inserts quite a few libcalls and assumes these use the
platform's default calling convention.
The previous patch tried to insert inregs when necessary both in the FE and,
somewhat hackily, in the CG. Instead, we now define a new default calling convention
for the MCU, which doesn't use inreg marking at all, similarly to what x86-64 does.
Differential Revision: http://reviews.llvm.org/D15054
llvm-svn: 256494
Summary:
These were deprecated 11 months ago when a generic
llvm.experimental.gc.result intrinsic, which works for all types, was added.
Reviewers: sanjoy, reames
Subscribers: sanjoy, chenli, llvm-commits
Differential Revision: http://reviews.llvm.org/D15719
llvm-svn: 256262
Summary:
First up is instcombine, where in the dbg.declare -> dbg.value conversion,
the llvm.dbg.value needs to be called on the actual loaded value, rather
than the address (since the whole point of this transformation is to be
able to get rid of the alloca). Further, now that that's cleaned up, we
can remove a hack in the backend, that would add an implicit OP_deref if
the argument to dbg.value was an alloca. This stems from before the
existence of DIExpression and is no longer necessary since the deref can
be expressed explicitly.
Now, in order to make sure that the tests pass with this change, we need to
correct the printing of DEBUG_VALUE comments to take into account the
expression, which wasn't taken into account before.
Unfortunately, for both these changes, there were a number of incorrect
test cases (mostly the wrong number of DW_OP_derefs, but also a couple
where the test itself was broken more badly). aprantl and I have gone
through and adjusted these test case in order to make them pass with
these fixes and in some cases to make sure they're actually testing
what they are meant to test.
Reviewers: aprantl
Subscribers: dsanders
Differential Revision: http://reviews.llvm.org/D14186
llvm-svn: 256077
Summary: This patch adds a check in visitLandingPad to see if landingpad's result type is token type. If so, do not create DAG nodes for its exception pointer and selector value. This patch enables the back end to handle landingpads of token type.
Reviewers: JosephTremoulet, majnemer, rnk
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D15405
llvm-svn: 255749
Part 1 was submitted in http://reviews.llvm.org/D15134.
Changes in this part:
* X86RegisterInfo.td, X86RecognizableInstr.cpp: Add FR128 register class.
* X86CallingConv.td: Pass f128 values in XMM registers or on stack.
* X86InstrCompiler.td, X86InstrInfo.td, X86InstrSSE.td:
Add instruction selection patterns for f128.
* X86ISelLowering.cpp:
When target has MMX registers, configure MVT::f128 in FR128RegClass,
with TypeSoftenFloat action, and custom actions for some opcodes.
Add missed cases of MVT::f128 in places that handle f32, f64, or vector types.
Add TODO comment to support f128 type in inline assembly code.
* SelectionDAGBuilder.cpp:
Fix infinite loop when f128 type can have
VT == TLI.getTypeToTransformTo(Ctx, VT).
* Add unit tests for x86-64 fp128 type.
Differential Revision: http://reviews.llvm.org/D11438
llvm-svn: 255558
It turns out that terminatepad gives little benefit over a cleanuppad
which calls the termination function. This is not sufficient to
implement fully generic filters but MSVC doesn't support them which
makes terminatepad a little over-designed.
Depends on D15478.
Differential Revision: http://reviews.llvm.org/D15479
llvm-svn: 255522
This patch adds some missing calls to MBB::normalizeSuccProbs() in several
locations where it should be called. Those places are found by checking if the
sum of successors' probabilities is approximate one in MachineBlockPlacement
pass with some instrumented code (not in this patch).
Differential revision: http://reviews.llvm.org/D15259
llvm-svn: 255455
Summary:
Previously SelectionDAGBuilder asserted that the pointer operands of
memcpy / memset / memmove intrinsics are in address space < 256. This assert
implicitly assumed the X86 backend, where all address spaces < 256 are
equivalent to address space 0 from the code generator's point of view. On some
targets (R600 and NVPTX) several address spaces < 256 have a target-defined
meaning, so this assert made little sense for these targets.
This patch removes this wrong assertion and adds extra checks before lowering
these intrinsics to library calls. If a pointer operand can't be casted to
address space 0 without changing semantics, a fatal error is reported to the
user.
The new behavior should be valid for all targets that give address spaces != 0
a target-specified meaning (NVPTX, R600, X86). NVPTX lowers big or
variable-sized memory intrinsics before SelectionDAG construction. All other
memory intrinsics are inlined (the threshold is set very high for this target).
R600 doesn't support memcpy / memset / memmove library calls (previously the
illegal emission of a call to such library function triggered an error
somewhere in the code generator). X86 now emits inline loads and stores for
address spaces 256 and 257 up to the same threshold that is used for address
space 0 and reports a fatal error otherwise.
I call this a "partial fix" because there are still cases that can't be
lowered. A fatal error is reported in these cases.
Reviewers: arsenm, theraven, compnerd, hfinkel
Subscribers: hfinkel, llvm-commits, alex
Differential Revision: http://reviews.llvm.org/D7241
llvm-svn: 255441
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
but they are difficult to explain to others, even to seasoned LLVM
experts.
- catchendpad and cleanupendpad are optimization barriers. They cannot
be split and force all potentially throwing call-sites to be invokes.
This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
It is unsplittable, starts a funclet, and has control flow to other
funclets.
- The nesting relationship between funclets is currently a property of
control flow edges. Because of this, we are forced to carefully
analyze the flow graph to see if there might potentially exist illegal
nesting among funclets. While we have logic to clone funclets when
they are illegally nested, it would be nicer if we had a
representation which forbade them upfront.
Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
flow, just a bunch of simple operands; catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad. Their presence can be inferred
implicitly using coloring information.
N.B. The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for. An expert should take a
look to make sure the results are reasonable.
Reviewers: rnk, JosephTremoulet, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D15139
llvm-svn: 255422
After much discussion, ending here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html
it has been decided that, instead of having the vectorizer directly generate
special absdiff and horizontal-add intrinsics, we'll recognize the relevant
reduction patterns during CodeGen. Accordingly, these intrinsics are not needed
(the operations they represent can be pattern matched, as is already done in
some backends). Thus, we're backing these out in favor of the current
development work.
r248483 - Codegen: Fix llvm.*absdiff semantic.
r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA
r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA
r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation
llvm-svn: 255387
Cost calculation for vector GEP failed with due to invalid cast to GEP index operand.
The bug is fixed, added a test.
http://reviews.llvm.org/D14976
llvm-svn: 254408
The @llvm.get.dynamic.area.offset.* intrinsic family is used to get the offset
from native stack pointer to the address of the most recent dynamic alloca on
the caller's stack. These intrinsics are intendend for use in combination with
@llvm.stacksave and @llvm.restore to get a pointer to the most recent dynamic
alloca. This is useful, for example, for AddressSanitizer's stack unpoisoning
routines.
Patch by Max Ostapenko.
Differential Revision: http://reviews.llvm.org/D14983
llvm-svn: 254404
SDAG currently can emit debug location for function parameters when
an llvm.dbg.declare points to either a function argument SSA temp,
or to an AllocaInst. This change extends this logic by adding a
fallback case when neither of the above is true.
This is required for SafeStack, which may copy the contents of a
byval function argument into something that is not an alloca, and
then describe the target as the new location of the said argument.
llvm-svn: 254352
The patch in http://reviews.llvm.org/D13745 is broken into four parts:
1. New interfaces without functional changes.
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights.
3. Use new interfaces in all other passes.
4. Remove old interfaces.
This the second patch above. In this patch SelectionDAG starts to use
probability-based interfaces in MBB to add successors but other MC passes are
still using weight-based interfaces. Therefore, we need to maintain correct
weight list in MBB even when probability-based interfaces are used. This is
done by updating weight list in probability-based interfaces by treating the
numerator of probabilities as weights. This change affects many test cases
that check successor weight values. I will update those test cases once this
patch looks good to you.
Differential revision: http://reviews.llvm.org/D14361
llvm-svn: 253965
Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
These intrinsics currently have an explicit alignment argument which is
required to be a constant integer. It represents the alignment of the
source and dest, and so must be the minimum of those.
This change allows source and dest to each have their own alignments
by using the alignment attribute on their arguments. The alignment
argument itself is removed.
There are a few places in the code for which the code needs to be
checked by an expert as to whether using only src/dest alignment is
safe. For those places, they currently take the minimum of src/dest
alignments which matches the current behaviour.
For example, code which used to read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false)
will now read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false)
For out of tree owners, I was able to strip alignment from calls using sed by replacing:
(call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\)
with:
$1i1 false)
and similarly for memmove and memcpy.
I then added back in alignment to test cases which needed it.
A similar commit will be made to clang which actually has many differences in alignment as now
IRBuilder can generate different source/dest alignments on calls.
In IRBuilder itself, a new argument was added. Instead of calling:
CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false)
you now call
CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false)
There is a temporary class (IntegerAlignment) which takes the source alignment and rejects
implicit conversion from bool. This is to prevent isVolatile here from passing its default
parameter to the source alignment.
Note, changes in future can now be made to codegen. I didn't change anything here, but this
change should enable better memcpy code sequences.
Reviewed by Hal Finkel.
llvm-svn: 253511
This change introduces an instrumentation intrinsic instruction for
value profiling purposes, the lowering of the instrumentation intrinsic
and raw reader updates. The raw profile data files for llvm-profdata
testing are updated.
llvm-svn: 253484
The virtual register containing the address for returned value on
stack should in the DAG be represented with a CopyFromReg node and not
a Register node. Otherwise, InstrEmitter will not make sure that it
ends up in the right register class for the target instruction.
SystemZ needs this, becuause the reg class for address registers is a
subset of the general 64 bit register class.
test/SystemZ/CodeGen/args-07.ll and args-04.ll updated to run with
-verify-machineinstrs.
Reviewed by Hal Finkel.
llvm-svn: 253461
Summary:
Now that there is a one-to-one mapping from MachineFunction to
WinEHFuncInfo, we don't need to use a DenseMap to select the right
WinEHFuncInfo for the current funclet.
The main challenge here is that X86WinEHStatePass is an IR pass that
doesn't have access to the MachineFunction. I gave it its own
WinEHFuncInfo object that it uses to calculate state numbers, which it
then throws away. As long as nobody creates or removes EH pads between
this pass and SDAG construction, we will get the same state numbers.
The other thing X86WinEHStatePass does is to mark the EH registration
node. Instead of communicating which alloca was the registration through
WinEHFuncInfo, I added the llvm.x86.seh.ehregnode intrinsic. This
intrinsic generates no code and simply marks the alloca in use.
Reviewers: JCTremoulet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14668
llvm-svn: 253378
Several backends have instructions to reverse the order of bits in an integer. Conceptually matching such patterns is similar to @llvm.bswap, and it was mentioned in http://reviews.llvm.org/D14234 that it would be best if these patterns were matched in InstCombine instead of reimplemented in every different target.
This patch introduces an intrinsic @llvm.bitreverse.i* that operates similarly to @llvm.bswap. For plumbing purposes there is also a new ISD node ISD::BITREVERSE, with simple expansion and promotion support.
The intention is that InstCombine's BSWAP detection logic will be extended to support BITREVERSE too, and @llvm.bitreverse intrinsics emitted (if the backend supports lowering it efficiently).
llvm-svn: 252878
We don't currently have any runtime library functions for operations on
f16 values (other than conversions to and from f32 and f64), so we
should always promote it to f32, even if that is not a legal type. In
that case, the f32 values would be softened to f32 library calls.
SoftenFloatRes_FP_EXTEND now needs to check the promoted operand's type,
as it may ne a no-op or require a different library call.
getCopyFromParts and getCopyToParts now need to cope with a
floating-point value stored in a larger integer part, as is the case for
any target that needs to store an f16 value in a 32-bit integer
register.
Differential Revision: http://reviews.llvm.org/D12856
llvm-svn: 252459
Summary:
The CLR's personality routine passes these in rdx/edx, not rax/eax.
Make getExceptionPointerRegister a virtual method parameterized by
personality function to allow making this distinction.
Similarly make getExceptionSelectorRegister a virtual method parameterized
by personality function, for symmetry.
Reviewers: pgavlin, majnemer, rnk
Subscribers: jyknight, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D14344
llvm-svn: 252383
There is no point in having invoke safepoints handled differently than the
call safepoints. All relevant decisions could be made by looking at whether
or not gc.result and gc.relocate lay in a same basic block. This change will
allow to lower call safepoints with relocates and results in a different
basic blocks. See test case for example.
Differential Revision: http://reviews.llvm.org/D14158
llvm-svn: 252028
When optimization is disabled, edge weights that are stored in MBB won't be used so that we don't have to store them. Currently, this is done by adding successors with default weight 0, and if all successors have default weights, the weight list will be empty. But that the weight list is empty doesn't mean disabled optimization (as is stated several times in MachineBasicBlock.cpp): it may also mean all successors just have default weights.
We should discourage using default weights when adding successors, because it is very easy for users to forget update the correct edge weights instead of using default ones (one exception is that the MBB only has one successor). In order to detect such usages, it is better to differentiate using default weights from the case when optimizations is disabled.
In this patch, a new interface addSuccessorWithoutWeight(MBB*) is created for when optimization is disabled. In this case, MBB will try to maintain an empty weight list, but it cannot guarantee this as for many uses of addSuccessor() whether optimization is disabled or not is not checked. But it can guarantee that if optimization is enabled, then the weight list always has the same size of the successor list.
Differential revision: http://reviews.llvm.org/D13963
llvm-svn: 251429
Summary:
Some shared code for handling eh.exceptionpointer and eh.exceptioncode
needs to not share the part that truncates to 32 bits, which is intended
just for exception codes.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13747
llvm-svn: 250588
When lowering invoke statement, all unwind destinations are directly added as successors of call site block, and the weight of those new edges are not assigned properly. Actually, default weight 16 are used for those edges. This patch calculates the proper edge weights for those edges when collecting all unwind destinations.
Differential revision: http://reviews.llvm.org/D13354
llvm-svn: 250119
On targets where f32 is not legal, we have to look through a BITCAST SDNode to
find the register that an argument is stored in when emitting debug info, or we
will not be able to emit a DW_AT_location for it.
Differential Revision: http://reviews.llvm.org/D13005
llvm-svn: 250056
The new implementation works at least as well as the old implementation
did.
Also delete the associated preparation tests. They don't exercise
interesting corner cases of the new implementation. All the codegen
tests of the EH tables have already been ported.
llvm-svn: 249918
Summary:
Set the pad MBB as a funclet entry for CoreCLR as well as MSVCCXX, and
update state numbering to put the catchpad block rather than its normal
successor into the unwind map.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13492
llvm-svn: 249569
Our current emission strategy is to emit the funclet prologue in the
CatchPad's normal destination. This is problematic because
intra-funclet control flow to the normal destination is not erroneous
and results in us reevaluating the prologue if said control flow is
taken.
Instead, use the CatchPad's location for the funclet prologue. This
correctly models our desire to have unwind edges evaluate the prologue
but edges to the normal destination result in typical control flow.
Differential Revision: http://reviews.llvm.org/D13424
llvm-svn: 249483
Summary:
- Add CoreCLR to if/else ladders and switches as appropriate.
- Rename isMSVCEHPersonality to isFuncletEHPersonality to better
reflect what it captures.
Reviewers: majnemer, andrew.w.kaylor, rnk
Subscribers: pgavlin, AndyAyers, llvm-commits
Differential Revision: http://reviews.llvm.org/D13449
llvm-svn: 249455
Catchret transfers control from a catch funclet to an earlier funclet.
However, it is not completely clear which funclet the catchret target is
part of. Make this clear by stapling the catchret target's funclet
membership onto the CATCHRET SDAG node.
llvm-svn: 249052
The Win64 unwinder disassembles forwards from each PC to try to
determine if this PC is in an epilogue. If so, it skips calling the EH
personality function for that frame. Typically, this means you cannot
catch an exception in the same frame that you threw it, because 'throw'
calls a noreturn runtime function.
Previously we avoided this problem with the TrapUnreachable
TargetOption, but that's a much bigger hammer than we need. All we need
is a 1 byte non-epilogue instruction right after the call. Instead,
what we got was an unconditional branch to a shared block containing the
ud2, potentially 7 bytes instead of 1. So, this reverts r206684, which
added TrapUnreachable, and replaces it with something better.
The new code pattern matches for invoke/call followed by unreachable and
inserts an int3 into the DAG. To be 100% watertight, we would need to
insert SEH_Epilogue instructions into all basic blocks ending in a call
with no terminators or successors, but in practice this is unlikely to
come up.
llvm-svn: 248959
Summary:
Funclets have been turned into functions by the time they hit the object
file. Make sure that they have decent names for the symbol table and
CFI directives explaining how to reason about their prologues.
Differential Revision: http://reviews.llvm.org/D13261
llvm-svn: 248824
Fixed the issue that when there is an edge from the jump table to the default statement, we should check it directly instead of checking if the sibling of the jump table header is a successor of the jump table header, which may not be the default statment but a successor of it.
llvm-svn: 248354
After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing,
so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests:
if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is
one test case in this patch to prove that point.
This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I
did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF
( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes.
This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as
FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the
current global settings.
Differential Revision: http://reviews.llvm.org/D12095
llvm-svn: 247815
All of the complexity is in cleanupret, and it mostly follows the same
codepaths as catchret, except it doesn't take a return value in RAX.
This small example now compiles and executes successfully on win32:
extern "C" int printf(const char *, ...) noexcept;
struct Dtor {
~Dtor() { printf("~Dtor\n"); }
};
void has_cleanup() {
Dtor o;
throw 42;
}
int main() {
try {
has_cleanup();
} catch (int) {
printf("caught it\n");
}
}
Don't try to put the cleanup in the same function as the catch, or Bad
Things will happen.
llvm-svn: 247219
The 32-bit tables don't actually contain PC range data, so emitting them
is incredibly simple.
The 64-bit tables, on the other hand, use the same table for state
numbering as well as label ranges. This makes things more difficult, so
it will be implemented later.
llvm-svn: 247192
Typically these are catchpads, which hold data used to decide whether to
catch the exception or continue unwinding. We also shouldn't create MBBs
for catchendpads, cleanupendpads, or terminatepads, since no real code
can live in them.
This fixes a problem where MI passes (like the register allocator) would
try to put code into catchpad blocks, which are not executed by the
runtime. In the new world, blocks ending in invokes now have many
possible successors.
llvm-svn: 247102
Summary:
Add a `cleanupendpad` instruction, used to mark exceptional exits out of
cleanups (for languages/targets that can abort a cleanup with another
exception). The `cleanupendpad` instruction is similar to the `catchendpad`
instruction in that it is an EH pad which is the target of unwind edges in
the handler and which itself has an unwind edge to the next EH action.
The `cleanupendpad` instruction, similar to `cleanupret` has a `cleanuppad`
argument indicating which cleanup it exits. The unwind successors of a
`cleanuppad`'s `cleanupendpad`s must agree with each other and with its
`cleanupret`s.
Update WinEHPrepare (and docs/tests) to accomodate `cleanupendpad`.
Reviewers: rnk, andrew.w.kaylor, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12433
llvm-svn: 246751
This patch uses the metadata defined in D12341 to avoid creating an unpredictable branch.
Differential Revision: http://reviews.llvm.org/D12343
llvm-svn: 246691
Vector 'getelementptr' with scalar base is an opportunity for gather/scatter intrinsic to generate a better sequence.
While looking for uniform base, we want to use the scalar base pointer of GEP, if exists.
Differential Revision: http://reviews.llvm.org/D11121
llvm-svn: 246622
Currently, when edge weights are assigned to edges that are created when lowering switch statement, the weight on the edge to default statement (let's call it "default weight" here) is not considered. We need to distribute this weight properly. However, without value profiling, we have no idea how to distribute it. In this patch, I applied the heuristic that this weight is evenly distributed to successors.
For example, given a switch statement with cases 1,2,3,5,10,11,20, and every edge from switch to each successor has weight 10. If there is a binary search tree built to test if n < 10, then its two out-edges will have weight 4x10+10/2 = 45 and 3x10 + 10/2 = 35 respectively (currently they are 40 and 30 without considering the default weight). Each distribution (which is 5 here) will be stored in each SwitchWorkListItem for further distribution.
There are some exceptions:
For a jump table header which doesn't have any edge to default statement, we don't distribute the default weight to it.
For a bit test header which covers a contiguous range and hence has no edges to default statement, we don't distribute the default weight to it.
When the branch checks a single value or a contiguous range with no edge to default statement, we don't distribute the default weight to it.
In other cases, the default weight is evenly distributed to successors.
Differential Revision: http://reviews.llvm.org/D12418
llvm-svn: 246522
This reverts commit r246371, as it cause a rather obscure bug in AArch64
test-suite paq8p (time outs, seg-faults). I'll investigate it before
reapplying.
llvm-svn: 246379
Value *getSplatValue(Value *Val);
It complements the CreateVectorSplat(), which creates 2 instructions - insertelement and shuffle with all-zero mask.
The new function recognizes the pattern - insertelement+shuffle and returns the splat value (or nullptr).
It also returns a splat value form ConstantDataVector, for completeness.
Differential Revision: http://reviews.llvm.org/D11124
llvm-svn: 246371
We can now run 32-bit programs with empty catch bodies. The next step
is to change PEI so that we get funclet prologues and epilogues.
llvm-svn: 246235
This is a one-line-change patch that moves the update to UnhandledWeights to the correct position: it should be updated for all clusters instead of just range clusters.
Differential Revision: http://reviews.llvm.org/D12391
llvm-svn: 246129
Currently, when lowering switch statement and a new basic block is built for jump table / bit test header, the edge to this new block is not assigned with a correct weight. This patch collects the edge weight from all its successors and assign this sum of weights to the edge (and also the other fall-through edge). Test cases are adjusted accordingly.
Differential Revision: http://reviews.llvm.org/D12166#fae6eca7
llvm-svn: 246104
When lowering switch statement, if bit tests are used then LLVM will always generates a jump to the default statement in the last bit test. However, this is not necessary when all cases in bit tests cover a contiguous range. This is because when generating the bit tests header MBB, there is a range check that guarantees cases in bit tests won't go outside of [low, high], where low and high are minimum and maximum case values in the bit tests. This patch checks if this is the case and then doesn't emit jump to default statement and hence saves a bit test and a branch.
Differential Revision: http://reviews.llvm.org/D12249
llvm-svn: 245976
These only get generated if the target supports them. If one of the variants is not legal and the other is, and it is safe to do so, the other variant will be emitted.
For example on AArch32 (V8), we have scalar fminnm but not fmin.
Fix up a couple of tests while we're here - one now produces better code, and the other was just plain wrong to start with.
llvm-svn: 245196
This commit removes the global manager variable which is responsible for
storing and allocating pseudo source values and instead it introduces a new
manager class named 'PseudoSourceValueManager'. Machine functions now own an
instance of the pseudo source value manager class.
This commit also modifies the 'get...' methods in the 'MachinePointerInfo'
class to construct pseudo source values using the instance of the pseudo
source value manager object from the machine function.
This commit updates calls to the 'get...' methods from the 'MachinePointerInfo'
class in a lot of different files because those calls now need to pass in a
reference to a machine function to those methods.
This change will make it easier to serialize pseudo source values as it will
enable me to transform the mips specific MipsCallEntry PseudoSourceValue
subclass into two target independent subclasses.
Reviewers: Akira Hatanaka
llvm-svn: 244693
The select pattern recognition in ValueTracking (as used by InstCombine
and SelectionDAGBuilder) only knew about integer patterns. This teaches
it about minimum and maximum operations.
matchSelectPattern() has been extended to return a struct containing the
existing Flavor and a new enum defining the pattern's behavior when
given one NaN operand.
C minnum() is defined to return the non-NaN operand in this case, but
the idiomatic C "a < b ? a : b" would return the NaN operand.
ARM and AArch64 at least have different instructions for these different cases.
llvm-svn: 244580
around a DataLayout interface in favor of directly querying DataLayout.
This wrapper specifically helped handle the case where this no
DataLayout, but LLVM now requires it simplifynig all of this. I've
updated callers to directly query DataLayout. This in turn exposed
a bunch of places where we should have DataLayout readily available but
don't which I've fixed. This then in turn exposed that we were passing
DataLayout around in a bunch of arguments rather than making it readily
available so I've also fixed that.
No functionality changed.
llvm-svn: 244189
We can't propagate FMF partially without breaking DAG-level CSE. We either need to
relax CSE to account for mismatched FMF as a temporary work-around or fully propagate
FMF throughout the DAG.
Surprisingly, there are no existing regression tests for this, but here's an example:
define float @fmf(float %a, float %b) {
%mul1 = fmul fast float %a, %b
%nega = fsub fast float 0.0, %a
%mul2 = fmul fast float %nega, %b
%abx2 = fsub fast float %mul1, %mul2
ret float %abx2
}
$ llc -o - badflags.ll -march=x86-64 -mattr=fma -enable-unsafe-fp-math -enable-fmf-dag=0
...
vmulss %xmm1, %xmm0, %xmm0
vaddss %xmm0, %xmm0, %xmm0
retq
$ llc -o - badflags.ll -march=x86-64 -mattr=fma -enable-unsafe-fp-math -enable-fmf-dag=1
...
vmulss %xmm1, %xmm0, %xmm2
vfmadd213ss %xmm2, %xmm1, %xmm0 <--- failed to recognize that (a * b) was already calculated
retq
llvm-svn: 244053
Create wrapper methods in the Function class for the OptimizeForSize and MinSize
attributes. We want to hide the logic of "or'ing" them together when optimizing
just for size (-Os).
Currently, we are not consistent about this and rely on a front-end to always set
OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here
that should be added as follow-on patches with regression tests.
This patch is NFC-intended: it just replaces existing direct accesses of the attributes
by the equivalent wrapper call.
Differential Revision: http://reviews.llvm.org/D11734
llvm-svn: 243994
Remove the fake `DW_TAG_auto_variable` and `DW_TAG_arg_variable` tags,
using `DW_TAG_variable` in their place Stop exposing the `tag:` field at
all in the assembly format for `DILocalVariable`.
Most of the testcase updates were generated by the following sed script:
find test/ -name "*.ll" -o -name "*.mir" |
xargs grep -l 'DILocalVariable' |
xargs sed -i '' \
-e 's/tag: DW_TAG_arg_variable, //' \
-e 's/tag: DW_TAG_auto_variable, //'
There were only a handful of tests in `test/Assembly` that I needed to
update by hand.
(Note: a follow-up could change `DILocalVariable::DILocalVariable()` to
set the tag to `DW_TAG_formal_parameter` instead of `DW_TAG_variable`
(as appropriate), instead of having that logic magically in the backend
in `DbgVariable`. I've added a FIXME to that effect.)
llvm-svn: 243774
This introduces new instructions neccessary to implement MSVC-compatible
exception handling support. Most of the middle-end and none of the
back-end haven't been audited or updated to take them into account.
Differential Revision: http://reviews.llvm.org/D11097
llvm-svn: 243766