For some targets, it is useful to be able to look at the original
type of an argument without having to dig through the original IR.
This also fixes a bug in SelectionDAGBuilder where InputArg.PartOffset
was not taking into account the offset of structure elements.
Patch by: Justin Holewinski
Tom Stellard:
- Changed the type of ArgVT to EVT, so it can store non-simple types
like v3i32.
llvm-svn: 193214
The work on this project was left in an unfinished and inconsistent state.
Hopefully someone will eventually get a chance to implement this feature, but
in the meantime, it is better to put things back the way the were. I have
left support in the bitcode reader to handle the case-range bitcode format,
so that we do not lose bitcode compatibility with the llvm 3.3 release.
This reverts the following commits: 155464, 156374, 156377, 156613, 156704,
156757, 156804 156808, 156985, 157046, 157112, 157183, 157315, 157384, 157575,
157576, 157586, 157612, 157810, 157814, 157815, 157880, 157881, 157882, 157884,
157887, 157901, 158979, 157987, 157989, 158986, 158997, 159076, 159101, 159100,
159200, 159201, 159207, 159527, 159532, 159540, 159583, 159618, 159658, 159659,
159660, 159661, 159703, 159704, 160076, 167356, 172025, 186736
llvm-svn: 190328
If we have a binary operation like ISD:ADD, we can set the result type
equal to the result type of one of its operands rather than using
TargetLowering::getPointerTy().
Also, any use of DAG.getIntPtrConstant(C) as an operand for a binary
operation can be replaced with:
DAG.getConstant(C, OtherOperand.getValueType());
llvm-svn: 189227
This adds minimal support to the SelectionDAG for handling address spaces
with different pointer sizes. The SelectionDAG should now correctly
lower pointer function arguments to the correct size as well as generate
the correct code when lowering getelementptr.
This patch also updates the R600 DataLayout to use 32-bit pointers for
the local address space.
v2:
- Add more helper functions to TargetLoweringBase
- Use CHECK-LABEL for tests
llvm-svn: 189221
SystemZTargetLowering::emitStringWrapper() previously loaded the character
into R0 before the loop and made R0 live on entry. I'd forgotten that
allocatable registers weren't allowed to be live across blocks at this stage,
and it confused LiveVariables enough to cause a miscompilation of f3 in
memchr-02.ll.
This patch instead loads R0 in the loop and leaves LICM to hoist it
after RA. This is actually what I'd tried originally, but I went for
the manual optimisation after noticing that R0 often wasn't being hoisted.
This bug forced me to go back and look at why, now fixed as r188774.
We should also try to optimize null checks so that they test the CC result
of the SRST directly. The select between null and the SRST GPR result could
then usually be deleted as dead.
llvm-svn: 188779
Previously, generation of stack protectors was done exclusively in the
pre-SelectionDAG Codegen LLVM IR Pass "Stack Protector". This necessitated
splitting basic blocks at the IR level to create the success/failure basic
blocks in the tail of the basic block in question. As a result of this,
calls that would have qualified for the sibling call optimization were no
longer eligible for optimization since said calls were no longer right in
the "tail position" (i.e. the immediate predecessor of a ReturnInst
instruction).
Then it was noticed that since the sibling call optimization causes the
callee to reuse the caller's stack, if we could delay the generation of
the stack protector check until later in CodeGen after the sibling call
decision was made, we get both the tail call optimization and the stack
protector check!
A few goals in solving this problem were:
1. Preserve the architecture independence of stack protector generation.
2. Preserve the normal IR level stack protector check for platforms like
OpenBSD for which we support platform specific stack protector
generation.
The main problem that guided the present solution is that one can not
solve this problem in an architecture independent manner at the IR level
only. This is because:
1. The decision on whether or not to perform a sibling call on certain
platforms (for instance i386) requires lower level information
related to available registers that can not be known at the IR level.
2. Even if the previous point were not true, the decision on whether to
perform a tail call is done in LowerCallTo in SelectionDAG which
occurs after the Stack Protector Pass. As a result, one would need to
put the relevant callinst into the stack protector check success
basic block (where the return inst is placed) and then move it back
later at SelectionDAG/MI time before the stack protector check if the
tail call optimization failed. The MI level option was nixed
immediately since it would require platform specific pattern
matching. The SelectionDAG level option was nixed because
SelectionDAG only processes one IR level basic block at a time
implying one could not create a DAG Combine to move the callinst.
To get around this problem a few things were realized:
1. While one can not handle multiple IR level basic blocks at the
SelectionDAG Level, one can generate multiple machine basic blocks
for one IR level basic block. This is how we handle bit tests and
switches.
2. At the MI level, tail calls are represented via a special return
MIInst called "tcreturn". Thus if we know the basic block in which we
wish to insert the stack protector check, we get the correct behavior
by always inserting the stack protector check right before the return
statement. This is a "magical transformation" since no matter where
the stack protector check intrinsic is, we always insert the stack
protector check code at the end of the BB.
Given the aforementioned constraints, the following solution was devised:
1. On platforms that do not support SelectionDAG stack protector check
generation, allow for the normal IR level stack protector check
generation to continue.
2. On platforms that do support SelectionDAG stack protector check
generation:
a. Use the IR level stack protector pass to decide if a stack
protector is required/which BB we insert the stack protector check
in by reusing the logic already therein. If we wish to generate a
stack protector check in a basic block, we place a special IR
intrinsic called llvm.stackprotectorcheck right before the BB's
returninst or if there is a callinst that could potentially be
sibling call optimized, before the call inst.
b. Then when a BB with said intrinsic is processed, we codegen the BB
normally via SelectBasicBlock. In said process, when we visit the
stack protector check, we do not actually emit anything into the
BB. Instead, we just initialize the stack protector descriptor
class (which involves stashing information/creating the success
mbbb and the failure mbb if we have not created one for this
function yet) and export the guard variable that we are going to
compare.
c. After we finish selecting the basic block, in FinishBasicBlock if
the StackProtectorDescriptor attached to the SelectionDAGBuilder is
initialized, we first find a splice point in the parent basic block
before the terminator and then splice the terminator of said basic
block into the success basic block. Then we code-gen a new tail for
the parent basic block consisting of the two loads, the comparison,
and finally two branches to the success/failure basic blocks. We
conclude by code-gening the failure basic block if we have not
code-gened it already (all stack protector checks we generate in
the same function, use the same failure basic block).
llvm-svn: 188755
This adds a llvm.copysign intrinsic; We already have Libfunc recognition for
copysign (which is turned into the FCOPYSIGN SDAG node). In order to
autovectorize calls to copysign in the loop vectorizer, we need a corresponding
intrinsic as well.
In addition to the expected changes to the language reference, the loop
vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into
an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a
few lists in LegalizeVector{Ops,Types} so that vector copysigns can be
expanded.
In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN
be Expand for vector types. This seems correct for all in-tree targets, and I
think is the right thing to do because, previously, there was no way to generate
vector-values FCOPYSIGN nodes (and most targets don't specify an action for
vector-typed FCOPYSIGN).
llvm-svn: 188728
Generalize r188163 to cope with return types other than MVT::i32, just
as the existing visitMemCmpCall code did. I've split this out into a
subroutine so that it can be used for other upcoming patches.
I also noticed that I'd used the wrong API to record the out chain.
It's a load that uses DAG.getRoot() rather than getRoot(), so the out
chain should go on PendingLoads. I don't have a testcase for that because
we don't do any interesting scheduling on z yet.
llvm-svn: 188540
All libm floating-point rounding functions, except for round(), had their own
ISD nodes. Recent PowerPC cores have an instruction for round(), and so here I'm
adding ISD::FROUND so that round() can be custom lowered as well.
For the most part, this is straightforward. I've added an intrinsic
and a matching ISD node just like those for nearbyint() and friends. The
SelectionDAG pattern I've named frnd (because ISD::FP_ROUND has already claimed
fround).
This will be used by the PowerPC backend in a follow-up commit.
llvm-svn: 187926
This virtual function can be implemented by targets to specify the type
to use for the index operand of INSERT_VECTOR_ELT, EXTRACT_VECTOR_ELT,
INSERT_SUBVECTOR, EXTRACT_SUBVECTOR. The default implementation returns
the result from TargetLowering::getPointerTy()
The previous code was using TargetLowering::getPointerTy() for vector
indices, because this is guaranteed to be legal on all targets. However,
using TargetLowering::getPointerTy() can be a problem for targets with
pointer sizes that differ across address spaces. On such targets,
when vectors need to be loaded or stored to an address space other than the
default 'zero' address space (which is the address space assumed by
TargetLowering::getPointerTy()), having an index that
is a different size than the pointer can lead to inefficient
pointer calculations, (e.g. 64-bit adds for a 32-bit address space).
There is no intended functionality change with this patch.
llvm-svn: 187748
For a testcase like the following:
typedef unsigned long uint64_t;
typedef struct {
uint64_t lo;
uint64_t hi;
} blob128_t;
void add_128_to_128(const blob128_t *in, blob128_t *res) {
asm ("PAND %1, %0" : "+Q"(*res) : "Q"(*in));
}
where we'll fail to allocate the register for the output constraint,
our matching input constraint will not find a register to match,
and could try to search past the end of the current operands array.
On the idea that we'd like to attempt to keep compilation going
to find more errors in the module, change the error cases when
we're visiting inline asm IR to return immediately and avoid
trying to create a node in the DAG. This leaves us with only
a single error message per inline asm instruction, but allows us
to safely keep going in the general case.
llvm-svn: 187470
Change the informal convention of DBG_VALUE machine instructions so that
we can express a register-indirect address with an offset of 0.
The old convention was that a DBG_VALUE is a register-indirect value if
the offset (operand 1) is nonzero. The new convention is that a DBG_VALUE
is register-indirect if the first operand is a register and the second
operand is an immediate. For plain register values the combination reg,
reg is used. MachineInstrBuilder::BuildMI knows how to build the new
DBG_VALUES.
rdar://problem/13658587
llvm-svn: 185966
in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in
order to resolve the following issues with fmuladd (i.e. optional FMA)
intrinsics:
1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd
intrinsics even if the subtarget does not support FMA instructions, leading
to laughably bad code generation in some situations.
2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128,
resulting in a call to a software fp128 FMA implementation.
3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types
like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize,
etc. to types that support hardware FMAs.
The function has also been slightly renamed for consistency and to force a
merge/build conflict for any out-of-tree target implementing it. To resolve,
see comments and fixed in-tree examples.
llvm-svn: 185956
This prevents the emission of DAG-generated vreg definitions after a
tail call be dropping them entirely (on the grounds that nothing could
use them anyway, and they interfere with O0 CodeGen).
llvm-svn: 185754
Stop using the ISD::EXCEPTIONADDR and ISD::EHSELECTION when lowering
landing pad arguments. These nodes were previously legalized into
CopyFromReg nodes, but that never worked properly because the
CopyFromReg node weren't guaranteed to be scheduled at the top of the
basic block.
This meant the exception pointer and selector registers could be
clobbered before being copied to a virtual register.
This patch copies the two physical registers to virtual registers at
the beginning of the basic block, and lowers the landingpad instruction
directly to two CopyFromReg nodes reading the *virtual* registers. This
is safe because virtual registers don't get clobbered.
A future patch will remove the ISD::EXCEPTIONADDR and ISD::EHSELECTION
nodes.
llvm-svn: 185617
Stop using the ISD::EXCEPTIONADDR and ISD::EHSELECTION when lowering
landing pad arguments. These nodes were previously legalized into
CopyFromReg nodes, but that never worked properly because the
CopyFromReg node weren't guaranteed to be scheduled at the top of the
basic block.
This meant the exception pointer and selector registers could be
clobbered before being copied to a virtual register.
This patch copies the two physical registers to virtual registers at
the beginning of the basic block, and lowers the landingpad instruction
directly to two CopyFromReg nodes reading the *virtual* registers. This
is safe because virtual registers don't get clobbered.
A future patch will remove the ISD::EXCEPTIONADDR and ISD::EHSELECTION
nodes.
llvm-svn: 185595
No functionality change.
It should suffice to check the type of a debug info metadata, instead of
calling Verify. For cases where we know the type of a DI metadata, use
assert.
Also update testing cases to make them conform to the format of DI classes.
llvm-svn: 185135
value is zero.
This allows optmizations to kick in more easily.
Fix some test cases so that they remain meaningful (i.e., not completely dead
coded) when optimizations apply.
<rdar://problem/14096009> superfluous multiply by high part of zero-extended
value.
llvm-svn: 184222
Rather than using the full power of target-specific addressing modes in
DBG_VALUEs with Frame Indicies, simply use Frame Index + Offset. This
reduces the complexity of debug info handling down to two
representations of values (reg+offset and frame index+offset) rather
than three or four.
Ideally we could ensure that frame indicies had been eliminated by the
time we reached an assembly or dwarf generation, but I haven't spent the
time to figure out where the FIs are leaking through into that & whether
there's a good place to convert them. Some FI+offset=>reg+offset
conversion is done (see PrologEpilogInserter, for example) which is
necessary for some SelectionDAG assumptions about registers, I believe,
but it might be possible to make this a more thorough conversion &
ensure there are no remaining FIs no matter how instruction selection
is performed.
llvm-svn: 184066
When -ffast-math is in effect (on Linux, at least), clang defines
__FINITE_MATH_ONLY__ > 0 when including <math.h>. This causes the
preprocessor to include <bits/math-finite.h>, which renames the sqrt functions.
For instance, "sqrt" is renamed as "__sqrt_finite".
This patch adds the 3 new names in such a way that they will be treated
as equivalent to their respective original names.
llvm-svn: 182739
Use a field in the SelectionDAGNode object to track its IR ordering.
This adds fields and utility classes without changing existing
interfaces or functionality.
llvm-svn: 182701
report a fatal error. This allows us to continue processing the translation
unit. Test case to come on the clang side because we need an inline asm
diagnostics handler in place.
rdar://13446483
llvm-svn: 180873
register-indirect address with an offset of 0.
It used to be that a DBG_VALUE is a register-indirect value if the offset
(operand 1) is nonzero. The new convention is that a DBG_VALUE is
register-indirect if the first operand is a register and the second
operand is an immediate. For plain registers use the combination reg, reg.
rdar://problem/13658587
llvm-svn: 180816
Performing this check unilaterally prevented us from generating FMAs when the incoming IR contained illegal vector types which would eventually be legalized to underlying types that *did* support FMA.
For example, an @llvm.fmuladd on an OpenCL float16 should become a sequence of float4 FMAs, not float4 fmul+fadd's.
NOTE: Because we still call the target-specific profitability hook, individual targets can reinstate the old behavior, if desired, by simply performing the legality check inside their callback hook. They can also perform more sophisticated legality checks, if, for example, some illegal vector types can be productively implemented as FMAs, but not others.
llvm-svn: 177820
Code generation makes some basic assumptions about the IR it's been given. In
particular, if there is only one 'invoke' in the function, then that invoke
won't be going away. However, with the advent of the `llvm.donothing' intrinsic,
those invokes may go away. If all of them go away, the landing pad no longer has
any users. This confuses the back-end, which asserts.
This happens with SjLj exceptions, because that's the model that modifies the IR
based on there being invokes, etc. in the function.
Remove any invokes of `llvm.donothing' during SjLj EH preparation. This will
give us a CFG that the back-end won't be confused about. If all of the invokes
in a function are removed, then the SjLj EH prepare pass won't insert the bogus
code the relies upon the invokes being there.
<rdar://problem/13228754&13316637>
llvm-svn: 176677
- ISD::SHL/SRL/SRA must have either both scalar or both vector operands
but TLI.getShiftAmountTy() so far only return scalar type. As a
result, backend logic assuming that breaks.
- Rename the original TLI.getShiftAmountTy() to
TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to
return target-specificed scalar type or the same vector type as the
1st operand.
- Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar
type.
llvm-svn: 176364
SelectionDAGIsel::LowerArguments needs a function, not a basic block. So it
makes sense to pass it the function instead of extracting a basic-block from
the function and then tossing it. This is also more self-documenting (functions
have arguments, BBs don't).
In addition, added comments to a couple of Select* methods.
llvm-svn: 176305
memory intrinsics in the SDAG builder.
When alignment is zero, the lang ref says that *no* alignment
assumptions can be made. This is the exact opposite of the internal API
contracts of the DAG where alignment 0 indicates that the alignment can
be made to be anything desired.
There is another, more explicit alignment that is better suited for the
role of "no alignment at all": an alignment of 1. Map the intrinsic
alignment to this early so that we don't end up generating aligned DAGs.
It is really terrifying that we've never seen this before, but we
suddenly started generating a large number of alignment 0 memcpys due to
the new code to do memcpy-based copying of POD class members. That patch
contains a bug that rounds bitfield alignments down when they are the
first field. This can in turn produce zero alignments.
This fixes weird crashes I've seen in library users of LLVM on 32-bit
hosts, etc.
llvm-svn: 176022
Aside from the question of whether we report a warning or an error when we
can't satisfy a requested stack object alignment, the current implementation
of this is not good. We're not providing any source location in the diagnostics
and the current warning is not connected to any warning group so you can't
control it. We could improve the source location somewhat, but we can do a
much better job if this check is implemented in the front-end, so let's do that
instead. <rdar://problem/13127907>
llvm-svn: 174741
Previously we tried to infer it from the bit width size, with an added
IsIEEE argument for the PPC/IEEE 128-bit case, which had a default
value. This default value allowed bugs to creep in, where it was
inappropriate.
llvm-svn: 173138
- recognize string "{memory}" in the MI generation
- mark as mayload/maystore when there's a memory clobber constraint.
PR14859.
Patch by Krzysztof Parzyszek
llvm-svn: 172228
requirement when creating stack objects in MachineFrameInfo.
Add CreateStackObjectWithMinAlign to throw error when the minimal alignment
can't be achieved and to clamp the alignment when the preferred alignment
can't be achieved. Same is true for CreateVariableSizedObject.
Will not emit error in CreateSpillStackObject or CreateStackObject.
As long as callers of CreateStackObject do not assume the object will be
aligned at the requested alignment, we should not have miscompile since
later optimizations which look at the object's alignment will have the correct
information.
rdar://12713765
llvm-svn: 172027
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.
There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.
The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.
I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).
I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.
llvm-svn: 171366
directly.
This is in preparation for removing the use of the 'Attribute' class as a
collection of attributes. That will shift to the AttributeSet class instead.
llvm-svn: 171253
Accordingly, add helper funtions getSimpleValueType (in parallel to
getValueType) in SDValue, SDNode, and TargetLowering.
This is the first, in a series of patches.
This is the second attempt. In the first attempt (r169837), a few
getSimpleVT() were hoisted too far, detected by bootstrap failures.
llvm-svn: 170104
Accordingly, add helper funtions getSimpleValueType (in parallel to
getValueType) in SDValue, SDNode, and TargetLowering.
This is the first, in a series of patches.
llvm-svn: 169837
This shouldn't affect codegen for -O0 compiles as tail call markers are not
emitted in unoptimized compiles. Testing with the external/internal nightly
test suite reveals no change in compile time performance. Testing with -O1,
-O2 and -O3 with fast-isel enabled did not cause any compile-time or
execution-time failures. All tests were performed on my x86 machine.
I'll monitor our arm testers to ensure no regressions occur there.
In an upcoming clang patch I will be marking the objc_autoreleaseReturnValue
and objc_retainAutoreleaseReturnValue as tail calls unconditionally. While
it's theoretically true that this is just an optimization, it's an
optimization that we very much want to happen even at -O0, or else ARC
applications become substantially harder to debug.
Part of rdar://12553082
llvm-svn: 169796
understand target implementation of any_extend / extload, just generate
zero_extend in place of any_extend for liveouts when the target knows the
zero_extend will be implicit (e.g. ARM ldrb / ldrh) or folded (e.g. x86 movz).
rdar://12771555
llvm-svn: 169536
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
llvm-svn: 169131
the MachineInstr MayLoad/MayLoad flags are based on the tablegen implementation.
For inline assembly, however, we need to compute these based on the constraints.
Revert r166929 as this is no longer needed, but leave the test case in place.
rdar://12033048 and PR13504
llvm-svn: 167040
which is supposed to consistently raise SIGTRAP across all systems. In contrast,
__builtin_trap() behave differently on different systems. e.g. it raises SIGTRAP on ARM, and
SIGILL on X86. The purpose of __builtin_debugtrap() is to consistently provide "trap"
functionality, in the mean time preserve the compatibility with on gcc on __builtin_trap().
The X86 backend is already able to handle debugtrap(). This patch is to:
1) make front-end recognize "__builtin_debugtrap()" (emboddied in the one-line change to Clang).
2) In DAG legalization phase, by default, "debugtrap" will be replaced with "trap", which
make the __builtin_debugtrap() "available" to all existing ports without the hassle of
changing their code.
3) If trap-function is specified (via -trap-func=xyz to llc), both __builtin_debugtrap() and
__builtin_trap() will be expanded into the function call of the specified trap function.
This behavior may need change in the future.
The provided testing-case is to make sure 2) and 3) are working for ARM port, and we
already have a testing case for x86.
llvm-svn: 166300
SchedulerDAGInstrs::buildSchedGraph ignores dependencies between FixedStack
objects and byval parameters. So loading byval parameters from stack may be
inserted *before* it will be stored, since these operations are treated as
independent.
Fix:
Currently ARMTargetLowering::LowerFormalArguments saves byval registers with
FixedStack MachinePointerInfo. To fix the problem we need to store byval
registers with MachinePointerInfo referenced to first the "byval" parameter.
Also commit adds two new fields to the InputArg structure: Function's argument
index and InputArg's part offset in bytes relative to the start position of
Function's argument. E.g.: If function's argument is 128 bit width and it was
splitted onto 32 bit regs, then we got 4 InputArg structs with same arg index,
but different offset values.
llvm-svn: 165616
We use the enums to query whether an Attributes object has that attribute. The
opaque layer is responsible for knowing where that specific attribute is stored.
llvm-svn: 165488
scalar-to-vector conversion that we cannot handle. For instance, when an invalid
constraint is used in an inline asm statement.
<rdar://problem/12284092>
llvm-svn: 164662
scalar-to-vector conversion that we cannot handle. For instance, when an invalid
constraint is used in an inline asm statement.
<rdar://problem/12284092>
llvm-svn: 164657
Provide interface in TargetLowering to set or get the minimum number of basic
blocks whereby jump tables are generated for switch statements rather than an
if sequence.
getMinimumJumpTableEntries() defaults to 4.
setMinimumJumpTableEntries() allows target configuration.
This patch changes the default for the Hexagon architecture to 5
as it improves performance on some benchmarks.
llvm-svn: 164628
the case of multiple edges from one block to another.
A simple example is a switch statement with multiple values to the same
destination. The definition of an edge is modified from a pair of blocks to
a pair of PredBlock and an index into the successors.
Also set the weight correctly when building SelectionDAG from LLVM IR,
especially when converting a Switch.
IntegersSubsetMapping is updated to calculate the weight for each cluster.
llvm-svn: 162572
SelectionDAG's 'init' has not been called when the SelectionDAGBuilder is
constructed (in SelectionDAGISel's constructor), so this was previously always
initialized with 0.
llvm-svn: 162333
IR that hasn't been through SimplifyCFG can look like this:
br i1 %b, label %r, label %r
Make sure we don't create duplicate Machine CFG edges in this case.
Fix the machine code verifier to accept conditional branches with a
single CFG edge.
llvm-svn: 162230
This patch is mostly just refactoring a bunch of copy-and-pasted code, but
it also adds a check that the call instructions are readnone or readonly.
That check was already present for sin, cos, sqrt, log2, and exp2 calls, but
it was missing for the rest of the builtins being handled in this code.
llvm-svn: 161282
The previous change caused fast isel to not attempt handling any calls to
builtin functions. That included things like "printf" and caused some
noticable regressions in compile time. I wanted to avoid having fast isel
keep a separate list of functions that had to be kept in sync with what the
code in SelectionDAGBuilder.cpp was handling. I've resolved that here by
moving the list into TargetLibraryInfo. This is somewhat redundant in
SelectionDAGBuilder but it will ensure that we keep things consistent.
llvm-svn: 161263
I noticed that SelectionDAGBuilder::visitCall was missing a check for memcmp
in TargetLibraryInfo, so that it would use custom code for memcmp calls even
with -fno-builtin. I also had to add a new -disable-simplify-libcalls option
to llc so that I could write a test for this.
llvm-svn: 161262
IntegersSubsetMapping
- Replaced type of Items field from std::list with std::map. In neares future I'll test it with DenseMap and do the correspond replacement
if possible.
llvm-svn: 159703
IntegersSubsetMapping
- Replaced type of Items field from std::list with std::map. In neares future I'll test it with DenseMap and do the correspond replacement
if possible.
llvm-svn: 159659
include/llvm/Analysis/DebugInfo.h to include/llvm/DebugInfo.h.
The reasoning is because the DebugInfo module is simply an interface to the
debug info MDNodes and has nothing to do with analysis.
llvm-svn: 159312
boolean flag to an enum: { Fast, Standard, Strict } (default = Standard).
This option controls the creation by optimizations of fused FP ops that store
intermediate results in higher precision than IEEE allows (E.g. FMAs). The
behavior of this option is intended to match the behaviour specified by a
soon-to-be-introduced frontend flag: '-ffuse-fp-ops'.
Fast mode - allows formation of fused FP ops whenever they're profitable.
Standard mode - allow fusion only for 'blessed' FP ops. At present the only
blessed op is the fmuladd intrinsic. In the future more blessed ops may be
added.
Strict mode - allow fusion only if/when it can be proven that the excess
precision won't effect the result.
Note: This option only controls formation of fused ops by the optimizers. Fused
operations that are explicitly requested (e.g. FMA via the llvm.fma.* intrinsic)
will always be honored, regardless of the value of this option.
Internally TargetOptions::AllowExcessFPPrecision has been replaced by
TargetOptions::AllowFPOpFusion.
llvm-svn: 158956
expression (a * b + c) that can be implemented as a fused multiply-add (fma)
if the target determines that this will be more efficient. This intrinsic
will be used to implement FP_CONTRACT support and an aggressive FMA formation
mode.
If your target has a fast FMA instruction you should override the
isFMAFasterThanMulAndAdd method in TargetLowering to return true.
llvm-svn: 158014
IntegersSubsetGeneric, IntegersSubsetMapping: added IntTy template parameter, that allows use either APInt or IntItem. This change allows to write unittest for these classes.
llvm-svn: 157880
IntegersSubset devided into IntegersSubsetGeneric and into IntegersSubset itself. The first has no references to ConstantInt and works with IntItem only.
IntegersSubsetMapping also made generic. Here added second template parameter "IntegersSubsetTy" that allows to use on of two IntegersSubset types described below.
llvm-svn: 157815
Implemented IntItem - the wrapper around APInt. Why not to use APInt item directly right now?
1. It will very difficult to implement case ranges as series of small patches. We got several large and heavy patches. Each patch will about 90-120 kb. If you replace ConstantInt with APInt in SwitchInst you will need to changes at the same time all Readers,Writers and absolutely all passes that uses SwitchInst.
2. We can implement APInt pool inside and save memory space. E.g. we use several switches that works with 256 bit items (switch on signatures, or strings). We can avoid value duplicates in this case.
3. IntItem can be easyly easily replaced with APInt.
4. Currenly we can interpret IntItem both as ConstantInt and as APInt. It allows to provide SwitchInst methods that works with ConstantInt for non-updated passes.
Why I need it right now? Currently I need to update SimplifyCFG pass (EqualityComparisons). I need to work with APInts directly a lot, so peaces of code
ConstantInt *V = ...;
if (V->getValue().ugt(AnotherV->getValue()) {
...
}
will look awful. Much more better this way:
IntItem V = ConstantIntVal->getValue();
if (AnotherV < V) {
}
Of course any reviews are welcome.
P.S.: I'm also going to rename ConstantRangesSet to IntegersSubset, and CRSBuilder to IntegersSubsetMapping (allows to map individual subsets of integers to the BasicBlocks).
Since in future these classes will founded on APInt, it will possible to use them in more generic ways.
llvm-svn: 157576
SimplifyCFG tends to form a lot of 2-3 case switches when merging branches. Move
the most likely condition to the front so it is checked first and the others can
be skipped. This is currently not as effective as it could be because SimplifyCFG
destroys profiling metadata when merging branches and switches. Merging branch
weight metadata is tricky though.
This code touches at most 3 cases so I didn't use a proper sorting algorithm.
llvm-svn: 157521
to pass around a struct instead of a large set of individual values. This
cleans up the interface and allows more information to be added to the struct
for future targets without requiring changes to each and every target.
NV_CONTRIB
llvm-svn: 157479
SelectionDAGBuilder::Clusterify : main functinality was replaced with CRSBuilder::optimize, so big part of Clusterify's code was reduced.
llvm-svn: 157046
This is the CodeGen equivalent of r153747. I tested that there is not noticeable
performance difference with any combination of -O0/-O2 /-g when compiling
gcc as a single compilation unit.
llvm-svn: 153817
Renamed methods caseBegin, caseEnd and caseDefault with case_begin, case_end, and case_default.
Added some notes relative to case iterators.
llvm-svn: 152532
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120130/136146.html
Implemented CaseIterator and it solves almost all described issues: we don't need to mix operand/case/successor indexing anymore. Base iterator class is implemented as a template since it may be initialized either from "const SwitchInst*" or from "SwitchInst*".
ConstCaseIt is just a read-only iterator.
CaseIt is read-write iterator; it allows to change case successor and case value.
Usage of iterator allows totally remove resolveXXXX methods. All indexing convertions done automatically inside the iterator's getters.
Main way of iterator usage looks like this:
SwitchInst *SI = ... // intialize it somehow
for (SwitchInst::CaseIt i = SI->caseBegin(), e = SI->caseEnd(); i != e; ++i) {
BasicBlock *BB = i.getCaseSuccessor();
ConstantInt *V = i.getCaseValue();
// Do something.
}
If you want to convert case number to TerminatorInst successor index, just use getSuccessorIndex iterator's method.
If you want initialize iterator from TerminatorInst successor index, use CaseIt::fromSuccessorIndex(...) method.
There are also related changes in llvm-clients: klee and clang.
llvm-svn: 152297
When the GEP index is a vector of pointers, the code that calculated the size
of the element started from the vector type, and not the contained pointer type.
As a result, instead of looking at the data element pointed by the vector, this
code used the size of the vector. This works for 32bit members (on 32bit
systems), but not for other types. Added code to peel the vector type and
added a test.
llvm-svn: 151626
the processor keeps a return addresses stack (RAS) which stores the address
and the instruction execution state of the instruction after a function-call
type branch instruction.
Calling a "noreturn" function with normal call instructions (e.g. bl) can
corrupt RAS and causes 100% return misprediction so LLVM should use a
unconditional branch instead. i.e.
mov lr, pc
b _foo
The "mov lr, pc" is issued in order to get proper backtrace.
rdar://8979299
llvm-svn: 151623
variable declaration as an argument because we want that address
anyhow for our debug information.
This seems to fix rdar://9965111, at least we have more debug
information than before and from reading the assembly it appears
to be the correct location.
llvm-svn: 151335
The purpose of refactoring is to hide operand roles from SwitchInst user (programmer). If you want to play with operands directly, probably you will need lower level methods than SwitchInst ones (TerminatorInst or may be User). After this patch we can reorganize SwitchInst operands and successors as we want.
What was done:
1. Changed semantics of index inside the getCaseValue method:
getCaseValue(0) means "get first case", not a condition. Use getCondition() if you want to resolve the condition. I propose don't mix SwitchInst case indexing with low level indexing (TI successors indexing, User's operands indexing), since it may be dangerous.
2. By the same reason findCaseValue(ConstantInt*) returns actual number of case value. 0 means first case, not default. If there is no case with given value, ErrorIndex will returned.
3. Added getCaseSuccessor method. I propose to avoid usage of TerminatorInst::getSuccessor if you want to resolve case successor BB. Use getCaseSuccessor instead, since internal SwitchInst organization of operands/successors is hidden and may be changed in any moment.
4. Added resolveSuccessorIndex and resolveCaseIndex. The main purpose of these methods is to see how case successors are really mapped in TerminatorInst.
4.1 "resolveSuccessorIndex" was created if you need to level down from SwitchInst to TerminatorInst. It returns TerminatorInst's successor index for given case successor.
4.2 "resolveCaseIndex" converts low level successors index to case index that curresponds to the given successor.
Note: There are also related compatability fix patches for dragonegg, klee, llvm-gcc-4.0, llvm-gcc-4.2, safecode, clang.
llvm-svn: 149481
Before we'd get:
$ clang t.c
fatal error: error in backend: Invalid operand for inline asm constraint 'i'!
Now we get:
$ clang t.c
t.c:16:5: error: invalid operand for inline asm constraint 'i'!
"movq (%4), %%mm0\n"
^
Which at least gets us the inline asm that is the problem.
llvm-svn: 147502
undefined result. This adds new ISD nodes for the new semantics,
selecting them when the LLVM intrinsic indicates that the undef behavior
is desired. The new nodes expand trivially to the old nodes, so targets
don't actually need to do anything to support these new nodes besides
indicating that they should be expanded. I've done this for all the
operand types that I could figure out for all the targets. Owners of
various targets, please review and let me know if any of these are
incorrect.
Note that the expand behavior is *conservatively correct*, and exactly
matches LLVM's current behavior with these operations. Ideally this
patch will not change behavior in any way. For example the regtest suite
finds the exact same instruction sequences coming out of the code
generator. That's why there are no new tests here -- all of this is
being exercised by the existing test suite.
Thanks to Duncan Sands for reviewing the various bits of this patch and
helping me get the wrinkles ironed out with expanding for each target.
Also thanks to Chris for clarifying through all the discussions that
this is indeed the approach he was looking for. That said, there are
likely still rough spots. Further review much appreciated.
llvm-svn: 146466
change, now you need a TargetOptions object to create a TargetMachine. Clang
patch to follow.
One small functionality change in PTX. PTX had commented out the machine
verifier parts in their copy of printAndVerify. That now calls the version in
LLVMTargetMachine. Users of PTX who need verification disabled should rely on
not passing the command-line flag to enable it.
llvm-svn: 145714
dropping weights on the floor for invokes. This was impeding my writing
further test cases for invoke when interacting with probabilities and
block placement.
No test case as there doesn't appear to be a way to test this stuff. =/
Suggestions for a test case of course welcome. I hope to be able to add
test cases that indirectly cover this eventually by adding probabilities
to the exceptional edge and reordering blocks as a result.
llvm-svn: 145060
When this field is true it means that the load is from constant (runt-time or compile-time) and so can be hoisted from loops or moved around other memory accesses
llvm-svn: 144100
This code makes different decisions when compiled into x87 instructions
because of different rounding behavior. That caused phase 2/3
miscompares on 32-bit Linux when the phase 1 compiler was built with gcc
(using x87), and the phase 2 compiler was built with clang (using SSE).
This fixes PR11200.
llvm-svn: 143006
This isn't put into the 'clear()' method because the information needs to stick
around (at least for a little bit) after the selection DAG is built.
llvm-svn: 142032
The inline asm operand constraint is initially encoded in the virtual
register for the operand, but that register class may change during
coalescing, and the original constraint is lost.
Encode the original register class as part of the flag word for each
inline asm operand. This makes it possible to recover the actual
constraint required by inline asm, just like we can for normal
instructions.
llvm-svn: 141833
This intrinsic is used to pass the index of the function context to the back-end
for further processing. The back-end is in charge of filling in the rest of the
entries.
llvm-svn: 140676
with a vector condition); such selects become VSELECT codegen nodes.
This patch also removes VSETCC codegen nodes, unifying them with SETCC
nodes (codegen was actually often using SETCC for vector SETCC already).
This ensures that various DAG combiner optimizations kick in for vector
comparisons. Passes dragonegg bootstrap with no testsuite regressions
(nightly testsuite as well as "make check-all"). Patch mostly by
Nadav Rotem.
llvm-svn: 139159
init.trampoline and adjust.trampoline intrinsics, into two intrinsics
like in GCC. While having one combined intrinsic is tempting, it is
not natural because typically the trampoline initialization needs to
be done in one function, and the result of adjust trampoline is needed
in a different (nested) function. To get around this llvm-gcc hacks the
nested function lowering code to insert an additional parent variable
holding the adjust.trampoline result that can be accessed from the child
function. Dragonegg doesn't have the luxury of tweaking GCC code, so it
stored the result of adjust.trampoline in the memory GCC set aside for
the trampoline itself (this is always available in the child function),
and set up some new memory (using an alloca) to hold the trampoline.
Unfortunately this breaks Go which allocates trampoline memory on the
heap and wants to use it even after the parent has exited (!). Rather
than doing even more hacks to get Go working, it seemed best to just use
two intrinsics like in GCC. Patch mostly by Sanjoy Das.
llvm-svn: 139140
I don't really like the patterns, but I'm having trouble coming up with a
better way to handle them.
I plan on making other targets use the same legalization
ARM-without-memory-barriers is using... it's not especially efficient, but
if anyone cares, it's not that hard to fix for a given target if there's
some better lowering.
llvm-svn: 138621
The landingpad instruction is lowered into the EXCEPTIONADDR and EHSELECTION
SDNodes. The information from the landingpad instruction is harvested by the
'AddLandingPadInfo' function. The new EH uses the current EH scheme in the
back-end. This will change once we switch over to the new scheme. (Reviewed by
Jakob!)
llvm-svn: 137880
This generates the SDNodes for the new exception handling scheme. It takes the
two values coming from the landingpad instruction and assigns them to the
EXCEPTIONADDR and EHSELECTION nodes.
llvm-svn: 137873
This implements the 'landingpad' instruction. It's used to indicate that a basic
block is a landing pad. There are several restrictions on its use (see
LangRef.html for more detail). These restrictions allow the exception handling
code to gather the information it needs in a much more sane way.
This patch has the definition, implementation, C interface, parsing, and bitcode
support in it.
llvm-svn: 137501
This adds the 'resume' instruction class, IR parsing, and bitcode reading and
writing. The 'resume' instruction resumes propagation of an existing (in-flight)
exception whose unwinding was interrupted with a 'landingpad' instruction (to be
added later).
llvm-svn: 136589
working on x86 (at least for trivial testcases); other architectures will
need more work so that they actually emit the appropriate instructions for
orderings stricter than 'monotonic'. (As far as I can tell, the ARM, PPC,
Mips, and Alpha backends need such changes.)
llvm-svn: 136457
This generates the correct SDNodes for the landingpad instruction. It makes an
assumption that the result of the landingpad instruction has at least two
values. And that the first value is a pointer to the exception object and the
second value is the "selector."
llvm-svn: 136430
'atomicrmw' instructions, which allow representing all the current atomic
rmw intrinsics.
The allowed operands for these instructions are heavily restricted at the
moment; we can probably loosen it a bit, but supporting general
first-class types (where it makes sense) might get a bit complicated,
given how SelectionDAG works.
As an initial cut, these operations do not support specifying an alignment,
but it would be possible to add if we think it's useful. Specifying an
alignment lower than the natural alignment would be essentially
impossible to support on anything other than x86, but specifying a greater
alignment would be possible. I can't think of any useful optimizations which
would use that information, but maybe someone else has ideas.
Optimizer/codegen support coming soon.
llvm-svn: 136404
when determining validity of matching constraint. Allow i1
types access to the GR8 reg class for x86.
Fixes PR10352 and rdar://9777108
llvm-svn: 135180
We have to do this in DAGBuilder instead of DAGCombiner, because the exact bit is lost after building.
struct foo { char x[24]; };
long bar(struct foo *a, struct foo *b) { return a-b; }
is now compiled into
movl 4(%esp), %eax
subl 8(%esp), %eax
sarl $3, %eax
imull $-1431655765, %eax, %eax
instead of
movl 4(%esp), %eax
subl 8(%esp), %eax
movl $715827883, %ecx
imull %ecx
movl %edx, %eax
shrl $31, %eax
sarl $2, %edx
addl %eax, %edx
movl %edx, %eax
llvm-svn: 134695
Both become <earlyclobber> defs on the INLINEASM MachineInstr, but we
now use two different asm operand kinds.
The new Kind_Clobber is treated identically to the old
Kind_RegDefEarlyClobber for now, but x87 floating point stack inline
assembly does care about the difference.
This will pop a register off the stack:
asm("fstp %st" : : "t"(x) : "st");
While this will pop the input and push an output:
asm("fst %st" : "=&t"(r) : "t"(x));
We need to know if ST0 was a clobber or an output operand, and we can't
depend on <dead> flags for that.
llvm-svn: 133902
BranchProbabilityInfo (expect setEdgeWeight which is not available here).
Branch Weights are kept in MachineBasicBlocks. To turn off this analysis
set -use-mbpi=false.
llvm-svn: 133184
This virtual function will replace allocation_order_begin/end as the one
to override when implementing custom allocation orders. It is simpler to
have one function return an ArrayRef than having two virtual functions
computing different ends of the same array.
Use getRawAllocationOrder() in place of allocation_order_begin() where
it makes sense, but leave some clients that look like they really want
the filtered allocation orders from RegisterClassInfo.
llvm-svn: 133170
Instead of scalarizing, and doing an element-by-element truncat, use vector
truncate.
Add support for scalarization of vectors: i8 -> <1 x i1> (from Duncan's
testcase).
llvm-svn: 132892
intrinsic call. This prevents it from being reordered so that it appears
*before* the setjmp intrinsic (thus making it completely useless).
<rdar://problem/9409683>
llvm-svn: 131174
It needed to be moved closer to the setjmp statement, because the code directly
after the setjmp needs to know about values that are on the stack. Also, the
'bitcast' of the function context was causing a dead load. This wouldn't be too
horrible, except that at -O0 it wasn't optimized out, and because it wasn't
using the correct base pointer (if there is a VLA), it would try to access a
value from a garbage address.
<rdar://problem/9130540>
llvm-svn: 128873
It couldn't be used outside of the file because SDISelAsmOperandInfo
is local to SelectionDAGBuilder.cpp. Making it a static function avoids
a weird linkage dance.
llvm-svn: 128342
rather than an int. Thankfully, this only causes LLVM to miss optimizations, not
generate incorrect code.
This just fixes the zext at the return. We still insert an i32 ZextAssert when
reading a function's arguments, but it is followed by a truncate and another i8
ZextAssert so it is not optimized.
llvm-svn: 127766
generating i8 shift amounts for things like i1024 types. Add
an assert in getNode to prevent this from occuring in the future,
fix the buggy transformation, revert my previous patch, and
document this gotcha in ISDOpcodes.h
llvm-svn: 125465
Instead encode llvm IR level property "HasSideEffects" in an operand (shared
with IsAlignStack). Added MachineInstrs::hasUnmodeledSideEffects() to check
the operand when the instruction is an INLINEASM.
This allows memory instructions to be moved around INLINEASM instructions.
llvm-svn: 123044
Also fix an off-by-one in SelectionDAGBuilder that was preventing shuffle
vectors from being translated to EXTRACT_SUBVECTOR.
Patch by Tim Northover.
The test changes are needed to keep those spill-q tests from testing aligned
spills and restores. If the only aligned stack objects are spill slots, we
no longer realign the stack frame. Prior to this patch, an EXTRACT_SUBVECTOR
was legalized by loading from the stack, which created an aligned frame index.
Now, however, there is nothing except the spill slot in the stack frame, so
I added an aligned alloca.
llvm-svn: 122995
zextOrTrunc(), and APSInt methods extend(), extOrTrunc() and new method
trunc(), to be const and to return a new value instead of modifying the
object in place.
llvm-svn: 121120
catastrophic compilation time in the event of unreasonable LLVM
IR. Code quality is a separate issue--someone upstream needs to do a
better job of reducing to llvm.memcpy. If the situation can be reproduced with
any supported frontend, then it will be a separate bug.
llvm-svn: 118904
value type, so there is no point in passing it around using
an EVT. Use the simpler MVT everywhere. Rather than trying
to propagate this information maximally in all the code that
using the calling convention stuff, I chose to do a mainly
low impact change instead.
llvm-svn: 118167