Do not export __llvm_profile_counter_bias when profiling is enabled
because this symbol is hidden and cannot be exported.
Should fix this bot error:
```
URL: http://green.lab.llvm.org/green/job/clang-stage1-RA/5678/consoleFull
Problem: Command Output (stdout):
--
ld: warning: cannot export hidden symbol ___llvm_profile_counter_bias
from
/Users/buildslave/jenkins/workspace/clang-stage1-RA/clang-build/lib/clang/11.0.0/lib/darwin/libclang_rt.profile_osx.a(InstrProfilingBiasVar.c.o)
ld: warning: cannot export hidden symbol ___llvm_profile_counter_bias
from
/Users/buildslave/jenkins/workspace/clang-stage1-RA/clang-build/lib/clang/11.0.0/lib/darwin/libclang_rt.profile_osx.a(InstrProfilingBiasVar.c.o)
```
The issue was reported by @xazax.hun here: https://reviews.llvm.org/D69825#1827826
"This patch (D69825) breaks scan-build-py which parses the output of "-###" to get -cc1 command. There might be other tools with the same problems. Could we either remove (in-process) from CC1Command::Print or add a line break?
Having the last line as a valid invocation is valuable and there might be tools relying on that."
Differential Revision: https://reviews.llvm.org/D72982
This change replaces the manual building of executable paths
using llvm::sys::path::append with GetProgramPath.
This enables adding other paths in case executables reside
in different directories and makes the code easier to read.
Differential Revision: https://reviews.llvm.org/D72903
Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.
- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute
AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).
Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.
Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.
-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
attributes.
Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.
This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.
AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.
This is an alternative to the continous mode that was implemented in
D68351. This mode relies on padding and the ability to mmap a file over
the existing mapping which is generally only available on POSIX systems
and isn't suitable for other platforms.
This change instead introduces the ability to relocate counters at
runtime using a level of indirection. On every counter access, we add a
bias to the counter address. This bias is stored in a symbol that's
provided by the profile runtime and is initially set to zero, meaning no
relocation. The runtime can mmap the profile into memory at abitrary
location, and set bias to the offset between the original and the new
counter location, at which point every subsequent counter access will be
to the new location, which allows updating profile directly akin to the
continous mode.
The advantage of this implementation is that doesn't require any special
OS support. The disadvantage is the extra overhead due to additional
instructions required for each counter access (overhead both in terms of
binary size and performance) plus duplication of counters (i.e. one copy
in the binary itself and another copy that's mmapped).
Differential Revision: https://reviews.llvm.org/D69740
Extend -fxray-instrumentation-bundle to split function-entry and
function-exit into two separate options, so that it is possible to
instrument only function entry or only function exit. For use cases
that only care about one or the other this will save significant overhead
and code size.
Differential Revision: https://reviews.llvm.org/D72890
Summary:
This change implements the expansion in two parts:
- Add a utility function emitAMDGPUPrintfCall() in LLVM.
- Invoke the above function from Clang CodeGen, when processing a HIP
program for the AMDGPU target.
The printf expansion has undefined behaviour if the format string is
not a compile-time constant. As a sufficient condition, the HIP
ToolChain now emits -Werror=format-nonliteral.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D71365
Flags are clang's default UI is flags.
We can have an env var in addition to that, but in D69825 nobody has yet
mentioned why this needs an env var, so omit it for now. If someone
needs to set the flag via env var, the existing CCC_OVERRIDE_OPTIONS
mechanism works for it (set CCC_OVERRIDE_OPTIONS=+-fno-integrated-cc1
for example).
Also mention the cc1-in-process change in the release notes.
Also spruce up the test a bit so it actually tests something :)
Differential Revision: https://reviews.llvm.org/D72769
These driver options perform some checking and delegate to MC options -x86-align-branch* and -x86-branches-within-32B-boundaries.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D72463
The option will limit debug info by only emitting complete class
type information when its constructor is emitted.
This patch changes comparisons with LimitedDebugInfo to use the new
level instead.
Differential Revision: https://reviews.llvm.org/D72427
With this patch, the clang tool will now call the -cc1 invocation directly inside the same process. Previously, the -cc1 invocation was creating, and waiting for, a new process.
This patch therefore reduces the number of created processes during a build, thus it reduces build times on platforms where process creation can be costly (Windows) and/or impacted by a antivirus.
It also makes debugging a bit easier, as there's no need to attach to the secondary -cc1 process anymore, breakpoints will be hit inside the same process.
Crashes or signaling inside the -cc1 invocation will have the same side-effect as before, and will be reported through the same means.
This behavior can be controlled at compile-time through the CLANG_SPAWN_CC1 cmake flag, which defaults to OFF. Setting it to ON will revert to the previous behavior, where any -cc1 invocation will create/fork a secondary process.
At run-time, it is also possible to tweak the CLANG_SPAWN_CC1 environment variable. Setting it and will override the compile-time setting. A value of 0 calls -cc1 inside the calling process; a value of 1 will create a secondary process, as before.
Differential Revision: https://reviews.llvm.org/D69825
which is the default TLS model for non-PIC objects. This allows large/
many thread local variables or a compact/fast code in an executable.
Specification is same as that of GCC. For example, the code model
option precedes the TLS size option.
TLS access models other than local-exec are not changed. It means
supoort of the large code model is only in the local exec TLS model.
Patch By KAWASHIMA Takahiro (kawashima-fj <t-kawashima@fujitsu.com>)
Reviewers: dmgreen, mstorsjo, t.p.northover, peter.smith, ostannard
Reviewd By: peter.smith
Committed by: peter.smith
Differential Revision: https://reviews.llvm.org/D71688
All 130+ f_Group flags that take an argument allow it after a '=',
except for fdebug-complation-dir. Add a Joined<> alias so that
it behaves consistently with all the other f_Group flags.
(Keep the old Separate flag for backwards compat.)
Follow-up of D72014. It is more appropriate to use a target
feature instead of a SubTypeArch to express the difference.
Reviewed By: #powerpc, jhibbits
Differential Revision: https://reviews.llvm.org/D72433
In the backend, this feature is implemented with the function attribute
"patchable-function-entry". Both the attribute and XRay use
TargetOpcode::PATCHABLE_FUNCTION_ENTER, so the two features are
incompatible.
Reviewed By: ostannard, MaskRay
Differential Revision: https://reviews.llvm.org/D72222
Architecturally, it's allowed to have MVE-I without an FPU, thus
-mfpu=none should not disable MVE-I, or moves to/from FP-registers.
This patch removes `+/-fpregs` from features unconditionally added to
target feature list, depending on FPU and moves the logic to Clang
driver, where the negative form (`-fpregs`) is conditionally added to
the target features list for the cases of `-mfloat-abi=soft`, or
`-mfpu=none` without either `+mve` or `+mve.fp`. Only the negative
form is added by the driver, the positive one is derived from other
features in the backend.
Differential Revision: https://reviews.llvm.org/D71843
Apple's CPUs are called A7-A13 in official communication, occasionally with
weird suffixes which we probably don't need to care about. This adds each one
and describes its features. It also switches the default CPU to the canonical
name for Cyclone, but leaves legacy support in so that existing bitcode still
compiles.
According to D53384, the default was switched from -fno-PIC to -fPIC to
work around a -fsanitize=leak bug on big-endian.
This gratuitous difference between little-endian and big-endian is
undesired, and not acceptable on powerpc64-unknown-freebsd. If
-fsanitize=leak still has the problem, we should consider defaulting to
-fPIC/-fPIE only when -fsanitize=leak is specified (see SanitizerArgs::requiresPIE())
powerpc64-ibm-aix is unaffected: it still defaults to -fPIC.
powerpc64-linux-musl is unaffected (-fPIE since D39588): it still defaults to -fPIE.
Reviewed By: #powerpc, jhibbits
Differential Revision: https://reviews.llvm.org/D72363
Summary:
Every powerpc64le platform uses elfv2.
For powerpc64, the environments "elfv1" and "elfv2" were added for
FreeBSD ELFv1->ELFv2 migration in D61950. FreeBSD developers have
decided to use OS versions to select ABI, and no one is relying on the
environments.
Also use elfv2 on powerpc64-linux-musl.
Users can always use -mabi=elfv1 and -mabi=elfv2 to override the default
ABI.
Reviewed By: adalava
Differential Revision: https://reviews.llvm.org/D72352
Linux' current addLibCxxIncludePaths and addLibStdCxxIncludePaths
are actually almost non-Linux-specific at all, and can be reused
almost as such for all gcc toolchains. Only keep
Android/Freescale/Cray hacks in Linux's version.
Patch by sthibaul (Samuel Thibault)
Differential Revision: https://reviews.llvm.org/D69758
-mpacked-stack is currently not supported with -mbackchain, so this should
result in a compilation error message instead of being silently ignored.
Review: Ulrich Weigand
gcc/config/{i386,s390} support -mnop-mcount. We currently only support
-mnop-mcount for SystemZ. The function attribute "mnop-mcount" is
ignored on other targets.
gcc/config/{i386,s390} support -mfentry. We currently only support
-mfentry for X86 and SystemZ. TargetOpcode::FENTRY_CALL is not handled
on other targets.
% clang -target aarch64 -pg -mfentry a.c -c
fatal error: error in backend: Not supported instr: <MCInst 21>
-mfentry, -mrecord-mcount, and -mnop-mcount were invented for Linux
ftrace. Linux uses $(call cc-option-yn,-mrecord-mcount) to detect if the
specific feature is available. Reject unsupported features so that Linux
build system will not wrongly consider them available and cause
build/runtime failures.
Note, GCC has stricter checks that we do not implement, e.g. -fpic/-fpie
-fnop-mcount is not allowed on x86, -fpic/-fpie -mfentry is not allowed
on x86-32.
GCC's x86 and s390 ports support -mrecord-mcount. Other ports reject the
option.
aarch64-linux-gnu-gcc: error: unrecognized command line option ‘-mrecord-mcount’
Allowing this option can cause failures when building Linux kernel for
aarch64, powerpc64, etc, which will think the feature is available if
the clang command returns 0.
Method '-[NSCoder decodeValueOfObjCType:at:]' is not only deprecated
but also a security hazard, hence a loud check.
Differential Revision: https://reviews.llvm.org/D71728
Recognize -mrecord-mcount from the command line and add a function attribute
"mrecord-mcount" when passed.
Only valid on SystemZ (when used with -mfentry).
Review: Ulrich Weigand
https://reviews.llvm.org/D71627
Summary: Per D62731, the behavior of clang with `-frounding-math` is no worse than when the rounding flag was completely ignored, so remove this unnecessary warning.
Reviewers: mibintc, chandlerc, echristo, rjmccall, kpn, erichkeane, rsmith, andrew.w.kaylor
Reviewed By: mibintc
Subscribers: merge_guards_bot, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D71671
On Darwin, when used for generating a linked binary from a source file
(through an intermediate object file), the driver will invoke `cc1` to
generate a temporary object file. The temporary remark file will now be
emitted next to the object file, which will then be picked up by
`dsymutil` and emitted in the .dSYM bundle.
This is available for all formats except YAML since by default, YAML
doesn't need a section and the remark file will be lost.
When clang is invoked with a source file without -c or -S, it creates a
cc1 job, a linker job and if debug info is requested, a dsymutil job. In
case of remarks, we should also create a dsymutil job to avoid losing
the remarks that will be generated in a tempdir that gets removed.
Differential Revision: https://reviews.llvm.org/D71675