mirror of
https://github.com/Gericom/teak-llvm.git
synced 2025-06-20 20:15:49 -04:00

This patch adds BPF Debug Format (BTF) as a standalone LLVM debuginfo. The BTF related sections are directly generated from IR. The BTF debuginfo is generated only when the compilation target is BPF. What is BTF? ============ First, the BPF is a linux kernel virtual machine and widely used for tracing, networking and security. https://www.kernel.org/doc/Documentation/networking/filter.txt https://cilium.readthedocs.io/en/v1.2/bpf/ BTF is the debug info format for BPF, introduced in the below linux patch69b693f0ae (diff-06fb1c8825f653d7e539058b72c83332)
in the patch set mentioned in the below lwn article. https://lwn.net/Articles/752047/ The BTF format is specified in the above github commit. In summary, its layout looks like struct btf_header type subsection (a list of types) string subsection (a list of strings) With such information, the kernel and the user space is able to pretty print a particular bpf map key/value. One possible example below: Withtout BTF: key: [ 0x01, 0x01, 0x00, 0x00 ] With BTF: key: struct t { a : 1; b : 1; c : 0} where struct is defined as struct t { char a; char b; short c; }; How BTF is generated? ===================== Currently, the BTF is generated through pahole. https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=68645f7facc2eb69d0aeb2dd7d2f0cac0feb4d69 and available in pahole v1.12 https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=4a21c5c8db0fcd2a279d067ecfb731596de822d4 Basically, the bpf program needs to be compiled with -g with dwarf sections generated. The pahole is enhanced such that a .BTF section can be generated based on dwarf. This format of the .BTF section matches the format expected by the kernel, so a bpf loader can just take the .BTF section and load it into the kernel.8a138aed4a
The .BTF section layout is also specified in this patch: with file include/llvm/BinaryFormat/BTF.h. What use cases this patch tries to address? =========================================== Currently, only the bpf instruction stream is required to pass to the kernel. The kernel verifies it, jits it if configured to do so, attaches it to a particular kernel attachment point, and later executes when a particular event happens. This patch tries to expand BTF to support two more use cases below: (1). BPF supports subroutine calls. During performance analysis, it would be good to differentiate which call is hot instead of just providing a virtual address. This would require to pass a unique identifier for each subroutine to the kernel, the subroutine name is a natual choice. (2). If a particular jitted instruction is hot, we want user to know which source line this jitted instruction belongs to. This would require the source information is available to various profiling tools. Note that in a single ELF file, . there may be multiple loadable bpf programs, . for a particular to-be-loaded bpf instruction stream, its instructions may come from multiple PROGBITS sections, the bpf loader needs to merge them together to a single consecutive insn stream before loading to the kernel. For example: section .text: subroutines funcFoo section _progA: calling funcFoo section _progB: calling funcFoo The bpf loader could construct two loadable bpf instruction streams and load them into the kernel: . _progA funcFoo . _progB funcFoo So per ELF section function offset and instruction offset will need to be adjusted before passing to the kernel, and the kernel essentially expect only one code section regardless of how many in the ELF file. What do we propose and Why? =========================== To support the above two use cases, we propose to add an additional section, .BTF.ext, to the ELF file which is the input of the bpf loader. A different section is preferred since loader may need to manipulate it before loading part of its data to the kernel. The .BTF.ext section has a similar header to the .BTF section and it contains two subsections for func_info and line_info. . the func_info maps the func insn byte offset to a func type in the .BTF type subsection. . the line_info maps the insn byte offset to a line info. . both func_info and line_info subsections are organized by ELF PROGBITS AX sections. pahole is not a good place to implement .BTF.ext as pahole is mostly for structure hole information and more importantly, we want to pass the actual code to the kernel. . bpf program typically is small so storage overhead should be small. . in bpf land, it is totally possible that an application loads the bpf program into the kernel and then that application quits, so holding debug info by the user space application is not practical as you may not even know who loads this bpf program. . having source codes directly kept by kernel would ease deployment since the original source code does not need ship on every hosts and kernel-devel package does not need to be deployed even if kernel headers are used. LLVM is a good place to implement. . The only reliable time to get the source code is during compilation time. This will result in both more accurate information and easier deployment as stated in the above. . Another consideration is for JIT. The project like bcc (https://github.com/iovisor/bcc) use MCJIT to compile a C program into bpf insns and load them to the kernel. The llvm generated BTF sections will be readily available for such cases as well. Design and implementation of emiting .BTF/.BTF.ext sections =========================================================== The BTF debuginfo format is defined. Both .BTF and .BTF.ext sections are generated directly from IR when both "-target bpf" and "-g" are specified. Note that dwarf sections are still generated as dwarf is used by user space tools like llvm-objdump etc. for BPF target. This patch also contains tests to verify generated .BTF and .BTF.ext sections for all supported types, func_info and line_info subsections. The patch is also tested against linux kernel bpf sample tests and selftests. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D53736 llvm-svn: 347999
325 lines
9.8 KiB
C++
325 lines
9.8 KiB
C++
//===-- llvm/lib/CodeGen/AsmPrinter/DebugHandlerBase.cpp -------*- C++ -*--===//
|
|
//
|
|
// The LLVM Compiler Infrastructure
|
|
//
|
|
// This file is distributed under the University of Illinois Open Source
|
|
// License. See LICENSE.TXT for details.
|
|
//
|
|
//===----------------------------------------------------------------------===//
|
|
//
|
|
// Common functionality for different debug information format backends.
|
|
// LLVM currently supports DWARF, CodeView and BTF.
|
|
//
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
#include "DebugHandlerBase.h"
|
|
#include "llvm/ADT/Optional.h"
|
|
#include "llvm/ADT/Twine.h"
|
|
#include "llvm/CodeGen/AsmPrinter.h"
|
|
#include "llvm/CodeGen/MachineFunction.h"
|
|
#include "llvm/CodeGen/MachineInstr.h"
|
|
#include "llvm/CodeGen/MachineModuleInfo.h"
|
|
#include "llvm/CodeGen/TargetSubtargetInfo.h"
|
|
#include "llvm/IR/DebugInfo.h"
|
|
#include "llvm/MC/MCStreamer.h"
|
|
|
|
using namespace llvm;
|
|
|
|
#define DEBUG_TYPE "dwarfdebug"
|
|
|
|
Optional<DbgVariableLocation>
|
|
DbgVariableLocation::extractFromMachineInstruction(
|
|
const MachineInstr &Instruction) {
|
|
DbgVariableLocation Location;
|
|
if (!Instruction.isDebugValue())
|
|
return None;
|
|
if (!Instruction.getOperand(0).isReg())
|
|
return None;
|
|
Location.Register = Instruction.getOperand(0).getReg();
|
|
Location.FragmentInfo.reset();
|
|
// We only handle expressions generated by DIExpression::appendOffset,
|
|
// which doesn't require a full stack machine.
|
|
int64_t Offset = 0;
|
|
const DIExpression *DIExpr = Instruction.getDebugExpression();
|
|
auto Op = DIExpr->expr_op_begin();
|
|
while (Op != DIExpr->expr_op_end()) {
|
|
switch (Op->getOp()) {
|
|
case dwarf::DW_OP_constu: {
|
|
int Value = Op->getArg(0);
|
|
++Op;
|
|
if (Op != DIExpr->expr_op_end()) {
|
|
switch (Op->getOp()) {
|
|
case dwarf::DW_OP_minus:
|
|
Offset -= Value;
|
|
break;
|
|
case dwarf::DW_OP_plus:
|
|
Offset += Value;
|
|
break;
|
|
default:
|
|
continue;
|
|
}
|
|
}
|
|
} break;
|
|
case dwarf::DW_OP_plus_uconst:
|
|
Offset += Op->getArg(0);
|
|
break;
|
|
case dwarf::DW_OP_LLVM_fragment:
|
|
Location.FragmentInfo = {Op->getArg(1), Op->getArg(0)};
|
|
break;
|
|
case dwarf::DW_OP_deref:
|
|
Location.LoadChain.push_back(Offset);
|
|
Offset = 0;
|
|
break;
|
|
default:
|
|
return None;
|
|
}
|
|
++Op;
|
|
}
|
|
|
|
// Do one final implicit DW_OP_deref if this was an indirect DBG_VALUE
|
|
// instruction.
|
|
// FIXME: Replace these with DIExpression.
|
|
if (Instruction.isIndirectDebugValue())
|
|
Location.LoadChain.push_back(Offset);
|
|
|
|
return Location;
|
|
}
|
|
|
|
DebugHandlerBase::DebugHandlerBase(AsmPrinter *A) : Asm(A), MMI(Asm->MMI) {}
|
|
|
|
// Each LexicalScope has first instruction and last instruction to mark
|
|
// beginning and end of a scope respectively. Create an inverse map that list
|
|
// scopes starts (and ends) with an instruction. One instruction may start (or
|
|
// end) multiple scopes. Ignore scopes that are not reachable.
|
|
void DebugHandlerBase::identifyScopeMarkers() {
|
|
SmallVector<LexicalScope *, 4> WorkList;
|
|
WorkList.push_back(LScopes.getCurrentFunctionScope());
|
|
while (!WorkList.empty()) {
|
|
LexicalScope *S = WorkList.pop_back_val();
|
|
|
|
const SmallVectorImpl<LexicalScope *> &Children = S->getChildren();
|
|
if (!Children.empty())
|
|
WorkList.append(Children.begin(), Children.end());
|
|
|
|
if (S->isAbstractScope())
|
|
continue;
|
|
|
|
for (const InsnRange &R : S->getRanges()) {
|
|
assert(R.first && "InsnRange does not have first instruction!");
|
|
assert(R.second && "InsnRange does not have second instruction!");
|
|
requestLabelBeforeInsn(R.first);
|
|
requestLabelAfterInsn(R.second);
|
|
}
|
|
}
|
|
}
|
|
|
|
// Return Label preceding the instruction.
|
|
MCSymbol *DebugHandlerBase::getLabelBeforeInsn(const MachineInstr *MI) {
|
|
MCSymbol *Label = LabelsBeforeInsn.lookup(MI);
|
|
assert(Label && "Didn't insert label before instruction");
|
|
return Label;
|
|
}
|
|
|
|
// Return Label immediately following the instruction.
|
|
MCSymbol *DebugHandlerBase::getLabelAfterInsn(const MachineInstr *MI) {
|
|
return LabelsAfterInsn.lookup(MI);
|
|
}
|
|
|
|
// Return the function-local offset of an instruction.
|
|
const MCExpr *
|
|
DebugHandlerBase::getFunctionLocalOffsetAfterInsn(const MachineInstr *MI) {
|
|
MCContext &MC = Asm->OutContext;
|
|
|
|
MCSymbol *Start = Asm->getFunctionBegin();
|
|
const auto *StartRef = MCSymbolRefExpr::create(Start, MC);
|
|
|
|
MCSymbol *AfterInsn = getLabelAfterInsn(MI);
|
|
assert(AfterInsn && "Expected label after instruction");
|
|
const auto *AfterRef = MCSymbolRefExpr::create(AfterInsn, MC);
|
|
|
|
return MCBinaryExpr::createSub(AfterRef, StartRef, MC);
|
|
}
|
|
|
|
/// If this type is derived from a base type then return base type size.
|
|
uint64_t DebugHandlerBase::getBaseTypeSize(const DITypeRef TyRef) {
|
|
DIType *Ty = TyRef.resolve();
|
|
assert(Ty);
|
|
DIDerivedType *DDTy = dyn_cast<DIDerivedType>(Ty);
|
|
if (!DDTy)
|
|
return Ty->getSizeInBits();
|
|
|
|
unsigned Tag = DDTy->getTag();
|
|
|
|
if (Tag != dwarf::DW_TAG_member && Tag != dwarf::DW_TAG_typedef &&
|
|
Tag != dwarf::DW_TAG_const_type && Tag != dwarf::DW_TAG_volatile_type &&
|
|
Tag != dwarf::DW_TAG_restrict_type && Tag != dwarf::DW_TAG_atomic_type)
|
|
return DDTy->getSizeInBits();
|
|
|
|
DIType *BaseType = DDTy->getBaseType().resolve();
|
|
|
|
if (!BaseType)
|
|
return 0;
|
|
|
|
// If this is a derived type, go ahead and get the base type, unless it's a
|
|
// reference then it's just the size of the field. Pointer types have no need
|
|
// of this since they're a different type of qualification on the type.
|
|
if (BaseType->getTag() == dwarf::DW_TAG_reference_type ||
|
|
BaseType->getTag() == dwarf::DW_TAG_rvalue_reference_type)
|
|
return Ty->getSizeInBits();
|
|
|
|
return getBaseTypeSize(BaseType);
|
|
}
|
|
|
|
static bool hasDebugInfo(const MachineModuleInfo *MMI,
|
|
const MachineFunction *MF) {
|
|
if (!MMI->hasDebugInfo())
|
|
return false;
|
|
auto *SP = MF->getFunction().getSubprogram();
|
|
if (!SP)
|
|
return false;
|
|
assert(SP->getUnit());
|
|
auto EK = SP->getUnit()->getEmissionKind();
|
|
if (EK == DICompileUnit::NoDebug)
|
|
return false;
|
|
return true;
|
|
}
|
|
|
|
void DebugHandlerBase::beginFunction(const MachineFunction *MF) {
|
|
PrevInstBB = nullptr;
|
|
|
|
if (!Asm || !hasDebugInfo(MMI, MF)) {
|
|
skippedNonDebugFunction();
|
|
return;
|
|
}
|
|
|
|
// Grab the lexical scopes for the function, if we don't have any of those
|
|
// then we're not going to be able to do anything.
|
|
LScopes.initialize(*MF);
|
|
if (LScopes.empty()) {
|
|
beginFunctionImpl(MF);
|
|
return;
|
|
}
|
|
|
|
// Make sure that each lexical scope will have a begin/end label.
|
|
identifyScopeMarkers();
|
|
|
|
// Calculate history for local variables.
|
|
assert(DbgValues.empty() && "DbgValues map wasn't cleaned!");
|
|
assert(DbgLabels.empty() && "DbgLabels map wasn't cleaned!");
|
|
calculateDbgEntityHistory(MF, Asm->MF->getSubtarget().getRegisterInfo(),
|
|
DbgValues, DbgLabels);
|
|
LLVM_DEBUG(DbgValues.dump());
|
|
|
|
// Request labels for the full history.
|
|
for (const auto &I : DbgValues) {
|
|
const auto &Ranges = I.second;
|
|
if (Ranges.empty())
|
|
continue;
|
|
|
|
// The first mention of a function argument gets the CurrentFnBegin
|
|
// label, so arguments are visible when breaking at function entry.
|
|
const DILocalVariable *DIVar = Ranges.front().first->getDebugVariable();
|
|
if (DIVar->isParameter() &&
|
|
getDISubprogram(DIVar->getScope())->describes(&MF->getFunction())) {
|
|
LabelsBeforeInsn[Ranges.front().first] = Asm->getFunctionBegin();
|
|
if (Ranges.front().first->getDebugExpression()->isFragment()) {
|
|
// Mark all non-overlapping initial fragments.
|
|
for (auto I = Ranges.begin(); I != Ranges.end(); ++I) {
|
|
const DIExpression *Fragment = I->first->getDebugExpression();
|
|
if (std::all_of(Ranges.begin(), I,
|
|
[&](DbgValueHistoryMap::InstrRange Pred) {
|
|
return !Fragment->fragmentsOverlap(
|
|
Pred.first->getDebugExpression());
|
|
}))
|
|
LabelsBeforeInsn[I->first] = Asm->getFunctionBegin();
|
|
else
|
|
break;
|
|
}
|
|
}
|
|
}
|
|
|
|
for (const auto &Range : Ranges) {
|
|
requestLabelBeforeInsn(Range.first);
|
|
if (Range.second)
|
|
requestLabelAfterInsn(Range.second);
|
|
}
|
|
}
|
|
|
|
// Ensure there is a symbol before DBG_LABEL.
|
|
for (const auto &I : DbgLabels) {
|
|
const MachineInstr *MI = I.second;
|
|
requestLabelBeforeInsn(MI);
|
|
}
|
|
|
|
PrevInstLoc = DebugLoc();
|
|
PrevLabel = Asm->getFunctionBegin();
|
|
beginFunctionImpl(MF);
|
|
}
|
|
|
|
void DebugHandlerBase::beginInstruction(const MachineInstr *MI) {
|
|
if (!MMI->hasDebugInfo())
|
|
return;
|
|
|
|
assert(CurMI == nullptr);
|
|
CurMI = MI;
|
|
|
|
// Insert labels where requested.
|
|
DenseMap<const MachineInstr *, MCSymbol *>::iterator I =
|
|
LabelsBeforeInsn.find(MI);
|
|
|
|
// No label needed.
|
|
if (I == LabelsBeforeInsn.end())
|
|
return;
|
|
|
|
// Label already assigned.
|
|
if (I->second)
|
|
return;
|
|
|
|
if (!PrevLabel) {
|
|
PrevLabel = MMI->getContext().createTempSymbol();
|
|
Asm->OutStreamer->EmitLabel(PrevLabel);
|
|
}
|
|
I->second = PrevLabel;
|
|
}
|
|
|
|
void DebugHandlerBase::endInstruction() {
|
|
if (!MMI->hasDebugInfo())
|
|
return;
|
|
|
|
assert(CurMI != nullptr);
|
|
// Don't create a new label after DBG_VALUE and other instructions that don't
|
|
// generate code.
|
|
if (!CurMI->isMetaInstruction()) {
|
|
PrevLabel = nullptr;
|
|
PrevInstBB = CurMI->getParent();
|
|
}
|
|
|
|
DenseMap<const MachineInstr *, MCSymbol *>::iterator I =
|
|
LabelsAfterInsn.find(CurMI);
|
|
CurMI = nullptr;
|
|
|
|
// No label needed.
|
|
if (I == LabelsAfterInsn.end())
|
|
return;
|
|
|
|
// Label already assigned.
|
|
if (I->second)
|
|
return;
|
|
|
|
// We need a label after this instruction.
|
|
if (!PrevLabel) {
|
|
PrevLabel = MMI->getContext().createTempSymbol();
|
|
Asm->OutStreamer->EmitLabel(PrevLabel);
|
|
}
|
|
I->second = PrevLabel;
|
|
}
|
|
|
|
void DebugHandlerBase::endFunction(const MachineFunction *MF) {
|
|
if (hasDebugInfo(MMI, MF))
|
|
endFunctionImpl(MF);
|
|
DbgValues.clear();
|
|
DbgLabels.clear();
|
|
LabelsBeforeInsn.clear();
|
|
LabelsAfterInsn.clear();
|
|
}
|