Skip to content

Commit 07a9c5e

Browse files
authored
Add debuginfos documentation and link to it from debuginfod (#242)
1 parent b3280bb commit 07a9c5e

File tree

4 files changed

+103
-37
lines changed

4 files changed

+103
-37
lines changed

docs/debuginfod.md

Lines changed: 16 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,49 +1,28 @@
1-
# debuginfod client
1+
# debuginfod support
22

3-
In order to symbolize ingested profiles, Parca needs to have debug information
4-
for the binaries that are being profiled. Debug information, also
5-
referred to as _debuginfos_, can be ELF object files, DWARF debug data, and
6-
source code. However, sometimes, application packages distributed by various Linux
7-
distros [strip](https://man7.org/linux/man-pages/man1/strip.1.html) away debug
8-
information to minimize the size of the binaries. Thankfully, there are publicly accessible
9-
servers, distributing debug information for various Linux package managers and distributions.
3+
:::tip
104

11-
[debuginfod](https://www.mankier.com/8/debuginfod) is an HTTP file server that serves debug
12-
information to clients based on the build IDs of the binaries. You can find out the build ID
13-
of a binary using the `file` command on Linux.
5+
This page assumes familiarity with what debuginfos are. First read the [debuginfos](debuginfos) docs page if you are not already familiar with debuginfos.
146

15-
Here is an example to find out the build ID of a zsh shell:
7+
:::
8+
9+
Unfortunately, packages distributed by various Linux distros [strip](https://man7.org/linux/man-pages/man1/strip.1.html) away debuginfos to minimize the size of the binaries.
10+
11+
Thankfully, there are publicly accessible servers, distributing debuginfos for various Linux package managers and distributions.
12+
13+
[debuginfod](https://www.mankier.com/8/debuginfod) is an HTTP file server that serves debuginfos to clients based on the Build IDs of the binaries. You can find out the Build ID of a binary using the `file` command on Linux.
14+
15+
Here is an example to find out the Build ID of a zsh binary:
1616

1717
```
1818
$ file /bin/zsh
1919
2020
/bin/zsh: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=24fcd0179bb3aa797de6a570c2359e528f7638c0, for GNU/Linux 3.2.0, stripped
2121
```
22-
Parca integrates with debuginfod to query for upstream debuginfod files and then
23-
stores them for potential later use. The default debuginfod server used by Parca is
24-
at https://debuginfod.elfutils.org .
25-
26-
## Implementation
27-
28-
Primarily, Parca looks for the relevant debug information files in its default
29-
symbol store. However, if debug info files are not found in the symbol store,
30-
Parca will try to fetch corresponding debuginfo files from the upstream
31-
[debuginfod servers](https://sourceware.org/elfutils/Debuginfod.html), and store
32-
them in the Parca symbol store, associated with the unique build ID of the object files.
33-
34-
The symbol store is a wrapper around the [Parca object store](https://www.parca.dev/docs/storage#storing-debug-information)
35-
to hold debug information. By default, Parca is configured to use the `/tmp`
36-
directory on local disk. This can be reconfigured to use any other user-specified
37-
location by editing the Parca configuration file.
38-
39-
The debuginfod client is implemented as a read-through client storage cache.
40-
An HTTP client is implemented to send requests to the upstream debuginfod servers.
41-
The client queries the server addresses sequentially until it finds the suitable
42-
debuginfo files. The downloaded object files are then stored in a `parca/debuginfod`
43-
bucket with the build ID as the key.
44-
45-
Users can add private debuginfod servers to be queried through the
46-
` --debuginfod-upstream-servers` flag.
22+
23+
Parca integrates with debuginfod to query for upstream debuginfod files and then stores them for potential later use. The default debuginfod server used by Parca is: https://debuginfod.elfutils.org
24+
25+
To use a different set of debuginfod servers to attempt to retrieve debuginfos from use the `--debuginfod-upstream-servers` flag.
4726

4827
## Additional Resources
4928

docs/debuginfos.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# Debuginfos
2+
3+
Profiling raw data is just memory addresses that represent a function call stack and how often we observed the same stack. For example a function call stack might look like this:
4+
5+
```
6+
0x0b
7+
0x2a
8+
0x43
9+
```
10+
11+
In order for humans to understand what these memory addresses represent, we need a mapping from memory address to function names. That mapping is what is commonly referred to as "debuginfos".
12+
13+
Debuginfos are in form of sections within an [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) binary (ELF is the format of binaries used on Linux). ELF binaries have sections, and some of these sections contain the debuginfos. Most commonly debuginfos are in the [DWARF format](https://dwarfstd.org/doc/DWARF5.pdf).
14+
15+
Let's look at example DWARF of a tiny C program.
16+
17+
```c
18+
#include <stdio.h>
19+
int main() {
20+
printf("Hello, World!");
21+
return 0;
22+
}
23+
```
24+
25+
Compile it, enabling DWARF to be emitted (`-g`):
26+
27+
```bash
28+
zig cc -o mainc -g -target x86_64-linux main.c
29+
```
30+
31+
> Note: Any C compiler could have been used here but Zig's cross-compile support is very convenient, as it works well on any platform.
32+
33+
And let's use the `dwarfdump` tool to print everything.
34+
35+
```dwarfdump
36+
$ dwarfdump --show-form mainc
37+
mainc: file format elf64-x86-64
38+
39+
.debug_info contents:
40+
0x00000000: Compile Unit: length = 0x00000047, format = DWARF32, version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x0000004b)
41+
42+
0x0000000b: DW_TAG_compile_unit
43+
DW_AT_producer [DW_FORM_strp] ("Homebrew clang version 13.0.1")
44+
DW_AT_language [DW_FORM_data2] (DW_LANG_C99)
45+
DW_AT_name [DW_FORM_strp] ("main.c")
46+
DW_AT_stmt_list [DW_FORM_sec_offset] (0x00000000)
47+
DW_AT_comp_dir [DW_FORM_strp] ("/Users/brancz/src/github.com/polarsignals/polarsignals/pkg/debuginfo/objfile/testdata")
48+
DW_AT_low_pc [DW_FORM_addr] (0x0000000000201e20)
49+
DW_AT_high_pc [DW_FORM_data4] (0x00000016)
50+
51+
0x0000002a: DW_TAG_subprogram
52+
DW_AT_low_pc [DW_FORM_addr] (0x0000000000201e20)
53+
DW_AT_high_pc [DW_FORM_data4] (0x00000016)
54+
DW_AT_frame_base [DW_FORM_exprloc] (DW_OP_reg6 RBP)
55+
DW_AT_GNU_all_call_sites [DW_FORM_flag_present] (true)
56+
DW_AT_name [DW_FORM_strp] ("main")
57+
DW_AT_decl_file [DW_FORM_data1] ("/Users/brancz/src/github.com/polarsignals/polarsignals/pkg/debuginfo/objfile/testdata/main.c")
58+
DW_AT_decl_line [DW_FORM_data1] (2)
59+
DW_AT_type [DW_FORM_ref4] (0x00000043 "int")
60+
DW_AT_external [DW_FORM_flag_present] (true)
61+
62+
0x00000043: DW_TAG_base_type
63+
DW_AT_name [DW_FORM_strp] ("int")
64+
DW_AT_encoding [DW_FORM_data1] (DW_ATE_signed)
65+
DW_AT_byte_size [DW_FORM_data1] (0x04)
66+
67+
0x0000004a: NULL
68+
```
69+
70+
Looking at this output, we see the compilation unit, which is the top level unit, and right underneath it a `DW_TAG_subprogram`, which is our `main` function. It has an attribute called `DW_AT_low_pc` with the form `DW_FORM_addr` (which means it is a `uint64`), that describes the start of our function's memory range, as well as the `DW_AT_high_pc` with the form `DW_FORM_data4` (which is a `int32`), which is the offset from the `DW_AT_low_pc` representing the end of our function's memory range, so the memory range is `[0x201e20, 0x201e36)`. And lastly, important for symbolization is the `DW_AT_name` attribute with the form `DW_FORM_strp`, which is a string.
71+
72+
Essentially what this means for symbolization: Thanks to this entry, we know that if we encountered a memory address between `0x201e20` and `0x201e36`, then it would be the `main` function.

sidebars.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ module.exports = {
3838
"ingestion",
3939
"storage",
4040
"symbolization",
41+
"debuginfos",
4142
"debuginfod",
4243
{
4344
type: "link",

wordlist.txt

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -325,3 +325,17 @@ yaml
325325
Yomi
326326
Youtube
327327
zsh
328+
printf
329+
Zig's
330+
dwarfdump
331+
mainc
332+
DW
333+
Homebrew
334+
objfile
335+
pc
336+
stmt
337+
strp
338+
testdata
339+
RBP
340+
decl
341+
exprloc

0 commit comments

Comments
 (0)