Skip to content

Commit 8824931

Browse files
authored
chore(build, crashtracker): build w/ debug symbols (#14286)
Build all native extensions with debug symbols by default - `Cython.Distutils.Extension` and `setuptools.Extensions`: they inherit `CFLAGS` from Python 's sysconfig, (`python -m sysconfig | grep CFLAGS`), which would show `CFLAGS = "-fno-strict-overflow -Wsign-compare -DNDEBUG -g -O3 -Wall"`. So they're built with debug symbols always. - `setuptools_rust.Extension`: We only have one Rust native extension in `src/native`. This PR explicitly sets `debug ="line-tables-only"`. - Our own `CMakeExtension`: Use CMake's `RelWithDebInfo` build type. By default this would change optimization flag for Extensions built with cmake, but only one is affected. Extension | Release -O flag | RelWithDebInfo -O flag -- | -- | -- ddtrace.internal.datadog.profiling.ddup._ddup | -Os as defined in [AnalysisFunc.cmake](https://github.com/DataDog/dd-trace-py/blob/4ed3d21d4efd662c640720fe0ddf88088026e1a6/ddtrace/internal/datadog/profiling/cmake/AnalysisFunc.cmake#L11) | -Os as defined in [AnalysisFunc.cmake](https://github.com/DataDog/dd-trace-py/blob/4ed3d21d4efd662c640720fe0ddf88088026e1a6/ddtrace/internal/datadog/profiling/cmake/AnalysisFunc.cmake#L26) ddtrace.internal.datadog.profiling.stack_v2._stack_v2 | -Os, same as above | -Os, same as above ddtrace.appsec._iast._taint_tracking._native | Not explicitly overridden,probably -O3 | Not explicitly overridden,probably -O2 For both Linux and macOS, we emit debug symbols and then strip them from the shared libraries. This step is done before auditing/delocating wheel in the build GH action. And the build GH action will also upload zipped debug symbols for Linux and macOS as artifacts. The idea behind this is that external developers could also download these and use if they wish. For our internal crashtracker use, we'd need to upload them to our symbols backend, and that will be done in a follow up PR. We'd like to upload debug symbols for all future releases and release candidates by default. For other builds, we'd have a manual trigger job. When there's no debug symbols found from any of the .so/.dylib files, except for those set to be ignored (e.g. libddwaf, which already is stripped of debug symbols), the build GH actions will fail and tell you which one doesn't have debug symbols. See https://github.com/DataDog/dd-trace-py/actions/runs/17137131985/job/48615775619?pr=14389 ``` ERROR: Failed to generate debug symbols for the following libraries: - ddtrace/appsec/_iast/_stacktrace.cpython-38-aarch64-linux-gnu.so - ddtrace/appsec/_iast/_ast/iastpatch.cpython-38-aarch64-linux-gnu.so - ddtrace/appsec/_iast/_taint_tracking/_native.cpython-38-aarch64-linux-gnu.so ... This indicates that these binaries were built without debug symbols (-g flag) or they were already stripped ERROR: Failed to extract debug symbols from wheel ``` ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
1 parent bc343f1 commit 8824931

File tree

11 files changed

+643
-20
lines changed

11 files changed

+643
-20
lines changed

.github/workflows/build_python_3.yml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,14 +70,22 @@ jobs:
7070
# `platform.mac_ver()` reports incorrect MacOS version at 11.0
7171
# See: https://stackoverflow.com/a/65402241
7272
CIBW_ENVIRONMENT_MACOS: CMAKE_BUILD_PARALLEL_LEVEL=24 SYSTEM_VERSION_COMPAT=0 CMAKE_ARGS="-DNATIVE_TESTING=OFF"
73+
# cibuildwheel repair will copy anything's under /output directory from the
74+
# build container to the host machine. This is a bit hacky way, but seems
75+
# to be the only way getting debug symbols out from the container while
76+
# we don't mess up with RECORD file.
7377
CIBW_REPAIR_WHEEL_COMMAND_LINUX: |
78+
mkdir -p /output/debugwheelhouse &&
79+
python scripts/extract_debug_symbols.py {wheel} --output-dir /output/debugwheelhouse &&
7480
python scripts/zip_filter.py {wheel} \*.c \*.cpp \*.cc \*.h \*.hpp \*.pyx \*.md &&
7581
mkdir ./tempwheelhouse &&
7682
unzip -l {wheel} | grep '\.so' &&
7783
auditwheel repair -w ./tempwheelhouse {wheel} &&
7884
mv ./tempwheelhouse/*.whl {dest_dir} &&
7985
rm -rf ./tempwheelhouse
8086
CIBW_REPAIR_WHEEL_COMMAND_MACOS: |
87+
mkdir -p ./debugwheelhouse &&
88+
python scripts/extract_debug_symbols.py {wheel} --output-dir ./debugwheelhouse &&
8189
python scripts/zip_filter.py {wheel} \*.c \*.cpp \*.cc \*.h \*.hpp \*.pyx \*.md &&
8290
MACOSX_DEPLOYMENT_TARGET=12.7 delocate-wheel --require-archs {delocate_archs} -w {dest_dir} -v {wheel}
8391
CIBW_REPAIR_WHEEL_COMMAND_WINDOWS: python scripts/zip_filter.py "{wheel}" "*.c" "*.cpp" "*.cc" "*.h" "*.hpp" "*.pyx" "*.md" && mv "{wheel}" "{dest_dir}"
@@ -126,3 +134,11 @@ jobs:
126134
with:
127135
name: wheels-${{ env.ARTIFACT_NAME }}
128136
path: ./wheelhouse/*.whl
137+
138+
- uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
139+
if: runner.os != 'Windows'
140+
with:
141+
name: debug-symbols-${{ env.ARTIFACT_NAME }}
142+
path: |
143+
./debugwheelhouse/*.zip
144+
./wheelhouse/debugwheelhouse/*.zip

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,3 +195,6 @@ tests/appsec/iast/fixtures/taint_sinks/not_exists.txt
195195
# .env file
196196
.env
197197
.envrc
198+
199+
*.debug
200+
*.dSYM/

.gitlab/download-wheels-from-gh-actions.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ fi
6767

6868
echo "Github workflow finished. Downloading wheels"
6969
# download all wheels
70-
gh run download $RUN_ID --repo DataDog/dd-trace-py
70+
gh run download $RUN_ID --repo DataDog/dd-trace-py --pattern "wheels-*" --pattern "source-dist*"
7171

7272
cd ..
7373

ddtrace/appsec/_iast/_taint_tracking/native.cpp

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -42,11 +42,12 @@ static PyMethodDef AspectsMethods[] = {
4242
{ nullptr, nullptr, 0, nullptr }
4343
};
4444

45-
static struct PyModuleDef aspects = { PyModuleDef_HEAD_INIT,
46-
.m_name = PY_MODULE_NAME_ASPECTS,
47-
.m_doc = "Taint tracking Aspects",
48-
.m_size = -1,
49-
.m_methods = AspectsMethods };
45+
// Mark the module as used to prevent it from being stripped.
46+
static struct PyModuleDef aspects __attribute__((used)) = { PyModuleDef_HEAD_INIT,
47+
.m_name = PY_MODULE_NAME_ASPECTS,
48+
.m_doc = "Taint tracking Aspects",
49+
.m_size = -1,
50+
.m_methods = AspectsMethods };
5051

5152
static PyMethodDef OpsMethods[] = {
5253
{ "new_pyobject_id", (PyCFunction)api_new_pyobject_id, METH_FASTCALL, "new pyobject id" },
@@ -55,11 +56,12 @@ static PyMethodDef OpsMethods[] = {
5556
{ nullptr, nullptr, 0, nullptr }
5657
};
5758

58-
static struct PyModuleDef ops = { PyModuleDef_HEAD_INIT,
59-
.m_name = PY_MODULE_NAME_ASPECTS,
60-
.m_doc = "Taint tracking operations",
61-
.m_size = -1,
62-
.m_methods = OpsMethods };
59+
// Mark the module as used to prevent it from being stripped.
60+
static struct PyModuleDef ops __attribute__((used)) = { PyModuleDef_HEAD_INIT,
61+
.m_name = PY_MODULE_NAME_ASPECTS,
62+
.m_doc = "Taint tracking operations",
63+
.m_size = -1,
64+
.m_methods = OpsMethods };
6365

6466
/**
6567
* This function initializes the native module.

ddtrace/internal/datadog/profiling/cmake/AnalysisFunc.cmake

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,6 @@ function(add_ddup_config target)
2626
"$<$<CONFIG:RelWithDebInfo>:-Os;-ggdb3>" -fno-semantic-interposition)
2727
endif()
2828

29-
# Common link options
30-
target_link_options(${target} PRIVATE "$<$<CONFIG:RelWithDebInfo>:>")
31-
3229
if(CMAKE_SYSTEM_NAME STREQUAL "Darwin")
3330
# macOS-specific linker options
3431
target_link_options(${target} PRIVATE "$<$<CONFIG:Release>:-Wl,-dead_strip>")
@@ -46,11 +43,19 @@ function(add_ddup_config target)
4643
-Wl,--exclude-libs,ALL)
4744
endif()
4845

49-
# If we can IPO, then do so
46+
# If we can IPO, then do so.
5047
check_ipo_supported(RESULT result)
5148

5249
if(result)
53-
set_property(TARGET ${target} PROPERTY INTERPROCEDURAL_OPTIMIZATION TRUE)
50+
if(CMAKE_CXX_COMPILER_ID MATCHES "AppleClang")
51+
# When using AppleClang, explicitly use thin LTO to match Rust's thin LTO strategy. And set the object path
52+
# for debug symbols.
53+
target_compile_options(${target} PRIVATE -flto=thin)
54+
target_link_options(${target} PRIVATE -flto=thin)
55+
target_link_options(${target} PRIVATE -Wl,-object_path_lto,${CMAKE_CURRENT_BINARY_DIR}/${target}_lto.o)
56+
else()
57+
set_property(TARGET ${target} PROPERTY INTERPROCEDURAL_OPTIMIZATION TRUE)
58+
endif()
5459
endif()
5560

5661
# Propagate sanitizers
@@ -85,4 +90,5 @@ function(add_ddup_config target)
8590
# The main targets, ddup, crashtracker, stack_v2, and dd_wrapper are built as dynamic libraries, so PIC is required.
8691
# And setting this is also fine for tests as they're loading those dynamic libraries.
8792
set_target_properties(${target} PROPERTIES POSITION_INDEPENDENT_CODE ON)
93+
8894
endfunction()

docs/debug_symbols.rst

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
Debugging Native Extensions with Debug Symbols
2+
==============================================
3+
4+
dd-trace-py is built with debug symbols by default, and packaged separately from the main wheel files to reduce the size of the primary distribution packages.
5+
6+
Debug Symbol Files
7+
------------------
8+
9+
The project generates debug symbols during the build process:
10+
11+
- **Linux**: ``.debug`` files (using ``objcopy --only-keep-debug``)
12+
- **macOS**: ``.dSYM`` bundles (using ``dsymutil``)
13+
14+
These debug symbols are extracted from the main wheels and packaged into separate `.zip` files with the naming convention:
15+
16+
::
17+
18+
{original-wheel-name}-debug-symbols.zip
19+
20+
For example:
21+
22+
- ``ddtrace-1.20.0-cp39-cp39-linux_x86_64.whl`` → ``ddtrace-1.20.0-cp39-cp39-linux_x86_64-debug-symbols.zip``
23+
- ``ddtrace-1.20.0-cp39-cp39-macosx_10_9_x86_64.whl`` → ``ddtrace-1.20.0-cp39-cp39-macosx_10_9_x86_64-debug-symbols.zip``
24+
25+
Build Process
26+
-------------
27+
28+
The debug symbols are handled automatically during the CI build process:
29+
30+
1. Wheels are built with debug symbols included
31+
2. Debug symbols are extracted using the ``scripts/extract_debug_symbols.py`` script
32+
3. Debug symbols are removed from the main wheel to reduce size
33+
4. Separate debug symbol packages are created and uploaded as artifacts
34+
35+
Usage
36+
-----
37+
38+
To use debug symbols for debugging or crash analysis:
39+
40+
1. Download the appropriate debug symbol package for your platform and Python version
41+
2. Extract the debug symbol files to the same directory as the corresponding `.so` files.
42+
Typically, the site-packages directory where ddtrace is installed.
43+
3. Your debugger or crash analysis tool should automatically find the debug symbols
44+
4. To view assembly with code side by side, you also need the source code, and
45+
set substitute paths in your debugger to the source code directory. For example,
46+
for ``_stack_v2.cpython-313-x86_64-linux-gnu.so`` is compiled from
47+
echion as specified in ``ddtrace/internal/datadog/profiling/stack_v2/CMakeLists.txt``.
48+
So you first need to check out the echion repository and checkout the commit hash.
49+
Then, set substitute paths in gdb to the echion source code directory.
50+
Typically, if you run ``dias /m <symbol>`` in gdb, it will tell you the full
51+
file path of the source code as the following:
52+
53+
.. code-block:: bash
54+
55+
(gdb) disas /m Frame::read
56+
Dump of assembler code for function _ZN5Frame4readEP19_PyInterpreterFramePS1_:
57+
269 /project/build/cmake.linux-x86_64-cpython-313/ddtrace.internal.datadog.profiling.stack_v2._stack_v2/_deps/echion-src/echion/frame.cc: No such file or directory.
58+
0x000000000000ece4 <+0>: push %r12
59+
0x000000000000ece6 <+2>: mov %rdi,%r8
60+
0x000000000000ece9 <+5>: push %rbp
61+
0x000000000000ecea <+6>: mov %rsi,%rbp
62+
0x000000000000eced <+9>: push %rbx
63+
0x000000000000ecee <+10>: sub $0x60,%rsp
64+
65+
270 in /project/build/cmake.linux-x86_64-cpython-313/ddtrace.internal.datadog.profiling.stack_v2._stack_v2/_deps/echion-src/echion/frame.cc
66+
271 in /project/build/cmake.linux-x86_64-cpython-313/ddtrace.internal.datadog.profiling.stack_v2._stack_v2/_deps/echion-src/echion/frame.cc
67+
68+
Then you can set substitute paths in gdb to the echion source code directory
69+
70+
.. code-block:: bash
71+
72+
(gdb) set substitute-path /project/build/cmake.linux-x86_64-cpython-313/ddtrace.internal.datadog.profiling.stack_v2._stack_v2/_deps/echion-src/echion /path/to/echion/source/code
73+
74+
Run ``dias /m Frame::read`` again to see the assembly with code side by side.
75+
76+
.. code-block:: bash
77+
78+
(gdb) disas /m Frame::read
79+
Dump of assembler code for function _ZN5Frame4readEP19_PyInterpreterFramePS1_:
80+
warning: Source file is more recent than executable.
81+
269 {
82+
0x000000000000ece4 <+0>: push %r12
83+
0x000000000000ece6 <+2>: mov %rdi,%r8
84+
0x000000000000ece9 <+5>: push %rbp
85+
0x000000000000ecea <+6>: mov %rsi,%rbp
86+
0x000000000000eced <+9>: push %rbx
87+
0x000000000000ecee <+10>: sub $0x60,%rsp
88+
89+
270 #if PY_VERSION_HEX >= 0x030b0000
90+
271 _PyInterpreterFrame iframe;
91+
92+
272 #if PY_VERSION_HEX >= 0x030d0000
93+
273 // From Python versions 3.13, f_executable can have objects other than
94+
274 // code objects for an internal frame. We need to skip some frames if
95+
275 // its f_executable is not code as suggested here:
96+
276 // https://github.com/python/cpython/issues/100987#issuecomment-1485556487
97+
277 PyObject f_executable;
98+
99+
278
100+
279 for (; frame_addr; frame_addr = frame_addr->previous)
101+
0x000000000000ecf7 <+19>: test %r8,%r8
102+
0x000000000000ecfa <+22>: je 0xed91 <_ZN5Frame4readEP19_PyInterpreterFramePS1_+173>
103+
0x000000000000ed88 <+164>: mov 0x8(%rbx),%r8
104+
0x000000000000ed8c <+168>: jmp 0xecf7 <_ZN5Frame4readEP19_PyInterpreterFramePS1_+19>
105+
106+
On lldb, you can find the source code full path by running ``image lookup -n Frame::read --verbose``,
107+
and set the source code path using ``settings set target.source-map <expected-path> <actual-path>``.

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -283,6 +283,7 @@ Indices and tables
283283
basic_usage
284284
advanced_usage
285285
build_system
286+
debug_symbols
286287
benchmarks
287288
contributing
288289
troubleshooting

docs/spelling_wordlist.txt

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ dramatiq
9595
Dramatiq
9696
dsn
9797
dunder
98+
echion
9899
eg
99100
elasticsearch
100101
elasticsearch1
@@ -116,6 +117,7 @@ flamegraph
116117
fnmatch
117118
formatter
118119
freezegun
120+
gdb
119121
genai
120122
generativeai
121123
gevent
@@ -340,4 +342,4 @@ wsgi
340342
xfail
341343
yaaredis
342344
openai-agents
343-
validators
345+
validators

0 commit comments

Comments
 (0)