Skip to content

[PROF-12372] (Draft) Hack for experimental support for publishing process context for fullhost profiler #4865

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented Aug 22, 2025

What does this PR do?

This PR imports the experimental code from
https://github.com/DataDog/fullhost-code-hotspots-wip/pull/2 (Datadog-internal repo for now!) into Ruby so we can test out the code on Ruby apps easily. See that original PR for more details on what that is and how it works.

I don't expect this PR to ever be merged, it's only for easier testing.

Motivation:

I'm opening this draft PR so I can add this to test apps in our reliability environment a bit more easily.

Change log entry

None. This is not expected to be merged.

Additional Notes:

Since this is a quick hack, I piggy-backed on the existing process discovery support, rather than separating it out too much.

How to test the change?

The PR linked above has a script that can be used to read the data.

Here's how it looks for me:

$ sudo otel_process_ctx_dump.sh 205306
Found OTEL context for PID 205306
Start address: 7f0f25efd000
00000000  4f 54 45 4c 5f 43 54 58  01 00 00 00 93 00 00 00  |OTEL_CTX........|
00000010  60 8a 1c 91 9c 5c 00 00                           |`....\..|
00000018
Parsed struct:
  otel_process_ctx_signature : "OTEL_CTX"
  otel_process_ctx_version   : 1
  otel_process_payload_size  : 147
  otel_process_payload       : 0x00005c9c911c8a60

Payload dump (147 bytes):
00000000  de 00 03 da 00 0c 73 65  72 76 69 63 65 2e 6e 61  |......service.na|
00000010  6d 65 da 00 11 74 65 73  74 2d 73 65 72 76 69 63  |me...test-servic|
00000020  65 2d 6e 61 6d 65 da 00  13 73 65 72 76 69 63 65  |e-name...service|
00000030  2e 69 6e 73 74 61 6e 63  65 2e 69 64 da 00 24 64  |.instance.id..$d|
00000040  35 30 63 33 33 63 30 2d  31 37 39 33 2d 34 37 63  |50c33c0-1793-47c|
00000050  30 2d 61 38 63 62 2d 65  38 36 62 38 62 36 61 35  |0-a8cb-e86b8b6a5|
00000060  38 62 36 da 00 1b 64 65  70 6c 6f 79 6d 65 6e 74  |8b6...deployment|
00000070  2e 65 6e 76 69 72 6f 6e  6d 65 6e 74 2e 6e 61 6d  |.environment.nam|
00000080  65 da 00 0f 74 68 69 73  2d 69 73 2d 74 68 65 2d  |e...this-is-the-|
00000090  65 6e 76                                          |env|
00000093

…text for fullhost profiler

**What does this PR do?**

This PR imports the experimental code from
DataDog/fullhost-code-hotspots-wip#2
(Datadog-internal repo for now!) into Ruby so we can test out the
code on Ruby apps easily.

See that original PR for more details.

**Motivation:**

I'm opening this draft PR so I can add this to test apps in our
reliability environment a bit more easily.

**Additional Notes:**

Since this is a quick hack, I piggy-backed on the existing
process discovery support, rather than separating it out too much.

**How to test the change?**

The PR linked above has a script that can be used to read the data.

Here's how it looks for me:

```
$ sudo otel_process_ctx_dump.sh 205306
Found OTEL context for PID 205306
Start address: 7f0f25efd000
00000000  4f 54 45 4c 5f 43 54 58  01 00 00 00 93 00 00 00  |OTEL_CTX........|
00000010  60 8a 1c 91 9c 5c 00 00                           |`....\..|
00000018
Parsed struct:
  otel_process_ctx_signature : "OTEL_CTX"
  otel_process_ctx_version   : 1
  otel_process_payload_size  : 147
  otel_process_payload       : 0x00005c9c911c8a60

Payload dump (147 bytes):
00000000  de 00 03 da 00 0c 73 65  72 76 69 63 65 2e 6e 61  |......service.na|
00000010  6d 65 da 00 11 74 65 73  74 2d 73 65 72 76 69 63  |me...test-servic|
00000020  65 2d 6e 61 6d 65 da 00  13 73 65 72 76 69 63 65  |e-name...service|
00000030  2e 69 6e 73 74 61 6e 63  65 2e 69 64 da 00 24 64  |.instance.id..$d|
00000040  35 30 63 33 33 63 30 2d  31 37 39 33 2d 34 37 63  |50c33c0-1793-47c|
00000050  30 2d 61 38 63 62 2d 65  38 36 62 38 62 36 61 35  |0-a8cb-e86b8b6a5|
00000060  38 62 36 da 00 1b 64 65  70 6c 6f 79 6d 65 6e 74  |8b6...deployment|
00000070  2e 65 6e 76 69 72 6f 6e  6d 65 6e 74 2e 6e 61 6d  |.environment.nam|
00000080  65 da 00 0f 74 68 69 73  2d 69 73 2d 74 68 65 2d  |e...this-is-the-|
00000090  65 6e 76                                          |env|
00000093
```
Copy link

datadog-official bot commented Aug 22, 2025

⚠️ Tests

⚠️ Warnings

🧪 1 Test failed

Datadog::Core::Configuration::Components::new is expected to receive build_health_metrics(##>}>, #, #) 1 time from rspec (Datadog)
Failure/Error: memfd = _native_store_tracer_metadata(logger, **metadata)
./lib/datadog/core/process_discovery.rb:19:in \`_native_store_tracer_metadata'
./lib/datadog/core/process_discovery.rb:19:in \`get_and_store_metadata'
./lib/datadog/core/configuration/components.rb:129:in \`initialize'
./spec/datadog/core/configuration/components_spec.rb:29:in \`new'
./spec/datadog/core/configuration/components_spec.rb:29:in \`block (2 levels) in <top (required)>'
./spec/datadog/core/configuration/components_spec.rb:103:in \`block (3 levels) in <top (required)>'
./spec/datadog/core/configuration/components_spec.rb:64:in \`block (3 levels) in <top (required)>'
/usr/local/bundle/gems/climate_control-1.2.0/lib/climate_control.rb:24:in \`block in modify'
/usr/local/bundle/gems/climate_control-1.2.0/lib/climate_control.rb:15:in \`modify'
...

ℹ️ Info

❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: c90f267 | Docs | Was this helpful? Give us feedback!

@pr-commenter
Copy link

pr-commenter bot commented Aug 22, 2025

Benchmarks

Benchmark execution time: 2025-08-22 11:11:51

Comparing candidate commit df68e4d in PR branch ivoanjo/experimental-process-ctx with baseline commit 319db66 in branch master.

Found 0 performance improvements and 1 performance regressions! Performance is the same for 43 metrics, 2 unstable metrics.

scenario:profiling - gvl benchmark samples

  • 🟥 throughput [-722.920op/s; -713.443op/s] or [-5.149%; -5.082%]

The semantics for `shutdown!` on reconfiguration are a bit different
from what I expected so the code to drop the context wasn't quite
correct. In particular, when reconfiguration happens, the new component
gets started before the older one shuts down; this is fine generally
but since the context is a singleton it means my approach of dropping
on `shutdown!` was not correct as it was tearing down the context
after updating it.

As a simplification, let's never drop the context.

+ Also fix updating the runtime-id on forks; the process discovery
module actually is incorrect as it wasn't handling this, and so I
did a bit of a heavy-handed thingy to fix.
@github-actions github-actions bot added the core Involves Datadog core libraries label Aug 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Involves Datadog core libraries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant