feat: add process tags to traces #5033

wantsui · 2025-11-07T22:46:53Z

What does this PR do?

The goal of AIDM-253 is to add process tags to the trace payloads.

After this gets merged, the next step is to add it for the other products.

To run the tests in docker

docker compose run --rm tracer-3.3 /bin/bash
bundle exec rake compile
bundle exec rake test:core_with_rails

Main tests:

BUNDLE_GEMFILE=/app/gemfiles/ruby_3.3_rails8.gemfile bundle exec rspec spec/datadog/core/environment/process_spec.rb
bundle exec rspec spec/datadog/tracing/transport/trace_formatter_spec.rb
bundle exec rspec spec/datadog/core/normalizer_spec.rb
bundle exec rspec spec/datadog/core/configuration/settings_spec.rb

Motivation:

We're trying to add process tags to various payloads so they can be used in different use cases.

Note I still want to try adding server type but I'll have to tackle that in a separate PR.

Change log entry

Yes. Add process tags to the trace payloads.

Additional Notes:

How to test the change?

… This is still missing memoization and additional tests.

github-actions · 2025-11-07T22:47:06Z

Thank you for updating Change log entry section 👏

^{Visited at: 2025-11-14 09:35:11 UTC}

datadog-official · 2025-11-07T22:51:11Z

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 6042830 | Docs | Datadog PR Page | Was this helpful? Give us feedback!}

lib/datadog/core/environment/process.rb

lib/datadog/tracing/configuration/settings.rb

marcotc · 2025-11-10T21:12:26Z

lib/datadog/tracing/transport/trace_formatter.rb

+        def tag_process_tags!
+          return unless trace.experimental_propagate_process_tags_enabled
+          process_tags = Core::Environment::Process.formatted_process_tags_k1_v1
+          return if process_tags.empty?


This is impossible right? If so, we can remove it, as it would give us a false sense of uncertainty here.

I think I fixed it in 8dae705 by just removing the check in process tags, but let me know if you spot issues with it!

…he payload has the process tag only when the feature is enabled.

…versions so this fixes that.

Co-authored-by: Marco Costa <[email protected]>

pr-commenter · 2025-11-10T22:17:05Z

Benchmarks

Benchmark execution time: 2025-11-18 23:19:43

Comparing candidate commit 6042830 in PR branch add-process-tags-to-tracing with baseline commit 49cee89 in branch master.

Found 0 performance improvements and 1 performance regressions! Performance is the same for 43 metrics, 2 unstable metrics.

scenario:tracing - Tracing.log_correlation

🟥 throughput [-10853.442op/s; -10495.857op/s] or [-9.774%; -9.452%]

wantsui · 2025-11-11T16:29:31Z

spec/datadog/tracing/transport/trace_formatter_spec.rb

+          format!
+          expect(first_span.meta).to include('_dd.tags.process')
+          expect(first_span.meta['_dd.tags.process']).to eq(Datadog::Core::Environment::Process.serialized)
+          # TODO figure out if we need an assertion for the value, ie


@marcotc - do you think there's value in asserting for the values of the tag? Or is the test in process_spec enough?

What you are doing with expect(first_span.meta['_dd.tags.process']).to eq(Datadog::Core::Environment::Process.serialized) seems good to me.

I wouldn't test realistic values.

The main thing to test here is that it's respecting the configuring option, which you did.

The main thing to test here is that it's respecting the configuring option.

Thanks! In that case it doesn't seem like I need to make any changes to the assertions then?

github-actions · 2025-11-11T18:43:19Z

Typing analysis

Note: Ignored files are excluded from the next sections.

Untyped methods

This PR introduces 1 partially typed method. It increases the percentage of typed methods from 54.67% to 54.77% (+0.1%).

Partially typed methods (+1-0)

❌ Introduced:

sig/datadog/core/normalizer.rbs:12
└── def self.normalize: (untyped original_value, ?remove_digit_start_char: bool) -> ::String

If you believe a method or an attribute is rightfully untyped or partially typed, you can add # untyped:accept to the end of the line to remove it from the stats.

…uby conflict with sqlite and it is not needed for this test

lib/datadog/core/normalizer.rb

Strech · 2025-11-14T09:41:18Z

lib/datadog/core/normalizer.rb

+        normalized_value.sub!(LEADING_INVALID_CHARS, "")
+        normalized_value.sub!(TRAILING_UNDERSCORES, "")
+        normalized_value.squeeze!('_')
+        normalized_value = normalized_value[MAX_CHARACTER_LENGTH]


Is there a value of having range when it could be normalized_value[0, 200]?

Not really, so I removed the range approach.

I played around with a few things here: 4747259 and now it looks like this:

normalized_value.slice!(MAX_CHARACTER_LENGTH..-1) if normalized_value.length > MAX_CHARACTER_LENGTH

(the conditional portion should help skip this operation if the text is small enough)

Let me know if this is more in line with what you're thinking of!

lib/datadog/tracing/transport/trace_formatter.rb

sig/datadog/core/environment/ext.rbs

Strech · 2025-11-14T09:43:12Z

sig/datadog/core/environment/process.rbs

+        @serialized: untyped
+
+        def self?.entrypoint_workdir: () -> untyped
+
+        def self?.entrypoint_type: () -> untyped
+
+        def self?.entrypoint_name: () -> untyped
+
+        def self?.entrypoint_basedir: () -> untyped
+        def self?.serialized_kv_helper: (untyped key, untyped value) -> ::String
+        def self?.serialized: () -> untyped


Minor: Could you please type it. You can use Codex, it's good at it

Yes! Thanks for pointing this out! Addressed in adfa416!

Strech · 2025-11-14T09:43:42Z

sig/datadog/core/normalizer.rbs

+  module Core
+    module Normalizer
+      INVALID_TAG_CHARACTERS: ::Regexp
+      def self.normalize: (untyped original_value) -> ("" | untyped)


untyped will cover everything, but still, it's not untyped, it's a ::String?

Addressed in adfa416!

Co-authored-by: Sergey Fedorov <[email protected]>

lib/datadog/core/normalizer.rb

…d data error.

…nd to 200 characters

marcotc · 2025-11-14T21:50:38Z

lib/datadog/core/environment/process.rb

+          tags << serialized_kv_helper(Core::Environment::Ext::TAG_ENTRYPOINT_WORKDIR, entrypoint_workdir) if entrypoint_workdir
+          tags << serialized_kv_helper(Core::Environment::Ext::TAG_ENTRYPOINT_NAME, entrypoint_name) if entrypoint_name
+          tags << serialized_kv_helper(Core::Environment::Ext::TAG_ENTRYPOINT_BASEDIR, entrypoint_basedir) if entrypoint_basedir
+          tags << serialized_kv_helper(Core::Environment::Ext::TAG_ENTRYPOINT_TYPE, entrypoint_type) if entrypoint_type


We only need to specify namespacing in Ruby up until the common point between: the current class or module we are in; and the object we want to reference.
In this case, we are in Datadog::Core::Environment::Process and want to reference Datadog::Core::Environment::Ext::TAG_ENTRYPOINT_WORKDIR.

We can remove the prefix namespace that is identical. For example Ext::TAG_ENTRYPOINT_WORKDIR will work here.

BUT, Ruby namespace resolution is very lenient, and we will try to match Ext (from Ext::TAG_ENTRYPOINT_WORKDIR), in order, to: Datadog::Core::Environment::Process::Ext, Datadog::Core::Environment::Ext, Datadog::Core::Ext, Datadog::Ext, and ::Ext.
This is important because the namespace matching doesn't try to match the complete Ext::TAG_ENTRYPOINT_WORKDIR path; it only tries to match the first token you provided: the Ext in Ext::TAG_ENTRYPOINT_WORKDIR.
And because more than one of these locations in the possible search logic are realistic matches, we should be a bit more specific than Ext::TAG_ENTRYPOINT_WORKDIR.

A good practice is to stop at the closet common namespace location. In this case, it would be the Environment. So I suggest using Environment::Ext::TAG_ENTRYPOINT_WORKDIR (and the equivalent for the other constants) here.

Thanks for the explanation! I'll keep this in mind going forward!
Addressed in: 31d9796

marcotc · 2025-11-14T21:51:13Z

lib/datadog/core/environment/process.rb

+        # Returns the entrypoint type of the process
+        # @return [String] the type of the process, which is fixed in Ruby
+        def entrypoint_type
+          Core::Environment::Ext::PROCESS_TYPE


We can remove Core:: from this constant access (see comment in def serialized).

Thanks for this note! see 31d9796!

marcotc · 2025-11-14T22:51:45Z

lib/datadog/core/normalizer.rb

+      # - Trailing underscores are removed
+      # - Consecutive underscores are merged into a single underscore
+      # - Maximum length is 200 characters
+      def self.normalize(original_value)


Given how many operations happen inside this method, I recommend adding a "fast-case", where we do some checks and return immediately if the provided original_value is already valid.
This suggestion is equivalent to the early return by the agent here.

I suggest trying to use a regular expression, instead of implementing the agent's isNormalizedASCIITag in Ruby, since Ruby code is slower than Go code, but Ruby regex is pretty fast.

Something like:

return original_value if original_value.size <= MAX_CHARACTER_LENGTH && original_value.matches?(VALID_ASCII_TAG)

The hypothetical VALID_ASCII_TAG doesn't have to catch all valid cases: it's a trade-off between matching most valid tags vs making the regex complicated and slow. As long as it never matches invalid tags, it's all good.

Addressed in be9587d ! Let me know if this is better now!

marcotc · 2025-11-14T22:53:34Z

lib/datadog/core/normalizer.rb

+      TRAILING_UNDERSCORES = %r{_++\z}
+      MAX_CHARACTER_LENGTH = 200
+
+      # Based on https://github.com/DataDog/datadog-agent/blob/45799c842bbd216bcda208737f9f11cade6fdd95/pkg/trace/traceutil/normalize.go#L131


In general here: do we need NormalizeTag or NormalizeTagValue for process tags? https://github.com/DataDog/datadog-agent/blob/45799c842bbd216bcda208737f9f11cade6fdd95/pkg/trace/traceutil/normalize.go#L120-L129

Good catch. In a2643a6, I kept the "tag" the originally defined constants, but I just adjusted the logic so that the normalization only takes place in the string values.

mabdinur

Left some comments based on my review on the python PR. I'll defer to Marco for the final approval. Overall this looks good to me

mabdinur · 2025-11-17T20:15:44Z

lib/datadog/core/environment/process.rb

+        # @return [String] the last segment of the base directory of the script
+        def entrypoint_basedir
+          current_basedir = File.expand_path(File.dirname($0))
+          normalized_basedir = current_basedir.tr(File::SEPARATOR, '/')


I don't think we do this normalization in the python implementation. We should align on the same approach (either normalize in the SDK or do it one central place like the Agent).

Good catch, I made an adjustment in a2643a6 so it's the same as the python tracer behavior now.

mabdinur · 2025-11-17T20:24:08Z

lib/datadog/core/normalizer.rb

+        normalized_value.gsub!(INVALID_TAG_CHARACTERS, '_')
+        normalized_value.sub!(LEADING_INVALID_CHARS, "")
+        normalized_value.sub!(TRAILING_UNDERSCORES, "")
+        normalized_value.squeeze!('_') if normalized_value.include?('__')


Should we use _ for invalid values? This a character that is common in file names. It will be hard to distinguish between cases where a value contains a legitimate underscore or if it matches an invalid character.

Can we defer this normalization to the Agent? It would be nice if we could centralize this logic instead of duplicating it across SDKs

Hmmm, for process tags, the agent is just one of the transport targets.

But, since the transport format is a tag value for traces, we have to either live with this ambiguity around _, or create an escape pattern for valid _s.

mabdinur · 2025-11-17T20:24:14Z

lib/datadog/core/normalizer.rb

+        normalized_value.sub!(LEADING_INVALID_CHARS, "")
+        normalized_value.sub!(TRAILING_UNDERSCORES, "")
+        normalized_value.squeeze!('_') if normalized_value.include?('__')
+        normalized_value.slice!(MAX_CHARACTER_LENGTH..-1) if normalized_value.length > MAX_CHARACTER_LENGTH


The python implementation is a bit simpler. I don't think it enforces a max length

I was basing this off the Trace Agent: https://github.com/DataDog/datadog-agent/blob/45799c842bbd216bcda208737f9f11cade6fdd95/pkg/trace/traceutil/normalize_test.go#L17.

The python implementation currently fails some of those tests and I left a comment in the dd-trace-py PR about it.

That said, I will update this logic to back off sooner if the tag is already valid!

Size check is sane and reasonable! We should add it to python if we have the chance.

I re-reviewed the Trace Agent implementation with the Python one and the main thing is that for span tag values, it's ok for it to start with a digit.

I ended up following the trace agent more closely so it passes the case where emoji get added: be9587d

Note - the trace agent has an interesting bytes test that I cannot get to pass in Ruby:

# This test case doesn't work with the current logic because it yields 202 characters # {in: 'A' + ('0' * 200) + ' ' + ('0' * 11), out: 'a' + ('0' * 200) + '_0'},

(Sometimes it can go over 200 characters in the Trace Agent so that's the only test I am skipping for now)

mabdinur · 2025-11-17T20:26:02Z

lib/datadog/tracing/transport/trace_formatter.rb

          tag_sampling_priority!
          tag_profiling_enabled!
          tag_apm_tracing_disabled!
+          tag_process_tags!


Do we need to move this check into:

if first_span

?

yes! (and add a test to assert that we do check for first_span).

Fixed in 6042830!

mabdinur · 2025-11-17T20:27:25Z

sig/datadog/core/environment/process.rbs

+        @serialized: ::String
+
+        def self?.entrypoint_workdir: () -> ::String
+
+        def self?.entrypoint_type: () -> ::String
+
+        def self?.entrypoint_name: () -> ::String
+
+        def self?.entrypoint_basedir: () -> ::String
+        def self?.serialized_kv_helper: (::String key, ::String value) -> ::String


Should most of these methods/fields be private? I think we only need to expose self?.serialized

It looks like only serialized needs to be public.
We should privatized everything else.

If I do that then the process_spec needs to be adjusted 👀
I'll make the test adjustments and see if this helps.

Adjusted in 47efb90

lib/datadog/core/environment/ext.rb

marcotc · 2025-11-18T00:09:24Z

lib/datadog/core/environment/process.rb

+        # Returns the last segment of the working directory of the process
+        # @return [String] the last segment of the working directory
+        def entrypoint_workdir
+          File.basename(Dir.pwd)
+        end
+
+        # Returns the entrypoint type of the process
+        # @return [String] the type of the process, which is fixed in Ruby
+        def entrypoint_type
+          Environment::Ext::PROCESS_TYPE
+        end
+
+        # Returns the last segment of the base directory of the process
+        # @return [String] the last segment of base directory of the script
+        def entrypoint_name
+          File.basename($0)
+        end
+
+        # Returns the last segment of the base directory of the process
+        # @return [String] the last segment of the base directory of the script
+        def entrypoint_basedir


Can you add a simple example string to these methods (except entrypoint_type).
For example /home/server/app/script.rb -> ... (insert real output).

Addressed in 47efb90 !

…ing digits for tag values.

Add initial attempt at adding process related tags on trace payloads.…

1d8bab2

… This is still missing memoization and additional tests.

github-actions bot added core Involves Datadog core libraries tracing labels Nov 7, 2025

wantsui added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label Nov 7, 2025

Add test for multiple calls to the formatter tags

58592a3

marcotc reviewed Nov 10, 2025

View reviewed changes

lib/datadog/core/environment/process.rb Outdated Show resolved Hide resolved

marcotc reviewed Nov 10, 2025

View reviewed changes

lib/datadog/core/environment/process.rb Outdated Show resolved Hide resolved

marcotc reviewed Nov 10, 2025

View reviewed changes

lib/datadog/core/environment/process.rb Outdated Show resolved Hide resolved

marcotc reviewed Nov 10, 2025

View reviewed changes

lib/datadog/core/environment/process.rb Outdated Show resolved Hide resolved

marcotc reviewed Nov 10, 2025

View reviewed changes

lib/datadog/tracing/configuration/settings.rb Outdated Show resolved Hide resolved

marcotc reviewed Nov 10, 2025

View reviewed changes

wantsui and others added 6 commits November 10, 2025 16:29

Add tests for trace formatter spec to assert that the first span of t…

7dc9184

…he payload has the process tag only when the feature is enabled.

it turns out you cannot just pin things to rails 7 due to newer ruby …

cad26a6

…versions so this fixes that.

Update lib/datadog/core/environment/process.rb

f31440a

Co-authored-by: Marco Costa <[email protected]>

fix string and rename formatted_process_tags_k1_v1 to serialized

cfec602

remove unneeded line

8dae705

remove server type for now until more research is done

055586f

Add new tag normalizer logic following the trace agent.

cacb500

wantsui commented Nov 11, 2025

View reviewed changes

wantsui added 2 commits November 11, 2025 13:38

lint fix

7661a3f

add missing files from prototype command

7825940

wantsui added 3 commits November 11, 2025 13:47

Add missing constants to ext rbs file

5de6efd

jruby fix for the process spec

f5ca84a

remove the active record during rails creation because it caused a jr…

9ad5be5

…uby conflict with sqlite and it is not needed for this test

wantsui mentioned this pull request Nov 11, 2025

swap out the existing headers normalization logic with the tag normalizer #5041

Draft

wantsui requested a review from vandonr November 12, 2025 15:27

Merge branch 'master' into add-process-tags-to-tracing

4073ab5

Strech reviewed Nov 14, 2025

View reviewed changes

wantsui and others added 5 commits November 14, 2025 13:47

Remove the rails gem install from process_spec

22a3680

Remove 1 sec delay.

5784833

Update sig/datadog/core/environment/ext.rbs

2b705e3

Co-authored-by: Sergey Fedorov <[email protected]>

Update lib/datadog/tracing/transport/trace_formatter.rb

e3deb4c

Co-authored-by: Sergey Fedorov <[email protected]>

Add improvements for long strings.

4747259

github-advanced-security bot found potential problems Nov 14, 2025

View reviewed changes

lib/datadog/core/normalizer.rb Fixed Show fixed Hide fixed

wantsui added 4 commits November 14, 2025 16:09

small improvement to the whitespace removal.

41bc6c0

Add upper bound to regex to avoid the polynomial regex on uncontrolle…

c3605c0

…d data error.

Change untyped to string.

adfa416

Use possessive quantifiers in regex instead of limiting the upper bou…

0dff545

…nd to 200 characters

marcotc reviewed Nov 14, 2025

View reviewed changes

wantsui added 4 commits November 14, 2025 16:54

Fix types for steep check command

7d8da40

Remove unneeded Core prefix

31d9796

lint fixes

3672a8a

restructure folder lookup so it works on the macos ci tests

23d9769

marcotc reviewed Nov 14, 2025

View reviewed changes

mabdinur reviewed Nov 17, 2025

View reviewed changes

wantsui added 2 commits November 17, 2025 16:44

fixes for local mac development.

7615906

Add missing trace agent test cases.

d4c6a91

marcotc reviewed Nov 18, 2025

View reviewed changes

wantsui added 6 commits November 18, 2025 10:59

Fix lint

433b250

Change methods to private. Also add comments with examples

47efb90

Fix basedir logic and adjust tests (and also fix the private change)

a2643a6

Fix steepcheck error

ccd4971

Add in byte logic to handle emojis with early backoff and allow start…

be9587d

…ing digits for tag values.

Move process tags only to the first span and adjust tests

6042830

feat: add process tags to traces #5033

Are you sure you want to change the base?

feat: add process tags to traces #5033

Conversation

wantsui commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datadog-official bot commented Nov 7, 2025 • edited by datadog-datadog-prod-us1 bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pr-commenter bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

scenario:tracing - Tracing.log_correlation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Typing analysis

Untyped methods

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marcotc Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mabdinur left a comment

wantsui commented Nov 7, 2025 •

edited

Loading

github-actions bot commented Nov 7, 2025 •

edited

Loading

datadog-official bot commented Nov 7, 2025 •

edited by datadog-datadog-prod-us1 bot

Loading

pr-commenter bot commented Nov 10, 2025 •

edited

Loading

github-actions bot commented Nov 11, 2025 •

edited

Loading

marcotc Nov 14, 2025 •

edited

Loading