Skip to content

Conversation

Aaalibaba42
Copy link
Contributor

What does this PR do?

Migrate from the regex crate to regex-lite.

Motivation

The regex crate is very fast, but takes up lots of space in the binaries. regex-lite might introduce regression in performance, but we should see it's much more optimized in space.

How to test the change?

There is (I believe so) a benchmark on the size of the artifacts (and the performance). We can evaluate if this change is worth it based on those.

@codecov-commenter
Copy link

codecov-commenter commented Sep 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.64%. Comparing base (a7c8765) to head (f1e15d4).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1232      +/-   ##
==========================================
- Coverage   71.65%   71.64%   -0.01%     
==========================================
  Files         354      354              
  Lines       56063    56027      -36     
==========================================
- Hits        40172    40143      -29     
+ Misses      15891    15884       -7     
Components Coverage Δ
datadog-crashtracker 49.30% <ø> (+0.02%) ⬆️
datadog-crashtracker-ffi 5.93% <ø> (ø)
datadog-alloc 98.73% <ø> (ø)
data-pipeline 90.54% <ø> (+0.23%) ⬆️
data-pipeline-ffi 88.19% <ø> (ø)
ddcommon 84.29% <ø> (ø)
ddcommon-ffi 73.84% <ø> (ø)
ddtelemetry 59.98% <ø> (-0.04%) ⬇️
ddtelemetry-ffi 21.24% <ø> (ø)
dogstatsd-client 83.26% <ø> (ø)
datadog-ipc 82.39% <ø> (ø)
datadog-profiling 76.90% <ø> (ø)
datadog-profiling-ffi 62.12% <ø> (ø)
datadog-sidecar 36.30% <ø> (-0.78%) ⬇️
datdog-sidecar-ffi 7.60% <ø> (-3.77%) ⬇️
spawn-worker 55.35% <ø> (ø)
tinybytes 92.22% <ø> (ø)
datadog-trace-normalization 98.24% <ø> (ø)
datadog-trace-obfuscation 94.17% <100.00%> (ø)
datadog-trace-protobuf 77.10% <ø> (ø)
datadog-trace-utils 89.98% <ø> (+0.23%) ⬆️
datadog-tracer-flare 54.62% <ø> (+0.10%) ⬆️
datadog-log 76.31% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@pr-commenter
Copy link

pr-commenter bot commented Sep 19, 2025

Benchmarks

Comparison

Benchmark execution time: 2025-09-22 09:37:10

Comparing candidate commit f1e15d4 in PR branch jwiriath/regex-to-regex-lite with baseline commit a7c8765 in branch main.

Found 1 performance improvements and 15 performance regressions! Performance is the same for 37 metrics, 2 unstable metrics.

scenario:benching serializing traces from their internal representation to msgpack

  • 🟥 execution_time [+627.194µs; +641.417µs] or [+4.349%; +4.447%]

scenario:credit_card/is_card_number/37828224631000521389798

  • 🟥 execution_time [+7.676µs; +7.702µs] or [+17.065%; +17.122%]
  • 🟥 throughput [-3250376.673op/s; -3240357.203op/s] or [-14.621%; -14.575%]

scenario:credit_card/is_card_number/x371413321323331

  • 🟥 execution_time [+736.703ns; +739.160ns] or [+12.928%; +12.971%]
  • 🟥 throughput [-20151008.625op/s; -20088536.766op/s] or [-11.483%; -11.447%]

scenario:credit_card/is_card_number_no_luhn/ 378282246310005

  • 🟥 execution_time [+5.333µs; +5.408µs] or [+9.926%; +10.065%]
  • 🟥 throughput [-1702650.092op/s; -1679984.397op/s] or [-9.148%; -9.026%]

scenario:credit_card/is_card_number_no_luhn/378282246310005

  • 🟥 execution_time [+5.462µs; +5.527µs] or [+10.898%; +11.028%]
  • 🟥 throughput [-1983426.659op/s; -1959537.433op/s] or [-9.940%; -9.821%]

scenario:credit_card/is_card_number_no_luhn/37828224631000521389798

  • 🟥 execution_time [+7.675µs; +7.702µs] or [+17.059%; +17.117%]
  • 🟥 throughput [-3248662.459op/s; -3238542.518op/s] or [-14.617%; -14.571%]

scenario:credit_card/is_card_number_no_luhn/x371413321323331

  • 🟥 execution_time [+737.592ns; +740.325ns] or [+12.942%; +12.990%]
  • 🟥 throughput [-20171865.421op/s; -20103643.403op/s] or [-11.497%; -11.458%]

scenario:ip_address/quantize_peer_ip_address_benchmark

  • 🟥 execution_time [+3.199µs; +3.213µs] or [+63.804%; +64.084%]

scenario:sql/obfuscate_sql_string

  • 🟩 execution_time [-4.384µs; -4.307µs] or [-4.901%; -4.815%]

scenario:tags/replace_trace_tags

  • 🟥 execution_time [+11.061µs; +11.076µs] or [+453.505%; +454.124%]

Candidate

Candidate benchmark details

Group 1

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f1e15d4 1758533031 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
receiver_entry_point/report/2597 execution_time 6.252ms 6.303ms ± 0.044ms 6.292ms ± 0.013ms 6.307ms 6.370ms 6.491ms 6.611ms 5.08% 3.725 18.500 0.69% 0.003ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
receiver_entry_point/report/2597 execution_time [6.297ms; 6.309ms] or [-0.096%; +0.096%] None None None

Group 2

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f1e15d4 1758533031 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching deserializing traces from msgpack to their internal representation execution_time 61.214ms 62.097ms ± 2.497ms 61.812ms ± 0.091ms 61.906ms 62.127ms 81.159ms 83.624ms 35.29% 7.928 61.511 4.01% 0.177ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching deserializing traces from msgpack to their internal representation execution_time [61.751ms; 62.443ms] or [-0.557%; +0.557%] None None None

Group 3

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f1e15d4 1758533031 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching serializing traces from their internal representation to msgpack execution_time 15.005ms 15.057ms ± 0.039ms 15.049ms ± 0.016ms 15.066ms 15.141ms 15.182ms 15.289ms 1.59% 2.893 11.677 0.26% 0.003ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching serializing traces from their internal representation to msgpack execution_time [15.051ms; 15.062ms] or [-0.035%; +0.035%] None None None

Group 4

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f1e15d4 1758533031 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
credit_card/is_card_number/ execution_time 3.892µs 3.915µs ± 0.003µs 3.915µs ± 0.002µs 3.917µs 3.921µs 3.922µs 3.925µs 0.25% -1.141 10.491 0.08% 0.000µs 1 200
credit_card/is_card_number/ throughput 254806035.762op/s 255426487.563op/s ± 216339.309op/s 255437136.921op/s ± 134799.127op/s 255577912.933op/s 255677035.899op/s 255729243.895op/s 256931183.773op/s 0.58% 1.170 10.698 0.08% 15297.499op/s 1 200
credit_card/is_card_number/ 3782-8224-6310-005 execution_time 76.998µs 78.677µs ± 0.750µs 78.653µs ± 0.497µs 79.160µs 79.824µs 80.390µs 80.750µs 2.67% 0.081 -0.194 0.95% 0.053µs 1 200
credit_card/is_card_number/ 3782-8224-6310-005 throughput 12383834.513op/s 12711421.517op/s ± 121057.209op/s 12714144.128op/s ± 80711.920op/s 12791887.647op/s 12910348.047op/s 12983453.621op/s 12987356.961op/s 2.15% -0.030 -0.218 0.95% 8560.037op/s 1 200
credit_card/is_card_number/ 378282246310005 execution_time 70.384µs 72.190µs ± 0.804µs 72.107µs ± 0.591µs 72.777µs 73.452µs 74.187µs 74.540µs 3.37% 0.258 -0.205 1.11% 0.057µs 1 200
credit_card/is_card_number/ 378282246310005 throughput 13415554.921op/s 13854010.704op/s ± 153847.022op/s 13868309.812op/s ± 114112.398op/s 13955036.923op/s 14093004.511op/s 14186791.223op/s 14207674.291op/s 2.45% -0.200 -0.245 1.11% 10878.627op/s 1 200
credit_card/is_card_number/37828224631 execution_time 3.895µs 3.914µs ± 0.003µs 3.914µs ± 0.002µs 3.916µs 3.919µs 3.922µs 3.923µs 0.23% -0.989 8.301 0.08% 0.000µs 1 200
credit_card/is_card_number/37828224631 throughput 254896074.730op/s 255467614.807op/s ± 195731.934op/s 255482767.718op/s ± 111370.177op/s 255587308.625op/s 255724120.460op/s 255792916.920op/s 256754450.896op/s 0.50% 1.010 8.443 0.08% 13840.338op/s 1 200
credit_card/is_card_number/378282246310005 execution_time 67.227µs 68.874µs ± 0.790µs 68.843µs ± 0.551µs 69.339µs 70.319µs 70.837µs 70.973µs 3.09% 0.314 -0.292 1.14% 0.056µs 1 200
credit_card/is_card_number/378282246310005 throughput 14089873.319op/s 14521227.224op/s ± 165922.593op/s 14525840.531op/s ± 116081.430op/s 14650964.111op/s 14787367.406op/s 14850804.913op/s 14874918.239op/s 2.40% -0.259 -0.334 1.14% 11732.499op/s 1 200
credit_card/is_card_number/37828224631000521389798 execution_time 52.496µs 52.670µs ± 0.085µs 52.661µs ± 0.063µs 52.727µs 52.823µs 52.859µs 52.894µs 0.44% 0.357 -0.475 0.16% 0.006µs 1 200
credit_card/is_card_number/37828224631000521389798 throughput 18905632.535op/s 18986263.160op/s ± 30519.501op/s 18989208.437op/s ± 22764.688op/s 19010509.012op/s 19029172.870op/s 19042461.474op/s 19048951.997op/s 0.31% -0.350 -0.481 0.16% 2158.055op/s 1 200
credit_card/is_card_number/x371413321323331 execution_time 6.429µs 6.436µs ± 0.008µs 6.435µs ± 0.002µs 6.437µs 6.444µs 6.473µs 6.506µs 1.10% 5.214 34.790 0.12% 0.001µs 1 200
credit_card/is_card_number/x371413321323331 throughput 153709974.205op/s 155368544.364op/s ± 191355.119op/s 155406060.851op/s ± 46831.996op/s 155449883.868op/s 155510203.083op/s 155530321.814op/s 155537666.389op/s 0.08% -5.179 34.347 0.12% 13530.850op/s 1 200
credit_card/is_card_number_no_luhn/ execution_time 3.891µs 3.914µs ± 0.003µs 3.914µs ± 0.002µs 3.915µs 3.918µs 3.920µs 3.922µs 0.20% -2.113 18.451 0.07% 0.000µs 1 200
credit_card/is_card_number_no_luhn/ throughput 255002719.197op/s 255504200.557op/s ± 189727.674op/s 255508751.175op/s ± 107750.856op/s 255616944.777op/s 255721079.824op/s 255767217.996op/s 257014897.207op/s 0.59% 2.148 18.777 0.07% 13415.773op/s 1 200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 execution_time 64.915µs 65.151µs ± 0.137µs 65.116µs ± 0.091µs 65.240µs 65.403µs 65.519µs 65.652µs 0.82% 0.826 0.346 0.21% 0.010µs 1 200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 throughput 15231718.451op/s 15348953.930op/s ± 32305.325op/s 15357269.166op/s ± 21509.657op/s 15375668.389op/s 15389948.287op/s 15395892.887op/s 15404744.661op/s 0.31% -0.816 0.316 0.21% 2284.331op/s 1 200
credit_card/is_card_number_no_luhn/ 378282246310005 execution_time 58.636µs 59.097µs ± 0.229µs 59.065µs ± 0.139µs 59.238µs 59.543µs 59.666µs 59.825µs 1.29% 0.755 0.331 0.39% 0.016µs 1 200
credit_card/is_card_number_no_luhn/ 378282246310005 throughput 16715378.886op/s 16921713.389op/s ± 65307.234op/s 16930445.500op/s ± 39988.669op/s 16965962.473op/s 17010264.042op/s 17026554.619op/s 17054285.496op/s 0.73% -0.735 0.289 0.38% 4617.919op/s 1 200
credit_card/is_card_number_no_luhn/37828224631 execution_time 3.896µs 3.914µs ± 0.003µs 3.914µs ± 0.001µs 3.915µs 3.919µs 3.920µs 3.929µs 0.39% -0.040 11.354 0.07% 0.000µs 1 200
credit_card/is_card_number_no_luhn/37828224631 throughput 254499942.476op/s 255486790.398op/s ± 184578.107op/s 255502631.685op/s ± 96212.596op/s 255595695.035op/s 255692669.540op/s 255747433.085op/s 256670288.179op/s 0.46% 0.069 11.433 0.07% 13051.643op/s 1 200
credit_card/is_card_number_no_luhn/378282246310005 execution_time 55.312µs 55.612µs ± 0.152µs 55.586µs ± 0.090µs 55.678µs 55.901µs 56.008µs 56.167µs 1.04% 0.996 1.048 0.27% 0.011µs 1 200
credit_card/is_card_number_no_luhn/378282246310005 throughput 17804068.012op/s 17982009.202op/s ± 49040.463op/s 17990077.574op/s ± 29027.711op/s 18018919.866op/s 18041614.708op/s 18072613.565op/s 18079323.030op/s 0.50% -0.979 1.007 0.27% 3467.684op/s 1 200
credit_card/is_card_number_no_luhn/37828224631000521389798 execution_time 52.494µs 52.682µs ± 0.085µs 52.674µs ± 0.055µs 52.730µs 52.836µs 52.908µs 53.004µs 0.63% 0.674 0.871 0.16% 0.006µs 1 200
credit_card/is_card_number_no_luhn/37828224631000521389798 throughput 18866470.667op/s 18981882.054op/s ± 30503.861op/s 18984786.542op/s ± 19691.265op/s 19004151.932op/s 19026017.604op/s 19035389.872op/s 19049769.428op/s 0.34% -0.662 0.842 0.16% 2156.949op/s 1 200
credit_card/is_card_number_no_luhn/x371413321323331 execution_time 6.429µs 6.438µs ± 0.009µs 6.436µs ± 0.003µs 6.440µs 6.448µs 6.477µs 6.514µs 1.21% 4.445 27.469 0.14% 0.001µs 1 200
credit_card/is_card_number_no_luhn/x371413321323331 throughput 153514337.340op/s 155319458.670op/s ± 220513.605op/s 155374200.117op/s ± 71623.326op/s 155429034.144op/s 155492302.563op/s 155526197.709op/s 155548377.597op/s 0.11% -4.404 27.000 0.14% 15592.667op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
credit_card/is_card_number/ execution_time [3.915µs; 3.915µs] or [-0.012%; +0.012%] None None None
credit_card/is_card_number/ throughput [255396505.015op/s; 255456470.111op/s] or [-0.012%; +0.012%] None None None
credit_card/is_card_number/ 3782-8224-6310-005 execution_time [78.573µs; 78.780µs] or [-0.132%; +0.132%] None None None
credit_card/is_card_number/ 3782-8224-6310-005 throughput [12694644.152op/s; 12728198.882op/s] or [-0.132%; +0.132%] None None None
credit_card/is_card_number/ 378282246310005 execution_time [72.079µs; 72.302µs] or [-0.154%; +0.154%] None None None
credit_card/is_card_number/ 378282246310005 throughput [13832688.986op/s; 13875332.422op/s] or [-0.154%; +0.154%] None None None
credit_card/is_card_number/37828224631 execution_time [3.914µs; 3.915µs] or [-0.011%; +0.011%] None None None
credit_card/is_card_number/37828224631 throughput [255440488.244op/s; 255494741.371op/s] or [-0.011%; +0.011%] None None None
credit_card/is_card_number/378282246310005 execution_time [68.764µs; 68.983µs] or [-0.159%; +0.159%] None None None
credit_card/is_card_number/378282246310005 throughput [14498231.948op/s; 14544222.499op/s] or [-0.158%; +0.158%] None None None
credit_card/is_card_number/37828224631000521389798 execution_time [52.658µs; 52.682µs] or [-0.022%; +0.022%] None None None
credit_card/is_card_number/37828224631000521389798 throughput [18982033.450op/s; 18990492.869op/s] or [-0.022%; +0.022%] None None None
credit_card/is_card_number/x371413321323331 execution_time [6.435µs; 6.437µs] or [-0.017%; +0.017%] None None None
credit_card/is_card_number/x371413321323331 throughput [155342024.385op/s; 155395064.343op/s] or [-0.017%; +0.017%] None None None
credit_card/is_card_number_no_luhn/ execution_time [3.913µs; 3.914µs] or [-0.010%; +0.010%] None None None
credit_card/is_card_number_no_luhn/ throughput [255477906.126op/s; 255530494.988op/s] or [-0.010%; +0.010%] None None None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 execution_time [65.132µs; 65.170µs] or [-0.029%; +0.029%] None None None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 throughput [15344476.723op/s; 15353431.138op/s] or [-0.029%; +0.029%] None None None
credit_card/is_card_number_no_luhn/ 378282246310005 execution_time [59.065µs; 59.128µs] or [-0.054%; +0.054%] None None None
credit_card/is_card_number_no_luhn/ 378282246310005 throughput [16912662.434op/s; 16930764.343op/s] or [-0.053%; +0.053%] None None None
credit_card/is_card_number_no_luhn/37828224631 execution_time [3.914µs; 3.914µs] or [-0.010%; +0.010%] None None None
credit_card/is_card_number_no_luhn/37828224631 throughput [255461209.647op/s; 255512371.148op/s] or [-0.010%; +0.010%] None None None
credit_card/is_card_number_no_luhn/378282246310005 execution_time [55.590µs; 55.633µs] or [-0.038%; +0.038%] None None None
credit_card/is_card_number_no_luhn/378282246310005 throughput [17975212.666op/s; 17988805.739op/s] or [-0.038%; +0.038%] None None None
credit_card/is_card_number_no_luhn/37828224631000521389798 execution_time [52.670µs; 52.694µs] or [-0.022%; +0.022%] None None None
credit_card/is_card_number_no_luhn/37828224631000521389798 throughput [18977654.513op/s; 18986109.596op/s] or [-0.022%; +0.022%] None None None
credit_card/is_card_number_no_luhn/x371413321323331 execution_time [6.437µs; 6.440µs] or [-0.020%; +0.020%] None None None
credit_card/is_card_number_no_luhn/x371413321323331 throughput [155288897.605op/s; 155350019.735op/s] or [-0.020%; +0.020%] None None None

Group 5

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f1e15d4 1758533031 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
sql/obfuscate_sql_string execution_time 84.840µs 85.108µs ± 0.242µs 85.078µs ± 0.084µs 85.166µs 85.322µs 86.180µs 87.411µs 2.74% 5.562 44.207 0.28% 0.017µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
sql/obfuscate_sql_string execution_time [85.075µs; 85.142µs] or [-0.039%; +0.039%] None None None

Group 6

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f1e15d4 1758533031 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
write only interface execution_time 1.196µs 3.230µs ± 1.438µs 3.009µs ± 0.032µs 3.041µs 3.678µs 14.217µs 14.803µs 391.98% 7.283 54.380 44.42% 0.102µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
write only interface execution_time [3.031µs; 3.429µs] or [-6.171%; +6.171%] None None None

Group 7

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f1e15d4 1758533031 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
two way interface execution_time 17.821µs 26.217µs ± 10.174µs 18.136µs ± 0.184µs 35.885µs 45.194µs 45.963µs 52.101µs 187.28% 0.693 -1.041 38.71% 0.719µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
two way interface execution_time [24.807µs; 27.627µs] or [-5.378%; +5.378%] None None None

Group 8

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f1e15d4 1758533031 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
ip_address/quantize_peer_ip_address_benchmark execution_time 8.173µs 8.219µs ± 0.031µs 8.214µs ± 0.024µs 8.241µs 8.266µs 8.271µs 8.389µs 2.13% 1.025 3.144 0.37% 0.002µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
ip_address/quantize_peer_ip_address_benchmark execution_time [8.215µs; 8.224µs] or [-0.052%; +0.052%] None None None

Group 9

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f1e15d4 1758533031 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
redis/obfuscate_redis_string execution_time 34.621µs 35.001µs ± 0.646µs 34.721µs ± 0.045µs 34.786µs 36.298µs 36.411µs 38.815µs 11.79% 2.253 5.734 1.84% 0.046µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
redis/obfuscate_redis_string execution_time [34.912µs; 35.091µs] or [-0.256%; +0.256%] None None None

Group 10

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f1e15d4 1758533031 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
tags/replace_trace_tags execution_time 13.395µs 13.507µs ± 0.051µs 13.491µs ± 0.029µs 13.569µs 13.590µs 13.597µs 13.602µs 0.82% 0.474 -1.109 0.38% 0.004µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
tags/replace_trace_tags execution_time [13.500µs; 13.514µs] or [-0.052%; +0.052%] None None None

Group 11

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f1e15d4 1758533031 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
concentrator/add_spans_to_concentrator execution_time 8.235ms 8.252ms ± 0.012ms 8.250ms ± 0.007ms 8.257ms 8.273ms 8.290ms 8.314ms 0.78% 1.684 4.742 0.14% 0.001ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
concentrator/add_spans_to_concentrator execution_time [8.250ms; 8.254ms] or [-0.020%; +0.020%] None None None

Group 12

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f1e15d4 1758533031 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching string interning on wordpress profile execution_time 159.461µs 160.679µs ± 0.735µs 160.621µs ± 0.164µs 160.790µs 161.070µs 161.330µs 170.359µs 6.06% 11.499 149.388 0.46% 0.052µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching string interning on wordpress profile execution_time [160.578µs; 160.781µs] or [-0.063%; +0.063%] None None None

Group 13

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f1e15d4 1758533031 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_trace/test_trace execution_time 241.571ns 253.043ns ± 12.345ns 246.923ns ± 3.515ns 257.873ns 276.068ns 287.156ns 290.078ns 17.48% 1.243 0.311 4.87% 0.873ns 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_trace/test_trace execution_time [251.332ns; 254.754ns] or [-0.676%; +0.676%] None None None

Group 14

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f1e15d4 1758533031 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... execution_time 185.297µs 185.789µs ± 0.407µs 185.716µs ± 0.163µs 185.886µs 186.303µs 187.681µs 188.201µs 1.34% 3.195 13.334 0.22% 0.029µs 1 200
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... throughput 5313473.694op/s 5382473.503op/s ± 11723.536op/s 5384555.896op/s ± 4723.548op/s 5388830.021op/s 5393977.465op/s 5396066.543op/s 5396748.729op/s 0.23% -3.162 13.106 0.22% 828.979op/s 1 200
normalization/normalize_name/normalize_name/bad-name execution_time 17.901µs 18.004µs ± 0.049µs 18.004µs ± 0.038µs 18.037µs 18.084µs 18.107µs 18.165µs 0.90% 0.192 -0.360 0.27% 0.003µs 1 200
normalization/normalize_name/normalize_name/bad-name throughput 55049597.837op/s 55544744.841op/s ± 150370.588op/s 55543640.393op/s ± 116354.232op/s 55665574.331op/s 55778327.761op/s 55835442.898op/s 55862145.369op/s 0.57% -0.179 -0.377 0.27% 10632.806op/s 1 200
normalization/normalize_name/normalize_name/good execution_time 10.388µs 10.479µs ± 0.045µs 10.480µs ± 0.031µs 10.510µs 10.551µs 10.594µs 10.607µs 1.22% 0.203 -0.218 0.43% 0.003µs 1 200
normalization/normalize_name/normalize_name/good throughput 94273795.384op/s 95429466.563op/s ± 411938.026op/s 95421193.050op/s ± 279715.142op/s 95711941.864op/s 96098517.040op/s 96247420.271op/s 96268220.766op/s 0.89% -0.181 -0.242 0.43% 29128.417op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... execution_time [185.733µs; 185.846µs] or [-0.030%; +0.030%] None None None
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... throughput [5380848.734op/s; 5384098.272op/s] or [-0.030%; +0.030%] None None None
normalization/normalize_name/normalize_name/bad-name execution_time [17.997µs; 18.010µs] or [-0.038%; +0.038%] None None None
normalization/normalize_name/normalize_name/bad-name throughput [55523904.923op/s; 55565584.758op/s] or [-0.038%; +0.038%] None None None
normalization/normalize_name/normalize_name/good execution_time [10.473µs; 10.485µs] or [-0.060%; +0.060%] None None None
normalization/normalize_name/normalize_name/good throughput [95372375.915op/s; 95486557.212op/s] or [-0.060%; +0.060%] None None None

Group 15

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz f1e15d4 1758533031 jwiriath/regex-to-regex-lite
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... execution_time 533.883µs 534.913µs ± 0.847µs 534.685µs ± 0.298µs 535.013µs 536.767µs 538.131µs 538.661µs 0.74% 2.210 5.180 0.16% 0.060µs 1 200
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... throughput 1856455.376op/s 1869466.801op/s ± 2948.606op/s 1870261.294op/s ± 1041.787op/s 1871202.954op/s 1872189.355op/s 1872473.791op/s 1873068.742op/s 0.15% -2.199 5.126 0.16% 208.498op/s 1 200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて execution_time 380.284µs 380.842µs ± 0.290µs 380.793µs ± 0.159µs 381.009µs 381.391µs 381.673µs 381.750µs 0.25% 0.705 0.204 0.08% 0.020µs 1 200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて throughput 2619516.285op/s 2625761.563op/s ± 1997.563op/s 2626098.394op/s ± 1099.386op/s 2627087.875op/s 2628600.614op/s 2629227.426op/s 2629610.951op/s 0.13% -0.701 0.197 0.08% 141.249op/s 1 200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters execution_time 190.077µs 190.543µs ± 0.512µs 190.461µs ± 0.142µs 190.639µs 190.867µs 191.491µs 195.399µs 2.59% 7.357 63.755 0.27% 0.036µs 1 200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters throughput 5117726.071op/s 5248206.235op/s ± 13842.836op/s 5250422.609op/s ± 3918.342op/s 5253875.163op/s 5257776.692op/s 5258928.893op/s 5261023.666op/s 0.20% -7.263 62.597 0.26% 978.836op/s 1 200
normalization/normalize_service/normalize_service/[empty string] execution_time 36.880µs 37.129µs ± 0.154µs 37.143µs ± 0.129µs 37.233µs 37.395µs 37.431µs 37.484µs 0.92% 0.225 -0.961 0.41% 0.011µs 1 200
normalization/normalize_service/normalize_service/[empty string] throughput 26677726.938op/s 26933335.340op/s ± 111943.712op/s 26923244.176op/s ± 93966.056op/s 27046940.294op/s 27088859.394op/s 27096564.788op/s 27115261.554op/s 0.71% -0.213 -0.973 0.41% 7915.616op/s 1 200
normalization/normalize_service/normalize_service/test_ASCII execution_time 45.997µs 46.105µs ± 0.052µs 46.099µs ± 0.029µs 46.133µs 46.186µs 46.235µs 46.469µs 0.80% 1.893 10.835 0.11% 0.004µs 1 200
normalization/normalize_service/normalize_service/test_ASCII throughput 21519913.213op/s 21689670.351op/s ± 24415.519op/s 21692488.964op/s ± 13766.855op/s 21704344.505op/s 21722964.142op/s 21735614.148op/s 21740395.999op/s 0.22% -1.862 10.578 0.11% 1726.438op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... execution_time [534.796µs; 535.031µs] or [-0.022%; +0.022%] None None None
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... throughput [1869058.153op/s; 1869875.450op/s] or [-0.022%; +0.022%] None None None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて execution_time [380.802µs; 380.882µs] or [-0.011%; +0.011%] None None None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて throughput [2625484.720op/s; 2626038.406op/s] or [-0.011%; +0.011%] None None None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters execution_time [190.472µs; 190.614µs] or [-0.037%; +0.037%] None None None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters throughput [5246287.751op/s; 5250124.719op/s] or [-0.037%; +0.037%] None None None
normalization/normalize_service/normalize_service/[empty string] execution_time [37.108µs; 37.151µs] or [-0.058%; +0.058%] None None None
normalization/normalize_service/normalize_service/[empty string] throughput [26917821.018op/s; 26948849.662op/s] or [-0.058%; +0.058%] None None None
normalization/normalize_service/normalize_service/test_ASCII execution_time [46.098µs; 46.112µs] or [-0.016%; +0.016%] None None None
normalization/normalize_service/normalize_service/test_ASCII throughput [21686286.595op/s; 21693054.108op/s] or [-0.016%; +0.016%] None None None

Baseline

Omitted due to size.

@Aaalibaba42 Aaalibaba42 force-pushed the jwiriath/regex-to-regex-lite branch from 87e6cfa to b9ee66d Compare September 22, 2025 08:21
@Aaalibaba42
Copy link
Contributor Author

tools:

  • Regex used to find typedef/defines at the beginning of c files (Like #define MY_VAR 35) -> Not trivial but doable without regex without being too expensive I think

datadog-trace-obfuscation:

  • ip_address.rs: Just segmentation of protocol vs address, maybe we could do without it but harder to optimize better than regex crates
  • replacer.rs: Used to run the clients' regex for obfuscation, we can't do without

ddcommon:

  • azure_app_services.rs: parsing and getting the "resource group" of azure. Given the Regex pattern we should be able to it without regexes relatively quickly
  • entitiy_id/unix/mod.rs: Only used for testing in this file, I don't even think that it would be in the release crate with the configuration
  • entitiy_id/unix/container_id.rs: Used to match cgroup to identify running container id, would be a bit harder to do without

datadog-live-debugger:

  • expr_eval.rs: I don't have the full context of this, but when condition is checked for strings, it can be in the form of a regex match. So we couldn't do without them if the pattern is not known before-hand.
  • redacted_names.rs: Most of the file uses regex_automata crate, only one time the regex crate to escape regular expression meta characters. Don't know whether we could do without

data-pipeline:

  • CAN'T MIGRATE: testing done with httpmock implements traits from the regex crate that are not the same as the regex-lite crate (I think, I did not 100% investigate but compilation message lead me to believe this)
  • src/telemetry/mod.rs: Just used for testing, Same as above, I don't even believe this would be present in the final release binary

@Aaalibaba42 Aaalibaba42 marked this pull request as ready for review September 22, 2025 12:11
@Aaalibaba42 Aaalibaba42 requested review from a team as code owners September 22, 2025 12:11
@Aaalibaba42
Copy link
Contributor Author

https://gitlab.ddbuild.io/DataDog/apm-reliability/libddprof-build/-/jobs/1140399351 Job for size benchmark failed, so I'm pasting the results here:

Artifact Size Benchmark Report

aarch64-alpine-linux-musl
Artifact Baseline Commit Change
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.a 70.98 MB 67.98 MB --4.22% (-2.99 MB) 💪
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.so 7.19 MB 6.69 MB --6.94% (-511.95 KB) 💪
aarch64-unknown-linux-gnu
Artifact Baseline Commit Change
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so 9.25 MB 8.65 MB --6.49% (-615.48 KB) 💪
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.a 83.24 MB 79.85 MB --4.07% (-3.39 MB) 💪
libdatadog-x64-windows
Artifact Baseline Commit Change
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.dll 18.39 MB 16.83 MB --8.46% (-1.55 MB) 💪
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.lib 65.01 KB 65.01 KB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.pdb 124.93 MB 120.41 MB --3.61% (-4.51 MB) 💪
/libdatadog-x64-windows/debug/static/datadog_profiling_ffi.lib 653.09 MB 641.35 MB --1.79% (-11.73 MB) 💪
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.dll 5.89 MB 5.36 MB --8.88% (-536.00 KB) 💪
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.lib 65.01 KB 65.01 KB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.pdb 17.36 MB 16.24 MB --6.43% (-1.11 MB) 💪
/libdatadog-x64-windows/release/static/datadog_profiling_ffi.lib 32.22 MB 30.14 MB --6.44% (-2.07 MB) 💪
libdatadog-x86-windows
Artifact Baseline Commit Change
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.dll 15.67 MB 14.73 MB --6.05% (-971.50 KB) 💪
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.lib 66.01 KB 66.01 KB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.pdb 127.26 MB 124.42 MB --2.22% (-2.83 MB) 💪
/libdatadog-x86-windows/debug/static/datadog_profiling_ffi.lib 643.25 MB 634.37 MB --1.38% (-8.87 MB) 💪
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.dll 4.49 MB 4.10 MB --8.80% (-405.50 KB) 💪
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.lib 66.01 KB 66.01 KB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.pdb 18.50 MB 17.62 MB --4.77% (-904.00 KB) 💪
/libdatadog-x86-windows/release/static/datadog_profiling_ffi.lib 30.26 MB 28.72 MB --5.08% (-1.53 MB) 💪
x86_64-alpine-linux-musl
Artifact Baseline Commit Change
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.a 63.63 MB 60.13 MB --5.49% (-3.49 MB) 💪
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.so 8.50 MB 7.93 MB --6.79% (-591.96 KB) 💪
x86_64-unknown-linux-gnu
Artifact Baseline Commit Change
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.a 78.03 MB 74.42 MB --4.62% (-3.61 MB) 💪
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so 9.84 MB 9.21 MB --6.46% (-652.00 KB) 💪

Copy link
Contributor

@paullegranddc paullegranddc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One not about specifying the dependency but otherwise LGTM


[dependencies]
regex = "1"
regex-lite = "^0.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"^0.1" is equivalent to "0.1" so you don't really need to add it
https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#caret-requirements

Also you should probably add it as a workspace dependencies, with a minimum constraint to the highest minor available ("0.1.7" right now)

Copy link
Contributor Author

@Aaalibaba42 Aaalibaba42 Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about instances where it is likely using a whole regex engine (even the lite one) is superfluous as described in this comment: #1232 (comment)

Is it worth exploring ? Would it be a separate PR ?

@Aaalibaba42
Copy link
Contributor Author

Aaalibaba42 commented Sep 23, 2025

There is also the potential pitfall of Unicode support: One of the corners cut by regex-lite to be smol was to sacrifice a little correctness, notably around Unicode support. In the instances where this PR changes regex to regex-lite:

  • are there instances where unicode could be used ?
  • and if so would the corners cut by regex-lite lead to bugs ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants