Migrate from regex to regex-lite #1232

Aaalibaba42 · 2025-09-19T15:37:52Z

What does this PR do?

Migrate from the regex crate to regex-lite.

Motivation

The regex crate is very fast, but takes up lots of space in the binaries. regex-lite might introduce regression in performance, but we should see it's much more optimized in space.

How to test the change?

There is (I believe so) a benchmark on the size of the artifacts (and the performance). We can evaluate if this change is worth it based on those.

codecov-commenter · 2025-09-19T16:02:15Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.64%. Comparing base (a7c8765) to head (f1e15d4).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1232      +/-   ##
==========================================
- Coverage   71.65%   71.64%   -0.01%     
==========================================
  Files         354      354              
  Lines       56063    56027      -36     
==========================================
- Hits        40172    40143      -29     
+ Misses      15891    15884       -7

Components	Coverage Δ
datadog-crashtracker	`49.30% <ø> (+0.02%)`	⬆️
datadog-crashtracker-ffi	`5.93% <ø> (ø)`
datadog-alloc	`98.73% <ø> (ø)`
data-pipeline	`90.54% <ø> (+0.23%)`	⬆️
data-pipeline-ffi	`88.19% <ø> (ø)`
ddcommon	`84.29% <ø> (ø)`
ddcommon-ffi	`73.84% <ø> (ø)`
ddtelemetry	`59.98% <ø> (-0.04%)`	⬇️
ddtelemetry-ffi	`21.24% <ø> (ø)`
dogstatsd-client	`83.26% <ø> (ø)`
datadog-ipc	`82.39% <ø> (ø)`
datadog-profiling	`76.90% <ø> (ø)`
datadog-profiling-ffi	`62.12% <ø> (ø)`
datadog-sidecar	`36.30% <ø> (-0.78%)`	⬇️
datdog-sidecar-ffi	`7.60% <ø> (-3.77%)`	⬇️
spawn-worker	`55.35% <ø> (ø)`
tinybytes	`92.22% <ø> (ø)`
datadog-trace-normalization	`98.24% <ø> (ø)`
datadog-trace-obfuscation	`94.17% <100.00%> (ø)`
datadog-trace-protobuf	`77.10% <ø> (ø)`
datadog-trace-utils	`89.98% <ø> (+0.23%)`	⬆️
datadog-tracer-flare	`54.62% <ø> (+0.10%)`	⬆️
datadog-log	`76.31% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pr-commenter · 2025-09-19T16:52:08Z

Benchmarks

Comparison

Benchmark execution time: 2025-09-22 09:37:10

Comparing candidate commit f1e15d4 in PR branch jwiriath/regex-to-regex-lite with baseline commit a7c8765 in branch main.

Found 1 performance improvements and 15 performance regressions! Performance is the same for 37 metrics, 2 unstable metrics.

scenario:benching serializing traces from their internal representation to msgpack

🟥 execution_time [+627.194µs; +641.417µs] or [+4.349%; +4.447%]

scenario:credit_card/is_card_number/37828224631000521389798

🟥 execution_time [+7.676µs; +7.702µs] or [+17.065%; +17.122%]
🟥 throughput [-3250376.673op/s; -3240357.203op/s] or [-14.621%; -14.575%]

scenario:credit_card/is_card_number/x371413321323331

🟥 execution_time [+736.703ns; +739.160ns] or [+12.928%; +12.971%]
🟥 throughput [-20151008.625op/s; -20088536.766op/s] or [-11.483%; -11.447%]

scenario:credit_card/is_card_number_no_luhn/ 378282246310005

🟥 execution_time [+5.333µs; +5.408µs] or [+9.926%; +10.065%]
🟥 throughput [-1702650.092op/s; -1679984.397op/s] or [-9.148%; -9.026%]

scenario:credit_card/is_card_number_no_luhn/378282246310005

🟥 execution_time [+5.462µs; +5.527µs] or [+10.898%; +11.028%]
🟥 throughput [-1983426.659op/s; -1959537.433op/s] or [-9.940%; -9.821%]

scenario:credit_card/is_card_number_no_luhn/37828224631000521389798

🟥 execution_time [+7.675µs; +7.702µs] or [+17.059%; +17.117%]
🟥 throughput [-3248662.459op/s; -3238542.518op/s] or [-14.617%; -14.571%]

scenario:credit_card/is_card_number_no_luhn/x371413321323331

🟥 execution_time [+737.592ns; +740.325ns] or [+12.942%; +12.990%]
🟥 throughput [-20171865.421op/s; -20103643.403op/s] or [-11.497%; -11.458%]

scenario:ip_address/quantize_peer_ip_address_benchmark

🟥 execution_time [+3.199µs; +3.213µs] or [+63.804%; +64.084%]

scenario:sql/obfuscate_sql_string

🟩 execution_time [-4.384µs; -4.307µs] or [-4.901%; -4.815%]

scenario:tags/replace_trace_tags

🟥 execution_time [+11.061µs; +11.076µs] or [+453.505%; +454.124%]

Candidate

Candidate benchmark details

Group 1

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`f1e15d4`	1758533031	jwiriath/regex-to-regex-lite

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
receiver_entry_point/report/2597	execution_time	6.252ms	6.303ms ± 0.044ms	6.292ms ± 0.013ms	6.307ms	6.370ms	6.491ms	6.611ms	5.08%	3.725	18.500	0.69%	0.003ms	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
receiver_entry_point/report/2597	execution_time	[6.297ms; 6.309ms] or [-0.096%; +0.096%]	None	None	None

Group 2

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`f1e15d4`	1758533031	jwiriath/regex-to-regex-lite

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
benching deserializing traces from msgpack to their internal representation	execution_time	61.214ms	62.097ms ± 2.497ms	61.812ms ± 0.091ms	61.906ms	62.127ms	81.159ms	83.624ms	35.29%	7.928	61.511	4.01%	0.177ms	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
benching deserializing traces from msgpack to their internal representation	execution_time	[61.751ms; 62.443ms] or [-0.557%; +0.557%]	None	None	None

Group 3

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`f1e15d4`	1758533031	jwiriath/regex-to-regex-lite

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
benching serializing traces from their internal representation to msgpack	execution_time	15.005ms	15.057ms ± 0.039ms	15.049ms ± 0.016ms	15.066ms	15.141ms	15.182ms	15.289ms	1.59%	2.893	11.677	0.26%	0.003ms	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
benching serializing traces from their internal representation to msgpack	execution_time	[15.051ms; 15.062ms] or [-0.035%; +0.035%]	None	None	None

Group 4

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`f1e15d4`	1758533031	jwiriath/regex-to-regex-lite

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
credit_card/is_card_number/	execution_time	3.892µs	3.915µs ± 0.003µs	3.915µs ± 0.002µs	3.917µs	3.921µs	3.922µs	3.925µs	0.25%	-1.141	10.491	0.08%	0.000µs	1	200
credit_card/is_card_number/	throughput	254806035.762op/s	255426487.563op/s ± 216339.309op/s	255437136.921op/s ± 134799.127op/s	255577912.933op/s	255677035.899op/s	255729243.895op/s	256931183.773op/s	0.58%	1.170	10.698	0.08%	15297.499op/s	1	200
credit_card/is_card_number/ 3782-8224-6310-005	execution_time	76.998µs	78.677µs ± 0.750µs	78.653µs ± 0.497µs	79.160µs	79.824µs	80.390µs	80.750µs	2.67%	0.081	-0.194	0.95%	0.053µs	1	200
credit_card/is_card_number/ 3782-8224-6310-005	throughput	12383834.513op/s	12711421.517op/s ± 121057.209op/s	12714144.128op/s ± 80711.920op/s	12791887.647op/s	12910348.047op/s	12983453.621op/s	12987356.961op/s	2.15%	-0.030	-0.218	0.95%	8560.037op/s	1	200
credit_card/is_card_number/ 378282246310005	execution_time	70.384µs	72.190µs ± 0.804µs	72.107µs ± 0.591µs	72.777µs	73.452µs	74.187µs	74.540µs	3.37%	0.258	-0.205	1.11%	0.057µs	1	200
credit_card/is_card_number/ 378282246310005	throughput	13415554.921op/s	13854010.704op/s ± 153847.022op/s	13868309.812op/s ± 114112.398op/s	13955036.923op/s	14093004.511op/s	14186791.223op/s	14207674.291op/s	2.45%	-0.200	-0.245	1.11%	10878.627op/s	1	200
credit_card/is_card_number/37828224631	execution_time	3.895µs	3.914µs ± 0.003µs	3.914µs ± 0.002µs	3.916µs	3.919µs	3.922µs	3.923µs	0.23%	-0.989	8.301	0.08%	0.000µs	1	200
credit_card/is_card_number/37828224631	throughput	254896074.730op/s	255467614.807op/s ± 195731.934op/s	255482767.718op/s ± 111370.177op/s	255587308.625op/s	255724120.460op/s	255792916.920op/s	256754450.896op/s	0.50%	1.010	8.443	0.08%	13840.338op/s	1	200
credit_card/is_card_number/378282246310005	execution_time	67.227µs	68.874µs ± 0.790µs	68.843µs ± 0.551µs	69.339µs	70.319µs	70.837µs	70.973µs	3.09%	0.314	-0.292	1.14%	0.056µs	1	200
credit_card/is_card_number/378282246310005	throughput	14089873.319op/s	14521227.224op/s ± 165922.593op/s	14525840.531op/s ± 116081.430op/s	14650964.111op/s	14787367.406op/s	14850804.913op/s	14874918.239op/s	2.40%	-0.259	-0.334	1.14%	11732.499op/s	1	200
credit_card/is_card_number/37828224631000521389798	execution_time	52.496µs	52.670µs ± 0.085µs	52.661µs ± 0.063µs	52.727µs	52.823µs	52.859µs	52.894µs	0.44%	0.357	-0.475	0.16%	0.006µs	1	200
credit_card/is_card_number/37828224631000521389798	throughput	18905632.535op/s	18986263.160op/s ± 30519.501op/s	18989208.437op/s ± 22764.688op/s	19010509.012op/s	19029172.870op/s	19042461.474op/s	19048951.997op/s	0.31%	-0.350	-0.481	0.16%	2158.055op/s	1	200
credit_card/is_card_number/x371413321323331	execution_time	6.429µs	6.436µs ± 0.008µs	6.435µs ± 0.002µs	6.437µs	6.444µs	6.473µs	6.506µs	1.10%	5.214	34.790	0.12%	0.001µs	1	200
credit_card/is_card_number/x371413321323331	throughput	153709974.205op/s	155368544.364op/s ± 191355.119op/s	155406060.851op/s ± 46831.996op/s	155449883.868op/s	155510203.083op/s	155530321.814op/s	155537666.389op/s	0.08%	-5.179	34.347	0.12%	13530.850op/s	1	200
credit_card/is_card_number_no_luhn/	execution_time	3.891µs	3.914µs ± 0.003µs	3.914µs ± 0.002µs	3.915µs	3.918µs	3.920µs	3.922µs	0.20%	-2.113	18.451	0.07%	0.000µs	1	200
credit_card/is_card_number_no_luhn/	throughput	255002719.197op/s	255504200.557op/s ± 189727.674op/s	255508751.175op/s ± 107750.856op/s	255616944.777op/s	255721079.824op/s	255767217.996op/s	257014897.207op/s	0.59%	2.148	18.777	0.07%	13415.773op/s	1	200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005	execution_time	64.915µs	65.151µs ± 0.137µs	65.116µs ± 0.091µs	65.240µs	65.403µs	65.519µs	65.652µs	0.82%	0.826	0.346	0.21%	0.010µs	1	200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005	throughput	15231718.451op/s	15348953.930op/s ± 32305.325op/s	15357269.166op/s ± 21509.657op/s	15375668.389op/s	15389948.287op/s	15395892.887op/s	15404744.661op/s	0.31%	-0.816	0.316	0.21%	2284.331op/s	1	200
credit_card/is_card_number_no_luhn/ 378282246310005	execution_time	58.636µs	59.097µs ± 0.229µs	59.065µs ± 0.139µs	59.238µs	59.543µs	59.666µs	59.825µs	1.29%	0.755	0.331	0.39%	0.016µs	1	200
credit_card/is_card_number_no_luhn/ 378282246310005	throughput	16715378.886op/s	16921713.389op/s ± 65307.234op/s	16930445.500op/s ± 39988.669op/s	16965962.473op/s	17010264.042op/s	17026554.619op/s	17054285.496op/s	0.73%	-0.735	0.289	0.38%	4617.919op/s	1	200
credit_card/is_card_number_no_luhn/37828224631	execution_time	3.896µs	3.914µs ± 0.003µs	3.914µs ± 0.001µs	3.915µs	3.919µs	3.920µs	3.929µs	0.39%	-0.040	11.354	0.07%	0.000µs	1	200
credit_card/is_card_number_no_luhn/37828224631	throughput	254499942.476op/s	255486790.398op/s ± 184578.107op/s	255502631.685op/s ± 96212.596op/s	255595695.035op/s	255692669.540op/s	255747433.085op/s	256670288.179op/s	0.46%	0.069	11.433	0.07%	13051.643op/s	1	200
credit_card/is_card_number_no_luhn/378282246310005	execution_time	55.312µs	55.612µs ± 0.152µs	55.586µs ± 0.090µs	55.678µs	55.901µs	56.008µs	56.167µs	1.04%	0.996	1.048	0.27%	0.011µs	1	200
credit_card/is_card_number_no_luhn/378282246310005	throughput	17804068.012op/s	17982009.202op/s ± 49040.463op/s	17990077.574op/s ± 29027.711op/s	18018919.866op/s	18041614.708op/s	18072613.565op/s	18079323.030op/s	0.50%	-0.979	1.007	0.27%	3467.684op/s	1	200
credit_card/is_card_number_no_luhn/37828224631000521389798	execution_time	52.494µs	52.682µs ± 0.085µs	52.674µs ± 0.055µs	52.730µs	52.836µs	52.908µs	53.004µs	0.63%	0.674	0.871	0.16%	0.006µs	1	200
credit_card/is_card_number_no_luhn/37828224631000521389798	throughput	18866470.667op/s	18981882.054op/s ± 30503.861op/s	18984786.542op/s ± 19691.265op/s	19004151.932op/s	19026017.604op/s	19035389.872op/s	19049769.428op/s	0.34%	-0.662	0.842	0.16%	2156.949op/s	1	200
credit_card/is_card_number_no_luhn/x371413321323331	execution_time	6.429µs	6.438µs ± 0.009µs	6.436µs ± 0.003µs	6.440µs	6.448µs	6.477µs	6.514µs	1.21%	4.445	27.469	0.14%	0.001µs	1	200
credit_card/is_card_number_no_luhn/x371413321323331	throughput	153514337.340op/s	155319458.670op/s ± 220513.605op/s	155374200.117op/s ± 71623.326op/s	155429034.144op/s	155492302.563op/s	155526197.709op/s	155548377.597op/s	0.11%	-4.404	27.000	0.14%	15592.667op/s	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
credit_card/is_card_number/	execution_time	[3.915µs; 3.915µs] or [-0.012%; +0.012%]	None	None	None
credit_card/is_card_number/	throughput	[255396505.015op/s; 255456470.111op/s] or [-0.012%; +0.012%]	None	None	None
credit_card/is_card_number/ 3782-8224-6310-005	execution_time	[78.573µs; 78.780µs] or [-0.132%; +0.132%]	None	None	None
credit_card/is_card_number/ 3782-8224-6310-005	throughput	[12694644.152op/s; 12728198.882op/s] or [-0.132%; +0.132%]	None	None	None
credit_card/is_card_number/ 378282246310005	execution_time	[72.079µs; 72.302µs] or [-0.154%; +0.154%]	None	None	None
credit_card/is_card_number/ 378282246310005	throughput	[13832688.986op/s; 13875332.422op/s] or [-0.154%; +0.154%]	None	None	None
credit_card/is_card_number/37828224631	execution_time	[3.914µs; 3.915µs] or [-0.011%; +0.011%]	None	None	None
credit_card/is_card_number/37828224631	throughput	[255440488.244op/s; 255494741.371op/s] or [-0.011%; +0.011%]	None	None	None
credit_card/is_card_number/378282246310005	execution_time	[68.764µs; 68.983µs] or [-0.159%; +0.159%]	None	None	None
credit_card/is_card_number/378282246310005	throughput	[14498231.948op/s; 14544222.499op/s] or [-0.158%; +0.158%]	None	None	None
credit_card/is_card_number/37828224631000521389798	execution_time	[52.658µs; 52.682µs] or [-0.022%; +0.022%]	None	None	None
credit_card/is_card_number/37828224631000521389798	throughput	[18982033.450op/s; 18990492.869op/s] or [-0.022%; +0.022%]	None	None	None
credit_card/is_card_number/x371413321323331	execution_time	[6.435µs; 6.437µs] or [-0.017%; +0.017%]	None	None	None
credit_card/is_card_number/x371413321323331	throughput	[155342024.385op/s; 155395064.343op/s] or [-0.017%; +0.017%]	None	None	None
credit_card/is_card_number_no_luhn/	execution_time	[3.913µs; 3.914µs] or [-0.010%; +0.010%]	None	None	None
credit_card/is_card_number_no_luhn/	throughput	[255477906.126op/s; 255530494.988op/s] or [-0.010%; +0.010%]	None	None	None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005	execution_time	[65.132µs; 65.170µs] or [-0.029%; +0.029%]	None	None	None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005	throughput	[15344476.723op/s; 15353431.138op/s] or [-0.029%; +0.029%]	None	None	None
credit_card/is_card_number_no_luhn/ 378282246310005	execution_time	[59.065µs; 59.128µs] or [-0.054%; +0.054%]	None	None	None
credit_card/is_card_number_no_luhn/ 378282246310005	throughput	[16912662.434op/s; 16930764.343op/s] or [-0.053%; +0.053%]	None	None	None
credit_card/is_card_number_no_luhn/37828224631	execution_time	[3.914µs; 3.914µs] or [-0.010%; +0.010%]	None	None	None
credit_card/is_card_number_no_luhn/37828224631	throughput	[255461209.647op/s; 255512371.148op/s] or [-0.010%; +0.010%]	None	None	None
credit_card/is_card_number_no_luhn/378282246310005	execution_time	[55.590µs; 55.633µs] or [-0.038%; +0.038%]	None	None	None
credit_card/is_card_number_no_luhn/378282246310005	throughput	[17975212.666op/s; 17988805.739op/s] or [-0.038%; +0.038%]	None	None	None
credit_card/is_card_number_no_luhn/37828224631000521389798	execution_time	[52.670µs; 52.694µs] or [-0.022%; +0.022%]	None	None	None
credit_card/is_card_number_no_luhn/37828224631000521389798	throughput	[18977654.513op/s; 18986109.596op/s] or [-0.022%; +0.022%]	None	None	None
credit_card/is_card_number_no_luhn/x371413321323331	execution_time	[6.437µs; 6.440µs] or [-0.020%; +0.020%]	None	None	None
credit_card/is_card_number_no_luhn/x371413321323331	throughput	[155288897.605op/s; 155350019.735op/s] or [-0.020%; +0.020%]	None	None	None

Group 5

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`f1e15d4`	1758533031	jwiriath/regex-to-regex-lite

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
sql/obfuscate_sql_string	execution_time	84.840µs	85.108µs ± 0.242µs	85.078µs ± 0.084µs	85.166µs	85.322µs	86.180µs	87.411µs	2.74%	5.562	44.207	0.28%	0.017µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
sql/obfuscate_sql_string	execution_time	[85.075µs; 85.142µs] or [-0.039%; +0.039%]	None	None	None

Group 6

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`f1e15d4`	1758533031	jwiriath/regex-to-regex-lite

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
write only interface	execution_time	1.196µs	3.230µs ± 1.438µs	3.009µs ± 0.032µs	3.041µs	3.678µs	14.217µs	14.803µs	391.98%	7.283	54.380	44.42%	0.102µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
write only interface	execution_time	[3.031µs; 3.429µs] or [-6.171%; +6.171%]	None	None	None

Group 7

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`f1e15d4`	1758533031	jwiriath/regex-to-regex-lite

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
two way interface	execution_time	17.821µs	26.217µs ± 10.174µs	18.136µs ± 0.184µs	35.885µs	45.194µs	45.963µs	52.101µs	187.28%	0.693	-1.041	38.71%	0.719µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
two way interface	execution_time	[24.807µs; 27.627µs] or [-5.378%; +5.378%]	None	None	None

Group 8

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`f1e15d4`	1758533031	jwiriath/regex-to-regex-lite

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
ip_address/quantize_peer_ip_address_benchmark	execution_time	8.173µs	8.219µs ± 0.031µs	8.214µs ± 0.024µs	8.241µs	8.266µs	8.271µs	8.389µs	2.13%	1.025	3.144	0.37%	0.002µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
ip_address/quantize_peer_ip_address_benchmark	execution_time	[8.215µs; 8.224µs] or [-0.052%; +0.052%]	None	None	None

Group 9

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`f1e15d4`	1758533031	jwiriath/regex-to-regex-lite

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
redis/obfuscate_redis_string	execution_time	34.621µs	35.001µs ± 0.646µs	34.721µs ± 0.045µs	34.786µs	36.298µs	36.411µs	38.815µs	11.79%	2.253	5.734	1.84%	0.046µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
redis/obfuscate_redis_string	execution_time	[34.912µs; 35.091µs] or [-0.256%; +0.256%]	None	None	None

Group 10

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`f1e15d4`	1758533031	jwiriath/regex-to-regex-lite

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
tags/replace_trace_tags	execution_time	13.395µs	13.507µs ± 0.051µs	13.491µs ± 0.029µs	13.569µs	13.590µs	13.597µs	13.602µs	0.82%	0.474	-1.109	0.38%	0.004µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
tags/replace_trace_tags	execution_time	[13.500µs; 13.514µs] or [-0.052%; +0.052%]	None	None	None

Group 11

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`f1e15d4`	1758533031	jwiriath/regex-to-regex-lite

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
concentrator/add_spans_to_concentrator	execution_time	8.235ms	8.252ms ± 0.012ms	8.250ms ± 0.007ms	8.257ms	8.273ms	8.290ms	8.314ms	0.78%	1.684	4.742	0.14%	0.001ms	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
concentrator/add_spans_to_concentrator	execution_time	[8.250ms; 8.254ms] or [-0.020%; +0.020%]	None	None	None

Group 12

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`f1e15d4`	1758533031	jwiriath/regex-to-regex-lite

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
benching string interning on wordpress profile	execution_time	159.461µs	160.679µs ± 0.735µs	160.621µs ± 0.164µs	160.790µs	161.070µs	161.330µs	170.359µs	6.06%	11.499	149.388	0.46%	0.052µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
benching string interning on wordpress profile	execution_time	[160.578µs; 160.781µs] or [-0.063%; +0.063%]	None	None	None

Group 13

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`f1e15d4`	1758533031	jwiriath/regex-to-regex-lite

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
normalization/normalize_trace/test_trace	execution_time	241.571ns	253.043ns ± 12.345ns	246.923ns ± 3.515ns	257.873ns	276.068ns	287.156ns	290.078ns	17.48%	1.243	0.311	4.87%	0.873ns	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
normalization/normalize_trace/test_trace	execution_time	[251.332ns; 254.754ns] or [-0.676%; +0.676%]	None	None	None

Group 14

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`f1e15d4`	1758533031	jwiriath/regex-to-regex-lite

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...	execution_time	185.297µs	185.789µs ± 0.407µs	185.716µs ± 0.163µs	185.886µs	186.303µs	187.681µs	188.201µs	1.34%	3.195	13.334	0.22%	0.029µs	1	200
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...	throughput	5313473.694op/s	5382473.503op/s ± 11723.536op/s	5384555.896op/s ± 4723.548op/s	5388830.021op/s	5393977.465op/s	5396066.543op/s	5396748.729op/s	0.23%	-3.162	13.106	0.22%	828.979op/s	1	200
normalization/normalize_name/normalize_name/bad-name	execution_time	17.901µs	18.004µs ± 0.049µs	18.004µs ± 0.038µs	18.037µs	18.084µs	18.107µs	18.165µs	0.90%	0.192	-0.360	0.27%	0.003µs	1	200
normalization/normalize_name/normalize_name/bad-name	throughput	55049597.837op/s	55544744.841op/s ± 150370.588op/s	55543640.393op/s ± 116354.232op/s	55665574.331op/s	55778327.761op/s	55835442.898op/s	55862145.369op/s	0.57%	-0.179	-0.377	0.27%	10632.806op/s	1	200
normalization/normalize_name/normalize_name/good	execution_time	10.388µs	10.479µs ± 0.045µs	10.480µs ± 0.031µs	10.510µs	10.551µs	10.594µs	10.607µs	1.22%	0.203	-0.218	0.43%	0.003µs	1	200
normalization/normalize_name/normalize_name/good	throughput	94273795.384op/s	95429466.563op/s ± 411938.026op/s	95421193.050op/s ± 279715.142op/s	95711941.864op/s	96098517.040op/s	96247420.271op/s	96268220.766op/s	0.89%	-0.181	-0.242	0.43%	29128.417op/s	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...	execution_time	[185.733µs; 185.846µs] or [-0.030%; +0.030%]	None	None	None
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...	throughput	[5380848.734op/s; 5384098.272op/s] or [-0.030%; +0.030%]	None	None	None
normalization/normalize_name/normalize_name/bad-name	execution_time	[17.997µs; 18.010µs] or [-0.038%; +0.038%]	None	None	None
normalization/normalize_name/normalize_name/bad-name	throughput	[55523904.923op/s; 55565584.758op/s] or [-0.038%; +0.038%]	None	None	None
normalization/normalize_name/normalize_name/good	execution_time	[10.473µs; 10.485µs] or [-0.060%; +0.060%]	None	None	None
normalization/normalize_name/normalize_name/good	throughput	[95372375.915op/s; 95486557.212op/s] or [-0.060%; +0.060%]	None	None	None

Group 15

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`f1e15d4`	1758533031	jwiriath/regex-to-regex-lite

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...	execution_time	533.883µs	534.913µs ± 0.847µs	534.685µs ± 0.298µs	535.013µs	536.767µs	538.131µs	538.661µs	0.74%	2.210	5.180	0.16%	0.060µs	1	200
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...	throughput	1856455.376op/s	1869466.801op/s ± 2948.606op/s	1870261.294op/s ± 1041.787op/s	1871202.954op/s	1872189.355op/s	1872473.791op/s	1873068.742op/s	0.15%	-2.199	5.126	0.16%	208.498op/s	1	200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて	execution_time	380.284µs	380.842µs ± 0.290µs	380.793µs ± 0.159µs	381.009µs	381.391µs	381.673µs	381.750µs	0.25%	0.705	0.204	0.08%	0.020µs	1	200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて	throughput	2619516.285op/s	2625761.563op/s ± 1997.563op/s	2626098.394op/s ± 1099.386op/s	2627087.875op/s	2628600.614op/s	2629227.426op/s	2629610.951op/s	0.13%	-0.701	0.197	0.08%	141.249op/s	1	200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters	execution_time	190.077µs	190.543µs ± 0.512µs	190.461µs ± 0.142µs	190.639µs	190.867µs	191.491µs	195.399µs	2.59%	7.357	63.755	0.27%	0.036µs	1	200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters	throughput	5117726.071op/s	5248206.235op/s ± 13842.836op/s	5250422.609op/s ± 3918.342op/s	5253875.163op/s	5257776.692op/s	5258928.893op/s	5261023.666op/s	0.20%	-7.263	62.597	0.26%	978.836op/s	1	200
normalization/normalize_service/normalize_service/[empty string]	execution_time	36.880µs	37.129µs ± 0.154µs	37.143µs ± 0.129µs	37.233µs	37.395µs	37.431µs	37.484µs	0.92%	0.225	-0.961	0.41%	0.011µs	1	200
normalization/normalize_service/normalize_service/[empty string]	throughput	26677726.938op/s	26933335.340op/s ± 111943.712op/s	26923244.176op/s ± 93966.056op/s	27046940.294op/s	27088859.394op/s	27096564.788op/s	27115261.554op/s	0.71%	-0.213	-0.973	0.41%	7915.616op/s	1	200
normalization/normalize_service/normalize_service/test_ASCII	execution_time	45.997µs	46.105µs ± 0.052µs	46.099µs ± 0.029µs	46.133µs	46.186µs	46.235µs	46.469µs	0.80%	1.893	10.835	0.11%	0.004µs	1	200
normalization/normalize_service/normalize_service/test_ASCII	throughput	21519913.213op/s	21689670.351op/s ± 24415.519op/s	21692488.964op/s ± 13766.855op/s	21704344.505op/s	21722964.142op/s	21735614.148op/s	21740395.999op/s	0.22%	-1.862	10.578	0.11%	1726.438op/s	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...	execution_time	[534.796µs; 535.031µs] or [-0.022%; +0.022%]	None	None	None
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...	throughput	[1869058.153op/s; 1869875.450op/s] or [-0.022%; +0.022%]	None	None	None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて	execution_time	[380.802µs; 380.882µs] or [-0.011%; +0.011%]	None	None	None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて	throughput	[2625484.720op/s; 2626038.406op/s] or [-0.011%; +0.011%]	None	None	None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters	execution_time	[190.472µs; 190.614µs] or [-0.037%; +0.037%]	None	None	None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters	throughput	[5246287.751op/s; 5250124.719op/s] or [-0.037%; +0.037%]	None	None	None
normalization/normalize_service/normalize_service/[empty string]	execution_time	[37.108µs; 37.151µs] or [-0.058%; +0.058%]	None	None	None
normalization/normalize_service/normalize_service/[empty string]	throughput	[26917821.018op/s; 26948849.662op/s] or [-0.058%; +0.058%]	None	None	None
normalization/normalize_service/normalize_service/test_ASCII	execution_time	[46.098µs; 46.112µs] or [-0.016%; +0.016%]	None	None	None
normalization/normalize_service/normalize_service/test_ASCII	throughput	[21686286.595op/s; 21693054.108op/s] or [-0.016%; +0.016%]	None	None	None

Baseline

Omitted due to size.

Aaalibaba42 · 2025-09-22T09:29:47Z

tools:

Regex used to find typedef/defines at the beginning of c files (Like #define MY_VAR 35) -> Not trivial but doable without regex without being too expensive I think

datadog-trace-obfuscation:

ip_address.rs: Just segmentation of protocol vs address, maybe we could do without it but harder to optimize better than regex crates
replacer.rs: Used to run the clients' regex for obfuscation, we can't do without

ddcommon:

azure_app_services.rs: parsing and getting the "resource group" of azure. Given the Regex pattern we should be able to it without regexes relatively quickly
entitiy_id/unix/mod.rs: Only used for testing in this file, I don't even think that it would be in the release crate with the configuration
entitiy_id/unix/container_id.rs: Used to match cgroup to identify running container id, would be a bit harder to do without

datadog-live-debugger:

expr_eval.rs: I don't have the full context of this, but when condition is checked for strings, it can be in the form of a regex match. So we couldn't do without them if the pattern is not known before-hand.
redacted_names.rs: Most of the file uses regex_automata crate, only one time the regex crate to escape regular expression meta characters. Don't know whether we could do without

data-pipeline:

CAN'T MIGRATE: testing done with httpmock implements traits from the regex crate that are not the same as the regex-lite crate (I think, I did not 100% investigate but compilation message lead me to believe this)
src/telemetry/mod.rs: Just used for testing, Same as above, I don't even believe this would be present in the final release binary

Aaalibaba42 · 2025-09-22T12:20:37Z

https://gitlab.ddbuild.io/DataDog/apm-reliability/libddprof-build/-/jobs/1140399351 Job for size benchmark failed, so I'm pasting the results here:

Artifact Size Benchmark Report

aarch64-alpine-linux-musl

Artifact	Baseline	Commit	Change
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.a	70.98 MB	67.98 MB	--4.22% (-2.99 MB) 💪
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.so	7.19 MB	6.69 MB	--6.94% (-511.95 KB) 💪

aarch64-unknown-linux-gnu

Artifact	Baseline	Commit	Change
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so	9.25 MB	8.65 MB	--6.49% (-615.48 KB) 💪
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.a	83.24 MB	79.85 MB	--4.07% (-3.39 MB) 💪

libdatadog-x64-windows

Artifact	Baseline	Commit	Change
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.dll	18.39 MB	16.83 MB	--8.46% (-1.55 MB) 💪
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.lib	65.01 KB	65.01 KB	0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.pdb	124.93 MB	120.41 MB	--3.61% (-4.51 MB) 💪
/libdatadog-x64-windows/debug/static/datadog_profiling_ffi.lib	653.09 MB	641.35 MB	--1.79% (-11.73 MB) 💪
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.dll	5.89 MB	5.36 MB	--8.88% (-536.00 KB) 💪
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.lib	65.01 KB	65.01 KB	0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.pdb	17.36 MB	16.24 MB	--6.43% (-1.11 MB) 💪
/libdatadog-x64-windows/release/static/datadog_profiling_ffi.lib	32.22 MB	30.14 MB	--6.44% (-2.07 MB) 💪

libdatadog-x86-windows

Artifact	Baseline	Commit	Change
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.dll	15.67 MB	14.73 MB	--6.05% (-971.50 KB) 💪
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.lib	66.01 KB	66.01 KB	0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.pdb	127.26 MB	124.42 MB	--2.22% (-2.83 MB) 💪
/libdatadog-x86-windows/debug/static/datadog_profiling_ffi.lib	643.25 MB	634.37 MB	--1.38% (-8.87 MB) 💪
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.dll	4.49 MB	4.10 MB	--8.80% (-405.50 KB) 💪
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.lib	66.01 KB	66.01 KB	0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.pdb	18.50 MB	17.62 MB	--4.77% (-904.00 KB) 💪
/libdatadog-x86-windows/release/static/datadog_profiling_ffi.lib	30.26 MB	28.72 MB	--5.08% (-1.53 MB) 💪

x86_64-alpine-linux-musl

Artifact	Baseline	Commit	Change
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.a	63.63 MB	60.13 MB	--5.49% (-3.49 MB) 💪
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.so	8.50 MB	7.93 MB	--6.79% (-591.96 KB) 💪

x86_64-unknown-linux-gnu

Artifact	Baseline	Commit	Change
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.a	78.03 MB	74.42 MB	--4.62% (-3.61 MB) 💪
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so	9.84 MB	9.21 MB	--6.46% (-652.00 KB) 💪

paullegranddc

One not about specifying the dependency but otherwise LGTM

paullegranddc · 2025-09-23T14:10:44Z

tools/Cargo.toml


 [dependencies]
-regex = "1"
+regex-lite = "^0.1"


"^0.1" is equivalent to "0.1" so you don't really need to add it
https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#caret-requirements

Also you should probably add it as a workspace dependencies, with a minimum constraint to the highest minor available ("0.1.7" right now)

What about instances where it is likely using a whole regex engine (even the lite one) is superfluous as described in this comment: #1232 (comment)

Is it worth exploring ? Would it be a separate PR ?

Aaalibaba42 · 2025-09-23T14:52:24Z

There is also the potential pitfall of Unicode support: One of the corners cut by regex-lite to be smol was to sacrifice a little correctness, notably around Unicode support. In the instances where this PR changes regex to regex-lite:

are there instances where unicode could be used ?
and if so would the corners cut by regex-lite lead to bugs ?

github-actions bot added the mini-agent label Sep 19, 2025

Aaalibaba42 added 4 commits September 22, 2025 10:19

feat(tools): Migrated from regex to regex-lite

7b949eb

chore: bundle-license thingy

fc6a230

feat(datadog-trace-obfuscation): Migrated from regex to regex-lite

b579f07

chore: rebased main (a7c8765)

b9ee66d

Aaalibaba42 force-pushed the jwiriath/regex-to-regex-lite branch from 87e6cfa to b9ee66d Compare September 22, 2025 08:21

feat(ddcommon): Migrated from regex to regex-lite

9df1a1e

github-actions bot added the common label Sep 22, 2025

feat(datadog-live-debugger): Migrated from regex to regex-lite

f1e15d4

Aaalibaba42 marked this pull request as ready for review September 22, 2025 12:11

Aaalibaba42 requested review from a team as code owners September 22, 2025 12:11

paullegranddc reviewed Sep 23, 2025

View reviewed changes

Migrate from regex to regex-lite #1232

Are you sure you want to change the base?

Migrate from regex to regex-lite #1232

Uh oh!

Conversation

Aaalibaba42 commented Sep 19, 2025

What does this PR do?

Motivation

How to test the change?

Uh oh!

codecov-commenter commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pr-commenter bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Comparison

scenario:benching serializing traces from their internal representation to msgpack

scenario:credit_card/is_card_number/37828224631000521389798

scenario:credit_card/is_card_number/x371413321323331

scenario:credit_card/is_card_number_no_luhn/ 378282246310005

scenario:credit_card/is_card_number_no_luhn/378282246310005

scenario:credit_card/is_card_number_no_luhn/37828224631000521389798

scenario:credit_card/is_card_number_no_luhn/x371413321323331

scenario:ip_address/quantize_peer_ip_address_benchmark

scenario:sql/obfuscate_sql_string

scenario:tags/replace_trace_tags

Candidate

Group 1

Group 2

Group 3

Group 4

Group 5

Group 6

Group 7

Group 8

Group 9

Group 10

Group 11

Group 12

Group 13

Group 14

Group 15

Baseline

Uh oh!

Aaalibaba42 commented Sep 22, 2025

Uh oh!

Aaalibaba42 commented Sep 22, 2025

Artifact Size Benchmark Report

Uh oh!

paullegranddc left a comment

Choose a reason for hiding this comment

Uh oh!

paullegranddc Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Aaalibaba42 Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Aaalibaba42 commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Sep 19, 2025 •

edited

Loading

pr-commenter bot commented Sep 19, 2025 •

edited

Loading

Aaalibaba42 Sep 23, 2025 •

edited

Loading

Aaalibaba42 commented Sep 23, 2025 •

edited

Loading