-
Notifications
You must be signed in to change notification settings - Fork 15
Migrate from regex to regex-lite #1232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1232 +/- ##
==========================================
- Coverage 71.65% 71.64% -0.01%
==========================================
Files 354 354
Lines 56063 56027 -36
==========================================
- Hits 40172 40143 -29
+ Misses 15891 15884 -7
🚀 New features to boost your workflow:
|
BenchmarksComparisonBenchmark execution time: 2025-09-22 09:37:10 Comparing candidate commit f1e15d4 in PR branch Found 1 performance improvements and 15 performance regressions! Performance is the same for 37 metrics, 2 unstable metrics. scenario:benching serializing traces from their internal representation to msgpack
scenario:credit_card/is_card_number/37828224631000521389798
scenario:credit_card/is_card_number/x371413321323331
scenario:credit_card/is_card_number_no_luhn/ 378282246310005
scenario:credit_card/is_card_number_no_luhn/378282246310005
scenario:credit_card/is_card_number_no_luhn/37828224631000521389798
scenario:credit_card/is_card_number_no_luhn/x371413321323331
scenario:ip_address/quantize_peer_ip_address_benchmark
scenario:sql/obfuscate_sql_string
scenario:tags/replace_trace_tags
CandidateCandidate benchmark detailsGroup 1
Group 2
Group 3
Group 4
Group 5
Group 6
Group 7
Group 8
Group 9
Group 10
Group 11
Group 12
Group 13
Group 14
Group 15
BaselineOmitted due to size. |
87e6cfa
to
b9ee66d
Compare
tools:
datadog-trace-obfuscation:
ddcommon:
datadog-live-debugger:
data-pipeline:
|
https://gitlab.ddbuild.io/DataDog/apm-reliability/libddprof-build/-/jobs/1140399351 Job for size benchmark failed, so I'm pasting the results here: Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One not about specifying the dependency but otherwise LGTM
|
||
[dependencies] | ||
regex = "1" | ||
regex-lite = "^0.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"^0.1" is equivalent to "0.1" so you don't really need to add it
https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#caret-requirements
Also you should probably add it as a workspace dependencies, with a minimum constraint to the highest minor available ("0.1.7" right now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about instances where it is likely using a whole regex engine (even the lite one) is superfluous as described in this comment: #1232 (comment)
Is it worth exploring ? Would it be a separate PR ?
There is also the potential pitfall of Unicode support: One of the corners cut by regex-lite to be smol was to sacrifice a little correctness, notably around Unicode support. In the instances where this PR changes regex to regex-lite:
|
What does this PR do?
Migrate from the
regex
crate toregex-lite
.Motivation
The
regex
crate is very fast, but takes up lots of space in the binaries.regex-lite
might introduce regression in performance, but we should see it's much more optimized in space.How to test the change?
There is (I believe so) a benchmark on the size of the artifacts (and the performance). We can evaluate if this change is worth it based on those.