-
-
Notifications
You must be signed in to change notification settings - Fork 208
feat: new string format "extended-unicode" #760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if setting the threshold at (via the fastJson(schema, {stringOptimizations: {...} })
:
would do the job without introducing a non-standard format. It may break other side-features such as a swagger viewer.
I have also introduced the "unsafe" string format into FJS. But stringOptimizations seem good for me, for both unsafe and dirty. |
Signed-off-by: Nigro Simone <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked claude for some suggestions for technically descriptive alternatives for the name dirty
. THats what claude answered:
Looking at the code and documentation, the dirty
format is used for strings that are "known to contain non-printable characters or surrogate pairs."
Here are some more technically descriptive name suggestions:
Most technically accurate:
non-printable
- directly describes what it handles (non-printable characters)surrogate-pairs
- references the specific Unicode issue it addressesextended-unicode
- indicates it handles extended Unicode characters
More general but descriptive:
escaped
- indicates the string needs special character escapingcontrol-chars
- references control/non-printable charactersbinary-safe
- indicates it safely handles binary/control characters
My recommendation would be non-printable
as it most accurately describes the primary use case - strings containing characters that aren't standard printable ASCII/Unicode characters. This is more technically precise than "dirty" while remaining clear about its purpose.
Alternatively, extended-unicode
would work well if you want to emphasize that it's specifically for handling Unicode edge cases like surrogate pairs.
I personally tend to extended-unicode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont have any objection. But I dont see the bigger picture.
@Eomm
PTAL
For string known to contain non-printable characters or surrogate pairs.
For very long strings whose presence of characters to escape is known and certain, it is useless to call
asString
, it is just a waste of time (always execute a regexSTR_ESCAPE
), eg. long product description with new lines.Checklist
npm run test
andnpm run benchmark
and the Code of conduct
Benchmark