Skip to content

toke.c dont call libc's memcmp() to test 1 byte in Perl_scan_str() #23533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: blead
Choose a base branch
from

Conversation

bulk88
Copy link
Contributor

@bulk88 bulk88 commented Aug 3, 2025

delim_byte_len is almost always 1, open_delim_str is almost always '"' or ''' or something similar. I'm not sure which exact string of PP code will make delim_byte_len not be 1, but it would be too rare to optimize for but still must be supported.

Just test the char directly if its length of 1. Invoking libc memcmp() requires 4 ABI inputs on any CPU, and while most of the code paths above the memEQ() lines are constants directly initialized inside Perl_scan_str(), one branch uses "utf8_to_uv_or_die(,,&delim_byte_len)" which optimizes to Perl_utf8_to_uvchr_buf_helper(,,,&delim_byte_len) making the value in STRLEN delim_byte_len unbounded according to all CC. All CCs must assume the value Perl_utf8_to_uvchr_buf_helper() put inside delim_byte_len could be a 4.7GB DVD or 25GB BD .iso file.

Put the retval of SvGROW() to use.

Don't let C auto var delim_byte_len escape with "&" op thru utf8_to_uv_or_die(). Var delim_byte_len can never be stored in a register again by any CC if it escapes and must be reread from C stack after ever possible call if it escapes.


  • This set of changes does not require a perldelta entry.

delim_byte_len is almost always 1, open_delim_str is almost always
'"' or '\'' or something similar. I'm not sure which exact string of
PP code will make delim_byte_len not be 1, but it would be too rare to
optimize for but still must be supported.

Just test the char directly if its length of 1. Invoking libc memcmp()
requires 4 ABI inputs on any CPU, and while most of the code paths
above the memEQ() lines are constants directly initialized inside
Perl_scan_str(), one branch uses "utf8_to_uv_or_die(,,&delim_byte_len)"
which optimizes to Perl_utf8_to_uvchr_buf_helper(,,,&delim_byte_len)
making the value in STRLEN delim_byte_len unbounded according to all CC.
All CCs must assume the value Perl_utf8_to_uvchr_buf_helper() put inside
delim_byte_len could be a 4.7GB DVD or 25GB BD .iso file.

Put the retval of SvGROW() to use.

Don't let C auto var delim_byte_len escape with "&" op thru
utf8_to_uv_or_die(). Var delim_byte_len can never be stored in a register
again by any CC if it escapes and must be reread from C stack after ever
possible call if it escapes.
Copy link
Contributor

@leonerd leonerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Commit message is reasonably descriptive and readable as to what the change actually is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants