Skip to content

buffer: improve performance of multiple Buffer operations#61871

Open
thisalihassan wants to merge 6 commits intonodejs:mainfrom
thisalihassan:buffer-perf-improvements
Open

buffer: improve performance of multiple Buffer operations#61871
thisalihassan wants to merge 6 commits intonodejs:mainfrom
thisalihassan:buffer-perf-improvements

Conversation

@thisalihassan
Copy link
Contributor

@thisalihassan thisalihassan commented Feb 17, 2026

Summary

Multiple performance improvements to Buffer operations, verified with benchmarks (15-30 runs, comparing old vs new binaries built from same tree).

Changes

Buffer.copyBytesFrom() (+100-210%)
Avoid intermediate TypedArrayPrototypeSlice allocation by calculating byte offsets directly into the source TypedArray's underlying ArrayBuffer.

Buffer.prototype.fill("t", "ascii") (+26-37%)

ASCII indexOf (+14-46%)
Call indexOfString directly for ASCII encoding instead of first converting the search value to a Buffer via fromStringFast and then calling indexOfBuffer. ASCII and Latin-1 share the same byte values for characters 0-127.

swap16/32/64 (+3-38%)
Add V8 Fast API C++ functions (FastSwap16/32/64) alongside the existing slow path. Largest gains at len=256 (+35%).

Benchmark results

Key results (15-30 runs, *** = p < 0.001):

Benchmark Improvement
copyBytesFrom (offset, Uint8Array, len=256) +210% ***
copyBytesFrom (offset+length, Uint8Array, len=256) +206% ***
swap16 len=256 +38% ***
fill("t", "ascii") size=8192 +37% ***
indexOf ASCII 'Alice' +46% ***
indexOf ASCII '@' +31% ***
fill("t", "ascii") size=65536 +26% ***
swap64 len=768 aligned +12% ***

No regressions observed. Full benchmark CSV attached.
compare-all-buffers-final.csv

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/performance

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Feb 17, 2026
- copyBytesFrom: calculate byte offsets directly instead of
  slicing into an intermediate typed array
- toString('hex'): use V8 Uint8Array.prototype.toHex() builtin
- fill: add single-char ASCII fast path
- indexOf: use indexOfString directly for ASCII encoding
- swap16/32/64: add V8 Fast API functions
@thisalihassan thisalihassan force-pushed the buffer-perf-improvements branch from d2ba38f to 495feb5 Compare February 17, 2026 21:41
Comment on lines 1210 to 1217
void FastSwap16(Local<Value> receiver,
Local<Value> buffer_obj,
// NOLINTNEXTLINE(runtime/references)
FastApiCallbackOptions& options) {
HandleScope scope(options.isolate);
ArrayBufferViewContents<char> buffer(buffer_obj);
CHECK(nbytes::SwapBytes16(const_cast<char*>(buffer.data()), buffer.length()));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fast callbacks are non-identical to the conventional callbacks they shadow.

  • The existing callbacks validate their argument and throws to JS if invalid, whereas your fast callbacks hard-crash the process. It might be better to validate in the JS layer, then use the same unwrapping logic on both sides.
  • Your fast callback cannot have a different return convention to the conventional callback. You will need to remove the return value from the conventional callback.

lib/buffer.js Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same behaviour AFAICT, but encodingsMap.ascii seems more appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes same behaviour IndexOfString only has branches for UCS2, UTF8, and Latin1, Adding an ASCII branch to IndexOfString would just be a duplicate of the Latin1 branch since ASCII is a subset of Latin1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my point, these are still discrete encodings even though the behaviour here is the same, so this should pass encodingsMap.ascii to the binding and IndexOfString should add a condition to send this down the same path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 1210 to 1217
void FastSwap16(Local<Value> receiver,
Local<Value> buffer_obj,
// NOLINTNEXTLINE(runtime/references)
FastApiCallbackOptions& options) {
HandleScope scope(options.isolate);
ArrayBufferViewContents<char> buffer(buffer_obj);
CHECK(nbytes::SwapBytes16(const_cast<char*>(buffer.data()), buffer.length()));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fast callbacks should include debug tracking and call tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing these I will update the code

@Renegade334 Renegade334 added performance Issues and PRs related to the performance of Node.js. needs-benchmark-ci PR that need a benchmark CI run. labels Feb 17, 2026
@codecov
Copy link

codecov bot commented Feb 17, 2026

Codecov Report

❌ Patch coverage is 89.70588% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.72%. Comparing base (4f13746) to head (59f5d09).
⚠️ Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
src/node_buffer.cc 72.00% 0 Missing and 7 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #61871   +/-   ##
=======================================
  Coverage   89.72%   89.72%           
=======================================
  Files         675      675           
  Lines      204806   204875   +69     
  Branches    39355    39369   +14     
=======================================
+ Hits       183761   183824   +63     
+ Misses      13330    13329    -1     
- Partials     7715     7722    +7     
Files with missing lines Coverage Δ
lib/buffer.js 99.78% <100.00%> (+<0.01%) ⬆️
src/node_buffer.cc 68.06% <72.00%> (+0.08%) ⬆️

... and 32 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@thisalihassan thisalihassan force-pushed the buffer-perf-improvements branch 3 times, most recently from 64a4b55 to 1395d2f Compare February 17, 2026 23:41
- Guard ensureUint8ArrayToHex against --no-js-base-64 flag by
  falling back to C++ hexSlice when toHex is unavailable
- Remove THROW_AND_RETURN_UNLESS_BUFFER and return value from
  slow Swap16/32/64 to match fast path conventions (JS validates)
- Add TRACK_V8_FAST_API_CALL to FastSwap16/32/64
- Add test/parallel/test-buffer-swap-fast.js for fast API verification
@thisalihassan thisalihassan force-pushed the buffer-perf-improvements branch from 1395d2f to 01ba74f Compare February 17, 2026 23:42
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrayBufferViewContents is wrong here, as buffer.data() may be a stack-allocated copy of the byte data rather than the data itself. SPREAD_BUFFER_ARG is the correct macro to use here, as per the conventional callback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Renegade334 replaced ArrayBufferViewContents with SPREAD_BUFFER_ARG in all three fast swap callbacks

Copy link
Member

@ChALkeR ChALkeR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For toHex, wait until #61609, which improves native perf significantly (more than Uint8Array.prototype.toHex)

See also #60249 (comment)

} else if (value.length === 1) {
// Fast path: If `value` fits into a single byte, use that numeric value.
if (normalizedEncoding === 'utf8') {
if (normalizedEncoding === 'utf8' || normalizedEncoding === 'ascii') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, ascii behaves exactly like latin1
Unsure if by design or accidentally

Copy link
Contributor Author

@thisalihassan thisalihassan Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is safe by design I am just extending the existing single byte numeric optimization to cover ASCII, since the guard already constrains it to the valid ASCII range.

@thisalihassan
Copy link
Contributor Author

Note on toBase64 / toBase64url:

I also tried replacing the C++ base64Slice/base64urlSlice bindings with V8's Uint8Array.prototype.toBase64() (similar to the toHex change) but it caused a 35-54% regression across all buffer sizes so I reverted base64/base64url and kept only the toHex optimization which showed a clear +26-37% win.

@ChALkeR
Copy link
Member

ChALkeR commented Feb 18, 2026

@thisalihassan toHex doesn't show a win anymore with nbytes update which should soon land (as it landed in nbytes)

Instead, it's ~3x slower.

See nodejs/nbytes#12

@thisalihassan
Copy link
Contributor Author

thisalihassan commented Feb 18, 2026

Hi @ChALkeR thanks for flagging I was not aware. I benchmarked the nibble approach locally and it's indeed a much bigger win (~3x vs my ~30% with toHex). Reverted the toHex path entirely the other changes in this PR are unaffected.

Should I include the nbytes nibble HexEncode optimization in this PR or keep them as separate PRs?

PS: One test is failing /test/parallel/test-debugger-restart-message.js I believe it's known mac issue and unrelated to my changes

Remove V8 Uint8Array.prototype.toHex() path for Buffer.toString('hex')
in favour of the upcoming nbytes HexEncode improvement (nodejs/nbytes#12)
which is ~3x faster through the existing C++ hexSlice path.

Refs: nodejs/nbytes#12
Comment on lines -1203 to -1204
Environment* env = Environment::GetCurrent(args);
THROW_AND_RETURN_UNLESS_BUFFER(env, args[0]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we remove these lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_swap16 is an internal binding and this is always a Buffer, so the THROW_AND_RETURN_UNLESS_BUFFER check could never fail also removed Environment since it was only needed by that macro.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of these methods should test the output of relevant swrap methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done each test now verifies the swapped output

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please re-review when you get time?

} else if (value.length === 1) {
// Fast path: If `value` fits into a single byte, use that numeric value.
if (normalizedEncoding === 'utf8') {
if (normalizedEncoding === 'utf8' || normalizedEncoding === 'ascii') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update comment to why we can add ASCII here.

if (length !== undefined) {
validateInteger(length, 'length', 0);
end = offset + length;
end = MathMin(offset + length, viewLength);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment to why we have this change now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, the clamping is required because we switched from TypedArrayPrototypeSlice (which clamped silently)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-benchmark-ci PR that need a benchmark CI run. needs-ci PRs that need a full CI run. performance Issues and PRs related to the performance of Node.js.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments