buffer: improve performance of multiple Buffer operations#61871
buffer: improve performance of multiple Buffer operations#61871thisalihassan wants to merge 6 commits intonodejs:mainfrom
Conversation
|
Review requested:
|
- copyBytesFrom: calculate byte offsets directly instead of
slicing into an intermediate typed array
- toString('hex'): use V8 Uint8Array.prototype.toHex() builtin
- fill: add single-char ASCII fast path
- indexOf: use indexOfString directly for ASCII encoding
- swap16/32/64: add V8 Fast API functions
d2ba38f to
495feb5
Compare
| void FastSwap16(Local<Value> receiver, | ||
| Local<Value> buffer_obj, | ||
| // NOLINTNEXTLINE(runtime/references) | ||
| FastApiCallbackOptions& options) { | ||
| HandleScope scope(options.isolate); | ||
| ArrayBufferViewContents<char> buffer(buffer_obj); | ||
| CHECK(nbytes::SwapBytes16(const_cast<char*>(buffer.data()), buffer.length())); | ||
| } |
There was a problem hiding this comment.
These fast callbacks are non-identical to the conventional callbacks they shadow.
- The existing callbacks validate their argument and throws to JS if invalid, whereas your fast callbacks hard-crash the process. It might be better to validate in the JS layer, then use the same unwrapping logic on both sides.
- Your fast callback cannot have a different return convention to the conventional callback. You will need to remove the return value from the conventional callback.
lib/buffer.js
Outdated
There was a problem hiding this comment.
Same behaviour AFAICT, but encodingsMap.ascii seems more appropriate.
There was a problem hiding this comment.
Yes same behaviour IndexOfString only has branches for UCS2, UTF8, and Latin1, Adding an ASCII branch to IndexOfString would just be a duplicate of the Latin1 branch since ASCII is a subset of Latin1.
There was a problem hiding this comment.
That's my point, these are still discrete encodings even though the behaviour here is the same, so this should pass encodingsMap.ascii to the binding and IndexOfString should add a condition to send this down the same path.
| void FastSwap16(Local<Value> receiver, | ||
| Local<Value> buffer_obj, | ||
| // NOLINTNEXTLINE(runtime/references) | ||
| FastApiCallbackOptions& options) { | ||
| HandleScope scope(options.isolate); | ||
| ArrayBufferViewContents<char> buffer(buffer_obj); | ||
| CHECK(nbytes::SwapBytes16(const_cast<char*>(buffer.data()), buffer.length())); | ||
| } |
There was a problem hiding this comment.
Fast callbacks should include debug tracking and call tests.
There was a problem hiding this comment.
Thanks for sharing these I will update the code
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #61871 +/- ##
=======================================
Coverage 89.72% 89.72%
=======================================
Files 675 675
Lines 204806 204875 +69
Branches 39355 39369 +14
=======================================
+ Hits 183761 183824 +63
+ Misses 13330 13329 -1
- Partials 7715 7722 +7
🚀 New features to boost your workflow:
|
64a4b55 to
1395d2f
Compare
- Guard ensureUint8ArrayToHex against --no-js-base-64 flag by falling back to C++ hexSlice when toHex is unavailable - Remove THROW_AND_RETURN_UNLESS_BUFFER and return value from slow Swap16/32/64 to match fast path conventions (JS validates) - Add TRACK_V8_FAST_API_CALL to FastSwap16/32/64 - Add test/parallel/test-buffer-swap-fast.js for fast API verification
1395d2f to
01ba74f
Compare
src/node_buffer.cc
Outdated
There was a problem hiding this comment.
ArrayBufferViewContents is wrong here, as buffer.data() may be a stack-allocated copy of the byte data rather than the data itself. SPREAD_BUFFER_ARG is the correct macro to use here, as per the conventional callback.
There was a problem hiding this comment.
@Renegade334 replaced ArrayBufferViewContents with SPREAD_BUFFER_ARG in all three fast swap callbacks
There was a problem hiding this comment.
For toHex, wait until #61609, which improves native perf significantly (more than Uint8Array.prototype.toHex)
See also #60249 (comment)
| } else if (value.length === 1) { | ||
| // Fast path: If `value` fits into a single byte, use that numeric value. | ||
| if (normalizedEncoding === 'utf8') { | ||
| if (normalizedEncoding === 'utf8' || normalizedEncoding === 'ascii') { |
There was a problem hiding this comment.
Currently, ascii behaves exactly like latin1
Unsure if by design or accidentally
There was a problem hiding this comment.
yes, this is safe by design I am just extending the existing single byte numeric optimization to cover ASCII, since the guard already constrains it to the valid ASCII range.
|
Note on toBase64 / toBase64url: I also tried replacing the C++ base64Slice/base64urlSlice bindings with V8's Uint8Array.prototype.toBase64() (similar to the toHex change) but it caused a 35-54% regression across all buffer sizes so I reverted base64/base64url and kept only the toHex optimization which showed a clear +26-37% win. |
|
@thisalihassan toHex doesn't show a win anymore with Instead, it's ~3x slower. See nodejs/nbytes#12 |
|
Hi @ChALkeR thanks for flagging I was not aware. I benchmarked the nibble approach locally and it's indeed a much bigger win (~3x vs my ~30% with toHex). Reverted the toHex path entirely the other changes in this PR are unaffected. Should I include the nbytes nibble HexEncode optimization in this PR or keep them as separate PRs?
|
Remove V8 Uint8Array.prototype.toHex() path for Buffer.toString('hex')
in favour of the upcoming nbytes HexEncode improvement (nodejs/nbytes#12)
which is ~3x faster through the existing C++ hexSlice path.
Refs: nodejs/nbytes#12
| Environment* env = Environment::GetCurrent(args); | ||
| THROW_AND_RETURN_UNLESS_BUFFER(env, args[0]); |
There was a problem hiding this comment.
_swap16 is an internal binding and this is always a Buffer, so the THROW_AND_RETURN_UNLESS_BUFFER check could never fail also removed Environment since it was only needed by that macro.
There was a problem hiding this comment.
Each of these methods should test the output of relevant swrap methods.
There was a problem hiding this comment.
Done each test now verifies the swapped output
There was a problem hiding this comment.
can you please re-review when you get time?
| } else if (value.length === 1) { | ||
| // Fast path: If `value` fits into a single byte, use that numeric value. | ||
| if (normalizedEncoding === 'utf8') { | ||
| if (normalizedEncoding === 'utf8' || normalizedEncoding === 'ascii') { |
There was a problem hiding this comment.
Please update comment to why we can add ASCII here.
| if (length !== undefined) { | ||
| validateInteger(length, 'length', 0); | ||
| end = offset + length; | ||
| end = MathMin(offset + length, viewLength); |
There was a problem hiding this comment.
Can you add a comment to why we have this change now?
There was a problem hiding this comment.
Sure, the clamping is required because we switched from TypedArrayPrototypeSlice (which clamped silently)
Summary
Multiple performance improvements to Buffer operations, verified with benchmarks (15-30 runs, comparing old vs new binaries built from same tree).
Changes
Buffer.copyBytesFrom()(+100-210%)Avoid intermediate
TypedArrayPrototypeSliceallocation by calculating byte offsets directly into the source TypedArray's underlying ArrayBuffer.Buffer.prototype.fill("t", "ascii")(+26-37%)ASCII
indexOf(+14-46%)Call
indexOfStringdirectly for ASCII encoding instead of first converting the search value to a Buffer viafromStringFastand then callingindexOfBuffer. ASCII and Latin-1 share the same byte values for characters 0-127.swap16/32/64(+3-38%)Add V8 Fast API C++ functions (
FastSwap16/32/64) alongside the existing slow path. Largest gains at len=256 (+35%).Benchmark results
Key results (15-30 runs,
***= p < 0.001):No regressions observed. Full benchmark CSV attached.
compare-all-buffers-final.csv