util: add fast path to stripVTControlCharacters#61833
util: add fast path to stripVTControlCharacters#61833privatenumber wants to merge 5 commits intonodejs:mainfrom
Conversation
|
Review requested:
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #61833 +/- ##
=======================================
Coverage 89.72% 89.73%
=======================================
Files 675 675
Lines 204797 204804 +7
Branches 39344 39353 +9
=======================================
+ Hits 183752 183773 +21
+ Misses 13324 13291 -33
- Partials 7721 7740 +19
🚀 New features to boost your workflow:
|
addaleax
left a comment
There was a problem hiding this comment.
Can you leave a comment in the source about why this short-circuiting is added? Because it still "looks" redundant when just reading the source.
lib/internal/util/inspect.js
Outdated
| if (!StringPrototypeIncludes(str, '\u001B') && | ||
| !StringPrototypeIncludes(str, '\u009B')) | ||
| return str; |
There was a problem hiding this comment.
You are doing two passes here, why not use a simple regex here that only checks these 2 characters in one pass?
| if (!StringPrototypeIncludes(str, '\u001B') && | |
| !StringPrototypeIncludes(str, '\u009B')) | |
| return str; | |
| if (!RegExpPrototypeTest(/[\u001B\u009B]/, str)) | |
| return str; |
Did you test the performance of this? I think this would be even faster
There was a problem hiding this comment.
It's unlikely that a Regex test will be faster.
Possibly this is better:
if (StringPrototypeIndexOf(str, '\u001B') === -1 &&
StringPrototypeIndexOf(str, '\u009B') === -1)
There was a problem hiding this comment.
Done — added a comment in bacd1a7 explaining why the short-circuit is there.
There was a problem hiding this comment.
@gurgunday Thanks for the suggestion! I benchmarked a single RegExpPrototypeTest(/[\u001B\u009B]/, str) call (one pass) vs two StringPrototypeIndexOf calls (two passes). Even with the extra pass, two indexOf calls are faster — especially for longer strings:
=== Short (~50 chars) ===
2x indexOf 17.6 ns/op
1x RegExp.test 29.5 ns/op
RegExp.test / indexOf = 1.67x
=== Long (~9991 chars) ===
2x indexOf 542.6 ns/op
1x RegExp.test 3991.4 ns/op
RegExp.test / indexOf = 7.36x
Single-char indexOf is a simpler operation than regex character class matching, so two indexOf passes still wins over one regex pass.
Switched to StringPrototypeIndexOf per @RafaelGSS's suggestion.
Benchmark code
const short = 'Hello, World! This is a test string without ANSI.';
const long = 'Long plain text without ANSI. '.repeat(333);
const re = /[\u001B\u009B]/;
const n = 2_000_000;
const shortArr = Array.from({length: 100}, (_, i) => short + String(i));
const longArr = Array.from({length: 100}, (_, i) => long + String(i));
function bench(label, fn) {
fn();
const runs = [];
for (let r = 0; r < 5; r++) {
const start = performance.now();
fn();
runs.push(performance.now() - start);
}
runs.sort((a, b) => a - b);
const median = runs[Math.floor(runs.length / 2)];
const perOp = (median / n * 1e6).toFixed(1);
console.log(label.padEnd(30) + median.toFixed(0).padStart(6) + ' ms (' + perOp + ' ns/op)');
return median;
}
console.log('=== Short (~' + shortArr[0].length + ' chars) ===');
const a = bench(' indexOf', () => { let r; for (let i = 0; i < n; i++) { const s = shortArr[i % 100]; r = s.indexOf('\u001B') === -1 && s.indexOf('\u009B') === -1; } return r; });
const b = bench(' RegExp.test', () => { let r; for (let i = 0; i < n; i++) { const s = shortArr[i % 100]; r = !re.test(s); } return r; });
console.log(' RegExp.test / indexOf = ' + (b / a).toFixed(2) + 'x');
console.log('\n=== Long (~' + longArr[0].length + ' chars) ===');
const c = bench(' indexOf', () => { let r; for (let i = 0; i < n; i++) { const s = longArr[i % 100]; r = s.indexOf('\u001B') === -1 && s.indexOf('\u009B') === -1; } return r; });
const d = bench(' RegExp.test', () => { let r; for (let i = 0; i < n; i++) { const s = longArr[i % 100]; r = !re.test(s); } return r; });
console.log(' RegExp.test / indexOf = ' + (d / c).toFixed(2) + 'x');
RafaelGSS
left a comment
There was a problem hiding this comment.
Please, share nodejs benchmark results. The one you attached in the PR description doesn't seem Node.js one.
lib/internal/util/inspect.js
Outdated
| if (!StringPrototypeIncludes(str, '\u001B') && | ||
| !StringPrototypeIncludes(str, '\u009B')) | ||
| return str; |
There was a problem hiding this comment.
It's unlikely that a Regex test will be faster.
Possibly this is better:
if (StringPrototypeIndexOf(str, '\u001B') === -1 &&
StringPrototypeIndexOf(str, '\u009B') === -1)
|
@RafaelGSS Switched to Node.js benchmark results using ~1.93x faster for short non-ANSI strings, ~6.91x for long non-ANSI strings. Negligible impact on ANSI input. |
Problem
stripVTControlCharactersalways runs a regex replacement, even when the input contains no ANSI escape codes. This is the common case — most strings passed through are already plain text (e.g. checking/sanitizing user input, processing log lines that are mostly text).Changes
Adds a
StringPrototypeIndexOfguard for ESC (\u001B) and CSI (\u009B) before the regex. Since all ANSI escape sequences start with one of these introducers, their absence means no ANSI codes exist and we can return the string immediately — skipping regex execution.Based on the same optimization proposed in chalk/strip-ansi#54.
Also adds a benchmark and test coverage for the fast path, 7-bit ESC sequences, and 8-bit CSI sequences.
Benchmarks
Node.js benchmark results using
benchmark/compare.js(30 runs, two separate builds):~1.93x faster for short non-ANSI strings, ~6.91x for long non-ANSI strings. Negligible impact on ANSI input.