Skip to content

util: add fast path to stripVTControlCharacters#61833

Open
privatenumber wants to merge 5 commits intonodejs:mainfrom
privatenumber:strip-vt-fast-path
Open

util: add fast path to stripVTControlCharacters#61833
privatenumber wants to merge 5 commits intonodejs:mainfrom
privatenumber:strip-vt-fast-path

Conversation

@privatenumber
Copy link
Contributor

@privatenumber privatenumber commented Feb 15, 2026

Problem

stripVTControlCharacters always runs a regex replacement, even when the input contains no ANSI escape codes. This is the common case — most strings passed through are already plain text (e.g. checking/sanitizing user input, processing log lines that are mostly text).

Changes

Adds a StringPrototypeIndexOf guard for ESC (\u001B) and CSI (\u009B) before the regex. Since all ANSI escape sequences start with one of these introducers, their absence means no ANSI codes exist and we can return the string immediately — skipping regex execution.

Based on the same optimization proposed in chalk/strip-ansi#54.

Also adds a benchmark and test coverage for the fast path, 7-bit ESC sequences, and 8-bit CSI sequences.

Benchmarks

Node.js benchmark results using benchmark/compare.js (30 runs, two separate builds):

                                                         confidence improvement accuracy (*)   (**)  (***)
util/strip-vt-control-characters.js input='noAnsi-short' n=1000000        ***     93.43 %       ±3.42% ±4.56% ±5.94%
util/strip-vt-control-characters.js input='noAnsi-long'  n=1000000        ***    591.26 %       ±9.71% ±12.97% ±16.97%
util/strip-vt-control-characters.js input='ansi-short'   n=1000000                -0.32 %       ±1.39% ±1.85% ±2.41%
util/strip-vt-control-characters.js input='ansi-long'    n=1000000                 1.74 %       ±2.55% ±3.39% ±4.42%

~1.93x faster for short non-ANSI strings, ~6.91x for long non-ANSI strings. Negligible impact on ANSI input.

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/performance

@nodejs-github-bot nodejs-github-bot added needs-ci PRs that need a full CI run. util Issues and PRs related to the built-in util module. labels Feb 15, 2026
@privatenumber privatenumber marked this pull request as ready for review February 15, 2026 03:15
@codecov
Copy link

codecov bot commented Feb 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.73%. Comparing base (9cc7fcc) to head (911e04c).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #61833   +/-   ##
=======================================
  Coverage   89.72%   89.73%           
=======================================
  Files         675      675           
  Lines      204797   204804    +7     
  Branches    39344    39353    +9     
=======================================
+ Hits       183752   183773   +21     
+ Misses      13324    13291   -33     
- Partials     7721     7740   +19     
Files with missing lines Coverage Δ
lib/internal/util/inspect.js 99.67% <100.00%> (+<0.01%) ⬆️

... and 33 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave a comment in the source about why this short-circuiting is added? Because it still "looks" redundant when just reading the source.

@addaleax addaleax added the request-ci Add this label to start a Jenkins CI on a PR. label Feb 15, 2026
@gurgunday gurgunday added the needs-benchmark-ci PR that need a benchmark CI run. label Feb 15, 2026
Comment on lines 3039 to 3041
if (!StringPrototypeIncludes(str, '\u001B') &&
!StringPrototypeIncludes(str, '\u009B'))
return str;
Copy link
Member

@gurgunday gurgunday Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are doing two passes here, why not use a simple regex here that only checks these 2 characters in one pass?

Suggested change
if (!StringPrototypeIncludes(str, '\u001B') &&
!StringPrototypeIncludes(str, '\u009B'))
return str;
if (!RegExpPrototypeTest(/[\u001B\u009B]/, str))
return str;

Did you test the performance of this? I think this would be even faster

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unlikely that a Regex test will be faster.

Possibly this is better:

if (StringPrototypeIndexOf(str, '\u001B') === -1 &&
    StringPrototypeIndexOf(str, '\u009B') === -1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — added a comment in bacd1a7 explaining why the short-circuit is there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gurgunday Thanks for the suggestion! I benchmarked a single RegExpPrototypeTest(/[\u001B\u009B]/, str) call (one pass) vs two StringPrototypeIndexOf calls (two passes). Even with the extra pass, two indexOf calls are faster — especially for longer strings:

=== Short (~50 chars) ===
  2x indexOf                  17.6 ns/op
  1x RegExp.test              29.5 ns/op
  RegExp.test / indexOf = 1.67x

=== Long (~9991 chars) ===
  2x indexOf                 542.6 ns/op
  1x RegExp.test            3991.4 ns/op
  RegExp.test / indexOf = 7.36x

Single-char indexOf is a simpler operation than regex character class matching, so two indexOf passes still wins over one regex pass.

Switched to StringPrototypeIndexOf per @RafaelGSS's suggestion.

Benchmark code
const short = 'Hello, World! This is a test string without ANSI.';
const long = 'Long plain text without ANSI. '.repeat(333);
const re = /[\u001B\u009B]/;
const n = 2_000_000;

const shortArr = Array.from({length: 100}, (_, i) => short + String(i));
const longArr = Array.from({length: 100}, (_, i) => long + String(i));

function bench(label, fn) {
  fn();
  const runs = [];
  for (let r = 0; r < 5; r++) {
    const start = performance.now();
    fn();
    runs.push(performance.now() - start);
  }
  runs.sort((a, b) => a - b);
  const median = runs[Math.floor(runs.length / 2)];
  const perOp = (median / n * 1e6).toFixed(1);
  console.log(label.padEnd(30) + median.toFixed(0).padStart(6) + ' ms  (' + perOp + ' ns/op)');
  return median;
}

console.log('=== Short (~' + shortArr[0].length + ' chars) ===');
const a = bench('  indexOf', () => { let r; for (let i = 0; i < n; i++) { const s = shortArr[i % 100]; r = s.indexOf('\u001B') === -1 && s.indexOf('\u009B') === -1; } return r; });
const b = bench('  RegExp.test', () => { let r; for (let i = 0; i < n; i++) { const s = shortArr[i % 100]; r = !re.test(s); } return r; });
console.log('  RegExp.test / indexOf = ' + (b / a).toFixed(2) + 'x');

console.log('\n=== Long (~' + longArr[0].length + ' chars) ===');
const c = bench('  indexOf', () => { let r; for (let i = 0; i < n; i++) { const s = longArr[i % 100]; r = s.indexOf('\u001B') === -1 && s.indexOf('\u009B') === -1; } return r; });
const d = bench('  RegExp.test', () => { let r; for (let i = 0; i < n; i++) { const s = longArr[i % 100]; r = !re.test(s); } return r; });
console.log('  RegExp.test / indexOf = ' + (d / c).toFixed(2) + 'x');

Copy link
Member

@RafaelGSS RafaelGSS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, share nodejs benchmark results. The one you attached in the PR description doesn't seem Node.js one.

Comment on lines 3039 to 3041
if (!StringPrototypeIncludes(str, '\u001B') &&
!StringPrototypeIncludes(str, '\u009B'))
return str;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unlikely that a Regex test will be faster.

Possibly this is better:

if (StringPrototypeIndexOf(str, '\u001B') === -1 &&
    StringPrototypeIndexOf(str, '\u009B') === -1)

@RafaelGSS RafaelGSS added the performance Issues and PRs related to the performance of Node.js. label Feb 15, 2026
@privatenumber
Copy link
Contributor Author

@RafaelGSS Switched to StringPrototypeIndexOf in 911e04c.

Node.js benchmark results using benchmark/compare.js with 30 runs, comparing two separate builds (without vs with fast path):

                                                         confidence improvement accuracy (*)   (**)  (***)
util/strip-vt-control-characters.js input='noAnsi-short' n=1000000        ***     93.43 %       ±3.42% ±4.56% ±5.94%
util/strip-vt-control-characters.js input='noAnsi-long'  n=1000000        ***    591.26 %       ±9.71% ±12.97% ±16.97%
util/strip-vt-control-characters.js input='ansi-short'   n=1000000                -0.32 %       ±1.39% ±1.85% ±2.41%
util/strip-vt-control-characters.js input='ansi-long'    n=1000000                 1.74 %       ±2.55% ±3.39% ±4.42%

~1.93x faster for short non-ANSI strings, ~6.91x for long non-ANSI strings. Negligible impact on ANSI input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-benchmark-ci PR that need a benchmark CI run. needs-ci PRs that need a full CI run. performance Issues and PRs related to the performance of Node.js. request-ci Add this label to start a Jenkins CI on a PR. util Issues and PRs related to the built-in util module.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants