-
Notifications
You must be signed in to change notification settings - Fork 22.9k
Reference for stage 3 regex-escaping #36928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
557085a
a1724ca
f657890
e0e5e03
b416287
39e2cfa
09f6a73
dcd50d5
b5b5eb1
53d173f
ee33968
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
--- | ||
title: RegExp.escape() | ||
slug: Web/JavaScript/Reference/Global_Objects/RegExp/escape | ||
page-type: javascript-static-method | ||
browser-compat: javascript.builtins.RegExp.escape | ||
--- | ||
|
||
{{JSRef}} | ||
|
||
The **`RegExp.escape()`** static method [escapes](/en-US/docs/Web/JavaScript/Reference/Regular_expressions#escape_sequences) any potential regex syntax characters in a string, and returns a new string that can be safely used as a [literal](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Literal_character) pattern for the {{jsxref("RegExp/RegExp", "RegExp()")}} constructor. | ||
|
||
When dynamically creating a {{jsxref("RegExp")}} with user-provided content, consider using this function to sanitize the input (unless the input is actually intended to contain regex syntax). In addition, don't try to re-implement its functionality by, for example, using {{jsxref("String.prototype.replaceAll()")}} to insert a `\` before all syntax characters. `RegExp.escape()` is designed to use escape sequences that work in many more edge cases/contexts than hand-crafted code is likely to achieve. | ||
|
||
## Syntax | ||
|
||
```js-nolint | ||
RegExp.escape(string) | ||
``` | ||
|
||
### Parameters | ||
|
||
- `string` | ||
- : The string to escape. | ||
|
||
### Return value | ||
|
||
A new string that can be safely used as a literal pattern for the {{jsxref("RegExp/RegExp", "RegExp()")}} constructor. Namely, the following things in the input string are replaced: | ||
|
||
- The first character of the string, if it's either a decimal digit (0–9) or ASCII letter (a–z, A–Z), is escaped using the `\x` [character escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape) syntax. For example, `RegExp.escape("foo")` returns `"\\x66oo"` (here and after, the two backslashes in a string literal denote a single backslash character). This step ensures that if this escaped string is embedded into a bigger pattern where it's immediately preceded by `\1`, `\x0`, `\u000`, etc., the leading character doesn't get interpreted as part of the escape sequence. | ||
- Regex [syntax characters](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Literal_character#description), including `^`, `$`, `\`, `.`, `*`, `+`, `?`, `(`, `)`, `[`, `]`, `{`, `}`, and `|`, as well as the `/` delimiter, are escaped by inserting a `\` character before them. For example, `RegExp.escape("foo.bar")` returns `"\\x66oo\\.bar"`, and `RegExp.escape("(foo)")` returns `"\\(foo\\)"`. | ||
- Other punctuators, including `,`, `-`, `=`, `<`, `>`, `#`, `&`, `!`, `%`, `:`, `;`, `@`, `~`, `'`, `` ` ``, and `"`, are escaped using the `\x` syntax. For example, `RegExp.escape("foo-bar")` returns `"\\x66oo\\x2dbar"`. These characters cannot be escaped by prefixing with `\` because, for example, `/foo\-bar/u` is a syntax error. | ||
- The characters with their own [character escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape) sequences: `\f` (U+000C FORM FEED), `\n` (U+000A LINE FEED), `\r` (U+000D CARRIAGE RETURN), `\t` (U+0009 CHARACTER TABULATION), and `\v` (U+000B LINE TABULATION), are replaced with their escape sequences. For example, `RegExp.escape("foo\nbar")` returns `"\\x66oo\\nbar"`. | ||
- The space character is escaped as `"\\x20"`. | ||
- Other non-ASCII [line break and white space characters](/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#white_space) are replaced with one or two `\uXXXX` escape sequences representing their UTF-16 code units. For example, `RegExp.escape("foo\u2028bar")` returns `"\\x66oo\\u2028bar"`. | ||
- [Lone surrogates](/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#utf-16_characters_unicode_code_points_and_grapheme_clusters) are replaced with their `\uXXXX` escape sequences. For example, `RegExp.escape("foo\uD800bar")` returns `"\\x66oo\\ud800bar"`. | ||
|
||
### Exceptions | ||
|
||
- {{jsxref("TypeError")}} | ||
- : Thrown if `string` is not a string. | ||
|
||
## Examples | ||
|
||
### Using RegExp.escape() | ||
|
||
The following examples demonstrate various inputs and outputs for the `RegExp.escape()` method. | ||
|
||
```js | ||
RegExp.escape("Buy it. use it. break it. fix it."); | ||
// "\\x42uy\\x20it\\.\\x20use\\x20it\\.\\x20break\\x20it\\.\\x20fix\\x20it\\." | ||
RegExp.escape("foo.bar"); // "\\x66oo\\.bar" | ||
RegExp.escape("foo-bar"); // "\\x66oo\\x2dbar" | ||
RegExp.escape("foo\nbar"); // "\\x66oo\\nbar" | ||
RegExp.escape("foo\uD800bar"); // "\\x66oo\\ud800bar" | ||
RegExp.escape("foo\u2028bar"); // "\\x66oo\\u2028bar" | ||
``` | ||
|
||
### Using RegExp.escape() with the RegExp constructor | ||
|
||
The primary use case of `RegExp.escape()` is when you want to embed a string into a bigger regex pattern, and you want to ensure that the string is treated as a literal pattern, not as a regex syntax. Consider the following naïve example that replaces URLs: | ||
|
||
```js | ||
function removeDomain(text, domain) { | ||
return text.replace(new RegExp(`https?://${domain}(?=/)`, "g"), ""); | ||
} | ||
|
||
const input = | ||
"Consider using [RegExp.escape()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/escape) to escape special characters in a string."; | ||
const domain = "developer.mozilla.org"; | ||
console.log(removeDomain(input, domain)); | ||
// Consider using [RegExp.escape()](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/escape) to escape special characters in a string. | ||
``` | ||
|
||
Inserting the `domain` above results in the regular expression literal `https?://developer.mozilla.org(?=/)`, where the "." character is a regex [wildcard](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Wildcard) character. This means the string will match the string with any character in place of the ".", such as `developer-mozilla-org`. Therefore, it would incorrectly also change the following text: | ||
|
||
```js | ||
const input = | ||
"This is not an MDN link: https://developer-mozilla.org/, be careful!"; | ||
const domain = "developer.mozilla.org"; | ||
console.log(removeDomain(input, domain)); | ||
// This is not an MDN link: /, be careful! | ||
``` | ||
|
||
To fix this, we can use `RegExp.escape()` to ensure that any user input is treated as a literal pattern: | ||
|
||
```js | ||
function removeDomain(text, domain) { | ||
return text.replace( | ||
new RegExp(`https?://${RegExp.escape(domain)}(?=/)`, "g"), | ||
"", | ||
); | ||
} | ||
``` | ||
|
||
Now this function will do exactly what we intend to, and will not transform `developer-mozilla.org` URLs. | ||
|
||
## Specifications | ||
|
||
{{Specifications}} | ||
|
||
## Browser compatibility | ||
|
||
{{Compat}} | ||
|
||
## See also | ||
|
||
- [Polyfill of `RegExp.escape` in `core-js`](https://github.com/zloirock/core-js#regexp-escaping) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are you still only linking to core-js polyfills? if not, it'd be great to include https://www.npmjs.com/package/regexp.escape There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I personally would not object, but I would need to see what others say. In the meantime let's hold the same policy that other maintainers have found acceptable, which is to only consistently include core-js polyfills. Hope you would understand. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure. Where can I go to revisit this discussion? Last time I tried it never went anywhere, and another polyfill maintainer tried recently and it seemed to get overly heated. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll discuss it internally. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks we still hold https://github.com/orgs/mdn/discussions/475 as our conclusion; i.e. as long as someone steps out to endorse es-shims we are happy to add it, and we should formally document a list of trusted polyfill sources and reject everything not in this list (with another process to endorse more). I'm happy to add es-shims, but I'm too busy to initiate the process in the next few months. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Who would need to endorse it? Who endorsed core-js originally? Does it require an MDN team member to initiate the process, or can I? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think that endorsement was implicitly given due to its wide usage via Babel. Happy to formalize the decision in the meta docs.
Just someone who raised a PR and got it merged without any objections. We don't know exactly yet as it's entirely new to us. I bet you can, too, as long as you get multiple feedback on your PR. I imagine it would be put under either https://developer.mozilla.org/en-US/docs/MDN/Writing_guidelines/What_we_write/Criteria_for_inclusion or https://developer.mozilla.org/en-US/docs/MDN/Writing_guidelines/Writing_style_guide#external_links, as a new section called "Inclusion of polyfills" that basically restates the conclusion in https://github.com/orgs/mdn/discussions/475 and say "we only allow polyfills from trusted sources, which are the following:" |
||
- {{jsxref("RegExp")}} | ||
Josh-Cena marked this conversation as resolved.
Show resolved
Hide resolved
|
Uh oh!
There was an error while loading. Please reload this page.