Skip to content

Commit 5b3ca00

Browse files
authored
Merge pull request #1944 from LukasKalbertodt/cr-lf-fixes
Fix and clarify CR LF normalization and CR in string literals
2 parents e0625a7 + f4e3da2 commit 5b3ca00

File tree

2 files changed

+4
-5
lines changed

2 files changed

+4
-5
lines changed

src/input-format.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ r[input.crlf]
2424
## CRLF normalization
2525

2626
Each pair of characters `U+000D` (CR) immediately followed by `U+000A` (LF) is replaced by a single `U+000A` (LF).
27+
This happens once, not repeatedly, so after the normalization, there can still exist `U+000D` (CR) immediately followed by `U+000A` (LF) in the input (e.g. if the raw input contained "CR CR LF LF").
2728

2829
Other occurrences of the character `U+000D` (CR) are left in place (they are treated as [whitespace]).
2930

src/tokens.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -60,8 +60,6 @@ Literals are tokens used in [literal expressions].
6060

6161
[^nsets]: The number of `#`s on each side of the same literal must be equivalent.
6262

63-
> [!NOTE]
64-
> Character and string literal tokens never include the sequence of `U+000D` (CR) immediately followed by `U+000A` (LF): this pair would have been previously transformed into a single `U+000A` (LF).
6563

6664
#### ASCII escapes
6765

@@ -198,9 +196,9 @@ which must be _escaped_ by a preceding `U+005C` character (`\`).
198196

199197
r[lex.token.literal.str.linefeed]
200198
Line-breaks, represented by the character `U+000A` (LF), are allowed in string literals.
199+
The character `U+000D` (CR) may not appear in a string literal.
201200
When an unescaped `U+005C` character (`\`) occurs immediately before a line break, the line break does not appear in the string represented by the token.
202201
See [String continuation escapes] for details.
203-
The character `U+000D` (CR) may not appear in a string literal other than as part of such a string continuation escape.
204202

205203
r[lex.token.literal.char-escape]
206204
#### Character escapes
@@ -323,9 +321,9 @@ below.
323321

324322
r[lex.token.str-byte.linefeed]
325323
Line-breaks, represented by the character `U+000A` (LF), are allowed in byte string literals.
324+
The character `U+000D` (CR) may not appear in a byte string literal.
326325
When an unescaped `U+005C` character (`\`) occurs immediately before a line break, the line break does not appear in the string represented by the token.
327326
See [String continuation escapes] for details.
328-
The character `U+000D` (CR) may not appear in a byte string literal other than as part of such a string continuation escape.
329327

330328
r[lex.token.str-byte.escape]
331329
Some additional _escapes_ are available in either byte or non-raw byte string
@@ -429,9 +427,9 @@ permitted within a C string.
429427

430428
r[lex.token.str-c.linefeed]
431429
Line-breaks, represented by the character `U+000A` (LF), are allowed in C string literals.
430+
The character `U+000D` (CR) may not appear in a C string literal.
432431
When an unescaped `U+005C` character (`\`) occurs immediately before a line break, the line break does not appear in the string represented by the token.
433432
See [String continuation escapes] for details.
434-
The character `U+000D` (CR) may not appear in a C string literal other than as part of such a string continuation escape.
435433

436434
r[lex.token.str-c.escape]
437435
Some additional _escapes_ are available in non-raw C string literals. An escape

0 commit comments

Comments
 (0)