Skip to content

Regex misses Windows-style newlines #822

@BigZaphod

Description

@BigZaphod

Description

I ran into a problem while trying to strip HTML whitespace from a string. The pattern I used here comes from many online examples and appears to match the rules for what HTML considers a whitespace, but when used in Swift it seems to not be catching Windows-style two byte newlines (\r\n) in my input.

I did find a workaround by turning on the .matchingSemantics(.unicodeScalar) mode, but that was pretty unexpected and so I'm filing this on the off chance it's an actual bug.

Reproduction

import Foundation

// I'm building a string from hex here so that we don't lose the Windows newlines
// somewhere along the way. I tried copy-pasting the offending string into the
// editor, but I think Xcode or something else was converting things when I did
// that. This seemed a good enough way to ensure nothing gets confused anywhere.

let bytes: [CChar] = [
    0x48, 0x65, 0x6C, 0x6C, 0x6F,
    0x0D, 0x0A, // Windows-style two byte newline (\r\n)
    0x57, 0x6F, 0x72, 0x6C, 0x64, 0x21,
    0x00
]

let clip = String(utf8String: bytes)!

// This first pattern does not catch the Windows-style newline. In fact it looks
// like it misses it entirely which is not what I expected at all. The printed
// string contains the Windows newline within and no replacing occurred.
let pattern1 = #/[\t\n\r ]+/#
print(clip.replacing(pattern1, with: ", "))

// <this line intentionally left blank>
print()

// Changing the pattern to use the unicodeScalar semantics seems to workaround
// it and the Windows-style newline is properly replaced. I don't know if
// this is more "technically correct" or if I'm hitting a bug here? It seems
// unexpected. The above regex is one I see referenced all over on the web for
// matching the same whitespaces definition that HTML uses, so it seemed odd
// it didn't work in Swift without turning on a flag first.
let pattern2 = pattern1.matchingSemantics(.unicodeScalar)
print(clip.replacing(pattern2, with: ", "))

Here's an Xcode playground file:
RegexBug2.playground.zip

Expected behavior

I expected the pattern to catch Windows-style newlines, but it didn't!

Environment

Xcode 26.0 beta 5 (17A5295f)
swift-driver version: 1.127.11.2 Apple Swift version 6.2 (swiftlang-6.2.0.16.14 clang-1700.3.16.4)
Target: arm64-apple-macosx15.0

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions