-
Notifications
You must be signed in to change notification settings - Fork 49
Description
Description
I ran into a problem while trying to strip HTML whitespace from a string. The pattern I used here comes from many online examples and appears to match the rules for what HTML considers a whitespace, but when used in Swift it seems to not be catching Windows-style two byte newlines (\r\n) in my input.
I did find a workaround by turning on the .matchingSemantics(.unicodeScalar)
mode, but that was pretty unexpected and so I'm filing this on the off chance it's an actual bug.
Reproduction
import Foundation
// I'm building a string from hex here so that we don't lose the Windows newlines
// somewhere along the way. I tried copy-pasting the offending string into the
// editor, but I think Xcode or something else was converting things when I did
// that. This seemed a good enough way to ensure nothing gets confused anywhere.
let bytes: [CChar] = [
0x48, 0x65, 0x6C, 0x6C, 0x6F,
0x0D, 0x0A, // Windows-style two byte newline (\r\n)
0x57, 0x6F, 0x72, 0x6C, 0x64, 0x21,
0x00
]
let clip = String(utf8String: bytes)!
// This first pattern does not catch the Windows-style newline. In fact it looks
// like it misses it entirely which is not what I expected at all. The printed
// string contains the Windows newline within and no replacing occurred.
let pattern1 = #/[\t\n\r ]+/#
print(clip.replacing(pattern1, with: ", "))
// <this line intentionally left blank>
print()
// Changing the pattern to use the unicodeScalar semantics seems to workaround
// it and the Windows-style newline is properly replaced. I don't know if
// this is more "technically correct" or if I'm hitting a bug here? It seems
// unexpected. The above regex is one I see referenced all over on the web for
// matching the same whitespaces definition that HTML uses, so it seemed odd
// it didn't work in Swift without turning on a flag first.
let pattern2 = pattern1.matchingSemantics(.unicodeScalar)
print(clip.replacing(pattern2, with: ", "))
Here's an Xcode playground file:
RegexBug2.playground.zip
Expected behavior
I expected the pattern to catch Windows-style newlines, but it didn't!
Environment
Xcode 26.0 beta 5 (17A5295f)
swift-driver version: 1.127.11.2 Apple Swift version 6.2 (swiftlang-6.2.0.16.14 clang-1700.3.16.4)
Target: arm64-apple-macosx15.0
Additional information
No response