-
Notifications
You must be signed in to change notification settings - Fork 155
Open
Description
SPACE_OR_PUNCTUATION has changed at some point and now is regexp: /[\n\r\p{Z}\p{P}]+/u
This matches quotation characters " and ' which results in, for me at least, unwanted and incorrect results. ie. p{P} matches all unicode punctuation characters.
For example: song's now matches "song" and "s" - so every 's' character in a document matches. Further documents which don't include "song" but do include "s' match. You can see this using: Demo
The older SPACE_OR_PUNCTUATION regexp did not use the new Unicode categories and did not match " and ' etc.
From reading: Unicode Character Categories I can't see how any of the Punctuation categories can be used for SPACE_OR_PUNCTUATION. That said I hadn't come across these before now.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels