How to decide a grapheme break efficiently in mid-string? #314
Replies: 1 comment
-
|
Good question … if you scrutinize the algorithm described in the Unicode standard, it should be possible to start in the middle of the string and scan backwards for the last character from which we can determine the state … in most cases you shouldn't have to scan all the way to the beginning of the string. The most rigorous version of this algorithm would be Unicode version-dependent, so ideally should be implemented within utf8proc itself. However, a simplified version of the algorithm would be search backwards through the characters for a |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Sorry if I ask naive question. I am quite new to unicode. As the warning part said: "
utf8proc_grapheme_break_statefulmust be called IN ORDER on ALL potential breaks in a string. However, it is safe to reset the state to zero after a grapheme break". So how to recognize a break in the middle of the str? Only start from the beginning of the str? I have to recognize a break before an offset.After I read the uax29,I have found that there are only a few cases need the state. I think I can call stateless version and decide whether this is break based on codepoint category? Is it correct?
Beta Was this translation helpful? Give feedback.
All reactions