-
Notifications
You must be signed in to change notification settings - Fork 157
Support Cantonese #200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Cantonese #200
Conversation
|
Update after discussion with @dmort27 When parsing a word with several characters, also add the Jyutping of each character into the dictionary to avoid OOV problem. This will definitely introduce some issues as some characters may have more than one pronunciation. I still get an issue with Jyutping -> IPA mapping. E.g., The step (2) is totally unexpected, as the |
|
is t really supposed to turn into aspirated t in all environments ? |
|
No. so that the step (2) is not expected. the rule |
|
maybe make t -> tʰ word-initial? seems like u already covered the cases where [t] is word-final |
|
Can you also update the README.md too? Thanks! |
|
Closing & reopening PR for triggering CI |
|
@jctian98 , can u resolve merge conflicts? Also, there is slight change in coding style of download.py, you might want to change it. |
|
I added some special characters to indicate the I think this PR is ready to review now. @dmort27 |
|
It seems that the above issue have been added in tests, and been resolved. Merging the PR. Thanks a lot everyone! |

Add Cantonese support, which mainly follows Chinese implementation.
Key features:
(1) The Cantonese-to-Jyutping dictionary from CC-Canto.
(2) Jyutping-to-IPA dictionary from Wiki
Other important points:
(1) In the Jyutping-to-IPA rules, the
mandnghave different IPA parsing that are dependent on context. Specifically, they can (1) represent isolated syllables; and (2) as a consonant. So we first check (1)(2) The order of Jyutping-to-IPA rules matters: jyutping items with long characters are checked first.
I haven't found a good test example for Chinese, maybe because it needs file downloading.
As suggestion is welcome