Problem with diacritics and transliterating to lists

I've run into some issues in several languages, where diacritics lead to strange behavior. 

Example in French:
```
lang_code = 'fra-Latn' 
epi = epitran.Epitran(lang_code)
print(epi.trans_list(u"mobilisèrent"))
print(epi.trans_delimiter(u"mobilisèrent"))
print(epi.trans_delimiter(u"mobilisèrent", delimiter='~'))
```
which yields the outputs
```
['m', 'ɔ', 'b', 'i', 'l', 'i', 'z', 'ə', '̀', 'ʀ', 'ɑ̃']
m ɔ b i l i z ə ̀ ʀ ɑ̃
m~ɔ~b~i~l~i~z~ə~̀~ʀ~ɑ̃
```
when using space as a delimiter the diacritic attaches itself to the next letter, when using any other delimiter like tilder, it outputs an extra delimiter and the diacritic then modifies the delimiter (in this case a tilder, but the same happens with any chosen delimiter).

This happens in other languages as well, so far I've tried Portuguese, Italian, same thing.

Is this expected behavior or is there some kind of trick I am unaware of? To my understanding a diacritic is not considered an additional phoneme, but instead a modifier. I also understand that unicode uses a postfix notation for diacritics, so is this perhaps an encoding issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problem with diacritics and transliterating to lists #174

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Problem with diacritics and transliterating to lists #174

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions