-
Notifications
You must be signed in to change notification settings - Fork 157
Description
Description
The Danish pre-processor file (epitran/data/pre/dan-Latn.txt) has a symbol name mismatch that causes RuleFileError: Undefined symbol: ::consonant:: when loading the dan-Latn language.
Version
epitran 1.34.0 (installed via pip)
Problem
In data/pre/dan-Latn.txt, the symbols are defined with plural names:
::vowels:: = a|e|i|o|u|y|æ|ø|å
::consonants:: = b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|z
But they are used throughout the file with singular names:
a -> aː / _[^::consonant::][^::consonant::]
a -> a / _::consonant::::consonant::
...
r -> 0 / _::vowel::
This causes the Rules._sub_symbols() method to raise an exception because ::consonant:: and ::vowel:: are never defined.
Expected Behavior
The symbols should be consistently named. Looking at the post-processor file (data/post/dan-Latn.txt) and other language files like cym-Latn.txt, the convention is to use singular names:
::vowel:: = ...
::consonant:: = ...
Steps to Reproduce
import epitranepi = epitran.Epitran("dan-Latn")
Error Output
epitran.rules.RuleFileError: Undefined symbol: ::consonant::
Full traceback:
File ".../epitran/simple.py", line 66, in __init__
self.preprocessor = PrePostProcessor(code, 'pre', False)
File ".../epitran/ppprocessor.py", line 27, in __init__
self.rules = self._read_rules(code, fix, rev)
File ".../epitran/ppprocessor.py", line 36, in _read_rules
return Rules([Path(str(resource_path))])
File ".../epitran/rules.py", line 33, in __init__
rules = self._read_rule_file(rule_file)
File ".../epitran/rules.py", line 47, in _read_rule_file
rules.append(self._read_rule(i, line))
File ".../epitran/rules.py", line 76, in _read_rule
line = self._sub_symbols(line)
File ".../epitran/rules.py", line 65, in _sub_symbols
raise RuleFileError('Undefined symbol: {}'.format(s))
epitran.rules.RuleFileError: Undefined symbol: ::consonant::
Suggested Fix
Change lines 1-2 in epitran/data/pre/dan-Latn.txt from:
::vowels:: = a|e|i|o|u|y|æ|ø|å::
consonants:: = b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|z
to:
::vowel:: = a|e|i|o|u|y|æ|ø|å
::consonant:: = b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|z