Skip to content

Bug Report: Symbol name mismatch in Danish (dan-Latn) pre-processor file #242

@andreas-solti

Description

@andreas-solti

Description

The Danish pre-processor file (epitran/data/pre/dan-Latn.txt) has a symbol name mismatch that causes RuleFileError: Undefined symbol: ::consonant:: when loading the dan-Latn language.

Version

epitran 1.34.0 (installed via pip)

Problem

In data/pre/dan-Latn.txt, the symbols are defined with plural names:

::vowels:: = a|e|i|o|u|y|æ|ø|å
::consonants:: = b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|z

But they are used throughout the file with singular names:

a -> aː / _[^::consonant::][^::consonant::]
a -> a / _::consonant::::consonant::
...
r -> 0 / _::vowel::

This causes the Rules._sub_symbols() method to raise an exception because ::consonant:: and ::vowel:: are never defined.

Expected Behavior

The symbols should be consistently named. Looking at the post-processor file (data/post/dan-Latn.txt) and other language files like cym-Latn.txt, the convention is to use singular names:

::vowel:: = ...
::consonant:: = ...

Steps to Reproduce

import epitranepi = epitran.Epitran("dan-Latn")

Error Output

epitran.rules.RuleFileError: Undefined symbol: ::consonant::
Full traceback:

File ".../epitran/simple.py", line 66, in __init__
    self.preprocessor = PrePostProcessor(code, 'pre', False)
File ".../epitran/ppprocessor.py", line 27, in __init__
    self.rules = self._read_rules(code, fix, rev)
File ".../epitran/ppprocessor.py", line 36, in _read_rules
    return Rules([Path(str(resource_path))])
File ".../epitran/rules.py", line 33, in __init__
    rules = self._read_rule_file(rule_file)
File ".../epitran/rules.py", line 47, in _read_rule_file
    rules.append(self._read_rule(i, line))
File ".../epitran/rules.py", line 76, in _read_rule
    line = self._sub_symbols(line)
File ".../epitran/rules.py", line 65, in _sub_symbols
    raise RuleFileError('Undefined symbol: {}'.format(s))
epitran.rules.RuleFileError: Undefined symbol: ::consonant::

Suggested Fix

Change lines 1-2 in epitran/data/pre/dan-Latn.txt from:

::vowels:: = a|e|i|o|u|y|æ|ø|å::
consonants:: = b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|z

to:

::vowel:: = a|e|i|o|u|y|æ|ø|å
::consonant:: = b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|z

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions