Skip to content

Conversation

@juliaseungjoobaek
Copy link
Contributor

@juliaseungjoobaek juliaseungjoobaek commented Mar 26, 2025

No description provided.

@juliaseungjoobaek
Copy link
Contributor Author

What kind of change does this PR introduce? (Language addition, bug fix, feature, docs update, ...)
-This PR introduces a language addition and test suite enhancement for Danish within the Epitran module. Specifically, it focuses on:
-Language Addition:
Adds the necessary mapping, preprocessor, and postprocessor rules to enable Epitran to handle Danish orthography to phonetic transcription.
-Test Suite Enhancement:
-Introduces a comprehensive suite of tests that cover a wide range of Danish phonetic phenomena. This includes:
-Contextual vowel length variations.
-Consonant contextual alternations.
-Diphthong formation.
-Loan word pronunciation.
-"R" after vowel behavior.
-Key Changes and Additions:
-Implementation of rules to manage vowel length based on surrounding consonants.
-Rules for the various pronunciations of consonants like "d," "g," "j," and "k" in different contexts.
-Handling of Danish diphthongs.
-Handling of the R after vowel sound change.
-Handling of loan word pronounciation.

Checklist
[Checked] Have you added adequate tests on epitran/test?
[Checked] Have you updated the language list in README.md?
Sources of information for the test samples (I'm a native speaker, books, online resources, ...)
"Danish Orthography"
https://en.wikipedia.org/wiki/Danish_orthography
"Danish phonology"
https://en.wikipedia.org/wiki/Danish_phonology
"A Pronunciation Guide To The Danish Alphabet"
https://www.babbel.com/en/magazine/danish-alphabet
A phonetically-based phoneme analysis of the Danish consonant system.
(Marie Skłodowska-Curie Action. Project: What makes the Danish sound system so difficult for non-native learners? Acronym: LxDP’)
Søballe Horslund, C.; Puggaard, R.; Jørgensen, H.

Sources of information for the rules (I'm a native speaker, books, online resources, ...)
"Danish Orthography"
https://en.wikipedia.org/wiki/Danish_orthography
"Danish phonology"
https://en.wikipedia.org/wiki/Danish_phonology
"A Pronunciation Guide To The Danish Alphabet"
https://www.babbel.com/en/magazine/danish-alphabet
A phonetically-based phoneme analysis of the Danish consonant system.
(Marie Skłodowska-Curie Action. Project: What makes the Danish sound system so difficult for non-native learners? Acronym: LxDP’)
Søballe Horslund, C.; Puggaard, R.; Jørgensen, H.

What is the current behavior? (You can also link to an open issue here)
Absence of Danish Support:
Currently, Epitran does not possess the necessary data files (mapping, preprocessor, and postprocessor rules) to effectively handle Danish orthography.
Attempting to use Epitran to transliterate Danish text would result in either errors or inaccurate phonetic transcriptions, as the system would not recognize or correctly process Danish-specific phonological phenomena.
The dan-Latn language code is not recognized by the Epitran library.
Lack of Danish Tests:
The test suite within Epitran does not include any tests designed to validate the accuracy of Danish transliteration.

What is the new behavior (if this is a feature change)?
Danish Language Support:
This PR introduces the dan-Latn language module, which includes:
dan-Latn.csv: A mapping file that associates Danish orthographic characters with their basic phonetic representations.
dan-Latn-pre.txt: A preprocessor file containing rules that apply contextual transformations to Danish orthography before mapping. These rules account for:
Vowel length variations based on surrounding consonants.
Contextual consonant pronunciations (e.g., "d" as /ð/, "g" as /j/).
Diphthong formation.
Loan word pronounciations.
r after vowel sound change.
dan-Latn-post.txt: A postprocessor file containing rules that refine the phonetic output, addressing:
Diphthong adjustments.
Glottal stop approximations.
Symbol simplification.
Epitran can now accurately transliterate Danish text, producing phonetic transcriptions that reflect the complexities of Danish pronunciation.
Enhanced Test Suite:
The PR adds a comprehensive test suite that validates the accuracy of the Danish transliteration rules. These tests cover:
Contextual vowel length.
Consonant alternations.
Diphthongs.
Loan word pronounciations.
r after vowel sound changes.

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
This PR introduces a new language module (dan-Latn) without modifying or removing any existing functionality within Epitran.
Users who were previously using Epitran for other languages will not experience any changes in behavior.
The addition of Danish support is entirely additive, meaning it expands Epitran's capabilities without affecting its existing functionality.

@juliaseungjoobaek juliaseungjoobaek changed the title Added Danish mapping Added Danish Epitran module Mar 26, 2025
Copy link
Owner

@dmort27 dmort27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README needs to be updated.

@dmort27 dmort27 merged commit 9544a54 into dmort27:master Oct 16, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants