An open-source alternative to Miraa, a Japanese transcription and translation app. Multilingual audio analysis and alignment system for transcription, translation, and visualization.
- Processes music into aligned, translated text and explained using a LLM
- Supports en-jp transcription and translation
- Visualizes timing, confidence, and alignment results
- Searches through Japanese dictionaries to find definitons of words
- Separates audio using demucs and masks for better separation
- Auto downloading from YouTube for better input
- Splits audio into stems (vocal & instrumental)
- Audio preprocessing & segmentation (VAD-based)
- Speech recognition & translation using pretrained models
- Using Genius API to get lyrics
- Saves data to a .json file for easy viewing (debugging)
- HTML-based interactive visualization (soon)
- Python
- PyTorch / torchaudio
- Speech & translation models
- Transformers
- HTML / JS for dashboards
- External APIs & web scraping
- Improved alignment accuracy through iterative refinement (fine-tuning)
- Explanation of Japanese songs and their meaning (for studying the language)
- Robust handling of noisy real-world audio (actual songs)
- Scalable pipeline design (soon)
Actively iterating and experimenting
This diagram shows how the program functions:
