This is a local Python application to evaluate spoken German sentences using OpenAI's Whisper model. Users upload audio (.webm
, usually connected with a frontend sending request to this), and the app provides a pronunciation score and feedback compared to a set of expected sentences.
├── main.py # Backend logic (or whisper_utils.py for modular usage)
├── whisper_utils.py # (Optional) Modular Whisper processing
├── requirements.txt # Python dependencies
├── render.yaml # (Ignore if using locally)
├── temp_audio.webm # Temp uploaded audio file (auto-created/deleted)
└── __pycache__/ # Python cache
- Accepts an audio recording (WebM format).
- Transcribes the audio using OpenAI Whisper.
- Compares it against a predefined sentence (based on
phrase_id
). - Scores the pronunciation using character-level similarity.
- Outputs:
- Expected vs Spoken sentence
- Accuracy Score
- Mispronounced letters
git clone https://github.com/krish-1010/whisper-backend
cd whisper-backend
python -m venv venv
source venv/bin/activate # on Windows: venv\Scripts\activate
pip install -r requirements.txt
Dependencies include:
openai-whisper
ffmpeg-python
(ensure system ffmpeg is installed)uvicorn
,fastapi
(for API usage)
Download ffmpeg:
- Windows: https://www.gyan.dev/ffmpeg/builds/
- Linux/macOS: via
brew
orapt
Ensure ffmpeg
is in your system PATH.
uvicorn main:app --reload
curl -X POST "http://localhost:8000/evaluate/1-1" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@your_audio_file.webm"
You can find all supported sentence IDs and phrases inside main.py
or whisper_utils.py
under EXPECTED_PHRASES
.
Example:
"1-1": "Ich bin müde"
"2-3": "Kannst du helfen?"
{
"expected": "Ich bin müde",
"spoken": "und bin mut",
"score": 50.0,
"feedback": "🗣️ Mispronounced letters: i, c, h, ü, d, e"
}
- WebM audio is expected (convert if needed).
- You may extend this with a frontend or voice recording interface.
- For offline usage, small or tiny Whisper models are ideal.
MIT