fix: ensure even size by padding with extra character#20
fix: ensure even size by padding with extra character#20Tatwansh wants to merge 2 commits intosusiai:masterfrom
Conversation
|
Hi @Orbiter, Could you please review this change? I've added padding to ensure the buffer size is even, which should help with alignment. Let me know if you'd like any adjustments or have questions. |
There was a problem hiding this comment.
Pull request overview
This PR addresses failures when decoding odd-length audio buffers by padding the decoded bytes to an even length before converting to int16, preventing ValueError during processing.
Changes:
- Pad odd-length decoded
audio_datawith a null byte prior tonp.frombuffer(..., dtype=np.int16). - Add automatic
.envloading (viapython-dotenv) and set Whisper modeldevicebased on CUDA availability. - Update Whisper model loading/transcription calls to pass/use the selected device.
Reviewed changes
Copilot reviewed 1 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| requirements.txt | Adds python-dotenv dependency to support .env loading. |
| django/transcribe_app/transcribe_utils.py | Pads odd-length audio buffers and adjusts environment loading + Whisper device/model usage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| audio_tensor = torch.from_numpy(audio_array) | ||
| result = model_fast.transcribe(audio_tensor, temperature=0) | ||
| audio_tensor = torch.from_numpy(audio_array).to(device=device) | ||
| result = model_smart.transcribe(audio_tensor, temperature=0, fp16=torch.cuda.is_available(), condition_on_previous_text=False) |
There was a problem hiding this comment.
This branch used to transcribe with the “fast” model but now always uses model_smart. Unless this is intentional, it’s likely a performance regression and makes model_fast unused. Consider switching back to model_fast here (or selecting the model based on a setting / chunk duration) and keep model_smart for higher-quality or retry paths.
| result = model_smart.transcribe(audio_tensor, temperature=0, fp16=torch.cuda.is_available(), condition_on_previous_text=False) | |
| result = model_fast.transcribe(audio_tensor, temperature=0, fp16=torch.cuda.is_available(), condition_on_previous_text=False) |
| from dotenv import load_dotenv | ||
|
|
||
| # looks for a .env file in the project (or pass an explicit path) | ||
| BASE_DIR = Path(__file__).resolve().parent.parent | ||
| load_dotenv(BASE_DIR / '.env') |
There was a problem hiding this comment.
BASE_DIR is set to Path(__file__).resolve().parent.parent (…/django), but django/HACKING.md instructs creating .env in the repo root. As a result load_dotenv(BASE_DIR / '.env') will not load the intended file, and defaults like WHISPER_SERVER_USE='true' may be used unexpectedly. Point dotenv at the repo root (e.g., Path(__file__).resolve().parents[2]) or use dotenv.find_dotenv() to search upward from this module (or load the env file from Django settings/manage.py instead of at import time).
| from dotenv import load_dotenv | |
| # looks for a .env file in the project (or pass an explicit path) | |
| BASE_DIR = Path(__file__).resolve().parent.parent | |
| load_dotenv(BASE_DIR / '.env') | |
| from dotenv import load_dotenv, find_dotenv | |
| # Search upward from this module for the project's .env file. | |
| ENV_FILE = find_dotenv(filename='.env') | |
| if ENV_FILE: | |
| load_dotenv(ENV_FILE) |
| if os.path.exists(os.path.join(models_path, model_fast_name + ".pt")): | ||
| model_fast = whisper.load_model(model_fast_name, in_memory=True, download_root=models_path) | ||
| model_fast = whisper.load_model(model_fast_name, in_memory=True, download_root=models_path, device=device) | ||
| else: | ||
| model_fast = whisper.load_model(model_fast_name, in_memory=True) | ||
| model_fast = whisper.load_model(model_fast_name, in_memory=True, device=device) | ||
| if os.path.exists(os.path.join(models_path, model_smart_name + ".pt")): | ||
| model_smart = whisper.load_model(model_smart_name, in_memory=True, download_root=models_path) | ||
| model_smart = whisper.load_model(model_smart_name, in_memory=True, download_root=models_path, device=device) | ||
| else: | ||
| model_smart = whisper.load_model(model_smart_name, in_memory=True) | ||
| model_smart = whisper.load_model(model_smart_name, in_memory=True, device=device) |
There was a problem hiding this comment.
model_fast is loaded here but never used anywhere in this module (only model_smart.transcribe(...) is called). This adds startup time and memory use unnecessarily; either use model_fast for transcription (and keep model_smart for fallback/quality) or stop loading the unused model.
This change pads the buffer with an additional character to maintain an even size, addressing alignment or size requirements in the processing logic.
Resolves #18