Skip to content

fix: ensure even size by padding with extra character#20

Open
Tatwansh wants to merge 2 commits intosusiai:masterfrom
Tatwansh:master
Open

fix: ensure even size by padding with extra character#20
Tatwansh wants to merge 2 commits intosusiai:masterfrom
Tatwansh:master

Conversation

@Tatwansh
Copy link
Copy Markdown

@Tatwansh Tatwansh commented Mar 7, 2026

This change pads the buffer with an additional character to maintain an even size, addressing alignment or size requirements in the processing logic.

Resolves #18

@Tatwansh
Copy link
Copy Markdown
Author

Tatwansh commented Mar 7, 2026

Hi @Orbiter,

Could you please review this change? I've added padding to ensure the buffer size is even, which should help with alignment. Let me know if you'd like any adjustments or have questions.

@mariobehling mariobehling requested a review from Copilot April 21, 2026 22:02
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses failures when decoding odd-length audio buffers by padding the decoded bytes to an even length before converting to int16, preventing ValueError during processing.

Changes:

  • Pad odd-length decoded audio_data with a null byte prior to np.frombuffer(..., dtype=np.int16).
  • Add automatic .env loading (via python-dotenv) and set Whisper model device based on CUDA availability.
  • Update Whisper model loading/transcription calls to pass/use the selected device.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 3 comments.

File Description
requirements.txt Adds python-dotenv dependency to support .env loading.
django/transcribe_app/transcribe_utils.py Pads odd-length audio buffers and adjusts environment loading + Whisper device/model usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

audio_tensor = torch.from_numpy(audio_array)
result = model_fast.transcribe(audio_tensor, temperature=0)
audio_tensor = torch.from_numpy(audio_array).to(device=device)
result = model_smart.transcribe(audio_tensor, temperature=0, fp16=torch.cuda.is_available(), condition_on_previous_text=False)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch used to transcribe with the “fast” model but now always uses model_smart. Unless this is intentional, it’s likely a performance regression and makes model_fast unused. Consider switching back to model_fast here (or selecting the model based on a setting / chunk duration) and keep model_smart for higher-quality or retry paths.

Suggested change
result = model_smart.transcribe(audio_tensor, temperature=0, fp16=torch.cuda.is_available(), condition_on_previous_text=False)
result = model_fast.transcribe(audio_tensor, temperature=0, fp16=torch.cuda.is_available(), condition_on_previous_text=False)

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +23
from dotenv import load_dotenv

# looks for a .env file in the project (or pass an explicit path)
BASE_DIR = Path(__file__).resolve().parent.parent
load_dotenv(BASE_DIR / '.env')
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BASE_DIR is set to Path(__file__).resolve().parent.parent (…/django), but django/HACKING.md instructs creating .env in the repo root. As a result load_dotenv(BASE_DIR / '.env') will not load the intended file, and defaults like WHISPER_SERVER_USE='true' may be used unexpectedly. Point dotenv at the repo root (e.g., Path(__file__).resolve().parents[2]) or use dotenv.find_dotenv() to search upward from this module (or load the env file from Django settings/manage.py instead of at import time).

Suggested change
from dotenv import load_dotenv
# looks for a .env file in the project (or pass an explicit path)
BASE_DIR = Path(__file__).resolve().parent.parent
load_dotenv(BASE_DIR / '.env')
from dotenv import load_dotenv, find_dotenv
# Search upward from this module for the project's .env file.
ENV_FILE = find_dotenv(filename='.env')
if ENV_FILE:
load_dotenv(ENV_FILE)

Copilot uses AI. Check for mistakes.
Comment on lines 65 to +72
if os.path.exists(os.path.join(models_path, model_fast_name + ".pt")):
model_fast = whisper.load_model(model_fast_name, in_memory=True, download_root=models_path)
model_fast = whisper.load_model(model_fast_name, in_memory=True, download_root=models_path, device=device)
else:
model_fast = whisper.load_model(model_fast_name, in_memory=True)
model_fast = whisper.load_model(model_fast_name, in_memory=True, device=device)
if os.path.exists(os.path.join(models_path, model_smart_name + ".pt")):
model_smart = whisper.load_model(model_smart_name, in_memory=True, download_root=models_path)
model_smart = whisper.load_model(model_smart_name, in_memory=True, download_root=models_path, device=device)
else:
model_smart = whisper.load_model(model_smart_name, in_memory=True)
model_smart = whisper.load_model(model_smart_name, in_memory=True, device=device)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model_fast is loaded here but never used anywhere in this module (only model_smart.transcribe(...) is called). This adds startup time and memory use unnecessarily; either use model_fast for transcription (and keep model_smart for fallback/quality) or stop loading the unused model.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

received Value Error for odd length audio_data from buffer

2 participants