Skip to content

transcribe_app#25

Open
CHINMAYK121 wants to merge 2 commits intosusiai:masterfrom
CHINMAYK121:master
Open

transcribe_app#25
CHINMAYK121 wants to merge 2 commits intosusiai:masterfrom
CHINMAYK121:master

Conversation

@CHINMAYK121
Copy link
Copy Markdown

What does this PR do?

This PR addresses the zero test coverage in transcribe_app/tests.py by adding 57 comprehensive test cases across all core modules. It also makes whisper/torch imports conditional so the app can load in server-mode and tests can run without the heavy ML dependencies (~2GB).

Changes

1. django/transcribe_app/tests.py — Comprehensive Test Suite

Category # Tests What's Covered
is_valid() validation 16 Valid sentences, empty/None input, forbidden phrases (thank you, bye!, click click, etc.), forbidden exact strings (eh., bye.), Korean hallucination chars, long words (>40 chars), case-insensitivity
clean_old_transcripts() 4 Removal of transcripts >2h old, recent transcripts preserved, mixed old/new, empty tenant cleanup
add_to_audio_stack() / get_transcripts() 5 Queue creation per tenant, multi-chunk same tenant, multi-tenant isolation, unknown/known tenant retrieval
Serializers 7 TranscribeInputSerializer (valid all fields, required only, missing audio, missing chunk_id, empty), TranscribeResponseSerializer, SizeResponseSerializer
REST API Integration 25 All 9 endpoints: POST /api/transcribe, GET /api/get_transcript, GET /api/get_first_transcript, GET /api/pop_first_transcript, GET /api/get_latest_transcript, GET /api/pop_latest_transcript, GET /api/delete_transcript, GET /api/list_transcripts, GET /api/transcripts_size

2. django/transcribe_app/transcribe_utils.py — Conditional Imports

# Before
import whisper
import torch

# After
try:
    import whisper
    import torch
    HAS_WHISPER = True
except ImportError:
    HAS_WHISPER = False

- Add 57 test cases covering all core functionality:
  - is_valid() transcript validation (16 tests)
  - clean_old_transcripts() cleanup logic (4 tests)
  - add_to_audio_stack()/get_transcripts() storage (5 tests)
  - Serializer validation (7 tests)
  - REST API integration for all 9 endpoints (25 tests)
- Make whisper/torch imports conditional via HAS_WHISPER guard
  so the app can load and tests can run without heavy ML packages
- All tests pass in ~1.4 seconds
Add comprehensive test suite for transcribe_app (57 tests)
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds a broad Django test suite for transcribe_app and updates transcribe_utils to conditionally import heavy ML dependencies so the app/tests can run without whisper/torch.

Changes:

  • Added extensive unit + integration tests covering validators, transcript storage utilities, serializers, and REST endpoints.
  • Made whisper/torch optional via try/except and gated local-model initialization behind HAS_WHISPER.
  • Added a warning path when ML deps are absent and no local model can be loaded.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 16 comments.

File Description
django/transcribe_app/transcribe_utils.py Adds conditional imports and guards local Whisper model loading when ML deps are not installed.
django/transcribe_app/tests.py Introduces a large unit/integration test suite for core utilities, serializers, and API endpoints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

import numpy as np
from unittest.mock import patch, MagicMock

from django.test import TestCase, override_settings
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several imports added here appear unused in the current file content (e.g., patch/MagicMock/override_settings and TranscriptResponseSerializer/ListTranscriptsResponseSerializer). If they aren’t needed, remove them to avoid linter failures and keep the test module focused; otherwise, add the missing tests/usages that require them.

Suggested change
from django.test import TestCase, override_settings
from django.test import TestCase

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +22
from unittest.mock import patch, MagicMock

from django.test import TestCase, override_settings
from rest_framework.test import APIClient
from rest_framework import status

from transcribe_app.serializers import (
TranscribeInputSerializer,
TranscribeResponseSerializer,
TranscriptResponseSerializer,
ListTranscriptsResponseSerializer,
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several imports added here appear unused in the current file content (e.g., patch/MagicMock/override_settings and TranscriptResponseSerializer/ListTranscriptsResponseSerializer). If they aren’t needed, remove them to avoid linter failures and keep the test module focused; otherwise, add the missing tests/usages that require them.

Suggested change
from unittest.mock import patch, MagicMock
from django.test import TestCase, override_settings
from rest_framework.test import APIClient
from rest_framework import status
from transcribe_app.serializers import (
TranscribeInputSerializer,
TranscribeResponseSerializer,
TranscriptResponseSerializer,
ListTranscriptsResponseSerializer,
from django.test import TestCase
from rest_framework.test import APIClient
from rest_framework import status
from transcribe_app.serializers import (
TranscribeInputSerializer,
TranscribeResponseSerializer,

Copilot uses AI. Check for mistakes.
import time
import json
import base64
import numpy as np
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests introduce a runtime dependency on NumPy just to generate silent audio bytes. If NumPy isn’t already a required dependency for the Django test environment, this can cause CI failures or unnecessarily bloat the test stack. Consider generating the same silent PCM bytes using the standard library (e.g., b'\\x00\\x00' * 16000 for int16 silence) to avoid requiring NumPy.

Suggested change
import numpy as np

Copilot uses AI. Check for mistakes.
Comment on lines +338 to +339
# Generate minimal valid audio (1 second of silence as int16)
audio_data = np.zeros(16000, dtype=np.int16).tobytes()
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests introduce a runtime dependency on NumPy just to generate silent audio bytes. If NumPy isn’t already a required dependency for the Django test environment, this can cause CI failures or unnecessarily bloat the test stack. Consider generating the same silent PCM bytes using the standard library (e.g., b'\\x00\\x00' * 16000 for int16 silence) to avoid requiring NumPy.

Copilot uses AI. Check for mistakes.
Comment on lines +348 to +352
response = self.client.post(
'/api/transcribe',
data=json.dumps(payload),
content_type='application/json',
)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration tests hard-code URL paths as strings. This makes the suite brittle if routing changes (e.g., trailing slash settings, URL namespace changes, or DRF router adjustments). Prefer resolving URLs via Django’s reverse() (or DRF’s route names) so the tests track the URLconf instead of duplicating it.

Copilot uses AI. Check for mistakes.
chunk_late: {'transcript': 'Keep me.', 'translated': False},
}

response = self.client.get('/api/pop_first_transcript', {'tenant_id': tenant})
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration tests hard-code URL paths as strings. This makes the suite brittle if routing changes (e.g., trailing slash settings, URL namespace changes, or DRF router adjustments). Prefer resolving URLs via Django’s reverse() (or DRF’s route names) so the tests track the URLconf instead of duplicating it.

Copilot uses AI. Check for mistakes.
chunk_late: {'transcript': 'Pop me.', 'translated': False},
}

response = self.client.get('/api/pop_latest_transcript', {'tenant_id': tenant})
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration tests hard-code URL paths as strings. This makes the suite brittle if routing changes (e.g., trailing slash settings, URL namespace changes, or DRF router adjustments). Prefer resolving URLs via Django’s reverse() (or DRF’s route names) so the tests track the URLconf instead of duplicating it.

Copilot uses AI. Check for mistakes.

def test_non_ascii_only_is_invalid(self):
"""A transcript with only non-ASCII or space characters should be invalid."""
self.assertFalse(is_valid(" ")) # only spaces (code > 32 check fails)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says 'only non-ASCII or space characters', but the test case only checks spaces (which are ASCII). Update the docstring to match what’s being tested, or add an additional assertion covering the non-ASCII-only scenario described.

Suggested change
self.assertFalse(is_valid(" ")) # only spaces (code > 32 check fails)
self.assertFalse(is_valid(" ")) # only spaces (code > 32 check fails)
self.assertFalse(is_valid("你好")) # only non-ASCII characters

Copilot uses AI. Check for mistakes.
Comment on lines +71 to +72
else:
logger.warning("Whisper/Torch not installed. Transcription will not work without a whisper server.")
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning executes at module import time, which can add noise to logs during test runs or in deployments where local transcription is intentionally disabled. Consider deferring the warning until the code path that actually requires local Whisper is exercised (or gating it behind an explicit configuration flag), so importing the module remains quiet unless transcription is requested.

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +16
try:
import whisper
import torch
HAS_WHISPER = True
except ImportError:
HAS_WHISPER = False
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new conditional-import behavior (specifically the ImportError path where HAS_WHISPER becomes False) isn’t covered by tests in this PR. Add a test that verifies the module can be imported/reloaded when whisper/torch are unavailable and that the code selects the non-local-transcription path (e.g., via importlib.reload with patched imports).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants