transcribe_app by CHINMAYK121 · Pull Request #25 · susiai/susi_translator

CHINMAYK121 · 2026-03-20T12:47:10Z

What does this PR do?

This PR addresses the zero test coverage in transcribe_app/tests.py by adding 57 comprehensive test cases across all core modules. It also makes whisper/torch imports conditional so the app can load in server-mode and tests can run without the heavy ML dependencies (~2GB).

Changes

1. django/transcribe_app/tests.py — Comprehensive Test Suite

Category	# Tests	What's Covered
is_valid() validation	16	Valid sentences, empty/None input, forbidden phrases (`thank you`, `bye!`, `click click`, etc.), forbidden exact strings (`eh.`, `bye.`), Korean hallucination chars, long words (>40 chars), case-insensitivity
clean_old_transcripts()	4	Removal of transcripts >2h old, recent transcripts preserved, mixed old/new, empty tenant cleanup
add_to_audio_stack() / get_transcripts()	5	Queue creation per tenant, multi-chunk same tenant, multi-tenant isolation, unknown/known tenant retrieval
Serializers	7	TranscribeInputSerializer (valid all fields, required only, missing audio, missing chunk_id, empty), TranscribeResponseSerializer, SizeResponseSerializer
REST API Integration	25	All 9 endpoints: `POST /api/transcribe`, `GET /api/get_transcript`, `GET /api/get_first_transcript`, `GET /api/pop_first_transcript`, `GET /api/get_latest_transcript`, `GET /api/pop_latest_transcript`, `GET /api/delete_transcript`, `GET /api/list_transcripts`, `GET /api/transcripts_size`

2. django/transcribe_app/transcribe_utils.py — Conditional Imports

# Before
import whisper
import torch

# After
try:
    import whisper
    import torch
    HAS_WHISPER = True
except ImportError:
    HAS_WHISPER = False

- Add 57 test cases covering all core functionality: - is_valid() transcript validation (16 tests) - clean_old_transcripts() cleanup logic (4 tests) - add_to_audio_stack()/get_transcripts() storage (5 tests) - Serializer validation (7 tests) - REST API integration for all 9 endpoints (25 tests) - Make whisper/torch imports conditional via HAS_WHISPER guard so the app can load and tests can run without heavy ML packages - All tests pass in ~1.4 seconds

Add comprehensive test suite for transcribe_app (57 tests)

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds a broad Django test suite for transcribe_app and updates transcribe_utils to conditionally import heavy ML dependencies so the app/tests can run without whisper/torch.

Changes:

Added extensive unit + integration tests covering validators, transcript storage utilities, serializers, and REST endpoints.
Made whisper/torch optional via try/except and gated local-model initialization behind HAS_WHISPER.
Added a warning path when ML deps are absent and no local model can be loaded.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 16 comments.

File	Description
django/transcribe_app/transcribe_utils.py	Adds conditional imports and guards local Whisper model loading when ML deps are not installed.
django/transcribe_app/tests.py	Introduces a large unit/integration test suite for core utilities, serializers, and API endpoints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-21T22:10:10Z

+import numpy as np
+from unittest.mock import patch, MagicMock
+
+from django.test import TestCase, override_settings


Several imports added here appear unused in the current file content (e.g., patch/MagicMock/override_settings and TranscriptResponseSerializer/ListTranscriptsResponseSerializer). If they aren’t needed, remove them to avoid linter failures and keep the test module focused; otherwise, add the missing tests/usages that require them.

Suggested change

from django.test import TestCase, override_settings

from django.test import TestCase

Copilot · 2026-04-21T22:10:10Z

+from unittest.mock import patch, MagicMock
+
+from django.test import TestCase, override_settings
+from rest_framework.test import APIClient
+from rest_framework import status
+
+from transcribe_app.serializers import (
+    TranscribeInputSerializer,
+    TranscribeResponseSerializer,
+    TranscriptResponseSerializer,
+    ListTranscriptsResponseSerializer,


Several imports added here appear unused in the current file content (e.g., patch/MagicMock/override_settings and TranscriptResponseSerializer/ListTranscriptsResponseSerializer). If they aren’t needed, remove them to avoid linter failures and keep the test module focused; otherwise, add the missing tests/usages that require them.

Suggested change

from unittest.mock import patch, MagicMock

from django.test import TestCase, override_settings

from rest_framework.test import APIClient

from rest_framework import status

from transcribe_app.serializers import (

TranscribeInputSerializer,

TranscribeResponseSerializer,

TranscriptResponseSerializer,

ListTranscriptsResponseSerializer,

from django.test import TestCase

from rest_framework.test import APIClient

from rest_framework import status

from transcribe_app.serializers import (

TranscribeInputSerializer,

TranscribeResponseSerializer,

Copilot · 2026-04-21T22:10:10Z

+import time
+import json
+import base64
+import numpy as np


These tests introduce a runtime dependency on NumPy just to generate silent audio bytes. If NumPy isn’t already a required dependency for the Django test environment, this can cause CI failures or unnecessarily bloat the test stack. Consider generating the same silent PCM bytes using the standard library (e.g., b'\\x00\\x00' * 16000 for int16 silence) to avoid requiring NumPy.

Suggested change

import numpy as np

Copilot · 2026-04-21T22:10:11Z

+        # Generate minimal valid audio (1 second of silence as int16)
+        audio_data = np.zeros(16000, dtype=np.int16).tobytes()


These tests introduce a runtime dependency on NumPy just to generate silent audio bytes. If NumPy isn’t already a required dependency for the Django test environment, this can cause CI failures or unnecessarily bloat the test stack. Consider generating the same silent PCM bytes using the standard library (e.g., b'\\x00\\x00' * 16000 for int16 silence) to avoid requiring NumPy.

Copilot · 2026-04-21T22:10:11Z

+        response = self.client.post(
+            '/api/transcribe',
+            data=json.dumps(payload),
+            content_type='application/json',
+        )


The integration tests hard-code URL paths as strings. This makes the suite brittle if routing changes (e.g., trailing slash settings, URL namespace changes, or DRF router adjustments). Prefer resolving URLs via Django’s reverse() (or DRF’s route names) so the tests track the URLconf instead of duplicating it.

Copilot · 2026-04-21T22:10:12Z

+            chunk_late: {'transcript': 'Keep me.', 'translated': False},
+        }
+
+        response = self.client.get('/api/pop_first_transcript', {'tenant_id': tenant})


The integration tests hard-code URL paths as strings. This makes the suite brittle if routing changes (e.g., trailing slash settings, URL namespace changes, or DRF router adjustments). Prefer resolving URLs via Django’s reverse() (or DRF’s route names) so the tests track the URLconf instead of duplicating it.

Copilot · 2026-04-21T22:10:13Z

+            chunk_late: {'transcript': 'Pop me.', 'translated': False},
+        }
+
+        response = self.client.get('/api/pop_latest_transcript', {'tenant_id': tenant})


The integration tests hard-code URL paths as strings. This makes the suite brittle if routing changes (e.g., trailing slash settings, URL namespace changes, or DRF router adjustments). Prefer resolving URLs via Django’s reverse() (or DRF’s route names) so the tests track the URLconf instead of duplicating it.

Copilot · 2026-04-21T22:10:13Z

+
+    def test_non_ascii_only_is_invalid(self):
+        """A transcript with only non-ASCII or space characters should be invalid."""
+        self.assertFalse(is_valid("   "))  # only spaces (code > 32 check fails)


The docstring says 'only non-ASCII or space characters', but the test case only checks spaces (which are ASCII). Update the docstring to match what’s being tested, or add an additional assertion covering the non-ASCII-only scenario described.

Suggested change

self.assertFalse(is_valid(" ")) # only spaces (code > 32 check fails)

self.assertFalse(is_valid(" ")) # only spaces (code > 32 check fails)

self.assertFalse(is_valid("你好")) # only non-ASCII characters

Copilot · 2026-04-21T22:10:13Z

+else:
+    logger.warning("Whisper/Torch not installed. Transcription will not work without a whisper server.")


This warning executes at module import time, which can add noise to logs during test runs or in deployments where local transcription is intentionally disabled. Consider deferring the warning until the code path that actually requires local Whisper is exercised (or gating it behind an explicit configuration flag), so importing the module remains quiet unless transcription is requested.

Copilot · 2026-04-21T22:10:14Z

+try:
+    import whisper
+    import torch
+    HAS_WHISPER = True
+except ImportError:
+    HAS_WHISPER = False


The new conditional-import behavior (specifically the ImportError path where HAS_WHISPER becomes False) isn’t covered by tests in this PR. Add a test that verifies the module can be imported/reloaded when whisper/torch are unavailable and that the code selects the non-local-transcription path (e.g., via importlib.reload with patched imports).

CHINMAYK121 added 2 commits March 20, 2026 18:10

Merge pull request #1 from CHINMAYK121/add-comprehensive-tests

a96b2eb

Add comprehensive test suite for transcribe_app (57 tests)

mariobehling requested a review from Copilot April 21, 2026 22:02

Copilot started reviewing on behalf of mariobehling April 21, 2026 22:07 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transcribe_app#25

transcribe_app#25
CHINMAYK121 wants to merge 2 commits intosusiai:masterfrom
CHINMAYK121:master

CHINMAYK121 commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

	from django.test import TestCase, override_settings
	from django.test import TestCase

		# Generate minimal valid audio (1 second of silence as int16)
		audio_data = np.zeros(16000, dtype=np.int16).tobytes()

	self.assertFalse(is_valid(" ")) # only spaces (code > 32 check fails)
	self.assertFalse(is_valid(" ")) # only spaces (code > 32 check fails)
	self.assertFalse(is_valid("你好")) # only non-ASCII characters

		else:
		logger.warning("Whisper/Torch not installed. Transcription will not work without a whisper server.")

Conversation

CHINMAYK121 commented Mar 20, 2026

What does this PR do?

Changes

1. django/transcribe_app/tests.py — Comprehensive Test Suite

2. django/transcribe_app/transcribe_utils.py — Conditional Imports

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants