Docs/fix dotenv and hardware detection by Tanya0605 · Pull Request #36 · susiai/susi_translator

Tanya0605 · 2026-03-30T18:55:37Z

Fixes #19

Changes made:

transcribe_server.py: added python-dotenv loading (load_dotenv() called at startup, falls back to system env vars if no .env present)
transcribe_server.py: added GPU auto-detection — uses CUDA if available, falls back to CPU automatically, logs which device is being used on startup
transcribe_server.py: FLASK_PORT now configurable via .env (defaults to 5000)
transcribe_server.py: fixed hardcoded port 5040 → 5000 in audio_grabber.py (was causing Connection aborted errors, chunks never reaching Flask)
Added .env.example with all supported config variables and comments
.env added to .gitignore to prevent accidental secret commits

Tested locally: chunks received, FP32 fallback working on CPU, POST /transcribe returning 200.

Copilot

Pull request overview

This PR aims to improve local configurability/portability for the transcription server by formalizing dependencies (requirements + pyproject.toml) and updating the Flask transcription pipeline and clients.

Changes:

Replaced the minimal requirements.txt with a fully specified dependency list and added a pyproject.toml for uv/packaging workflows.
Updated flask/transcribe_server.py to handle JSON POSTs (and a streaming fallback), adjusted whisper-server request logic, and cleaned up transcript endpoints.
Updated audio grabber client(s) UI and Python sender behavior and refreshed README setup instructions.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
requirements.txt	Replaces legacy dependency list with pinned/minimum versions and adds python-dotenv.
pyproject.toml	Introduces PEP 621 project metadata + dependencies for `uv sync`/packaging.
flask/transcribe_server.py	Refactors transcription request handling and transcript retrieval logic; adjusts server startup.
flask/transcribe_listener.html	Minor HTML formatting change.
flask/audio_grabber.py	Changes sending behavior and updates the transcribe endpoint port.
flask/audio_grabber.html	Adds backend selector + CSRF handling; UI restructure.
README.md	Updates setup instructions and documents `pip` vs `uv` workflows.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-21T22:06:48Z

-    app.run(host='0.0.0.0', port=5040, debug=True)
+if __name__ == '__main__':
+    threading.Thread(target=process_audio, daemon=True).start()
+    app.run(host='0.0.0.0', port=5040, debug=False)


The Flask server is still hardcoded to run on port 5040, but the PR description says the default should be 5000 / configurable via FLASK_PORT, and audio_grabber.py now posts to 5000. This will prevent the client from reaching the server unless the ports are aligned. Make the port configurable (e.g., read FLASK_PORT) and ensure all clients/default UI values match the server default.

Suggested change

app.run(host='0.0.0.0', port=5040, debug=False)

port_value = os.getenv('FLASK_PORT', '5000')

try:

flask_port = int(port_value)

except ValueError:

logger.warning("Invalid FLASK_PORT '%s'; falling back to 5000", port_value)

flask_port = 5000

app.run(host='0.0.0.0', port=flask_port, debug=False)

Copilot · 2026-04-21T22:06:49Z

 # we either use a local in-code model or access a whisper.cpp server
 use_whisper_server = os.getenv('WHISPER_SERVER_USE', 'false') == 'true'
-#model_name = os.getenv('WHISPER_MODEL', 'tiny')     # 39M
-#model_name = os.getenv('WHISPER_MODEL', 'base')     # 74M
-model_fast_name = os.getenv('WHISPER_MODEL', 'small')    # 244M
-model_smart_name = os.getenv('WHISPER_MODEL', 'medium')   # 769M
-#model_name = os.getenv('WHISPER_MODEL', 'large-v3') # 1550M
+model_fast_name = os.getenv('WHISPER_MODEL_FAST', 'small')    # 244M
+model_smart_name = os.getenv('WHISPER_MODEL_SMART', 'medium') # 769M



The PR description mentions loading a .env file (python-dotenv) and GPU auto-detection/logging, but transcribe_server.py does not call load_dotenv() and there is no device selection/logging (no torch.cuda checks). Either implement these startup behaviors or adjust the PR description/docs to match what the code actually does.

Copilot · 2026-04-21T22:06:49Z

 def clean_old_transcripts():
-    current_time = int(time.time() * 1000)  # Current time in milliseconds
-    two_hours_ago = current_time - (2 * 60 * 60 * 1000)  # Two hours ago in milliseconds
+    current_time = int(time.time() * 1000)
+    two_hours_ago = current_time - (2 * 60 * 60 * 1000)
    with threading.Lock():
-        # make a list of tenant_ids to delete
        to_delete = []
-        # iterate over all dictionaries in transcriptd
        for tenant_id in transcriptd.keys():
            transcripts = transcriptd[tenant_id]
            to_delete = [chunk_id for chunk_id in transcripts if int(chunk_id) < two_hours_ago]
            for chunk_id in to_delete:
                del transcripts[chunk_id]
-            # its possible that the tenant_id has no more transcripts
            if len(transcripts) == 0:
                to_delete.append(tenant_id)
-
-        # delete the tenant_ids
        for tenant_id in to_delete:
            del transcriptd[tenant_id]


clean_old_transcripts() has two issues: (1) it also uses with threading.Lock(): which is ineffective because it creates a new lock instance; and (2) to_delete is reused for chunk_ids and then later treated as tenant_ids, so del transcriptd[tenant_id] can attempt to delete non-existent keys (chunk IDs) and raise KeyError. Use separate collections (e.g., chunks_to_delete per tenant and tenants_to_delete overall) and delete tenants based on that second list while holding a shared lock.

Copilot · 2026-04-21T22:06:49Z

 def merge_and_split_transcripts(transcripts):
-    # Iterate through the sorted transcript keys.
    sec = ".!?"
    merged_transcripts = ""
    result = {}
    for key in transcripts.keys():
        if not merged_transcripts:
-            # If merged_transcripts is empty, start with the first transcript.
            merged_transcripts += transcripts[key].strip()
        else:
-            # Append the transcript to the merged string with a space and lowercase the following first character.
            t = transcripts[key].strip()
            if len(t) > 1:
-                merged_transcripts += " " +  t[0].lower() + t[1:]
+                merged_transcripts += " " + t[0].lower() + t[1:]
            else:
                merged_transcripts += " " + t


merge_and_split_transcripts() calls transcripts[key].strip(), but transcriptd stores entries as dicts like {'transcript': ...}. With the current storage format, this function will raise an AttributeError when sentences=true. Update it to read/write the nested transcript field (or change transcriptd to store plain strings consistently).

Copilot · 2026-04-21T22:06:49Z

+        # Fix: Try regular JSON first (sent by audio_grabber.py)
+        data = request.get_json(silent=True)
+        if data:
+            audio_b64 = data.get('audio_b64')
+            chunk_id = data.get('chunk_id')
+            tenant_id = data.get('tenant_id', '0000')
+
+            # Fix: validate required fields and return 400 if missing
+            if not audio_b64 or not chunk_id:
+                return jsonify({'error': 'Missing required fields: audio_b64 and chunk_id'}), 400


data = request.get_json(silent=True) followed by if data: treats an empty JSON object ({}) as "no JSON" and falls through to the streaming branch. Use an explicit is not None check (and then validate required fields) so malformed/empty JSON requests reliably return 400 instead of switching behavior based on truthiness.

Copilot · 2026-04-21T22:06:50Z

-    <button id="stopBtn" disabled>Stop</button>
+    <div id="controls">
+        <input type="text" id="serverhost" placeholder="Transcribe Host" value="localhost">
+        <input type="text" id="serverport" placeholder="Transcribe Port" value="5040">


The HTML client still defaults to port 5040, while audio_grabber.py now posts to 5000 and the PR description says the Flask default should be 5000. Align this default with the actual server port (or surface the configured FLASK_PORT) to avoid confusing connection failures for users.

Suggested change

<input type="text" id="serverport" placeholder="Transcribe Port" value="5040">

<input type="text" id="serverport" placeholder="Transcribe Port" value="5000">

Copilot · 2026-04-21T22:06:50Z

                logger.info(f"VALID transcript for chunk_id {chunk_id}: {transcript}")
-                with threading.Lock():  # Ensure thread-safe access to shared resources
-                    # we must distinguish between the case where the chunk_id is already in the transcripts
-                    # this can happen quite often because the client will generate a new chunk_id only when
-                    # the recorded audio has silence. So all chunks are those pieces with speech without a pause.
-
-                    # get the current transcripts for the tenant_id
+                with threading.Lock():
                    transcripts = transcriptd.get(tenant_id, None)


with threading.Lock(): creates a brand-new lock each time, so it does not actually synchronize access to transcriptd across threads. Use a single module-level lock (e.g., transcript_lock = threading.Lock()) and acquire that same lock for all reads/writes that need to be thread-safe (including in request handlers).

Copilot · 2026-04-21T22:06:50Z

+        sentences = request.args.get('sentences', default='false') == 'true'
+        if sentences: t = merge_and_split_transcripts(t)
+        chunk_id = request.args.get('chunk_id')
+        if chunk_id in t:
+            return jsonify({'chunk_id': chunk_id, 'transcript': t[chunk_id]['transcript']})
        else:


When sentences=true, t is replaced with the return value of merge_and_split_transcripts(t), but the handler still assumes t[chunk_id] is a dict containing ['transcript']. This will break once sentence merging is enabled unless the merge function preserves the same value shape. Ensure the "sentences" mode returns the same structure expected by these handlers (this affects several endpoints, not just /get_transcript).

Copilot · 2026-04-21T22:06:51Z

+            if use_whisper_server:
+                # Fix: properly call whisper.cpp /inference endpoint and parse response
+                try:
+                    files = {'file': ('audio.wav', audio_array.tobytes(), 'audio/wav')}
                    data = {'response_format': 'json'}
-                    response = requests.post(whisper_server, files=files, data=data)
-
-                else:
-                    result = model_fast.transcribe(audio_tensor, temperature=0)
+                    response = requests.post(f"{whisper_server}/inference", files=files, data=data)
+                    response.raise_for_status()
+                    result_json = response.json()
+                    transcript = result_json.get('text', '').strip()


In whisper-server mode, this uploads audio_array.tobytes() after audio_array has been converted to normalized float32 samples, but the multipart part is labeled audio/wav and named audio.wav. This is not a valid WAV file (no header, wrong sample format), so whisper.cpp is unlikely to decode it correctly. Send a real WAV payload (e.g., wrap the original int16 PCM at 16kHz into a WAV container) or use the server's expected raw-audio format if supported.

Copilot · 2026-04-21T22:06:51Z

    def start(self):
        self.stream.start_stream()
+        self.send_thread = threading.Thread(target=self.send_chunk)
+        self.send_thread.start()


start() spawns a thread that calls send_chunk() exactly once, while send_chunk() is also called synchronously from audio_callback(). This extra thread is unnecessary and increases the risk of concurrent access to self.buffer/self.chunk_id without any locking. Either remove the thread and keep sending in the callback, or move all network sending into a dedicated loop/thread and protect shared state appropriately.

Tanya0605 added 11 commits March 29, 2026 22:05

fix: merge duplicate start() methods in AudioGrabber

7fca121

fix: merge duplicate start() methods in AudioGrabber

1af2a8c

fix: add getCookie function for Django/Flask CSRF compatibility

c61d06e

fix: add getCookie function with Flask/Django backend selector

7f2b2ff

fix: improve /transcribe endpoint compatibility and robustness

af0e5c1

fix: improve /transcribe endpoint compatibility and robustness

5e9244d

docs: fix transcribe_listener typo and add Python version requirements

48c5450

docs: fix requirements and uv support

38b17cb

fix: add missing dependencies, update README, and add uv support

5e1bec5

fix: add missing dependencies, update README, and add uv support

8bf013f

fix: update transcribe server port from 5040 to 5000 in audio_grabber.py

1b2b429

mariobehling requested a review from Copilot April 21, 2026 22:01

Copilot started reviewing on behalf of mariobehling April 21, 2026 22:02 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs/fix dotenv and hardware detection#36

Docs/fix dotenv and hardware detection#36
Tanya0605 wants to merge 11 commits intosusiai:masterfrom
Tanya0605:docs/fix-dotenv-and-hardware-detection

Tanya0605 commented Mar 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

-    app.run(host='0.0.0.0', port=5040, debug=False)
+    port_value = os.getenv('FLASK_PORT', '5000')
+    try:
+        flask_port = int(port_value)
+    except ValueError:
+        logger.warning("Invalid FLASK_PORT '%s'; falling back to 5000", port_value)
+        flask_port = 5000
+    app.run(host='0.0.0.0', port=flask_port, debug=False)

	<input type="text" id="serverport" placeholder="Transcribe Port" value="5040">
	<input type="text" id="serverport" placeholder="Transcribe Port" value="5000">

Conversation

Tanya0605 commented Mar 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants