Skip to content

Docs/fix dotenv and hardware detection#36

Open
Tanya0605 wants to merge 11 commits intosusiai:masterfrom
Tanya0605:docs/fix-dotenv-and-hardware-detection
Open

Docs/fix dotenv and hardware detection#36
Tanya0605 wants to merge 11 commits intosusiai:masterfrom
Tanya0605:docs/fix-dotenv-and-hardware-detection

Conversation

@Tanya0605
Copy link
Copy Markdown
Contributor

Fixes #19

Changes made:

  • transcribe_server.py: added python-dotenv loading (load_dotenv() called at startup, falls back to system env vars if no .env present)
  • transcribe_server.py: added GPU auto-detection — uses CUDA if available, falls back to CPU automatically, logs which device is being used on startup
  • transcribe_server.py: FLASK_PORT now configurable via .env (defaults to 5000)
  • transcribe_server.py: fixed hardcoded port 5040 → 5000 in audio_grabber.py (was causing Connection aborted errors, chunks never reaching Flask)
  • Added .env.example with all supported config variables and comments
  • .env added to .gitignore to prevent accidental secret commits

Tested locally: chunks received, FP32 fallback working on CPU, POST /transcribe returning 200.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to improve local configurability/portability for the transcription server by formalizing dependencies (requirements + pyproject.toml) and updating the Flask transcription pipeline and clients.

Changes:

  • Replaced the minimal requirements.txt with a fully specified dependency list and added a pyproject.toml for uv/packaging workflows.
  • Updated flask/transcribe_server.py to handle JSON POSTs (and a streaming fallback), adjusted whisper-server request logic, and cleaned up transcript endpoints.
  • Updated audio grabber client(s) UI and Python sender behavior and refreshed README setup instructions.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
requirements.txt Replaces legacy dependency list with pinned/minimum versions and adds python-dotenv.
pyproject.toml Introduces PEP 621 project metadata + dependencies for uv sync/packaging.
flask/transcribe_server.py Refactors transcription request handling and transcript retrieval logic; adjusts server startup.
flask/transcribe_listener.html Minor HTML formatting change.
flask/audio_grabber.py Changes sending behavior and updates the transcribe endpoint port.
flask/audio_grabber.html Adds backend selector + CSRF handling; UI restructure.
README.md Updates setup instructions and documents pip vs uv workflows.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

app.run(host='0.0.0.0', port=5040, debug=True)
if __name__ == '__main__':
threading.Thread(target=process_audio, daemon=True).start()
app.run(host='0.0.0.0', port=5040, debug=False) No newline at end of file
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Flask server is still hardcoded to run on port 5040, but the PR description says the default should be 5000 / configurable via FLASK_PORT, and audio_grabber.py now posts to 5000. This will prevent the client from reaching the server unless the ports are aligned. Make the port configurable (e.g., read FLASK_PORT) and ensure all clients/default UI values match the server default.

Suggested change
app.run(host='0.0.0.0', port=5040, debug=False)
port_value = os.getenv('FLASK_PORT', '5000')
try:
flask_port = int(port_value)
except ValueError:
logger.warning("Invalid FLASK_PORT '%s'; falling back to 5000", port_value)
flask_port = 5000
app.run(host='0.0.0.0', port=flask_port, debug=False)

Copilot uses AI. Check for mistakes.
Comment on lines 24 to 28
# we either use a local in-code model or access a whisper.cpp server
use_whisper_server = os.getenv('WHISPER_SERVER_USE', 'false') == 'true'
#model_name = os.getenv('WHISPER_MODEL', 'tiny') # 39M
#model_name = os.getenv('WHISPER_MODEL', 'base') # 74M
model_fast_name = os.getenv('WHISPER_MODEL', 'small') # 244M
model_smart_name = os.getenv('WHISPER_MODEL', 'medium') # 769M
#model_name = os.getenv('WHISPER_MODEL', 'large-v3') # 1550M
model_fast_name = os.getenv('WHISPER_MODEL_FAST', 'small') # 244M
model_smart_name = os.getenv('WHISPER_MODEL_SMART', 'medium') # 769M

Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentions loading a .env file (python-dotenv) and GPU auto-detection/logging, but transcribe_server.py does not call load_dotenv() and there is no device selection/logging (no torch.cuda checks). Either implement these startup behaviors or adjust the PR description/docs to match what the code actually does.

Copilot uses AI. Check for mistakes.
Comment on lines 148 to 161
def clean_old_transcripts():
current_time = int(time.time() * 1000) # Current time in milliseconds
two_hours_ago = current_time - (2 * 60 * 60 * 1000) # Two hours ago in milliseconds
current_time = int(time.time() * 1000)
two_hours_ago = current_time - (2 * 60 * 60 * 1000)
with threading.Lock():
# make a list of tenant_ids to delete
to_delete = []
# iterate over all dictionaries in transcriptd
for tenant_id in transcriptd.keys():
transcripts = transcriptd[tenant_id]
to_delete = [chunk_id for chunk_id in transcripts if int(chunk_id) < two_hours_ago]
for chunk_id in to_delete:
del transcripts[chunk_id]
# its possible that the tenant_id has no more transcripts
if len(transcripts) == 0:
to_delete.append(tenant_id)

# delete the tenant_ids
for tenant_id in to_delete:
del transcriptd[tenant_id]
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean_old_transcripts() has two issues: (1) it also uses with threading.Lock(): which is ineffective because it creates a new lock instance; and (2) to_delete is reused for chunk_ids and then later treated as tenant_ids, so del transcriptd[tenant_id] can attempt to delete non-existent keys (chunk IDs) and raise KeyError. Use separate collections (e.g., chunks_to_delete per tenant and tenants_to_delete overall) and delete tenants based on that second list while holding a shared lock.

Copilot uses AI. Check for mistakes.
Comment on lines 164 to 176
def merge_and_split_transcripts(transcripts):
# Iterate through the sorted transcript keys.
sec = ".!?"
merged_transcripts = ""
result = {}
for key in transcripts.keys():
if not merged_transcripts:
# If merged_transcripts is empty, start with the first transcript.
merged_transcripts += transcripts[key].strip()
else:
# Append the transcript to the merged string with a space and lowercase the following first character.
t = transcripts[key].strip()
if len(t) > 1:
merged_transcripts += " " + t[0].lower() + t[1:]
merged_transcripts += " " + t[0].lower() + t[1:]
else:
merged_transcripts += " " + t
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge_and_split_transcripts() calls transcripts[key].strip(), but transcriptd stores entries as dicts like {'transcript': ...}. With the current storage format, this function will raise an AttributeError when sentences=true. Update it to read/write the nested transcript field (or change transcriptd to store plain strings consistently).

Copilot uses AI. Check for mistakes.
Comment on lines +238 to +247
# Fix: Try regular JSON first (sent by audio_grabber.py)
data = request.get_json(silent=True)
if data:
audio_b64 = data.get('audio_b64')
chunk_id = data.get('chunk_id')
tenant_id = data.get('tenant_id', '0000')

# Fix: validate required fields and return 400 if missing
if not audio_b64 or not chunk_id:
return jsonify({'error': 'Missing required fields: audio_b64 and chunk_id'}), 400
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data = request.get_json(silent=True) followed by if data: treats an empty JSON object ({}) as "no JSON" and falls through to the streaming branch. Use an explicit is not None check (and then validate required fields) so malformed/empty JSON requests reliably return 400 instead of switching behavior based on truthiness.

Copilot uses AI. Check for mistakes.
Comment thread flask/audio_grabber.html
<button id="stopBtn" disabled>Stop</button>
<div id="controls">
<input type="text" id="serverhost" placeholder="Transcribe Host" value="localhost">
<input type="text" id="serverport" placeholder="Transcribe Port" value="5040">
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTML client still defaults to port 5040, while audio_grabber.py now posts to 5000 and the PR description says the Flask default should be 5000. Align this default with the actual server port (or surface the configured FLASK_PORT) to avoid confusing connection failures for users.

Suggested change
<input type="text" id="serverport" placeholder="Transcribe Port" value="5040">
<input type="text" id="serverport" placeholder="Transcribe Port" value="5000">

Copilot uses AI. Check for mistakes.
Comment on lines 115 to 117
logger.info(f"VALID transcript for chunk_id {chunk_id}: {transcript}")
with threading.Lock(): # Ensure thread-safe access to shared resources
# we must distinguish between the case where the chunk_id is already in the transcripts
# this can happen quite often because the client will generate a new chunk_id only when
# the recorded audio has silence. So all chunks are those pieces with speech without a pause.

# get the current transcripts for the tenant_id
with threading.Lock():
transcripts = transcriptd.get(tenant_id, None)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with threading.Lock(): creates a brand-new lock each time, so it does not actually synchronize access to transcriptd across threads. Use a single module-level lock (e.g., transcript_lock = threading.Lock()) and acquire that same lock for all reads/writes that need to be thread-safe (including in request handlers).

Copilot uses AI. Check for mistakes.
Comment on lines +289 to 294
sentences = request.args.get('sentences', default='false') == 'true'
if sentences: t = merge_and_split_transcripts(t)
chunk_id = request.args.get('chunk_id')
if chunk_id in t:
return jsonify({'chunk_id': chunk_id, 'transcript': t[chunk_id]['transcript']})
else:
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When sentences=true, t is replaced with the return value of merge_and_split_transcripts(t), but the handler still assumes t[chunk_id] is a dict containing ['transcript']. This will break once sentence merging is enabled unless the merge function preserves the same value shape. Ensure the "sentences" mode returns the same structure expected by these handlers (this affects several endpoints, not just /get_transcript).

Copilot uses AI. Check for mistakes.
Comment on lines +95 to +103
if use_whisper_server:
# Fix: properly call whisper.cpp /inference endpoint and parse response
try:
files = {'file': ('audio.wav', audio_array.tobytes(), 'audio/wav')}
data = {'response_format': 'json'}
response = requests.post(whisper_server, files=files, data=data)

else:
result = model_fast.transcribe(audio_tensor, temperature=0)
response = requests.post(f"{whisper_server}/inference", files=files, data=data)
response.raise_for_status()
result_json = response.json()
transcript = result_json.get('text', '').strip()
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In whisper-server mode, this uploads audio_array.tobytes() after audio_array has been converted to normalized float32 samples, but the multipart part is labeled audio/wav and named audio.wav. This is not a valid WAV file (no header, wrong sample format), so whisper.cpp is unlikely to decode it correctly. Send a real WAV payload (e.g., wrap the original int16 PCM at 16kHz into a WAV container) or use the server's expected raw-audio format if supported.

Copilot uses AI. Check for mistakes.
Comment thread flask/audio_grabber.py
Comment on lines 102 to +105
def start(self):
self.stream.start_stream()
self.send_thread = threading.Thread(target=self.send_chunk)
self.send_thread.start()
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start() spawns a thread that calls send_chunk() exactly once, while send_chunk() is also called synchronously from audio_callback(). This extra thread is unnecessary and increases the risk of concurrent access to self.buffer/self.chunk_id without any locking. Either remove the thread and keep sending in the callback, or move all network sending into a dedicated loop/thread and protect shared state appropriately.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Add .env support and enable system-aware PyTorch model loading for local builds

2 participants