Skip to content

Refactor server error handling, add client device selection, and improve transcript UI#28

Open
weshaan wants to merge 1 commit intosusiai:masterfrom
weshaan:master
Open

Refactor server error handling, add client device selection, and improve transcript UI#28
weshaan wants to merge 1 commit intosusiai:masterfrom
weshaan:master

Conversation

@weshaan
Copy link
Copy Markdown

@weshaan weshaan commented Mar 26, 2026

Overview

This PR introduces significant quality of life improvements across the server, client, and UI components of the real-time audio transcription system. The goal is to enhance reliability, usability, and ease of setup for both technical and non-technical users.


Key Improvements

1. Server Enhancements (transcribe_server.py)

  • Refactored server architecture for improved stability and maintainability
  • Added robust error handling and clearer logging for easier debugging
  • Simplified model loading with more user-friendly error messages
  • Introduced a health check endpoint for monitoring server status

2. Client Improvements (audio_grabber.py)

  • Enabled flexible audio device selection
  • Added command-line arguments for configurable server connection
  • Improved error handling and runtime stability
  • Implemented clearer status messages and graceful shutdown behavior

3. UI Enhancements (transcribe_listener.html)

  • Added real-time status and error indicators for better user awareness
  • Implemented auto-scrolling for continuous transcript updates
  • Introduced a “Copy Transcript” feature
  • Improved connection handling and user feedback

4. Setup Simplification (start_server.bat)

  • Added a one-click Windows batch script to:
    • Install dependencies
    • Launch the server
  • Reduces setup complexity, especially for non-technical users

5. Documentation Updates (README.md)

  • Rewritten setup and usage instructions for clarity
  • Added a structured overview of features and improvements
  • Improved onboarding experience

Impact

  • Improved system reliability through better error handling and logging
  • Enhanced user experience across CLI and web interface
  • Faster onboarding with simplified setup and documentation
  • Increased accessibility for users with varying technical backgrounds

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Refactors the transcription server, updates the audio client for device selection and CLI configurability, and refreshes the browser transcript UI for improved user feedback.

Changes:

  • Simplifies transcribe_server.py to a basic Flask API with startup model loading and /health.
  • Reworks audio_grabber.py into a CLI-driven loop with device selection and per-chunk POSTs.
  • Updates transcribe_listener.html with connect/status UI, auto-scroll, and copy-to-clipboard.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.

File Description
flask/transcribe_server.py Replaces prior server architecture with a simplified Flask server + /health + /transcribe.
flask/transcribe_listener.html Adds connect/status/error UI and transcript UX improvements (auto-scroll, copy).
flask/audio_grabber.py Switches from class/threaded design to argparse + device selection + looped POST to server.
README.md Updates setup/run instructions to match the new workflow and features.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 48 to 49
else:
model_smart = whisper.load_model(model_smart_name, in_memory=True)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block contains a stray else: (and references to model_smart_name) after app.run(...), which makes the module syntactically invalid. Remove these leftover lines from the previous implementation so the file can import and run.

Suggested change
else:
model_smart = whisper.load_model(model_smart_name, in_memory=True)

Copilot uses AI. Check for mistakes.
if model is None:
return jsonify({'error': 'Model not loaded'}), 500
try:
data = request.get_json()
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request.get_json() can return None (e.g., invalid/missing JSON), which would raise an AttributeError on data.get(...) and result in a 500 instead of a 400. Use request.get_json(silent=True) and explicitly return a 400 when the JSON body is missing/invalid.

Suggested change
data = request.get_json()
data = request.get_json(silent=True)
if data is None:
return jsonify({'error': 'Missing or invalid JSON body'}), 400

Copilot uses AI. Check for mistakes.
Comment on lines +42 to +44
except Exception as e:
logger.error(f"Transcription error: {e}")
return jsonify({'error': str(e)}), 500
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning str(e) to clients can leak internal details (paths, environment info, library internals). Prefer logger.exception(...) for server-side diagnostics and return a generic error message to the client (optionally with a stable error code).

Suggested change
except Exception as e:
logger.error(f"Transcription error: {e}")
return jsonify({'error': str(e)}), 500
except Exception:
logger.exception("Transcription error")
return jsonify({'error': 'Internal server error'}), 500

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +39
import base64
import numpy as np
audio_bytes = base64.b64decode(audio_b64)
audio_np = np.frombuffer(audio_bytes, np.int16).astype(np.float32) / 32768.0
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing base64 and numpy inside the request handler adds overhead on every request. Move these imports to module scope so they are imported once at startup.

Copilot uses AI. Check for mistakes.
Comment on lines +64 to 68
</body>
</html>
}
latestChunkId = null;
}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is leftover JavaScript after the closing </html> tag, which makes the HTML invalid and will likely break script execution. Remove these trailing lines (they appear to be remnants of the previous session logic).

Copilot uses AI. Check for mistakes.
Comment on lines +44 to +45
fetch(`http://${host}:${port}/latest`)
.then(r => r.json())
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UI polls GET /latest, but the refactored server in this PR only defines /health and POST /transcribe. This will cause persistent connection errors. Either implement a compatible GET /latest endpoint on the server (with defined response shape), or update the UI to use an endpoint that actually exists.

Copilot uses AI. Check for mistakes.
Comment on lines +58 to +61
document.getElementById('connectBtn').onclick = connect;
document.getElementById('copyBtn').onclick = function() {
navigator.clipboard.writeText(transcript);
setStatus('Transcript copied!');
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

navigator.clipboard.writeText(...) is async and may fail in non-secure contexts (e.g., when opening the file directly). Handle the returned Promise (show an error on rejection), and consider providing a fallback (e.g., selecting text in a temporary textarea) so the feature works more reliably.

Suggested change
document.getElementById('connectBtn').onclick = connect;
document.getElementById('copyBtn').onclick = function() {
navigator.clipboard.writeText(transcript);
setStatus('Transcript copied!');
function fallbackCopyTextToClipboard(text) {
return new Promise((resolve, reject) => {
const textArea = document.createElement('textarea');
textArea.value = text;
textArea.setAttribute('readonly', '');
textArea.style.position = 'fixed';
textArea.style.top = '-9999px';
textArea.style.left = '-9999px';
document.body.appendChild(textArea);
textArea.focus();
textArea.select();
try {
const successful = document.execCommand('copy');
document.body.removeChild(textArea);
if (successful) {
resolve();
} else {
reject(new Error('Copy command was unsuccessful.'));
}
} catch (e) {
document.body.removeChild(textArea);
reject(e);
}
});
}
function copyTextToClipboard(text) {
if (navigator.clipboard && window.isSecureContext) {
return navigator.clipboard.writeText(text);
}
return fallbackCopyTextToClipboard(text);
}
document.getElementById('connectBtn').onclick = connect;
document.getElementById('copyBtn').onclick = function() {
copyTextToClipboard(transcript)
.then(() => setStatus('Transcript copied!'))
.catch(e => setError('Copy failed: ' + e));

Copilot uses AI. Check for mistakes.
Comment thread flask/audio_grabber.py
Comment on lines +53 to 57
if __name__ == '__main__':
main()
if len(self.buffer) > 0:
print("send chunk")
self.send_chunk()
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are leftover indented lines from the previous class-based implementation after main(), which makes the script syntactically invalid. Remove these trailing lines so the module can run.

Copilot uses AI. Check for mistakes.
Comment thread flask/audio_grabber.py
Comment on lines +13 to +17
def list_devices(audio):
print("Available audio input devices:")
for i in range(audio.get_device_count()):
info = audio.get_device_info_by_index(i)
print(f"{i}: {info['name']}")
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prints all devices, including those with no input channels, which can confuse users when selecting a microphone. Filter to devices where maxInputChannels > 0, and consider printing the channel count (and/or marking the default input device) to make selection more reliable.

Copilot uses AI. Check for mistakes.
Comment thread README.md
### 2. Start the transcription server

You can use the one-click script (Windows):

Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README instructs users to run start_server.bat, but that file is not included in the provided diffs for this PR. If the batch script is intended to be added/updated here (as stated in the PR description), ensure it’s actually part of the PR; otherwise adjust the README instructions.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants