Fix/issue 40 dict keys type error by prakashwaddar628 · Pull Request #42 · susiai/susi_translator

prakashwaddar628 · 2026-04-17T02:52:52Z

This PR fixes the TypeError: 'dict_keys' object is not subscriptable that occurs in Python 3 when merging transcripts without ending punctuation. It also standardizes the transcription utility logic in the Django app by replacing the identity placeholder with the actual merging logic.

#40

…models and a background processing queue.

… support

Copilot

Pull request overview

Fixes a Python 3 dict_keys indexing crash in transcript sentence-merging, and updates the Django app to use a real merge/split implementation instead of an identity placeholder, along with documentation cleanups.

Changes:

Fix Python 3 dict_keys non-subscriptable access by converting keys view to a list before indexing.
Replace Django transcription merge placeholder with the merge/split implementation.
Expand/modernize project and API documentation; add standalone reproduction/verification scripts.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`verify_fix.py`	Adds an isolation script to validate merge/split behavior (currently has a failing expectation and data-shape mismatch).
`reproduce_bug.py`	Adds a minimal reproduction script for the `dict_keys` indexing error.
`flask/transcribe_server.py`	Applies the `list(transcripts.keys())[-1]` fix and updates docstrings (merge code still has a data-shape issue).
`django/transcribe_app/views.py`	Improves endpoint docstrings and parameter documentation.
`django/transcribe_app/transcribe_utils.py`	Switches Django to use merge/split logic instead of identity placeholder (current implementation returns inconsistent value types).
`django/docs/index.md`	Rewrites docs landing page content (drops `disable_toc: true` directive used elsewhere).
`README.md`	Updates project overview and quick-start instructions.
`.gitignore`	Adds venv ignore pattern.

Comments suppressed due to low confidence (1)

django/transcribe_app/transcribe_utils.py:303

merge_and_split_transcripts is documented as iterating over sorted chunk IDs, but it currently loops over transcripts.keys() without sorting. Since chunk IDs are timestamp-like strings, this can merge/split sentences in the wrong order if inserts arrive out of order. Consider iterating with for chunk_id in sorted(transcripts.keys(), key=int): (or otherwise guaranteeing chronological order).

def merge_and_split_transcripts(transcripts):
    # Iterate through the sorted transcript keys.
    sec = ".!?"
    merged_transcripts = ""
    result = {}
    for chunk_id in transcripts.keys():
        transcript_event = transcripts[chunk_id]

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-21T22:06:23Z

+# SUSI Translator Documentation

-# Transcribe and Translate every audio stream
+Welcome to the SUSI Translator documentation. This project provides a robust, real-time audio transcription and translation system.


The initial comment front-matter line disable_toc: true was removed from index.md, but other pages in django/docs/ still use it. If the docs renderer relies on this directive, the index page may start showing an unwanted/incorrect table of contents or be parsed differently than the other docs pages. Consider keeping disable_toc: true at the top for consistency.

Copilot · 2026-04-21T22:06:24Z

+    assert res["1700000001000"] == "welcome to the event"
+    print("Test passed: Unpunctuated tail handled correctly.")


merge_and_split_transcripts_logic() merges all provided chunks into merged_transcripts and, when there is no punctuation at all, assigns the entire merged string to the last key. With the current input, res['1700000001000'] will be "Hello everyone welcome to the event", so this assertion will fail. Either adjust the expected output or change the logic/test case to reflect the intended behavior.

Suggested change

assert res["1700000001000"] == "welcome to the event"

print("Test passed: Unpunctuated tail handled correctly.")

assert res["1700000001000"] == "Hello everyone welcome to the event"

print("Test passed: Unpunctuated content is merged and assigned to the last key.")

Copilot · 2026-04-21T22:06:24Z

+    for key in transcripts.keys():
+        if not merged_transcripts:
+            merged_transcripts += transcripts[key].strip()
+        else:
+            t = transcripts[key].strip()
+            if len(t) > 1:
+                merged_transcripts += " " +  t[0].lower() + t[1:]
+            else:
+                merged_transcripts += " " + t
+


This verification script claims to be a copy of the fixed production logic, but it treats transcripts[key] as a string. In the Flask/Django servers, transcript collections are dicts of events (e.g., {'transcript': ...}), so this script isn’t exercising the same data shape as production. Consider updating the fixture inputs and access pattern to match production, otherwise the script can give a false sense of correctness.

Copilot · 2026-04-21T22:06:24Z

+def merge_and_split_transcripts(transcripts):
    # Iterate through the sorted transcript keys.
    sec = ".!?"
    merged_transcripts = ""


By renaming/enabling this implementation as merge_and_split_transcripts, the Django API will now use it when sentences=true. However, the current implementation builds result entries as plain strings during sentence splitting but later treats result[last_chunk_id] as a dict with a transcript field. This produces inconsistent return types and will break callers that expect t[chunk_id]['transcript'] for every entry. Update the function so it always returns the same event dict shape as the input (e.g., each value is a dict containing at least {'transcript': str} plus any other metadata).

Copilot · 2026-04-21T22:06:25Z

+        keys = list(transcripts.keys())
+        if keys:
+            last_key = keys[-1]
+            p = result.get(last_key)
+            if p:
+                result[last_key] = p + " " + merged_transcripts
+            else:
+                result[last_key] = merged_transcripts


In merge_and_split_transcripts, transcripts[key] appears to be treated as a string (calling .strip()), but this server stores transcript entries as dicts like {'transcript': transcript}. If this function is ever invoked (e.g., when sentences=true is fixed/used), it will raise an AttributeError and the merge logic won’t work. Update the merge logic to read/write transcripts[key]['transcript'] (and ensure the returned result matches the dict structure the endpoints expect).

kirtisrivastava22 · 2026-04-22T04:35:10Z

Hi, I saw that a PR is already linked to this issue.
I’d be happy to help review, test, or contribute improvements if needed. I’ve recently contributed and have the project set up locally.
Let me know if I can assist!

kirtisrivastava22 · 2026-04-22T05:42:37Z

I noticed the Copilot review flagged a few additional issues — the function
also needs to access transcripts[key]['transcript'] instead of treating
values as plain strings, and return consistent dict shapes so callers
like get_transcript don't break.

I've opened PR #45 which covers these cases along with the original fix
and includes tests. Tagging @mariobehling in case it's helpful for review!

prakashwaddar628 added 3 commits March 16, 2026 11:20

feat: Implement a Flask audio transcription server utilizing Whisper …

6d317f9

…models and a background processing queue.

chore: add virtual environment directories to .gitignore.

9589384

feat: implement asynchronous audio transcription service with Whisper…

1fcb8b0

… support

mariobehling requested a review from Copilot April 21, 2026 21:59

Copilot started reviewing on behalf of mariobehling April 21, 2026 22:00 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/issue 40 dict keys type error#42

Fix/issue 40 dict keys type error#42
prakashwaddar628 wants to merge 3 commits intosusiai:masterfrom
prakashwaddar628:fix/issue-40-dict-keys-TypeError

prakashwaddar628 commented Apr 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

kirtisrivastava22 commented Apr 22, 2026

Uh oh!

kirtisrivastava22 commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

		assert res["1700000001000"] == "welcome to the event"
		print("Test passed: Unpunctuated tail handled correctly.")

Conversation

prakashwaddar628 commented Apr 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

kirtisrivastava22 commented Apr 22, 2026

Uh oh!

kirtisrivastava22 commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants