Skip to content

Conversation

@stevenlafl
Copy link

Screenshot_20240719_110444
Screenshot_20240719_110509
Screenshot_20240719_110535

@stevenlafl stevenlafl marked this pull request as draft July 19, 2024 17:07
@abb128
Copy link
Owner

abb128 commented Jul 25, 2024

FYI the flatpak app doesn't have the network permission so this wouldn't work without adding the network permission now, and it would kinda violate the offline and privacy-first aspects of the app. I will likely not be merging this but you're welcome to maintain your own fork with these added online features

@stevenlafl
Copy link
Author

It might be a good start for a local model - though I am not familiar enough with CPU-only ones. I'll think about how that might work.

My use-case is to transcribe meeting notes and then be able to ask questions about it, or generate summaries. A Krisp-like setup is the desired end-state.

@abb128
Copy link
Owner

abb128 commented Jul 26, 2024

Do you think it might be better to improve copy-pasting transcripts instead, to better facilitate pasting it into ChatGPT or the user's assistant of choice?

@stevenlafl
Copy link
Author

stevenlafl commented Jul 26, 2024

Short answer is yes, talking about the aims and scope you mention associated with this project.

Ideally I'd need it to:

  1. Start/stop "sessions" on demand and (optionally) automatically based on a silence-for-time period threshold. I have a lot of back-to-back meetings, and I may simply forget to restart the app.
  2. Copy transcripts out easily, optionally with timestamps (perhaps a checkbox?). A "copy latest session transcript to clipboard" button would work well.

The reason I did it this way is that sometimes I get distracted and need some parts summarized, to get context quickly and easily while a conversation is actively happening. However, it is certainly still doable with a copy button.

Optionally, not sure how possible:

  1. Speaker diarization, but this is probably more for the april-asr project and may be far, far out of scope. A simple '>' as indicated on actual live TV transcripts denotes when a new speaker starts talking. The "voice print" just needs to change enough to trigger that, not actively track how many speakers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants