Skip to content

Add support for retrieving the effective output of the bot #158

@mattieruth

Description

@mattieruth

Currently, we have three bot output events:

  1. bot-llm-text: This event sends the text from every LLMTextFrame, which in effect is a token by token stream of what the llm outputs.
  2. bot-tts-text: This event sends the text from every TTSTextFrame, which, depending on the TTS is either a word-by-word stream, timed with when the TTS speaks the word OR sentence by sentence of what it is supposed to say if timing is not supported.
  3. bot-transcription: This even currently is generated from the LLMTextFrames, but aggregated into sentence by sentence.

None of these provides what a client likely ACTUALLY wants, which is a way to acquire the effective output of a bot. What does the bot say? When? Even when it's not speaking? The bot-llm-text does not take into account interrupts. The bot-tts-text does not take into account any non-interrupted llm output that has been filtered out of what is to be spoken (when skip_tts is turned on). And bot-transcription doesn't say whether the sentence was actually spoken or not and doesn't support word-by-word timing.

This issue is in place to identify this gap and prioritize fixing it.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions