Add support for retrieving the effective output of the bot

Currently, we have three bot output events:

1. `bot-llm-text`: This event sends the text from every LLMTextFrame, which in effect is a token by token stream of what the llm outputs.
2. `bot-tts-text`: This event sends the text from every TTSTextFrame, which, depending on the TTS is either a word-by-word stream, timed with when the TTS speaks the word OR sentence by sentence of what it is supposed to say if timing is not supported.
3. `bot-transcription`: This even currently is generated from the LLMTextFrames, but aggregated into sentence by sentence.

None of these provides what a client likely ACTUALLY wants, which is a way to acquire the effective output of a bot. What does the bot say? When? Even when it's not speaking? The bot-llm-text does not take into account interrupts. The bot-tts-text does not take into account any non-interrupted llm output that has been filtered out of what is to be spoken (when skip_tts is turned on).  And bot-transcription doesn't say whether the sentence was actually spoken or not and doesn't support word-by-word timing.

This issue is in place to identify this gap and prioritize fixing it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for retrieving the effective output of the bot #158

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add support for retrieving the effective output of the bot #158

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions