- 
                Notifications
    You must be signed in to change notification settings 
- Fork 55
Open
pipecat-ai/pipecat
#2899Description
Currently, we have three bot output events:
- bot-llm-text: This event sends the text from every LLMTextFrame, which in effect is a token by token stream of what the llm outputs.
- bot-tts-text: This event sends the text from every TTSTextFrame, which, depending on the TTS is either a word-by-word stream, timed with when the TTS speaks the word OR sentence by sentence of what it is supposed to say if timing is not supported.
- bot-transcription: This even currently is generated from the LLMTextFrames, but aggregated into sentence by sentence.
None of these provides what a client likely ACTUALLY wants, which is a way to acquire the effective output of a bot. What does the bot say? When? Even when it's not speaking? The bot-llm-text does not take into account interrupts. The bot-tts-text does not take into account any non-interrupted llm output that has been filtered out of what is to be spoken (when skip_tts is turned on). And bot-transcription doesn't say whether the sentence was actually spoken or not and doesn't support word-by-word timing.
This issue is in place to identify this gap and prioritize fixing it.
Metadata
Metadata
Assignees
Labels
No labels