Listener: speaker filter, wake debounce, TTS lead-in, skip follow-up + Bugbot fixes#1
Conversation
- Rolling pre-wake-word buffer + ElevenLabs Scribe diarization; keep first speaker only to reduce cross-talk from concurrent speech - Strip wake phrases from diarized/fallback transcripts (avoid Hey Jarvis as intent) - Spotify duck target volume 45% for usable playback floor - README: voice pipeline + how speaker filtering works; .env.example toggles - Orchestrator/refactor, tests, and related app updates (local workspace)
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| pre_ww_samples = int(PRE_WAKEWORD_BUFFER_SECS * TARGET_RATE) | ||
| self._pre_wakeword_buffer: collections.deque[np.ndarray] = collections.deque( | ||
| maxlen=pre_ww_samples // WAKEWORD_CHUNK + 10 | ||
| ) |
There was a problem hiding this comment.
Pre-wake buffer retains too little audio
High Severity
_pre_wakeword_buffer capacity is sized with WAKEWORD_CHUNK, but appended chunks are ~30ms callback chunks. This under-allocates the rolling buffer, so configured pre-wake audio seconds cannot be retained and diarization loses wake-word anchoring.
| if self._whisper_model == "elevenlabs": | ||
| logger.info("[DEBUG] ElevenLabs SDK not connected, trying REST API") | ||
| text = self._transcribe_elevenlabs(audio) | ||
| text = self._transcribe_elevenlabs(audio, use_diarization=use_speaker_filter) |
There was a problem hiding this comment.
Local fallback keeps wake phrase in command
Medium Severity
When speaker filtering prepends pre-wake audio, local Whisper fallback transcribes that combined clip but never calls _strip_wake_phrase. If ElevenLabs REST fails, wake phrases can be sent to intent routing.
Additional Locations (1)
| settings=settings, | ||
| memory=self.memory_store, | ||
| bedrock_client=self.brain._bedrock, | ||
| ) |
There was a problem hiding this comment.
Orchestrator created without Elasticsearch dependencies
Medium Severity
ZiriOrchestrator is now instantiated without es_store/hybrid_searcher, so both stay None and Elasticsearch turn indexing paths never run.
Additional Locations (2)
| new_vol = current + delta | ||
| if delta < 0: | ||
| new_vol = max(_SPOTIFY_VOLUME_FLOOR, new_vol) | ||
| else: |
There was a problem hiding this comment.
Volume-down may raise Spotify volume
Medium Severity
Negative volume adjustments clamp to a 40% floor unconditionally. If current Spotify volume is already below 40 (for example after spotify.set_volume to 35), saying quieter can increase volume up to 40%.
Additional Locations (1)
- Raise wake_word_threshold, consecutive frames + cooldown (settings + listener) - TTS: ~50ms output lead-in for CoreAudio stream startup (audio_player) - Spotify skip: track-id success, fast pause path, MUSIC_SKIP_NO_NEXT UX - Follow-up listening + route hint; speaker filter tweaks; brain routing fixes - Docs/env examples; routing tests + spotify skip test
|
You have used all of your free Bugbot PR reviews. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |
- scripts/macos/: LaunchAgent plist + short README - tests/fixtures/audio/sample.aiff (was test.aiff at root) - README: root summary table, accurate app/tests layout - .dockerignore: plist path under scripts/macos
…volume floor - Size pre-wake deque by ~30ms downsampled mic chunks (not 80ms WW frames) - Strip wake phrase after local Whisper / combined-clip paths when diarization path used - Pass es_store + hybrid_searcher into ZiriOrchestrator (turn indexing restored) - Volume down: apply 40% floor only when current >= floor (no boost from 35%)
|
You have used all of your free Bugbot PR reviews. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |


Summary
Always-on listener: speaker-aware STT, fewer false wakes, cleaner playback, Spotify skip/follow-up UX, and Bugbot follow-ups. Orchestrator again indexes turns to Elasticsearch when configured.
Listener & STT
maxlenmatches ~30 ms downsampled mic chunks (not 80 ms wake-word frames) so the full pre-wake window is retained.use_speaker_filter).WAKE_WORD_THRESHOLD, consecutive-frame confirmation, cooldown (settings+listener).audio_player) to avoid CoreAudio stream-start ramp clipping the first syllable.MUSIC_SKIP_NO_NEXT(no second wake word); route hints for bare playlist names; optional follow-up without diarization (settings).Spotify & tools
es_store+hybrid_searcherintoZiriOrchestratorso ES turn indexing runs again.Routing & brain
brain, tests +routing_eval.jsonl).Repo hygiene
scripts/macos/LaunchAgent plist + README; sample audio undertests/fixtures/audio/; README project tree refresh.Testing
pytest(ormake test): routing, orchestrator, Spotify skip tests.make start— wake stability, first syllable of TTS, skip/no-next follow-up,ELASTICSEARCH_URLset → verify indexing if desired.