Skip to content

Listener: speaker filter, wake debounce, TTS lead-in, skip follow-up + Bugbot fixes#1

Merged
adityasingh2400 merged 4 commits into
masterfrom
feature/listener-speaker-filter-diarization
Mar 19, 2026
Merged

Listener: speaker filter, wake debounce, TTS lead-in, skip follow-up + Bugbot fixes#1
adityasingh2400 merged 4 commits into
masterfrom
feature/listener-speaker-filter-diarization

Conversation

@adityasingh2400

@adityasingh2400 adityasingh2400 commented Mar 19, 2026

Copy link
Copy Markdown
Owner

Summary

Always-on listener: speaker-aware STT, fewer false wakes, cleaner playback, Spotify skip/follow-up UX, and Bugbot follow-ups. Orchestrator again indexes turns to Elasticsearch when configured.

Listener & STT

  • Pre-wake rolling buffer + ElevenLabs diarization → keep wake-word speaker only; strip wake phrases from routing.
  • Buffer sizing fix: deque maxlen matches ~30 ms downsampled mic chunks (not 80 ms wake-word frames) so the full pre-wake window is retained.
  • Local Whisper fallback: strip wake phrase when the same anchor + command clip was used for diarization (use_speaker_filter).
  • False wake reduction: higher default WAKE_WORD_THRESHOLD, consecutive-frame confirmation, cooldown (settings + listener).
  • TTS clarity: ~50 ms silent output lead-in before playback (audio_player) to avoid CoreAudio stream-start ramp clipping the first syllable.
  • Follow-up listening after MUSIC_SKIP_NO_NEXT (no second wake word); route hints for bare playlist names; optional follow-up without diarization (settings).

Spotify & tools

  • Duck to 45% during interaction; skip uses track-id success / fast pause; unduck ramp.
  • Volume down bugfix: 40% floor applies only when current volume is already ≥ 40% (no “quieter” raising 35 → 40).
  • Hub: pass es_store + hybrid_searcher into ZiriOrchestrator so ES turn indexing runs again.

Routing & brain

  • Deterministic fixes: playlist vs play, shuffle phrasing, follow-up hints (brain, tests + routing_eval.jsonl).

Repo hygiene

  • scripts/macos/ LaunchAgent plist + README; sample audio under tests/fixtures/audio/; README project tree refresh.

Testing

  • pytest (or make test): routing, orchestrator, Spotify skip tests.
  • Manual: make start — wake stability, first syllable of TTS, skip/no-next follow-up, ELASTICSEARCH_URL set → verify indexing if desired.

- Rolling pre-wake-word buffer + ElevenLabs Scribe diarization; keep first
  speaker only to reduce cross-talk from concurrent speech
- Strip wake phrases from diarized/fallback transcripts (avoid Hey Jarvis as intent)
- Spotify duck target volume 45% for usable playback floor
- README: voice pipeline + how speaker filtering works; .env.example toggles
- Orchestrator/refactor, tests, and related app updates (local workspace)

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment thread app/core/listener.py Outdated
pre_ww_samples = int(PRE_WAKEWORD_BUFFER_SECS * TARGET_RATE)
self._pre_wakeword_buffer: collections.deque[np.ndarray] = collections.deque(
maxlen=pre_ww_samples // WAKEWORD_CHUNK + 10
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-wake buffer retains too little audio

High Severity

_pre_wakeword_buffer capacity is sized with WAKEWORD_CHUNK, but appended chunks are ~30ms callback chunks. This under-allocates the rolling buffer, so configured pre-wake audio seconds cannot be retained and diarization loses wake-word anchoring.

Fix in Cursor Fix in Web

Comment thread app/core/listener.py
if self._whisper_model == "elevenlabs":
logger.info("[DEBUG] ElevenLabs SDK not connected, trying REST API")
text = self._transcribe_elevenlabs(audio)
text = self._transcribe_elevenlabs(audio, use_diarization=use_speaker_filter)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local fallback keeps wake phrase in command

Medium Severity

When speaker filtering prepends pre-wake audio, local Whisper fallback transcribes that combined clip but never calls _strip_wake_phrase. If ElevenLabs REST fails, wake phrases can be sent to intent routing.

Additional Locations (1)
Fix in Cursor Fix in Web

Comment thread app/hub.py
settings=settings,
memory=self.memory_store,
bedrock_client=self.brain._bedrock,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orchestrator created without Elasticsearch dependencies

Medium Severity

ZiriOrchestrator is now instantiated without es_store/hybrid_searcher, so both stay None and Elasticsearch turn indexing paths never run.

Additional Locations (2)
Fix in Cursor Fix in Web

Comment thread app/core/tool_runner.py
new_vol = current + delta
if delta < 0:
new_vol = max(_SPOTIFY_VOLUME_FLOOR, new_vol)
else:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Volume-down may raise Spotify volume

Medium Severity

Negative volume adjustments clamp to a 40% floor unconditionally. If current Spotify volume is already below 40 (for example after spotify.set_volume to 35), saying quieter can increase volume up to 40%.

Additional Locations (1)
Fix in Cursor Fix in Web

- Raise wake_word_threshold, consecutive frames + cooldown (settings + listener)
- TTS: ~50ms output lead-in for CoreAudio stream startup (audio_player)
- Spotify skip: track-id success, fast pause path, MUSIC_SKIP_NO_NEXT UX
- Follow-up listening + route hint; speaker filter tweaks; brain routing fixes
- Docs/env examples; routing tests + spotify skip test
@cursor

cursor Bot commented Mar 19, 2026

Copy link
Copy Markdown

You have used all of your free Bugbot PR reviews.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

- scripts/macos/: LaunchAgent plist + short README
- tests/fixtures/audio/sample.aiff (was test.aiff at root)
- README: root summary table, accurate app/tests layout
- .dockerignore: plist path under scripts/macos
…volume floor

- Size pre-wake deque by ~30ms downsampled mic chunks (not 80ms WW frames)
- Strip wake phrase after local Whisper / combined-clip paths when diarization path used
- Pass es_store + hybrid_searcher into ZiriOrchestrator (turn indexing restored)
- Volume down: apply 40% floor only when current >= floor (no boost from 35%)
@cursor

cursor Bot commented Mar 19, 2026

Copy link
Copy Markdown

You have used all of your free Bugbot PR reviews.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

@adityasingh2400 adityasingh2400 changed the title Listener: ElevenLabs diarization speaker filter + Spotify duck 45% + docs Listener: speaker filter, wake debounce, TTS lead-in, skip follow-up + Bugbot fixes Mar 19, 2026
@adityasingh2400 adityasingh2400 merged commit c95f247 into master Mar 19, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant