Skip to content

Latest commit

 

History

History
200 lines (148 loc) · 5.45 KB

File metadata and controls

200 lines (148 loc) · 5.45 KB

Pause Feature Documentation

Overview

The Chatterbox TTS Audiobook Edition now includes automatic pause insertion based on line breaks (returns) in your text input. For every line break (\n or \r\n) detected in your text, the system will automatically add a 0.1-second pause to the generated audio.

How It Works

Return-Based Pause System

  • Detection: The system counts all line breaks in your input text
  • Duration: Each line break adds exactly 0.1 seconds of silence
  • Accumulation: Multiple line breaks accumulate (10 returns = 1 second pause)
  • Debug Output: Terminal shows pause information when audio is generated

Example

Input Text:

Hello, this is the first line.
This is the second line.

This line comes after an empty line.
Final line.

Result:

  • 4 line breaks detected
  • 0.4 seconds of total pause time added
  • Debug output: 🔇 Detected 4 line breaks → 0.4s total pause time

Features Supported

✅ Speech-to-Text Generation

  • Single text input with returns
  • Pauses added to the end of generated speech
  • Debug output in terminal

✅ Single-Voice Audiobook Creation

  • Text processing before chunking
  • Pauses distributed throughout the audiobook
  • Project metadata includes pause information

✅ Multi-Voice Audiobook Creation

  • Character dialogue with natural pauses
  • Pause processing applied before voice assignment
  • Debug output shows total pause time added

✅ Batch Audiobook Processing

  • Automatic pause processing for all files in batch
  • Individual pause calculations per file

Technical Implementation

Text Processing Pipeline

  1. Input Text Analysis

    processed_text, return_count, total_pause_duration = process_text_for_pauses(text, 0.1)
  2. Silence Generation

    pause_audio = create_silence_audio(total_pause_duration, sample_rate)
  3. Audio Combination

    final_audio = np.concatenate([speech_audio, pause_audio])

Debug Output Examples

Speech-to-Text:

🔇 Detected 3 line breaks → 0.3s total pause time
🔇 Added 0.3s pause to speech (3 returns)

Audiobook Creation:

🔇 Detected 15 line breaks → 1.5s total pause time
🔇 Adding 1.5s pause (15 returns × 0.1s each)

Usage Guidelines

Best Practices

  1. Natural Breaks: Use line breaks where you want natural pauses in speech
  2. Paragraph Separation: Double line breaks create longer pauses
  3. Dialogue: Separate character lines for better multi-voice audiobooks
  4. Punctuation: Combine with punctuation for maximum effect

Example Text Formatting

Good for Natural Speech:

Welcome to our story.
Let me tell you about a magical place.

In this place, anything is possible.
The adventure begins now.

Good for Multi-Voice Audiobooks:

[Narrator] The sun was setting over the hills.

[Character1] "We need to find shelter soon."

[Character2] "I see a cave up ahead.
Let's hurry before it gets dark."

[Narrator] They rushed toward the cave.

Configuration

Pause Duration

  • Current Setting: 0.1 seconds per return
  • Location: Hardcoded in processing functions
  • Customization: Can be modified in src/audiobook/processing.py

Sample Rate

  • Default: 24,000 Hz
  • Compatibility: Automatically matches model output
  • Quality: High enough for natural-sounding pauses

Testing

Test Script

Run the included test script to verify functionality:

python test_pause_functionality.py

Manual Testing

  1. Create text with line breaks
  2. Generate speech or audiobook
  3. Check terminal for debug output
  4. Listen for pauses in generated audio

Troubleshooting

Common Issues

No Pauses Heard:

  • Check if text actually contains line breaks (\n)
  • Verify debug output appears in terminal
  • Ensure audio player supports the full generated file

Pauses Too Long/Short:

  • Current setting is 0.1s per return (not configurable via UI)
  • Multiple consecutive returns will create longer pauses
  • This is intended behavior for paragraph breaks

Debug Output Missing:

  • Check terminal/console where the application is running
  • Ensure you're using the updated functions
  • Verify pause processing is enabled

Future Enhancements

Potential Improvements

  • User-configurable pause duration
  • Different pause types (comma, period, paragraph)
  • Visual indicators in the UI
  • Pause preview before generation
  • Advanced pause distribution algorithms

Integration Ideas

  • Export settings in voice profiles
  • Project-level pause configuration
  • Advanced text markup for pause control
  • Audio timeline with pause indicators

Technical Details

Files Modified

  • src/audiobook/processing.py - Core pause processing functions
  • gradio_tts_app_audiobook.py - Main TTS integration
  • test_pause_functionality.py - Test and verification script

Functions Added

  • process_text_for_pauses() - Text analysis and preprocessing
  • create_silence_audio() - Silence generation
  • insert_pauses_between_chunks() - Audio combination with pauses
  • process_text_with_distributed_pauses() - Advanced chunk processing

Compatibility

  • ✅ Windows, macOS, Linux
  • ✅ CPU and GPU processing modes
  • ✅ All supported audio formats
  • ✅ Existing voice profiles and projects
  • ✅ Batch processing workflows

Note: This feature is automatically enabled and requires no configuration. Simply use line breaks in your text where you want pauses, and the system will handle the rest!