Skip to content

anthonywu/gensay

Repository files navigation

gensay

PyPI - Version PyPI - Python Version

A multi-provider text-to-speech (TTS) tool that implements the Apple macOS /usr/bin/say command interface while supporting multiple TTS backends including Chatterbox (local AI), OpenAI, ElevenLabs, and Amazon Polly.

Features

  • macOS say Compatible: Drop-in replacement for the macOS say command with identical CLI interface
  • Multiple TTS Providers: Extensible provider system with support for:
  • Smart Text Chunking: Intelligently splits long text for optimal TTS processing
  • Audio Caching: Automatic caching with LRU eviction to speed up repeated synthesis
  • Progress Tracking: Built-in progress bars with tqdm and customizable callbacks
  • Multiple Audio Formats: Support for AIFF, WAV, M4A, MP3, CAF, FLAC, AAC, OGG
  • Background Pre-caching: Queue and cache audio chunks in the background (Chatterbox only)
  • Interactive REPL Mode: Start an interactive session with provider initialized once for repeated use
  • Named Pipe Listener: Listen on a FIFO for text input from other processes

Table of Contents

Installation

It's 2026, use uv

gensay is intended to be used as a CLI tool that is a drop-in replacement to the macOS say CLI.

System Dependencies (ElevenLabs provider only)

PortAudio is required if you plan to use the ElevenLabs provider. The pyaudio dependency needs the PortAudio C library to compile successfully.

Other providers (macOS, OpenAI, Amazon Polly, Chatterbox) do not require PortAudio.

Homebrew (macOS):

brew install portaudio

Nix:

nix-env -iA nixpkgs.portaudio

Install gensay

# Install as a tool
uv tool install gensay

# With extras: ElevenLabs provider (requires PortAudio, see above)
pip install 'gensay[elevenlabs]'

# With extras: Chatterbox provider (local Text-to-Speech model, ~2GB PyTorch dependencies)
uv tool install 'gensay[chatterbox]' \
  --with git+https://github.com/anthonywu/chatterbox.git@allow-dep-updates

# Or add to your project
uv add gensay

# From source (with automatic PortAudio path configuration)
git clone https://github.com/anthonywu/gensay
cd gensay
just setup

Optional Dependencies

# Audio format conversion (for non-native formats like MP3, OGG, FLAC)
# Requires ffmpeg installed on system
pip install 'gensay[audio-formats]'

# Install all optional dependencies
pip install 'gensay[all]'

DInstallation Help:

For developer/maintainer installation, just setup automatically configures PortAudio and FFmpeg paths for both Nix and Homebrew.

Developer/Maintainer Build Dependencies

PortAudio Paths (for ElevenLabs)

Homebrew:

export C_INCLUDE_PATH="$(brew --prefix portaudio)/include:$C_INCLUDE_PATH"
export LIBRARY_PATH="$(brew --prefix portaudio)/lib:$LIBRARY_PATH"

Nix:

export C_INCLUDE_PATH="$(nix-build '<nixpkgs>' -A portaudio --no-out-link)/include:$C_INCLUDE_PATH"
export LIBRARY_PATH="$(nix-build '<nixpkgs>' -A portaudio --no-out-link)/lib:$LIBRARY_PATH"

Then install into local venv:

uv sync --all-extras
# temporarily, we have to use a special release of chatterbox library to allow for dependency resolution
uv pip install git+https://github.com/anthonywu/chatterbox.git@allow-dep-updates

FFmpeg Library Path (for Chatterbox on macOS)

Chatterbox uses TorchCodec which requires FFmpeg libraries at runtime. On macOS, set DYLD_LIBRARY_PATH before running gensay:

Homebrew:

export DYLD_LIBRARY_PATH="$(brew --prefix ffmpeg)/lib:$DYLD_LIBRARY_PATH"
gensay --provider chatterbox "Hello"

Nix:

# Find the ffmpeg-lib output in the Nix store
FFMPEG_LIB=$(nix-store -qR "$(which ffmpeg)" | grep 'ffmpeg.*-lib$')
export DYLD_LIBRARY_PATH="$FFMPEG_LIB/lib:$DYLD_LIBRARY_PATH"
gensay --provider chatterbox "Hello"

Note: DYLD_LIBRARY_PATH must be set before the Python process starts; it cannot be set from within Python.

Quick Start

# Basic usage - speaks the text
gensay "Hello, world!"

# Use specific voice
gensay -v Samantha "Hello from Samantha"

# Save to audio file
gensay -o greeting.m4a "Welcome to gensay"

# List available voices (two ways)
gensay -v '?'
gensay --list-voices

Command Line Usage

Basic Options

# Speak text
gensay "Hello, world!"

# Read from file
gensay -f document.txt

# Read from stdin
echo "Hello from pipe" | gensay -f -

# Specify voice
gensay -v Alex "Hello from Alex"

# Adjust speech rate (words per minute)
gensay -r 200 "Speaking faster"

# Save to file
gensay -o output.m4a "Save this speech"

# Specify audio format
gensay -o output.wav --format wav "Different format"

Provider Selection

# Use macOS native say command
gensay --provider macos "Using system TTS"

# List voices for specific provider
gensay --provider macos --list-voices
gensay --provider mock --list-voices

# Use mock provider for testing
gensay --provider mock "Testing without real TTS"

# Use Chatterbox explicitly
gensay --provider chatterbox "Local AI voice"

# Default provider depends on platform
gensay "Hello"  # Uses 'macos' on macOS, 'chatterbox' on other platforms

Advanced Options

# Show progress bar
gensay --progress "Long text with progress tracking"

# Pre-cache audio chunks in background
gensay --provider chatterbox --cache-ahead "Pre-process this text"

# Adjust chunk size
gensay --chunk-size 1000 "Process in larger chunks"

# Cache management
gensay --cache-stats     # Show cache statistics
gensay --clear-cache     # Clear all cached audio
gensay --no-cache "Text" # Disable cache for this run

Interactive Modes and Performance Optimization

REPL Mode

Start an interactive session where the provider is initialized once and reused for each prompt. This avoids the overhead of re-initializing the provider.

Tip: For Chatterbox and other local AI models, model loading from disk to memory is expensive (several seconds). Use --repl or --listen mode to load the model once and process many prompts without reloading.

# Start REPL mode (--repl, --interactive, and -i are all equivalent)
gensay --repl
gensay --interactive
gensay -i

# With a specific provider and voice
gensay --provider openai -v nova --repl

# Chatterbox with REPL (recommended - keeps model loaded)
gensay -p chatterbox -i

In REPL mode:

  • Type text and press Enter to speak it
  • Type exit or quit to exit
  • Press Ctrl+C or Ctrl+D to exit

Named Pipe (FIFO) Listener

Listen on a named pipe for text input, allowing other processes to send text to be spoken. Useful for integrating TTS into scripts or other applications.

Tip: Like REPL mode, --listen keeps the provider loaded between requests—ideal for Chatterbox and other local models where initialization is slow.

# Start listening on default pipe (/tmp/gensay.pipe)
gensay --listen

# Use a custom pipe path
gensay --listen /tmp/my-tts.pipe

# With a specific provider (Chatterbox benefits most from persistent mode)
gensay --provider chatterbox --listen
gensay --provider polly -v Joanna --listen

From another terminal or script, send text to the pipe:

echo "Hello from another process" > /tmp/gensay.pipe

The listener runs until interrupted with Ctrl+C. The named pipe is created automatically if it doesn't exist.

Python API

Basic Usage

from gensay import ChatterboxProvider, TTSConfig, AudioFormat

# Create provider
provider = ChatterboxProvider()

# Speak text
provider.speak("Hello from Python")

# Save to file
provider.save_to_file("Save this", "output.m4a")

# List voices
voices = provider.list_voices()
for voice in voices:
    print(f"{voice['id']}: {voice['name']}")

Advanced Configuration

from gensay import ChatterboxProvider, TTSConfig, AudioFormat

# Configure TTS
config = TTSConfig(
    voice="default",
    rate=150,
    format=AudioFormat.M4A,
    cache_enabled=True,
    extra={
        'show_progress': True,
        'chunk_size': 500
    }
)

# Create provider with config
provider = ChatterboxProvider(config)

# Add progress callback
def on_progress(progress: float, message: str):
    print(f"Progress: {progress:.0%} - {message}")

config.progress_callback = on_progress

# Use the configured provider
provider.speak("Text with all options configured")

Text Chunking

from gensay import chunk_text_for_tts, TextChunker

# Simple chunking
chunks = chunk_text_for_tts(long_text, max_chunk_size=500)

# Advanced chunking with custom strategy
chunker = TextChunker(
    max_chunk_size=1000,
    strategy="paragraph",  # or "sentence", "word", "character"
    overlap_size=50
)
chunks = chunker.chunk_text(document)

Provider Configurations

ElevenLabs

  1. Install the optional dependency (requires PortAudio):
    pip install 'gensay[elevenlabs]'
  2. Get an API key from ElevenLabs
  3. Set the environment variable:
    export ELEVENLABS_API_KEY="your-api-key"
# List ElevenLabs voices
gensay --provider elevenlabs --list-voices

# Use a specific ElevenLabs voice
gensay --provider elevenlabs -v Rachel "Hello from ElevenLabs"

# Save to file with high quality
gensay --provider elevenlabs -o speech.mp3 "High quality AI speech"

OpenAI TTS

  1. Get an API key from OpenAI Platform
  2. Set the environment variable:
    export OPENAI_API_KEY="sk-..."
# List OpenAI voices
gensay --provider openai --list-voices

# Use a specific voice (alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer)
gensay --provider openai -v nova "Hello from OpenAI"

# Save to file
gensay --provider openai -o speech.mp3 "OpenAI TTS output"

OpenAI offers two models via config.extra['model']:

  • tts-1 (default): Faster, lower latency
  • tts-1-hd: Higher quality audio

Amazon Polly

Option A - Environment variables:

  1. Sign in to AWS Console
  2. Go to IAMUsersCreate user
  3. Attach the AmazonPollyReadOnlyAccess policy
  4. Create access keys under Security credentialsAccess keys
  5. Configure credentials (choose one method):
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_DEFAULT_REGION="us-west-2"

Option B - AWS CLI v2:

This easy lets you sign in through the AWS Command Line Interface

export AWS_DEFAULT_REGION=us-west-2
# on your desktop with a browser
aws login --region
# in an env without a browser
aws login --region --remote
# List Polly voices (60+ voices in many languages)
gensay --provider polly --list-voices

# Use a specific voice
gensay --provider polly -v Joanna "Hello from Amazon Polly"

# Save to file
gensay --provider polly -o speech.mp3 "Polly TTS output"

Polly supports multiple engines via config.extra['engine']:

  • neural (default): Higher quality, natural-sounding
  • standard: Lower cost, available for all voices

Advanced Features

Caching System

The caching system automatically stores generated audio to speed up repeated synthesis:

from gensay import TTSCache

# Create cache instance
cache = TTSCache(
    enabled=True,
    max_size_mb=10000,
    max_items=1000
)

# Get cache statistics
stats = cache.get_stats()
print(f"Cache size: {stats['size_mb']:.2f} MB")
print(f"Cached items: {stats['items']}")

# Clear cache
cache.clear()

Cache Location

Cache files are stored in platform-specific user cache directories:

  • macOS: ~/Library/Caches/gensay
  • Linux: ~/.cache/gensay
  • Windows: %LOCALAPPDATA%\gensay\gensay\Cache

Managing Cache

# Show cache statistics
gensay --cache-stats

# Clear all cached audio
gensay --clear-cache

# Disable caching for a specific command
gensay --no-cache "Text to synthesize without caching"

Manual Deletion

To manually delete the cache, remove the cache directory:

# macOS/Linux
rm -rf ~/Library/Caches/gensay  # macOS
rm -rf ~/.cache/gensay          # Linux

# Windows (PowerShell)
Remove-Item -Recurse -Force $env:LOCALAPPDATA\gensay\gensay\Cache

Creating Custom Providers

from gensay.providers import TTSProvider, TTSConfig, AudioFormat
from typing import Optional, Union, Any
from pathlib import Path

class MyCustomProvider(TTSProvider):
    def speak(self, text: str, voice: Optional[str] = None,
              rate: Optional[int] = None) -> None:
        # Your implementation
        self.update_progress(0.5, "Halfway done")
        # ... generate and play audio ...
        self.update_progress(1.0, "Complete")

    def save_to_file(self, text: str, output_path: Union[str, Path],
                     voice: Optional[str] = None, rate: Optional[int] = None,
                     format: Optional[AudioFormat] = None) -> Path:
        # Your implementation
        return Path(output_path)

    def list_voices(self) -> list[dict[str, Any]]:
        return [
            {'id': 'voice1', 'name': 'Voice One', 'language': 'en-US'}
        ]

    def get_supported_formats(self) -> list[AudioFormat]:
        return [AudioFormat.WAV, AudioFormat.MP3]

Async Support

All providers support async operations:

import asyncio
from gensay import ChatterboxProvider

async def main():
    provider = ChatterboxProvider()

    # Async speak
    await provider.speak_async("Async speech")

    # Async save
    await provider.save_to_file_async("Async save", "output.m4a")

asyncio.run(main())

Development

This project uses just for common development tasks. First, install just:

# macOS (using Nix which you already have)
nix-env -iA nixpkgs.just

# Or using Homebrew
brew install just

# Or using cargo
cargo install just

Getting Started

# Setup development environment
just setup

# Run tests
just test

# Run all quality checks
just check

# See all available commands
just

Common Development Commands

Testing

# Run all tests
just test

# Run tests with coverage
just test-cov

# Run specific test
just test-specific tests/test_providers.py::test_mock_provider_speak

# Quick test (mock provider only)
just quick-test

Code Quality

# Run linter
just lint

# Auto-fix linting issues
just lint-fix

# Format code
just format

# Type checking
just typecheck

# Run all checks (lint, format, typecheck)
just check

# Pre-commit checks (format, lint, test)
just pre-commit

Running the CLI

# Run with mock provider
just run-mock "Hello, world!"
just run-mock -v '?'

# Run with macOS provider
just run-macos "Hello from macOS"

# Cache management
just cache-stats
just cache-clear

Development Utilities

# Run example script
just demo

# Clean build artifacts
just clean

# Build package
just build

Manual Setup (without just)

If you prefer not to use just, here are the equivalent commands:

# Setup
uv venv
uv pip install -e ".[dev]"

# Testing
uv run pytest -v
uv run pytest --cov=gensay --cov-report=term-missing

# Linting and formatting
uv run ruff check src tests
uv run ruff format src tests

# Type checking
uvx ty check src

Project Structure

gensay/
├── src/gensay/
│   ├── __init__.py
│   ├── main.py              # CLI entry point
│   ├── providers/           # TTS provider implementations
│   │   ├── base.py         # Abstract base provider
│   │   ├── chatterbox.py   # Chatterbox provider
│   │   ├── macos_say.py    # macOS say wrapper
│   │   └── ...            # Other providers
│   ├── cache.py            # Caching system
│   └── text_chunker.py     # Text chunking logic
├── tests/                  # Test suite
├── examples/               # Example scripts
├── justfile                # Development commands
└── README.md

Code Style Guide

  • Python 3.11+ with type hints
  • Follow PEP8 and Google Python Style Guide
  • Use ruff for linting and formatting
  • Keep docstrings concise but informative
  • Prefer pathlib.Path over os.path
  • Use pytest for testing

License

gensay is distributed under the terms of the MIT license.

About

multi-provider text-to-speech (TTS) tool that extends the Apple macOS /usr/bin/say interface

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •