This repository contains code for manipulating perceptual voice quality (PVQ) features.
git clone https://github.com/fgnt/pvq_manipulation.git
cd pvq_manipulation
pip install -e .
gh release download v1.0.0 --repo fgnt/pvq_manipulation --dir ./saved_modelsTo get started, follow the Example_Notebook.ipynb. It demonstrates how to load the model, prepare an audio file, and apply perceptual voice quality manipulations step by step.
The manipulation method of manipulation Perceptual Voice Qualities was introduced in the paper "Speech synthesis along perceptual voice quality dimensions"
@inproceedings{rautenberg2025speech,
title={Speech synthesis along perceptual voice quality dimensions},
author={Rautenberg, Frederik and Kuhlmann, Michael and Seebauer, Fritz and Wiechmann, Jana and Wagner, Petra and Haeb-Umbach, Reinhold},
booktitle={ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1--5},
year={2025},
organization={IEEE}
}The approach for controlling creakiness intensity was presented in the paper "Synthesizing Speech with Selected Perceptual Voice Qualities--A Case Study with Creaky Voice"
@inproceedings{rautenberg25_interspeech,
title = {{Synthesizing Speech with Selected Perceptual Voice Qualities – A Case Study with Creaky Voice}},
author = {Frederik Rautenberg and Fritz Seebauer and Jana Wiechmann and Michael Kuhlmann and Petra Wagner and Reinhold Haeb-Umbach},
year = {2025},
booktitle = {{Interspeech 2025}},
pages = {1633--1637},
doi = {10.21437/Interspeech.2025-1443},
issn = {2958-1796},
}