Skip to content

VoxCPM2 Chirp/Click Artifact & Voice Consistency in One-Shot Cloning #272

@dcox761

Description

@dcox761

Thank you for all your hard work on VoxCPM and the recent VoxCPM2 release. So far I am very impressed with the results for a project I am working on. Using with nanovllm-voxcpm on an AWS EC2 g6.xlarge (NVidia L4 24GB) I am achieving streaming responses within 250ms and LoRA training completes 2000 steps in approx. 15min.

Are you able to provide some guidance for a small but annoying issue?

One-shot voice cloning on the base VoxCPM2 model produces a chirp/click at the start of every generated audio segment - this appears to be the tail of the reference audio leaking through the DiT's prefix_feat_cond conditioning. For example if my reference audio ends with "enemy" the generated audio appears to start with "emy".

While I have attempted changes through nanovllm-voxcpm as described below, I can reproduce this issue easily with the default VoxCPM LoRA WebUI.

I do have a mix of audio segments that appear to work fine with these changes for some voices but am unable to work out the main factor could be causing this.

Things attempted:

  1. Blank audio padding on reference clip - Did not help; last audio patch still used as DiT conditioning, silence padding made it worse
  2. Reference mode (send ref_audio_latents_base64 only) - Eliminated chirp but significantly degraded voice quality; reverted
  3. Zero prefix_feat_cond patch (to nanovllm-voxcpm) - Zeroed the DiT conditioning during prefill to match training behaviour; did not completely resolve
  4. Dual mode - ref_audio + prompt latents together - Send both for maximum voice conditioning; did not completely resolve
  5. PCM chirp trim - 100ms (HACK) - Pipeline-level trim of first ~2 patches when cloning is active; addresses residual LM continuation artifact but still does not work completely
  6. Transcript mismatch - Obviously extra words in the reference text cause problems but even when reviewed carefully the Chirp/Click exists

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions