pitch-shift

Canonical pitch-shifting algorithms in functional JavaScript.
Frequency-domain: vocoder, phaseLock, transient, formant, sms, hpss.
Time-domain: ola, wsola, psola, granular.
Consistent unified API: batch, stream, multi-channel. Part of the audiojs ecosystem.

Install

npm install pitch-shift

Usage

import transient from 'pitch-shift/transient.js'

// Batch
let pitched = transient(audio, { semitones: 5 })

// Stream
let write = transient({ ratio: 1.5 })
let output = write(inputBlock)
let tail = write()  // flush

// Stereo
let [L, R] = transient([left, right], { ratio: 1.5 })

Algorithms

	Domain	Best for	shift
pitchShift	auto	content-aware default	1.781
transient	STFT	music with percussion ★	1.781
phaseLock	STFT	general music	1.775
vocoder	STFT	simple tonal	1.491
formant	STFT	voice (no chipmunk)	1.593
hpss	STFT	mixed music (drums+tonal)	1.464
sms	sinusoidal	harmonic/tonal	1.761
paulstretch	STFT	ambient, extreme shifts	2.339
wsola	time	speech, low-latency	1.672
psola	time	speech, mono voice	1.767
ola	time	baseline	2.050
granular	time	creative textures	1.905
sample	time	sampler/tracker playback	1.655
hybrid	hybrid	mixed dynamic material	1.925

Frequency-domain algorithms shift bins natively; time-domain algorithms use their namesake stretcher from time-stretch + sinc resample. shift = log-magnitude distance to canonical reference (lower is better). Run npm run quality for all metrics.

All algorithms accept ratio (1.5 = +7 semitones, 2 = octave), semitones, frameSize (2048), hopSize (frameSize/4).

`pitchShift`

Content-aware auto-selector. Picks: voice/speech → psola, tonal → sms, else → transient.

import pitchShift from 'pitch-shift'

pitchShift(audio, { semitones: 5 })
pitchShift(audio, { ratio: 1.5, content: 'voice' })
pitchShift(audio, { ratio: 2, method: 'formant' })

Param	Default
`content`	`music`	`music`, `voice`/`speech`, `tonal`
`method`	auto	Force a specific algorithm by name
`formant`	`false`	Wrap in formant preservation

Frequency domain

`transient`

Peak-locked phase vocoder with spectral-flux transient detection. On transient frames, synthesis phase resets to analysis phase, preserving attacks. Between transients, behaves like phaseLock.

import transient from 'pitch-shift/transient.js'

transient(audio, { ratio: 1.5 })
transient(audio, { semitones: 5, transientThreshold: 2.0 })

Param	Default
`transientThreshold`	`1.5`	z-score over log-flux EMA (higher = fewer resets)

Preserves phase coherence, partial structure, attack localization on detected transients.
Destroys formants; misses quiet transients at too-high threshold.

f0 err	THD%	alias	attack corr	formant dist	phase coh	shift
0.00	0.0	0.000	0.988	1.619	0.991	1.781

Formant dist 1.619 because bin-shift moves the spectral envelope with the partials — use formant to preserve it.

Use when: Music with drums — the default choice.
Not for: Voice where formant preservation matters.

`phaseLock`

Laroche-Dolson peak-locked phase vocoder. Peaks scatter to shifted bins; non-peak bins lock their phase relative to the nearest peak, keeping the vertical phase relationship inside each sinusoidal lobe intact.

import phaseLock from 'pitch-shift/phase-lock.js'

phaseLock(audio, { ratio: 1.5 })

Preserves phase coherence around peaks, partial structure.
Destroys transients (still smeared, less than vocoder), formants.

f0 err	THD%	alias	attack corr	formant dist	phase coh	shift
0.00	0.0	0.000	0.988	1.623	0.991	1.775

Nearly identical to transient on non-percussive material. The 0.006 shift gap is the transient reset cost on synthetic fixtures that have no transients.

Use when: General music — the "try this first" phase vocoder.
Not for: Music with drums (use transient), voice (use formant).

`vocoder`

SMB/Bernsee bin-shift. Computes true instantaneous frequency per bin from consecutive-frame phase advance, scatters peaks to shifted bins, accumulates synthesis phase at the shifted frequency.

import vocoder from 'pitch-shift/vocoder.js'

vocoder(audio, { ratio: 1.5 })

Preserves dominant-partial pitch, long-horizon phase per bin.
Destroys transients, vertical phase coherence ("phasiness"), formants.

f0 err	THD%	alias	attack corr	formant dist	phase coh	shift
0.00	0.0	0.000	0.983	1.343	0.922	1.491

Phase coh 0.922 from independent per-bin phase accumulation — no inter-bin locking. Lower shift score than phaseLock because the simpler scatter avoids peak-assignment artifacts on pure tones.

Use when: Simple tonal material, educational baseline.
Not for: Music with percussion, voice.

`formant`

Cepstral envelope preservation wrapping a peak-locked vocoder. Extracts spectral envelope via cepstral liftering from temporally-smoothed magnitude, flattens the spectrum, applies peak-locked pitch shift on the flat residual, re-imposes the original envelope.

import formant from 'pitch-shift/formant.js'

formant(audio, { semitones: 5 })
formant(audio, { ratio: 0.75, envelopeWidth: 16 })

Param	Default
`envelopeWidth`	`max(8, N/64)`	Cepstrum lifter cutoff (quefrency bins)

Preserves formant envelope (absolute Hz), vocal-tract character.
Destroys transients (same as vocoder); risks cepstral ringing on sparse spectra.

f0 err	THD%	alias	attack corr	formant dist	phase coh	shift
0.00	0.0	0.000	0.988	0.921	0.980	1.593

Best formant dist (0.921) by construction — the envelope is explicitly separated and re-applied. Slightly worse shift score than vocoder because the lifter→flatten→re-impose chain introduces spectral rounding.

Use when: Voice shifting without chipmunk / giant artifact.
Not for: Percussion-heavy material (transients smear).

`hpss`

Fitzgerald median-filter harmonic/percussive separation. Time-axis and frequency-axis medians produce soft Wiener masks splitting the spectrogram. Harmonic component is vocoder-shifted; percussive component passes through with original phase.

import hpss from 'pitch-shift/hpss.js'

hpss(audio, { ratio: 1.5 })
hpss(audio, { ratio: 1.5, hpssTimeWidth: 31, hpssFreqWidth: 31 })

Param	Default
`hpssTimeWidth`	`17`	Median window width (frames)
`hpssFreqWidth`	`17`	Median window width (bins)
`hpssPower`	`2`	Soft-mask exponent

Preserves percussive onset locations (unshifted) and harmonic pitch (shifted).
Destroys signal quality at ambiguous mask boundaries (leakage in both directions).

f0 err	THD%	alias	attack corr	formant dist	phase coh	shift
0.00	0.0	0.052	0.996	1.267	0.922	1.464

Best overall shift score — keeping percussion unshifted sidesteps most artifacts. Alias 0.052 from residual harmonic energy leaking through the percussive mask.

Use when: Mixed music where drums should stay stationary while melody shifts.
Not for: Solo tonal material (unnecessary separation overhead).

`sms`

Spectral Modeling Synthesis. Parabolic-interpolated peak picking builds sinusoidal tracks (freq, mag, phase); each peak's lobe is copied intact to round(f·ratio). Stochastic residual shifts to ratio-scaled bins with analysis phase.

import sms from 'pitch-shift/sms.js'

sms(audio, { ratio: 2 })
sms(audio, { ratio: 1.5, maxTracks: 40 })

Param	Default
`maxTracks`	`Infinity`	Max simultaneous sinusoidal tracks
`minMag`	`1e-4`	Peak detection threshold (linear)

Preserves formant envelope (lobes scale freely with peaks), harmonic structure, tonal clarity.
Destroys transients, noise-like textures (absorbed into residual), polyphony beyond maxTracks.

f0 err	THD%	alias	attack corr	formant dist	phase coh	shift
0.00	0.0	0.002	0.953	2.028	0.922	1.761

Lower attack corr (0.953) because sinusoidal modeling smooths onset transients into the residual. Formant dist 2.028 despite natural lobe scaling — the residual component carries unshifted energy.

Use when: Sustained tonal / harmonic instruments, vowels.
Not for: Percussion, noise-heavy material.

`paulstretch`

Large-frame (16k) phase randomization. Magnitudes pulled from source bins at k/ratio; phases drawn uniformly from [0, 2π) every frame. Destroys temporal structure by design.

import paulstretch from 'pitch-shift/paulstretch.js'

paulstretch(audio, { ratio: 1.5 })

Preserves long-term magnitude-spectrum statistics.
Destroys phase, transients, rhythm — by design.

f0 err	THD%	alias	attack corr	formant dist	phase coh	shift
0.00	0.3	0.232	0.954	7.113	—	2.339

Worst shift score (2.339) and formant dist (7.113) because random phases smear spectral energy across the frame — the smear is the aesthetic. Stream-vs-batch decorrelates (—) because random phase is non-deterministic.

Use when: Ambient/drone textures, extreme shift ratios.
Not for: Anything requiring temporal precision.

Time domain

`wsola`

WSOLA time-stretch + sinc resample. Searches each grain position ±tolerance samples for maximum cross-correlation with the previous grain's tail, eliminating phase cancellation before resampling to the target pitch.

import wsola from 'pitch-shift/wsola.js'

wsola(audio, { ratio: 0.85 })
wsola(audio, { ratio: 1.5, tolerance: 512 })

Param	Default
`tolerance`	`frameSize/4`	Similarity search radius (±samples)

Preserves local waveform shape, attack envelopes.
Destroys formants (shifted by resample), phase coherence across long spans.

f0 err	THD%	alias	attack corr	formant dist	phase coh	shift
1.00	0.2	0.005	0.995	2.345	0.866	1.672

f0 err 1.00 Hz from sinc resample quantization (time-domain algorithms round the stretch ratio to grain boundaries). Best attack corr (0.995) — the similarity search preserves waveform continuity.

Use when: Speech, low-latency, anywhere the phase vocoder's frame latency is unacceptable.
Not for: Polyphonic music with sustained tones.

`psola`

PSOLA time-stretch + sinc resample. Autocorrelation detects pitch periods; two-period Hann grains are placed at pitch-synchronous intervals, preserving formants in the stretch stage.

import psola from 'pitch-shift/psola.js'

psola(audio, { ratio: 0.75, sampleRate: 48000 })
psola(audio, { ratio: 1.5, minFreq: 100, maxFreq: 400 })

Param	Default
`sampleRate`	`44100`	For pitch detection range
`minFreq`	`70`	Lowest expected pitch (Hz)
`maxFreq`	`600`	Highest expected pitch (Hz)

Preserves waveform-per-period shape, formants, voiced-speech naturalness.
Destroys polyphony (assumes single pitch contour), unvoiced regions (pitch-mark jitter).

f0 err	THD%	alias	attack corr	formant dist	phase coh	shift
0.66	0.2	0.005	0.941	2.340	0.998	1.767

Best phase coherence (0.998) — pitch-synchronous grains align perfectly with the waveform period. Lower attack corr (0.941) from pitch-mark jitter on non-periodic onsets.

Use when: Monophonic speech, solo voice, single melodic instrument.
Not for: Polyphonic material, chords.

`ola`

Plain OLA time-stretch + sinc resample. Overlap-add without similarity search — the baseline the others improve on.

import ola from 'pitch-shift/ola.js'

ola(audio, { ratio: 1.5 })

Preserves amplitude envelope.
Destroys pitch accuracy, formants, transients, phase coherence.

f0 err	THD%	alias	attack corr	formant dist	phase coh	shift
39.59	0.1	0.005	0.977	2.360	0.992	2.050

f0 err 39.59 Hz — worst by far. Without similarity search, grains land at arbitrary phase offsets causing destructive interference that shifts the perceived pitch. Onset err 0.388 for the same reason.

Use when: Reference baseline, or the simplest possible shift for comparison.
Not for: Anything quality-sensitive.

`granular`

Small-grain (1024) WSOLA time-stretch + sinc resample. Grain-rate artifacts are intentionally prominent — the texture is the point.

import granular from 'pitch-shift/granular.js'

granular(audio, { ratio: 1.3 })

Preserves grain-local timbre, characteristic textural quality.
Destroys pitch accuracy on complex tones, smooth envelopes.

f0 err	THD%	alias	attack corr	formant dist	phase coh	shift
0.95	0.2	0.005	0.995	2.796	0.945	1.905

Worst formant dist among time-domain algorithms (2.796) because the small grains create audible spectral ripples.

Use when: Creative/textural effects where grain character is desired.
Not for: Transparent pitch shifting.

`sample`

Playback-rate pitch shift. Hann-windowed sinc interpolation at a fractional read-head stepped by ratio per output sample. No time preservation — higher pitch = shorter clip.

import sample from 'pitch-shift/sample.js'

sample(instrumentBuffer, { semitones: 7 })
sample(audio, { ratio: 2, sincRadius: 16 })

Param	Default
`sincRadius`	`8`	Windowed-sinc half-width (samples)

Preserves waveform identity (literally the same audio, faster/slower), formants — everything scales together.
Destroys time: output duration = input_length / ratio, zero-padded to match API.

f0 err	THD%	alias	attack corr	formant dist	phase coh	shift
2.50	0.1	0.007	0.951	2.245	0.170	1.655

Phase coh 0.170 because the modulation rate itself shifts with the pitch (a 5 Hz tremolo becomes 7.5 Hz at ratio 1.5). This is correct behavior for a sampler — not an artifact.

Use when: Instrument one-shots, ROM-sample playback, tracker-style.
Not for: Time-preserving pitch shift.

`hybrid`

Runs phaseLock and wsola in parallel, crossfades sample-by-sample by spectral-flux transient confidence. Tonal regions resolve via the phase vocoder; attacks resolve via WSOLA similarity search.

import hybrid from 'pitch-shift/hybrid.js'

hybrid(audio, { ratio: 1.5 })
hybrid(audio, { ratio: 1.5, hybridThreshold: 0.6 })

Param	Default
`hybridThreshold`	`0.8`	Spectral-flux z-score for full WSOLA blend

Preserves tonal phase coherence + attack shape — simultaneously.
Destroys CPU budget (≈2×), formants.

f0 err	THD%	alias	attack corr	formant dist	phase coh	shift
0.00	0.0	0.000	0.988	2.538	0.879	1.925

Phase coh 0.879 from crossfade blending — the detector's confidence curve creates micro-transitions between two engines with different phase trajectories. Worst on synthetic fixtures that have no transients to trigger the WSOLA path.

Use when: Mixed dynamic material where a single domain compromises the other.
Not for: Pure tonal (just use phaseLock) or pure percussive (just use transient).

Full quality table

Algorithm	f0 err	THD%	alias	stream corr	cent err	onset err	attack corr	formant dist	phase coh	shift
`hpss`	0.00	0.0	0.052	1.000	0.007	0.000	0.996	1.267	0.922	1.464
`vocoder`	0.00	0.0	0.000	1.000	0.006	0.000	0.983	1.343	0.922	1.491
`formant`	0.00	0.0	0.000	1.000	0.061	0.000	0.988	0.921	0.980	1.593
`sample`	2.50	0.1	0.007	1.000	0.003	0.000	0.951	2.245	0.170	1.655
`wsola`	1.00	0.2	0.005	1.000	0.003	0.000	0.995	2.345	0.866	1.672
`sms`	0.00	0.0	0.002	1.000	0.001	0.000	0.953	2.028	0.922	1.761
`psola`	0.66	0.2	0.005	1.000	0.003	0.000	0.941	2.340	0.998	1.767
`phaseLock`	0.00	0.0	0.000	1.000	0.012	0.000	0.988	1.623	0.991	1.775
`pitchShift`	0.00	0.0	0.000	1.000	0.012	0.000	0.988	1.619	0.991	1.781
`transient`	0.00	0.0	0.000	1.000	0.012	0.000	0.988	1.619	0.991	1.781
`granular`	0.95	0.2	0.005	1.000	0.019	0.000	0.995	2.796	0.945	1.905
`hybrid`	0.00	0.0	0.000	1.000	0.004	0.000	0.988	2.538	0.879	1.925
`ola`	39.59	0.1	0.005	1.000	0.042	0.388	0.977	2.360	0.992	2.050
`paulstretch`	0.00	0.3	0.232	—	0.005	0.000	0.954	7.113	—	2.339

Column definitions

f0 err (Hz) — pitch accuracy shifting 440→660 Hz sine.
THD% — harmonic distortion on shifted pure sine.
alias — energy above Nyquist when shifting 14 kHz ×2.
stream corr — streaming vs batch correlation. — = decorrelates by design.
cent err — spectral centroid ratio error on a 3-partial chord.
onset err — impulse-train period error after shift.
attack corr — plucked-string attack envelope correlation.
formant dist — cepstral envelope distance on synthetic vowel. Lower = formants preserved.
phase coh — AM-envelope coherence on 5 Hz tremolo. — for paulstretch (non-deterministic).
shift — log-magnitude distance to canonical shifted reference, averaged over four fixtures. Bold = leader.

Variable pitch

Frequency-domain algorithms + sample accept time-varying ratio — a function (t) => ratio or Float32Array. Time-domain algorithms (ola, wsola, psola, granular, hybrid) apply a single global ratio.

// Vibrato: ±10% at 5 Hz
let vibrato = phaseLock(audio, {
  ratio: (t) => 1 + 0.1 * Math.sin(2 * Math.PI * 5 * t),
  sampleRate: 44100,
})

Pitch correction

Combine with a pitch detector: detect per-frame f0, snap to target scale, pass as ratio function. Use formant for natural voice, phaseLock for hard-tune effect, sms for harmonic instruments.

import { yin } from 'pitch-detection'
import { formant } from 'pitch-shift'

let hop = 512, sr = 44100
let pitchFrames = []
for (let i = 0; i + 2048 <= audio.length; i += hop) {
  let r = yin(audio.subarray(i, i + 2048), { fs: sr })
  pitchFrames.push(r ? { freq: r.freq, clarity: r.clarity } : null)
}

let scale = [261.63, 293.66, 329.63, 349.23, 392.00, 440.00, 493.88]
let snap = (f) => scale.reduce((a, b) =>
  Math.abs(Math.log2(b / f)) < Math.abs(Math.log2(a / f)) ? b : a
)

let corrected = formant(audio, {
  ratio: (t) => {
    let p = pitchFrames[Math.min(Math.round(t * sr / hop), pitchFrames.length - 1)]
    return (!p || p.clarity < 0.5) ? 1 : snap(p.freq) / p.freq
  },
  sampleRate: sr,
})

Quality Tools

npm test          # correctness
npm run quality   # measured metrics
npm run bench     # performance

Dependencies

time-stretch — Time-domain stretchers (WSOLA, PSOLA)
fourier-transform — FFT
window-function — Hann windowing

Migration from v0.0.0

Previously held by mikolalysenko/pitch-shift (2013, v0.0.0) — a single WSOLA/TD-PSOLA implementation. Available here as wsola or psola with batch, streaming, and multi-channel support.

// v0.0.0 (old)
var shifter = require('pitch-shift')(onData, t => ratio, { frameSize: 2048 })
shifter.feed(float32Array)

// v1 (this package)
import { wsola } from 'pitch-shift'
let write = wsola({ ratio })
let out = write(float32Array)
let tail = write()  // flush

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
packages		packages
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
demo.html		demo.html
index.d.ts		index.d.ts
index.js		index.js
license.md		license.md
package-lock.json		package-lock.json
package.json		package.json
pitch-shift.js		pitch-shift.js
test.js		test.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pitch-shift

Install

Usage

Algorithms

`pitchShift`

Frequency domain

`transient`

`phaseLock`

`vocoder`

`formant`

`hpss`

`sms`

`paulstretch`

Time domain

`wsola`

`psola`

`ola`

`granular`

`sample`

`hybrid`

Variable pitch

Pitch correction

Quality Tools

Dependencies

Migration from v0.0.0

Related

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pitch-shift

Install

Usage

Algorithms

pitchShift

Frequency domain

transient

phaseLock

vocoder

formant

hpss

sms

paulstretch

Time domain

wsola

psola

ola

granular

sample

hybrid

Variable pitch

Pitch correction

Quality Tools

Dependencies

Migration from v0.0.0

Related

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`pitchShift`

`transient`

`phaseLock`

`vocoder`

`formant`

`hpss`

`sms`

`paulstretch`

`wsola`

`psola`

`ola`

`granular`

`sample`

`hybrid`

Packages