The PacBio PureTarget Carrier Pipeline is a WDL-based workflow designed to genotype tandem repeat regions and homologous genes using PacBio PureTarget data. It orchestrates several established PacBio tools in a containerized environment to deliver reproducible, per-sample and multi-sample results.
Current version: 3.2.1 (released 2026-02-25).
For a complete changelog, see the changelog or the git history.
For frequently asked questions, please refer to the FAQ section. For questions about running PTCP on DNAnexus, please see the PTCP on DNAnexus page.
If you have questions or issues running PTCP on your local system, you can contact PacBio support at support@pacb.com. Please ensure PureTarget Carrier Pipeline is in the subject line.
- Workflow overview
- Running PTCP
- 2.1. PTCP on HPC
- 2.2. PTCP on DNAnexus
- Inputs and configuration
- Workflow outputs
- DISCLAIMER
![]() |
|---|
| PTCP workflow overview: The pipeline processes genomic data through modules running in parallel per sample, followed by aggregate quality control and correction steps. |
Upon invocation, PTCP processes each sample independently:
- Alignment (pbmm2): Aligns HiFi (and optional fail) reads to the reference genome.
- Tandem repeat genotyping (TRGT): Generates per-sample VCFs containing genotypes for all targeted regions, spanning BAMs of reads used for genotyping, per-locus plots (motif and waterfall), and extracts reads (including optional fail reads) overlapping the specified tandem repeat loci.
- Gene phasing & analysis (Paraphase): Reads are phased within configured gene families; copy number is estimated, (small) variants are called for each haplotype, and optionally annotated with known variants.
- Structural variation calling (Sawfish): Aligned reads for configured genes are extracted, realigned, (large) structural variations are called, and reported in per-sample VCFs.
- QC reporting (ptcp-qc): Aggregates coverage, mapping quality, and genotyping metrics into both sample-level and cohort-level JSON reports for comprehensive quality control.
PTCP is available to run locally on a high-performance computing (HPC) system or in the cloud with DNAnexus. Details for both options are provided below:
Running PTCP on an HPC system is possible with some configuration. For instructions on setting up the pipeline on your HPC system, please see the PTCP on HPC page.
PTCP has been integrated into the DNAnexus platform. For instructions on getting set up there and running the pipeline, please see the PTCP DNAnexus page.
PTCP requires six primary input types, all specified in a JSON file:
- PacBio sequencing data (
.bam) - Sample sheet (
.csv) - Reference genome (
.faand.fai) - Regions and annotations (
.bedand.vcf) - Configuration file (
.yaml) - PTCP dependencies image (a local
.sifor Docker image)
In practice, only the sequencing data and sample sheet typically change between runs. The other inputs (3–6) are typically set up once when installing the pipeline and remain the same for future runs.
Details on each of these input types and how to generate them can be found on the Input Files page.
PTCP generates many output files per sample in the selected output folder. More details on the output files and their formats can be found on the Output Files page.
THIS WEBSITE AND CONTENT AND ALL SITE-RELATED SERVICES, INCLUDING ANY DATA, ARE PROVIDED "AS IS," WITH ALL FAULTS, WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. YOU ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THIS SITE, ALL SITE-RELATED SERVICES, AND ANY THIRD PARTY WEBSITES OR APPLICATIONS. NO ORAL OR WRITTEN INFORMATION OR ADVICE SHALL CREATE A WARRANTY OF ANY KIND. ANY REFERENCES TO SPECIFIC PRODUCTS OR SERVICES ON THE WEBSITES DO NOT CONSTITUTE OR IMPLY A RECOMMENDATION OR ENDORSEMENT BY PACIFIC BIOSCIENCES.
