Skip to content

Drawing PRC curves

Jochen Weile edited this page Jun 5, 2023 · 1 revision

Drawing PRC curves for VE maps

Precision-Recall Curves (PRC) are a way to describe the performance of a quantitative predictor (such as a VE map or a computational tool such as VARITY) against a gold standard (such as a set of known pathogenic and benign variants). In a PRC, we plot the trade-off of a predictor's precision (i.e. the fraction of positives that are correct) at a given score threshold against its recall (i.e. sensitivity, the fraction of pathogenic variants that were flagged as positive) at the same threshold.

To condense the predictors performance into a single number, we can look at the total area under the curve (called AUPRC) or the recall at whichever threshold brings the the precision to at least 90% (R90P).

Installing yogiroc

Even though tileseqMave comes with an integrated tool for drawing PRCs, it requires the installation of an additional library.

#Open an interactive R session
R
#install the yogiroc
remotes::install_github("jweile/yogiroc")
#close R
q()

Generating a set of reference variants

You can use tileseqMave to automatically generate a reference set from ClinVar and gnomAD controls for you. For example, to generate a reference set for BRCA1 with a minimum gnomAD allele frequency cutoff of $10^-5$ and a minimum of 1 Star quality on ClinVar:

tsm referenceSets BRCA1 --mafMin 1e-5 --starsMin 1 

The output will automatically be written to a file called BRCA1_refVars.csv.

If the target gene is associated with multiple different traits on ClinVar, the script will show a warning message about this. To filter down to a specific trait, you can use the --trait option, which also accepts regex values in case the same trait is represented by different names.

Drawing the PRC curve

After the reference set has been generated you draw the PRC curve. The map(s) and predictors you want to plot will need to be provided in MaveDB format.

For example:

tsm drawPRC BRCA1_VEmap.csv BRCA1_refVars.csv --predictors BRCA1_PROVEAN_predictions.csv --predictorNames PROVEAN --predictorOrders d

Full usage information:

usage: tsm drawPRC [--] [--help] [--labelScores] 
       [--predictors PREDICTORS] [--predictorNames PREDICTORNAMES]
       [--predictorOrders PREDICTORORDERS] [--outfile OUTFILE]
       [--posRanges POSRANGES] [--logfile LOGFILE] map reference

Draw a PRC curve against a reference set

positional arguments:
  map                VE map file in MaveDB format
  reference          reference variant set (from referenceSets.R

flags:
  -h, --help         show this help message and exit
  -l, --labelScores  Draw score labels along plot

optional arguments:
  -p, --predictors   comma-separated list of files with other
                     predictors (in MaveDB format)
  --predictorNames   comma-separated list of names for the above
                     predictors
  --predictorOrders  comma-separated list letters 'a' or 'd' for 
                     whether predictor scores are (a)scending
                     towards pathogenicity or (d)escending towards
                     pathogenicity.
  -o, --outfile      The desired prefix for the output file name.
  --posRanges        Positional ranges within the map to be plotted
                     separately. E.g '1-24,25-66,67-
  --logfile          The desired log file location. [default: prc.log]
Clone this wiki locally