-
Notifications
You must be signed in to change notification settings - Fork 0
Drawing PRC curves
Precision-Recall Curves (PRC) are a way to describe the performance of a quantitative predictor (such as a VE map or a computational tool such as VARITY) against a gold standard (such as a set of known pathogenic and benign variants). In a PRC, we plot the trade-off of a predictor's precision (i.e. the fraction of positives that are correct) at a given score threshold against its recall (i.e. sensitivity, the fraction of pathogenic variants that were flagged as positive) at the same threshold.
To condense the predictors performance into a single number, we can look at the total area under the curve (called AUPRC) or the recall at whichever threshold brings the the precision to at least 90% (R90P).
Even though tileseqMave
comes with an integrated tool for drawing PRCs, it requires the installation of an additional library.
#Open an interactive R session
R
#install the yogiroc
remotes::install_github("jweile/yogiroc")
#close R
q()
You can use tileseqMave
to automatically generate a reference set from ClinVar and gnomAD controls for you. For example, to generate a reference set for BRCA1 with a minimum gnomAD allele frequency cutoff of
tsm referenceSets BRCA1 --mafMin 1e-5 --starsMin 1
The output will automatically be written to a file called BRCA1_refVars.csv
.
If the target gene is associated with multiple different traits on ClinVar, the script will show a warning message about this. To filter down to a specific trait, you can use the --trait
option, which also accepts regex values in case the same trait is represented by different names.
After the reference set has been generated you draw the PRC curve. The map(s) and predictors you want to plot will need to be provided in MaveDB format.
For example:
tsm drawPRC BRCA1_VEmap.csv BRCA1_refVars.csv --predictors BRCA1_PROVEAN_predictions.csv --predictorNames PROVEAN --predictorOrders d
Full usage information:
usage: tsm drawPRC [--] [--help] [--labelScores]
[--predictors PREDICTORS] [--predictorNames PREDICTORNAMES]
[--predictorOrders PREDICTORORDERS] [--outfile OUTFILE]
[--posRanges POSRANGES] [--logfile LOGFILE] map reference
Draw a PRC curve against a reference set
positional arguments:
map VE map file in MaveDB format
reference reference variant set (from referenceSets.R
flags:
-h, --help show this help message and exit
-l, --labelScores Draw score labels along plot
optional arguments:
-p, --predictors comma-separated list of files with other
predictors (in MaveDB format)
--predictorNames comma-separated list of names for the above
predictors
--predictorOrders comma-separated list letters 'a' or 'd' for
whether predictor scores are (a)scending
towards pathogenicity or (d)escending towards
pathogenicity.
-o, --outfile The desired prefix for the output file name.
--posRanges Positional ranges within the map to be plotted
separately. E.g '1-24,25-66,67-
--logfile The desired log file location. [default: prc.log]