Update case studies for 1.9

kishwarshafin · copybara-github · commit f32edca605bb · 2025-05-12T16:37:13.000-07:00
PiperOrigin-RevId: 757960591
diff --git a/docs/deepvariant-fast-pipeline-case-study.md b/docs/deepvariant-fast-pipeline-case-study.md
@@ -48,13 +48,13 @@ Please refer to the following documentation for more details.
 [Installing the NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
 
 For this case study we used the
-[script](https://github.com/google/deepvariant/blob/r1.8.0/scripts/install_nvidia_docker.sh)
+[script](https://github.com/google/deepvariant/blob/r1.9/scripts/install_nvidia_docker.sh)
 that automates the CUDA and container tools kit installation.
 
 Please note that the script takes about 30 minutes to run.
 
 ```bash
-wget https://raw.githubusercontent.com/google/deepvariant/refs/heads/r1.8.0/scripts/install_nvidia_docker.sh
+wget https://raw.githubusercontent.com/google/deepvariant/refs/heads/r1.9/scripts/install_nvidia_docker.sh
 chmod +x install_nvidia_docker.sh
 ./install_nvidia_docker.sh
 ```
@@ -64,7 +64,7 @@ chmod +x install_nvidia_docker.sh
 ### Get DeepVariant Docker image
 
 ```bash
-BIN_VERSION="1.8.0"
+BIN_VERSION="1.9.0"
 sudo docker pull google/deepvariant:"${BIN_VERSION}-gpu"
 ```
 
@@ -217,9 +217,9 @@ variants.gvcf.chr20.vcf
 With the same settings the pipeline takes approximately 10 minutes.
 
 ```
-real    8m15.252s
-user    0m0.007s
-sys     0m0.035s
+real    12m45.795s
+user    0m0.018s
+sys     0m0.038s
 ```
 
 ## Benchmark output
@@ -256,8 +256,8 @@ time sudo docker run \
 ```
 Benchmarking Summary:
 Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  FP.al  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score  TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
-INDEL    ALL        10628     10543        85        22403        74      11375     40     29       0.992002          0.993290        0.507744         0.992646                     NaN                     NaN                   1.748961                   2.138647
-INDEL   PASS        10628     10543        85        22403        74      11375     40     29       0.992002          0.993290        0.507744         0.992646                     NaN                     NaN                   1.748961                   2.138647
-  SNP    ALL        70166     70101        65       105602        71      35342     12     12       0.999074          0.998989        0.334672         0.999032                2.296566                1.713281                   1.883951                   1.503192
-  SNP   PASS        70166     70101        65       105602        71      35342     12     12       0.999074          0.998989        0.334672         0.999032                2.296566                1.713281                   1.883951                   1.503192
+INDEL    ALL        10628     10553        75        22560        72      11522     37     28       0.992943          0.993477        0.510727         0.993210                     NaN                     NaN                   1.748961                   2.180292
+INDEL   PASS        10628     10553        75        22560        72      11522     37     28       0.992943          0.993477        0.510727         0.993210                     NaN                     NaN                   1.748961                   2.180292
+  SNP    ALL        70166     70106        60       102415        69      32148      9      9       0.999145          0.999018        0.313899         0.999081                2.296566                 1.72911                   1.883951                   1.442237
+  SNP   PASS        70166     70106        60       102415        69      32148      9      9       0.999145          0.999018        0.313899         0.999081                2.296566                 1.72911                   1.883951                   1.442237
 ```
diff --git a/docs/deepvariant-training-case-study.md b/docs/deepvariant-training-case-study.md
@@ -534,7 +534,7 @@ sudo docker run --gpus 1 \
   --disable_small_model
 ```
 
-Starting in v1.8.0, by default we use a small model to classify some
+We use a small model to classify some
 candidates. In this example, we set `--disable_small_model` so
 that small model is disabled. This allows us to run all examples
 through the model we just trained.
diff --git a/docs/deepvariant-vg-case-study.md b/docs/deepvariant-vg-case-study.md
@@ -172,7 +172,6 @@ Get the same reference we used for
 
 ```bash
 FTPDIR=ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids
-
 curl ${FTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz | gunzip > ${DATA_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
 samtools faidx ${DATA_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
 ```
@@ -184,7 +183,7 @@ And then, run DeepVariant.
 [DeepVariant Case Study](deepvariant-case-study.md).)
 
 ```bash
-BIN_VERSION="1.8.0"
+BIN_VERSION="1.9.0"
 
 sudo docker pull google/deepvariant:"${BIN_VERSION}"
 
@@ -204,9 +203,9 @@ time sudo docker run \
 
 Stage                            | Time (minutes)
 -------------------------------- | -----------------
-make_examples                    | 59m19.845s
-call_variants                    | 49m41.643s
-postprocess_variants (with gVCF) | 7m46.195s
+make_examples                    | 81m11.112s
+call_variants                    | 38m27.228s
+postprocess_variants (with gVCF) | 9m13.565s
 
 
 ### Run hap.py
@@ -244,16 +243,16 @@ Output:
 ```
 Benchmarking Summary:
 Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  FP.al  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score  TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
-INDEL    ALL       504501    502210      2291       954974      1522     429900    956    362       0.995459          0.997101        0.450169         0.996279                     NaN                     NaN                   1.489759                   1.942299
-INDEL   PASS       504501    502210      2291       954974      1522     429900    956    362       0.995459          0.997101        0.450169         0.996279                     NaN                     NaN                   1.489759                   1.942299
-  SNP    ALL      3327496   3316336     11160      3823082      4229     500683   1696    356       0.996646          0.998727        0.130963         0.997686                2.102576                1.990152                   1.535137                   1.449299
-  SNP   PASS      3327496   3316336     11160      3823082      4229     500683   1696    356       0.996646          0.998727        0.130963         0.997686                2.102576                1.990152                   1.535137                   1.449299
+INDEL    ALL       504501    502342      2159       956579      1444     431515    881    290       0.995721           0.99725        0.451102         0.996485                     NaN                     NaN                   1.489759                   1.924206
+INDEL   PASS       504501    502342      2159       956579      1444     431515    881    290       0.995721           0.99725        0.451102         0.996485                     NaN                     NaN                   1.489759                   1.924206
+  SNP    ALL      3327496   3319188      8308      4031912      5621     705300   1705    469       0.997503           0.99831        0.174929         0.997907                2.102576                1.889869                   1.535137                   1.312185
+  SNP   PASS      3327496   3319188      8308      4031912      5621     705300   1705    469       0.997503           0.99831        0.174929         0.997907                2.102576                1.889869                   1.535137                   1.312185
 ```
 
 This can be compared with
-https://github.com/google/deepvariant/blob/r1.8/docs/metrics.md#accuracy.
+https://github.com/google/deepvariant/blob/r1.9/docs/metrics.md#accuracy.
 
 Which shows that `vg giraffe` improves F1:
 
-- Indel F1: 0.995945 --> 0.996279
-- SNP F1: 0.996213 --> 0.997686
+- Indel F1: 0.995845 --> 0.996485
+- SNP F1: 0.996133 --> 0.997907