Skip to content

bihealth/swibrid_paper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

paper repository for Vazquez-Garcia & Obermayer et al.

contents:

  • data: contains input data for paper figures (prepared using prepare_data.R

  • paper_figures.Rmd: R code to produce paper figures (uses data in data, nothing else needed)

    sessionInfo **R version 4.3.2 (2023-10-31)**

    Platform: x86_64-pc-linux-gnu (64-bit)

    locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=en_US.UTF-8, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8 and LC_IDENTIFICATION=C

    attached base packages: grid, stats, graphics, grDevices, utils, datasets, methods and base

    other attached packages: dendextend(v.1.17.1), lme4(v.1.1-35.1), gtools(v.3.9.5), RColorBrewer(v.1.1-3), variancePartition(v.1.32.5), BiocParallel(v.1.36.0), limma(v.3.58.1), readxl(v.1.4.3), pROC(v.1.18.5), glmnet(v.4.1-8), Matrix(v.1.6-5), car(v.3.1-2), carData(v.3.0-5), ggrepel(v.0.9.5), circlize(v.0.4.16), ComplexHeatmap(v.2.18.0), cowplot(v.1.1.3), scales(v.1.3.0), caret(v.6.0-94), lattice(v.0.21-9), lubridate(v.1.9.3), forcats(v.1.0.0), stringr(v.1.5.1), purrr(v.1.0.2), readr(v.2.1.5), tidyr(v.1.3.1), tibble(v.3.2.1), tidyverse(v.2.0.0), dplyr(v.1.1.4), ggpubr(v.0.6.0) and ggplot2(v.3.5.1)

    loaded via a namespace (and not attached): bitops(v.1.0-7), Rdpack(v.2.6), gridExtra(v.2.3), rlang(v.1.1.3), magrittr(v.2.0.3), clue(v.0.3-65), GetoptLong(v.1.0.5), matrixStats(v.1.2.0), compiler(v.4.3.2), png(v.0.1-8), vctrs(v.0.6.5), reshape2(v.1.4.4), pkgconfig(v.2.0.3), shape(v.1.4.6.1), crayon(v.1.5.2), backports(v.1.4.1), pander(v.0.6.5), caTools(v.1.18.2), utf8(v.1.2.4), prodlim(v.2023.08.28), tzdb(v.0.4.0), nloptr(v.2.0.3), xfun(v.0.42), EnvStats(v.2.8.1), recipes(v.1.0.10), remaCor(v.0.0.18), broom(v.1.0.5), parallel(v.4.3.2), cluster(v.2.1.4), R6(v.2.5.1), stringi(v.1.8.3), boot(v.1.3-28.1), parallelly(v.1.37.0), rpart(v.4.1.21), numDeriv(v.2016.8-1.1), cellranger(v.1.1.0), Rcpp(v.1.0.12), iterators(v.1.0.14), knitr(v.1.45), future.apply(v.1.11.1), IRanges(v.2.36.0), splines(v.4.3.2), nnet(v.7.3-19), timechange(v.0.3.0), tidyselect(v.1.2.0), viridis(v.0.6.5), rstudioapi(v.0.15.0), abind(v.1.4-5), timeDate(v.4032.109), gplots(v.3.1.3.1), doParallel(v.1.0.17), codetools(v.0.2-19), listenv(v.0.9.1), lmerTest(v.3.1-3), plyr(v.1.8.9), Biobase(v.2.62.0), withr(v.3.0.0), future(v.1.33.1), survival(v.3.5-7), pillar(v.1.9.0), KernSmooth(v.2.23-22), foreach(v.1.5.2), stats4(v.4.3.2), generics(v.0.1.3), S4Vectors(v.0.40.2), hms(v.1.1.3), aod(v.1.3.3), munsell(v.0.5.0), minqa(v.1.2.6), globals(v.0.16.2), RhpcBLASctl(v.0.23-42), class(v.7.3-22), glue(v.1.7.0), tools(v.4.3.2), fANCOVA(v.0.6-1), data.table(v.1.15.0), ModelMetrics(v.1.2.2.2), gower(v.1.0.1), ggsignif(v.0.6.4), mvtnorm(v.1.2-4), rbibutils(v.2.2.16), ipred(v.0.9-14), colorspace(v.2.1-0), nlme(v.3.1-163), cli(v.3.6.2), fansi(v.1.0.6), viridisLite(v.0.4.2), lava(v.1.8.0), corpcor(v.1.6.10), gtable(v.0.3.4), rstatix(v.0.7.2), digest(v.0.6.34), BiocGenerics(v.0.48.1), pbkrtest(v.0.5.2), rjson(v.0.2.21), lifecycle(v.1.0.4), hardhat(v.1.3.1), GlobalOptions(v.0.1.2), statmod(v.1.5.0) and MASS(v.7.3-60)

  • swibrid_runs: contains config files for various SWIBRID runs on human or mouse data, or the simulations

    • benchmarks: config files for the benchmarks

      • dense: using dense MSA, can be run as is using swibrid test in that folder
      • sparse: using sparse MSA. for this, the sparsecluster package needs to be installed
    • mouse: config files for mouse data

      • download raw fastq files from SRA (accession PRJNA1190672) into raw_data and run demultiplex_dataset.sh; this will put fastq and info.csv files for individual samples into input and make it possible to run all samples in one go
      • download mm10 genome from UCSC or elsewhere
      • download gencode M12 reference and use swibrid prepare_annotation
      • use config.yaml for running all mouse data
      • use config_noSg.yaml for running everything only on Sm + Sa (potentially restrict info files in input to reads with Sa primer)
    • human: config files for human data

      raw sequencing data for human donors cannot be shared due to patient privacy legislation

      • demultiplex_dataset.sh is used to demultiplex input for each run, demultiplexed fastq and info.csv files would be expected in input
      • get hg38 genome and gencode v33 reference, create LAST index
      • config.yaml for "regular" runs
      • config_reads_averaging.yaml to use averaging of features over reads not clusters
      • combine_replicates.sh to pool reads from technical replicates
      • plot_bars.sh and plot_bars.py to plot isotype fractions as in Fig. 1
      • plot_circles.sh and plot_circles.py to create bubble plots of Fig. 1
      • plot_clustering.sh to create read plots for Fig. 1 and S2
      • plot_breakpoints.sh and plot_breakpoint_stats.py to create breakpoint matrix plot of Fig. 2A
    • external: config files for public datasets (Vincendeau et al. and Panchakshari et al.)

      • for Vincendeau et al., download data from SRA (PRJNA831666) into the Vincendeau subfolder and run make_info.py on every sample to create dummy files with primer locations
      • for Panchakshari et al., use get_data.sh in the HTGTS folder to download data, collapse read mates with bbmerge and create info files
  • supplementary_note.ipynb: python code to make plots for supplementary note (needs numpy, scipy, pandas, seaborn)

About

Code and scripts for SWIBRID publication

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published