GitHub - bihealth/swibrid_paper: Code and scripts for SWIBRID publication

paper repository for Vazquez-Garcia & Obermayer et al.

contents:

data: contains input data for paper figures (prepared using prepare_data.R
paper_figures.Rmd: R code to produce paper figures (uses data in data, nothing else needed)

sessionInfo
**R version 4.3.2 (2023-10-31)**
Platform: x86_64-pc-linux-gnu (64-bit)

locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=en_US.UTF-8, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8 and LC_IDENTIFICATION=C

attached base packages: grid, stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: dendextend(v.1.17.1), lme4(v.1.1-35.1), gtools(v.3.9.5), RColorBrewer(v.1.1-3), variancePartition(v.1.32.5), BiocParallel(v.1.36.0), limma(v.3.58.1), readxl(v.1.4.3), pROC(v.1.18.5), glmnet(v.4.1-8), Matrix(v.1.6-5), car(v.3.1-2), carData(v.3.0-5), ggrepel(v.0.9.5), circlize(v.0.4.16), ComplexHeatmap(v.2.18.0), cowplot(v.1.1.3), scales(v.1.3.0), caret(v.6.0-94), lattice(v.0.21-9), lubridate(v.1.9.3), forcats(v.1.0.0), stringr(v.1.5.1), purrr(v.1.0.2), readr(v.2.1.5), tidyr(v.1.3.1), tibble(v.3.2.1), tidyverse(v.2.0.0), dplyr(v.1.1.4), ggpubr(v.0.6.0) and ggplot2(v.3.5.1)

loaded via a namespace (and not attached): bitops(v.1.0-7), Rdpack(v.2.6), gridExtra(v.2.3), rlang(v.1.1.3), magrittr(v.2.0.3), clue(v.0.3-65), GetoptLong(v.1.0.5), matrixStats(v.1.2.0), compiler(v.4.3.2), png(v.0.1-8), vctrs(v.0.6.5), reshape2(v.1.4.4), pkgconfig(v.2.0.3), shape(v.1.4.6.1), crayon(v.1.5.2), backports(v.1.4.1), pander(v.0.6.5), caTools(v.1.18.2), utf8(v.1.2.4), prodlim(v.2023.08.28), tzdb(v.0.4.0), nloptr(v.2.0.3), xfun(v.0.42), EnvStats(v.2.8.1), recipes(v.1.0.10), remaCor(v.0.0.18), broom(v.1.0.5), parallel(v.4.3.2), cluster(v.2.1.4), R6(v.2.5.1), stringi(v.1.8.3), boot(v.1.3-28.1), parallelly(v.1.37.0), rpart(v.4.1.21), numDeriv(v.2016.8-1.1), cellranger(v.1.1.0), Rcpp(v.1.0.12), iterators(v.1.0.14), knitr(v.1.45), future.apply(v.1.11.1), IRanges(v.2.36.0), splines(v.4.3.2), nnet(v.7.3-19), timechange(v.0.3.0), tidyselect(v.1.2.0), viridis(v.0.6.5), rstudioapi(v.0.15.0), abind(v.1.4-5), timeDate(v.4032.109), gplots(v.3.1.3.1), doParallel(v.1.0.17), codetools(v.0.2-19), listenv(v.0.9.1), lmerTest(v.3.1-3), plyr(v.1.8.9), Biobase(v.2.62.0), withr(v.3.0.0), future(v.1.33.1), survival(v.3.5-7), pillar(v.1.9.0), KernSmooth(v.2.23-22), foreach(v.1.5.2), stats4(v.4.3.2), generics(v.0.1.3), S4Vectors(v.0.40.2), hms(v.1.1.3), aod(v.1.3.3), munsell(v.0.5.0), minqa(v.1.2.6), globals(v.0.16.2), RhpcBLASctl(v.0.23-42), class(v.7.3-22), glue(v.1.7.0), tools(v.4.3.2), fANCOVA(v.0.6-1), data.table(v.1.15.0), ModelMetrics(v.1.2.2.2), gower(v.1.0.1), ggsignif(v.0.6.4), mvtnorm(v.1.2-4), rbibutils(v.2.2.16), ipred(v.0.9-14), colorspace(v.2.1-0), nlme(v.3.1-163), cli(v.3.6.2), fansi(v.1.0.6), viridisLite(v.0.4.2), lava(v.1.8.0), corpcor(v.1.6.10), gtable(v.0.3.4), rstatix(v.0.7.2), digest(v.0.6.34), BiocGenerics(v.0.48.1), pbkrtest(v.0.5.2), rjson(v.0.2.21), lifecycle(v.1.0.4), hardhat(v.1.3.1), GlobalOptions(v.0.1.2), statmod(v.1.5.0) and MASS(v.7.3-60)
swibrid_runs: contains config files for various SWIBRID runs on human or mouse data, or the simulations
- benchmarks: config files for the benchmarks
  - dense: using dense MSA, can be run as is using swibrid test in that folder
  - sparse: using sparse MSA. for this, the sparsecluster package needs to be installed
- mouse: config files for mouse data
  - download raw fastq files from SRA (accession PRJNA1190672) into raw_data and run demultiplex_dataset.sh; this will put fastq and info.csv files for individual samples into input and make it possible to run all samples in one go
  - download mm10 genome from UCSC or elsewhere
  - download gencode M12 reference and use swibrid prepare_annotation
  - use config.yaml for running all mouse data
  - use config_noSg.yaml for running everything only on Sm + Sa (potentially restrict info files in input to reads with Sa primer)
- human: config files for human data
  
  raw sequencing data for human donors cannot be shared due to patient privacy legislation
  - demultiplex_dataset.sh is used to demultiplex input for each run, demultiplexed fastq and info.csv files would be expected in input
  - get hg38 genome and gencode v33 reference, create LAST index
  - config.yaml for "regular" runs
  - config_reads_averaging.yaml to use averaging of features over reads not clusters
  - combine_replicates.sh to pool reads from technical replicates
  - plot_bars.sh and plot_bars.py to plot isotype fractions as in Fig. 1
  - plot_circles.sh and plot_circles.py to create bubble plots of Fig. 1
  - plot_clustering.sh to create read plots for Fig. 1 and S2
  - plot_breakpoints.sh and plot_breakpoint_stats.py to create breakpoint matrix plot of Fig. 2A
- external: config files for public datasets (Vincendeau et al. and Panchakshari et al.)
  - for Vincendeau et al., download data from SRA (PRJNA831666) into the Vincendeau subfolder and run make_info.py on every sample to create dummy files with primer locations
  - for Panchakshari et al., use get_data.sh in the HTGTS folder to download data, collapse read mates with bbmerge and create info files
supplementary_note.ipynb: python code to make plots for supplementary note (needs numpy, scipy, pandas, seaborn)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
swibrid_runs		swibrid_runs
README.md		README.md
paper_figures.Rmd		paper_figures.Rmd
prepare_input.R		prepare_input.R
supplementary_note.ipynb		supplementary_note.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

bihealth/swibrid_paper

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages