Code for the paper "High-Throughput DNA melt measurements enable improved models of DNA folding thermodynamics" https://doi.org/10.1101/2024.01.08.574731. "NNN" stands for "Not-Nearest-Neighbor"
Jupyter notebooks 01.1 to 01.5 correspond to the 5 main figures.
Notebook 01.0_DataPrep.ipynb performs data cleaning and train-val-test split from the output of the preprocessing pipeline.
Functions used for generation of figures are defined in nnn/.
Three major conda environments were used:
- `nnn.yml` most analysis in the repository
- `torch.yml` for training and running graph neural networks
- `nn_train.yml` for fitting and running linear regression models.
Also available as a singularity container as defined in `nn_train.def`.
To install, make sure conda is already installed, then run conda env create -f {path/to/yml/file/name}. For example, conda env create -f envs/nnn.yml.
The local conda environment nnn is also directly exported to envs/nnn_environment.yml. It is provide for record keeping, and envs/nnn.yml is still recommended for installing from scratch.
Note for container users: If you're running this in a minimal linux container, you may need to manually install system build tools beforehand. For example,
sudo apt install build-essential.
The RiboGraphViz package is required for some visualization tasks.
By default, it is not directly available on PyPI or conda. To install, you can either:
Option 1 (recommended by original authors):
Clone and install manually, following the instructions at
https://github.com/DasLab/RiboGraphViz
git clone https://github.com/DasLab/RiboGraphViz.git
cd RiboGraphViz
pip install .Option 2 (alternative): Install directly via pip with dependencies:
pip install networkx matplotlib seaborn git+https://github.com/DasLab/RiboGraphVizNote: While the RiboGraphViz developers have expressed interest in adding the package to PyPI, this has not yet been done as of this writing.
NUPACK4 (v4.0.0.27) was manually installed from file as it requires a free licence for download (https://docs.nupack.org/). As of Oct. 2024, NUPACK is temporarily free for academic personal users but may need paid subscription in the future.
- Register a new account to get an academic licence. Verify your email.
- After logging in and acknowledging the licence, you will find download links at https://www.nupack.org/download/software. Click
NUPACK 4.0.0.28to download the zip file. - Unzip the zip file. Go to the directory
nupack-4.0.0.28/package. - Choose one of the four
cp38wheel files depending on your operating system. For example,nupack-4.0.0.28-cp38-cp38-macosx_10_13_x86_64.whlon macos. - Run
conda activate nnnto activate thennnenvironment. - Run
pip install {path/to/your/nupack/whl/file}to install the NUPACK python module.
The parameter estimation process could be replicated by following these steps:
-
Prepare the environment. Install the conda environment specified in
envs/nn_train.ymlwithconda env create -f envs/nn_train.yml. This yaml file specifies the required packages and their versions.a. Alternatively, use a singularity container. The build file of this singularity container is
envs/nn_train.def. -
Activate the environment.
conda activate nn_train -
Run the script. Enter
python run_nn_train.pyin the command line. You may also submit it as a job, usingrun_nn_train.shas a template (you will need to modify the slurm settings). Inrun_nn_train.py, edit theconfigdictionary to run models with different settings; edit themyrangelist to change the percentage of training data used for the plots.a. Alternatively, run the notebook interactively. Launch
jupyter laband run the notebook03.2_TrainNN.ipynbin thennn_paperrepository.b. Note that in either script or notebook settings, you will be prompted to login to
wandbto log the model training runs. This helps to keep track of models trained with different settings.
Python scripts in scripts/ generates the sequences in the variant library and are helpful to understand library design logics.
Run gnn_run.py in torch envoronment, pointing to the path of the saved model state dict file.
For any questions, contact Yuxi Ke ([email protected]).
Jan. 2024, updated Oct. 2024