Skip to content

bduranvinet/SENTINEL-ed

Repository files navigation

SENTINEL(ed) logo

SENTINEL - Smart Environmental Nucleic-acid Tracking using Inference from Neural-networks for Early-warning Localization

Important

This is an educational repository for installing ADAPT dependencies for CRISPR-based environmental biosurveillance (CRISPR-eBx) deployments, such as SENTINEL education (SENTINEL-ed). Many things in this repository have been adapted from the Original ADAPT repository to enhance educational accessibility. The original repository is recommended if you want to develop ADAPT further. There is a copy of the ADAPT v1.6.0 repository in this repo if ever required.

Note

Our SENTINEL article has been accepted at the Environmental DNA journal, where all this pipeline and similar procedures were done.

You can explore our article at Durán-Vinet et al. CRISPR-based environmental biosurveillance assisted via artificial intelligence design of guide-RNAs (2025)

Current supporters

Genetics Otago (GO)

image

Southern Environmental DNA Society (SeDNAs)

image

Consortium for the Application of CRISPR in Ecology (CACE)

image

Want to sponsor or support SENTINEL?

Get in touch! You can contact GO, seDNAs or directly to benjamin.duran-vinet@postgrad.otago.ac.nz

Table of contents

SENTINEL introduction

In particular, SENTINEL is an integrated tool for CRISPR-based environmental biosurveillance (CRISPR-eBx). SENTINEL currently leverages ADAPT (Activity-informed Design with All-inclusive Patrolling of Targets) [article here]. ADAPT is an end-to-end trained artificial intelligence with ~19,000 guide-target pairs, ultimately providing a robust and comprehensive platform for target discovery in the environmental nucleic acid field that can accelerate environmental biosurveillance of emergent biosecurity threats or aid endangered species detection. ADAPT designs are trained to be highly sensitive and specific, with well-established command options that enable high customization for a flexible assay design for the user. ADAPT designs are also scalable and can be locally deployed.

This ADAPT platform will provide suitable ranked guide-target pairs for the given target (set of primers and a spacer sequence, see below).

Figure 2  ADAPT concept art

Illustration from Durán-Vinet et al. (in press).

The ultimate objective of SENTINEL is to streamline and accelerate CRISPR-eBx deployments, whether from air, soil or water samples. So far, only water has been tested, but air and soil environmental samples are a promising deployment.

Most CRISPR-eBx deployments use an isothermal amplification method: RPA (recombinase polymerase amplification) and LAMP (Loop-mediated isothermal amplification). A quick illustration is shown below for RPA-CRISPR-Cas13a.

RPA-CRISPR-Cas13a

Illustration from Durán-Vinet et al. (in press).


Quick glossary

Important

Reading this glossary will enhance the overall experience when going through the workshop and the manual.

CRISPR-eBx, CRISPR-based environmental biosurveillance, an alternative use of CRISPR to detect target species presence from environmental samples, targeting environmental nucleic acids. This could be for various applications, including environmental viral vectors, elusive endangered species, invasive species, diseases, genetic modifications and more.

CRISPR-Dx, CRISPR-based diagnostics, CRISPR-Dx is the original use of CRISPR-based detection tools. However, these clinical deployments are not considered the same as an environmental deployment, i.e., CRISPR-eBx.

DR, Direct repeat, section of the gRNA specific for each Cas nuclease.

eNAs, Environmental nucleic acids, often refer to all the DNA or RNA shredded or left behind in the environment by organisms or agents. eNAs include environmental DNA (eDNA) and environmental RNA (eRNA).

Guide-target pairs, these are defined as the construct of primers and specific gRNAs that will detect a certain genomic region.

gRNA, small RNA molecules that program and guide the Cas nuclease into a specific target by nucleic acid complementarity. These are also sometimes called CRISPR RNAs (crRNAs).

LAMP, Loop-mediated isothermal amplification. One of the most used isothermal amplification methods, it uses around 4 - 6 primers and runs between 60-70°C.

RPA, Recombinase polymerase amplification. One of the most used isothermal amplification methods uses 2 primers and can run between 25-42°C.


Some noteworthy deployments of CRISPR-based environmental biosurveillance are:


Some noteworthy reviews and perspectives on the CRISPR-based detection field are:


Before starting

This repository is educational and is not intended to cover all of bioinformatics. Instead, it is intended to teach you to use a specific tool within many other applications that Conda could run. If you ran into any potential issues when exploring ADAPT deeper, feel free to contact the original ADAPT authors or me at benjamin.duran-vinet@postgrad.otago.ac.nz


Installing ADAPT v1.6.0

Dependencies

Important

The dependencies listed below are automatically installed via Bioconda, and they differ from the original ADAPT. These changes has been tested and do not change the results quality, but allow better compatibility across systems. If you desire to install the original package, it has to be installed via pip or by downloading the repository, but this is ONLY for advanced/dev users.

Moreover, the original version of ADAPT can only be installed in Linux or Windows due to major incompatibilities with the arm64 architecture with Protobuf.

ADAPT will install:

Using the thermodynamic modules of ADAPT requires:

To run ADAPT, you will need conda installed on your Windows Linux Subsystem or macOS. Please make sure you install it before installing ADAPT; a short walkthrough is included below.


Step 0. Installing Windows Linux Subsystem

Warning

This is only for Windows users who haven't used Windows Linux Subsystem before.

If you are a Windows 10 user, please follow this Win10 tutorial

If you are a Windows 11 user, please follow this Win11 tutorial

After you have installed Ubuntu subsystem, open the terminal and go to Step 1.1


Step 1. Installing miniconda3 into your system

Step 1.1 Status check

Important

If you have used the terminal before, you might have conda installed already.

Please check if you already have conda installed

conda --version

This should provide a number (format xx.x.x). If you got a number as shown below, proceed to step 3 (Creating an environment; Linux and Mac users)

image

If you got an error, conda is not installed on your computer, then proceed to step 1.2 (Linux subsystem users) or 1.3 (Mac users).


Step 1.2 Installing conda for Linux subsystem

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

After this, restart your terminal. You should have a (base) before you prompt.


Step 1.3 Installing conda for macOS

Caution

Please ensure you install the correct conda version for Mac, as there are different MacOS architectures (amd64 and arm64); installing the incorrect one will bring downstream compiling errors.

For arm64 architecture for Apple Silicon (i.e., from M1 and onwards):

curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-amd64.sh

For amd64 architecture:

curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh

These commands will download the latest conda version for the correct architecture, Please double-check your architecture before proceeding.

Then run:

bash Miniconda3-latest-MacOSX-*.sh

This will lead you to accept all T&C, press the ENTER key till a yes|no prompt appears.

image

Type 'yes' and press ENTER. This will install miniconda.

After this, restart your terminal. You should have a (base) before your prompt as shown below: image


Step 2. Quick basic terminal commands training

pwd
image

'pwd' command allows you to see your current path (where you are in your terminal.

ls
image

'ls' command allows you to see directories in your current path.

mkdir SENTINELv1
image

'mkdir' command will create a directory in your current path that you saw with 'pwd'. You can see this new directory using 'ls'.

cd SENTINELv1/
image

'cd' allows you to go into a specified directory or path. You can use TAB to autocomplete. 'ls' will show that this directory is empty.

cd ..
image

'cd..' will take you one directory back. You can check with 'pwd' your new position. You will also see that your prompt changed too.


Step 3. Creating an environment

Note

The steps below are for Windows Linux Subsystem, and Mac users, as the command lines are similar.

Tip

Environments are useful tools in bioinformatics as they allow us to have a unique 'bench' for specific work; this way, we avoid creating incompatibilities in our main system files.

Step 3.1 Specific environment creation

conda create -n SENTINELv1 python=3.8
image

This will create a specific environment that has Python version 3.8.x. It will also install other dependencies and libraries. Type y and press ENTER. This will take some minutes to run.


Step 3.2 Working directories creation

mkdir SENTINELv1
cd SENTINELv1
mkdir input output
ls
image

If you have already created a 'SENTINELv1' folder, use the last three lines instead.


Step 3.3 ADAPT installation

Warning

Ensure you activate the environment you created in step 3.1 before continuing.

conda activate SENTINELv1
image

Your prompt environment should change from (base) to (SENTINELv1). If it activated correctly, you should see Python 3.8. x when using 'conda list'.

conda list
image

Proceed to install

conda install -c bioconda "adapt[thermo]"

Type 'y' and then ENTER. The installation can take some minutes as some dependencies are heavy (mainly TensorFlow). TensorFlow is a software library for machine learning deployments, which is the backbone of ADAPT.

Then:

pip install primer3-py==0.6.1

This should have ADAPT 1.6.0 ready to go. Run the command below.

design.py --help
image

If something similar displays, then ADAPT has been successfully installed. 'design.py' calls for the workflow activation, while '--help' provides specific information to run the workflow. Check out the Commands Explained guide for more details.


Running ADAPT

Tip

Before starting~~

Always keep a log of your --seed xxx number, this will ensure reproducibility in case you might need it.

You can use Sublime Text for quick command changes and Benchling to keep a trackable record of what you have ran through the terminal.

ADAPT can only read .FASTA files. ADAPT will always assume that input files are aligned unless otherwise specified. It is recommended that files are aligned and examined carefully before running them through ADAPT.

Step 4. ADAPT example run

Note

You can optionally use your own FASTA file. In fact, it is encouraged so you can get a useful product from this workshop. You can follow the same instructions provided below, just make sure to use the correct file name.

Step 4.1 Preparing example file

In this same repository, go to the Example folder and download Test.fasta

image image

Move Test.fasta into your 'Input' directory that was previously created. You can do this manually in Windows or Mac.

To do this, go to your input folder

cd SENTINELv1/input/

Tip

If you are a Windows user use:

explorer.exe .

Tip

If you are a Mac user use:

open .

This will open your current location folder, where you can drag and drop the test.FASTA file and then run 'ls' to see it also in the terminal. image


Step 4.2 Running pipeline

Before running the pipeline, make sure you are in the 'SENTINELv1' folder, as the command shown below has the path directories to work from 'SENTINELv1'. You can call ADAPT from your home folder, but you would need to add the full path so ADAPT can find the input files and deposit the output file properly.

Use 'pwd' to know your current PATH position. You should be in the 'input' folder. If so, use:

cd ..

image

Then, please copy the following commands and paste them into your terminal.

Warning

Make sure you change your output file name before running a new ADAPT query, or it will get OVERWRITTEN, and previous results will be PERMANENTLY LOST.

Long sequences may require special commands to ensure a smooth run or eventually a server. See Commands Explained for more information.

Ensure you are in the SENTINELv1 folder before running the command below, or an error will occur.

Ensure you have the conda environment activated!

design.py complete-targets fasta ./input/Test.fasta -o ./output/example-output1 --obj maximize-activity --id-m 4 --id-frac 0.01 -gl 28 -gm 0 -pl 30 -pm 5 -pp 0.98 --primer-gc-content-bounds 0.35 0.70 --maximization-algorithm random-greedy --predict-cas13a-activity-model --best-n-targets 10 --seed 001 --verbose

'./' fills your path till your actual location in the terminal. The above command will only work if you are in the /SENTINELv1/ folder.

This is how it will look when running

image

Step 4.3 Output retrieval

You can see your output file as follows

cd output
ls
image

Tip

For MacOS users:

open .
image

Tip

For Windows Linux Subsystem users:

explore.exe .

Open the results from Finder (Mac) or Explorer (Windows), but not through the terminal. Results will look similar to those options shown below

image

OR

image

To make the results more accessible, if they are open as a text file, copy all the elements and paste them into Excel. This Excel file will be used in Step 5. Alternatively, you can also open it directly as an Excel file.

An explanation of the most important values are given at Understanding Output


Making ADAPT output functional

Note

To accelerate and enhance the visualisation of results, we recommend using Geneious Prime for the following steps.

Tip

Remember that ADAPT generates specific spacers for LwaCas13a. However, the pipeline can also make PCR and RPA primers for standalone applications.

Warning

ADAPT designs are not ready to use straightaway. We will use Geneious Prime to make them functional.

Spacer and reverse primers (left-primers) have to be reverse-complemented.

Forward primers have to be appended with the T7 promoter.

Spacers must be converted to RNA and appended with a DR sequence to be functional.


Tip

Save the sequences given below in Geneious Prime as separate sequences in a dedicated folder. image

See below LwaCas13a DR sequence,

LwaCas13a DR sequence: GAUUUAGACUACCCCAAAAACGAAGGGGACUAAAAC

This sequence has to be appended to the 5' end of the spacer sequence obtained.


See below the T7 promoter,

T7 promoter sequence: GAAATTAATACGACTCACTATAGGG

This sequence has to be appended to the 5' end of the forward primer sequence.


See below the poly-U5 sequence,

polyU5 sequence: /6-FAM/rUrUrUrUrU/BHQ-1/

This is the specific reporter for LwaCas13a. This nuclease has a high semi-specific di-nucleotide motif preference for U-U pairs.


Step 5. Functional guide-target pairs

Note

This section is done with Geneious Prime. Make sure to activate your educational license provided.

For the following example, we will use ONLY ONE GUIDE-TARGET PAIR: the best guide-target pair from the previous example, usually the first one of the list in Excel. Copy and paste each sequence independently from that guide-target pair from the Excel file into a dedicated folder in Geneious Prime, and also drag and drop the test.fasta file in the same Geneious Prime folder.

image

When naming guide-target pairs, it is recommended to use the crRNA position for all IDs.

Note

Remember to create a dedicated folder in Geneious Prime!

image

After copying and pasting a single guide-target pair in your dedicated Geneious Prime folder, it should look like below: image

Warning

As this is a deep learning model, you might have got a different result; proceed with the best predicted target.

Step 5.1 Off-target check

Caution

It is recommended to run BLAST on raw results.

Because SENTINEL uses the crRNAs as the main source of specificity while RPA is enrichment only, it is only required to BLAST the spacer sequence. It is also optional to BLAST the primers, but they will have off-targets. Moreover, RPA is highly tolerant of mismatches.

image

When BLASTing in Geneious, it will create a new folder with full-hit targets, and all hits shown in the list are guaranteed full activity.

image

As expected, the crRNA has lots of full-hit off-target sequences. This is completely expected as no '--specific-against-fastas' command was used. We will continue with this one as a demonstration. However, if you want ADAPT to provide highly specific guide-target pairs, it is recommended to use '--specific-against-fastas'. You'll need to specify a PATH to the off-target .FASTA file.


Step 5.2 Guide-target pair preparation

Warning

Geneious Prime does not allow bulk 'reverse complement' of primers, MAKE 'reverse complement' before converting the sequences into primers. This is especially useful when working with bulk guide-target pairs.

Reverse complement the Rv and crRNA using the 'reverse complement' option from the Sequence bar. image


The sequences will change name automatically with a (reversed) after saving. image


Then, select all sequences and convert all guide-target pair sequences into primers using the Primer button. Screenshot 2025-04-24 at 2 05 52 PM


Now, unselect all and only select your crRNA construct and convert to RNA. Then save. Screenshot 2025-04-24 at 2 28 44 PM


Now, copy the LwaCas13a DR sequence, go to the crRNA544 sequence, click on 'Allow editing', click on the 5' end and paste the sequence. Repeat the same with the T7 promoter for the Fw primer.

image

crRNA:

image

Fw:

image

Step 5.3 Guide-target pair screening

As the final stage for all guide-target pairs, a screening has to be done. Select ONLY test.fasta, go to the primer symbol and select 'Test Saved Primers'

image

Click on 'Choose...' and selected the guide-target pair sequences.

Screenshot 2025-04-24 at 2 33 05 PM

And use the following configuration for your screening, then select OK.

image

Now, the obtained guide-target pair is successfully screened.

image

Congratz! You've made it!!!

ADAPT throughput result dump (optional)

Note

Use Excel for this optional section.

For massive screenings of results, copying and pasting single guide-target pairs is not feasible. Accordingly, using the same result files in Excel, insert a new full column on the left-side of the columns: 'left-primer-target-sequences'; 'right-primer-target-sequences' and 'guide-target-sequence-positions'. Use the names 'Fw', 'Rv' and 'crRNA', respectively, for these new columns.

Then, type on the cell below Fw:

=("Fw"&W2) and then scroll down the formula. W2 is the crRNA position.

Screenshot 2025-04-24 at 2 42 27 PM

Repeat with Rv and crRNA.


Then, save the Excel file as .CSV file.

Screenshot 2025-04-24 at 11 37 25 PM

Then, drag and drop the .CSV file into a dedicated folder in Geneious Prime. Select .CSV.

Screenshot 2025-04-24 at 2 46 08 PM

Next, select the 'Fw' column as name and the 'left-primer-target-sequences' as sequences, as shown below. Then click on OK.

Screenshot 2025-04-24 at 11 36 11 PM

All Fw primers were inserted in bulk into Geneious Prime, and repeat the same steps but for your Rv and crRNAs sequences. Then you can repeat the Section 5 of this manual.

Screenshot 2025-04-24 at 2 48 36 PM

About

Smart Environmental Nucleic-acid Tracking using Inference from Neural-networks for Early-warning Localization

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Contributors