ncRNAHD: Non-coding RNA Homolog Detection

ncRNAHD is a tool for detecting homologous non-coding RNA sequences using deep learning embeddings and efficient similarity search.

Features

Deep learning-based RNA sequence embedding using ncRNABert
Efficient similarity search with FAISS indexing
Multiple sequence alignment (MSA) generation support

Installation

1. Clone the repository

git clone https://github.com/ISYSLAB-HUST/ncRNAHD
cd ncRNAHD

2. Create conda environment

conda env create -f environment.yml
conda activate ncRNAHD

3. Download and process RNACentral database

bash setup/download_data.sh
python process_rna_sequences.py

4. Generate embeddings for the database

python embedding/generate_embeddings.py

5. Build FAISS index

python indexing/build_faiss_index.py

6. Setup custom rMSA

# Setup rMSA
bash msa/setup_rmsa.sh

# Replace with custom rMSA.pl
bash msa/replace_rmsa.sh

Usage

Step 1:homolog search

python homolog_search.py --query_fasta your_query.fasta --output_dir results

# Example1:
python homolog_search.py --query_fasta examples/5kh8.fasta --output_dir results
# Example2:
python homolog_search.py --query_fasta examples/batch_query.fasta --output_dir results

Step 2:MSA Generation

cd rMSA
# 1. Format the candidate database
database/script/makeblastdb -in ../results/Homologs_your_query.fasta -parse_seqids -hash_index -dbtype nucl
# 2. Generate MSA
perl rMSA.pl your_query.fasta -db1=../results/Homologs_your_query.fasta -cpu=16
# 3. A3m format (Optional)
# perl ${WORK_DIR}/bin/reformat.pl fas a3m -l 10000 your_query.afa your_query.a3m

# Complete example:
database/script/makeblastdb -in ../results/Homologs_5kh8.fasta -parse_seqids -hash_index -dbtype nucl
perl rMSA.pl 5kh8.fasta -db1=../results/Homologs_5kh8.fasta -cpu=16
# perl ${WORK_DIR}/bin/reformat.pl fas a3m -l 10000 5kh8.afa 5kh8.a3m

File Structure

ncRNAHD/
├── homolog_search.py          # Main search tool
├── environment.yml            # Conda environment configuration
├── README.md                  # This file
├── setup/                     # Data download and preprocessing scripts
│   ├── download_data.sh
│   └── process_rna_sequences.py
├── embedding/                 # Embedding generation
│   └── generate_embeddings.py
├── indexing/                  # FAISS index building
│   └── build_faiss_index.py
├── search/                    # Search components
│   ├── embedding_generator.py
│   └── faiss_searcher.py
├── msa/                       # MSA generation tools
│   ├── setup_rmsa.sh
│   ├── setup_trrosetta.sh
│   ├── replace_rmsa.sh
│   └── rMSA.pl
├── data/                      # Generated data files (created during setup)
│   ├── rnacentral_active.fasta
│   ├── rnacentral_active_processed.fasta
│   ├── rna_embeddings.npy
│   ├── trained_index.faiss
│   ├── whiten_params.npz
│   └── sequence_index.json
├── results/                   # Search results (created during search)
│   └── Homologs_{sequence_id}.fasta    # Candidate sequences for each query
└── examples/                  # Example query files
    ├── batch_query.fasta
    ├── 5kh8.fasta
    └── Homologs_5kh8.fasta

Requirements

Python 3.12
PyTorch
BioPython
FAISS
ncRNABert
See environment.yml for complete dependencies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ncRNAHD: Non-coding RNA Homolog Detection

Features

Installation

1. Clone the repository

2. Create conda environment

3. Download and process RNACentral database

4. Generate embeddings for the database

5. Build FAISS index

6. Setup custom rMSA

Usage

Step 1:homolog search

Step 2:MSA Generation

File Structure

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
embedding		embedding
examples		examples
indexing		indexing
msa		msa
search		search
setup		setup
README.md		README.md
environment.yml		environment.yml
homolog_search.py		homolog_search.py

ISYSLAB-HUST/ncRNAHD

Folders and files

Latest commit

History

Repository files navigation

ncRNAHD: Non-coding RNA Homolog Detection

Features

Installation

1. Clone the repository

2. Create conda environment

3. Download and process RNACentral database

4. Generate embeddings for the database

5. Build FAISS index

6. Setup custom rMSA

Usage

Step 1:homolog search

Step 2:MSA Generation

File Structure

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages