Deep-learning models for predicting miRNA–mRNA interactions.
This repository ships two models:
- Pairwise binding-site model — a CNN that predicts whether a given miRNA binds a given target site (≈50 nt window). Use this to score candidate binding sites.
- Gene-level repression model — predicts the gene-level fold change a miRNA induces from a full 3'UTR sequence. Built on top of the binding-site model via transfer learning.
Clone the repo and install the dependencies (Python ≥ 3.9, PyTorch ≥ 1.9):
git clone https://github.com/BioGeMT/miRBind_2.0.git
cd miRBind_2.0
pip install -r code/pairwise_binding_site_model/requirements.txtA GPU is recommended but not required — the models will fall back to CPU automatically.
The trained binding-site model is included in models/pairwise_onehot_model_20260105_200141.pt.
A TSV file with at minimum these columns:
| column 0 (target/mRNA) | column 1 (miRNA) | label |
|---|---|---|
TTTTTTTT...GACAGTGG |
TGTGCAAATCTATGCAAAACTGA |
0 |
The label column is required by the data loader but is ignored at inference. Set it to 0 if you don't have ground truth. A small example is provided in data/chimeric_datasets/sample_dataset/.
cd code/pairwise_binding_site_model
python -m inference.predict \
--model_path ../../models/pairwise_onehot_model_20260105_200141.pt \
--input_file path/to/your_sites.tsv \
--output_file predictions.tsv \
--model_type pairwise_onehot \
--batch_size 32The output TSV is your input plus two columns:
prediction_score— binding probability in [0, 1]predicted_class— 1 ifprediction_score > 0.5, else 0
There is also a ready-to-edit wrapper script at analysis/pairwise_binding_site_model/inference.sh.
See analysis/gene_level_model/README.md for the full walkthrough. Briefly:
# install gene-level model dependencies
pip install -r analysis/gene_level_model/requirements.txt
# download the training/eval data
bash analysis/gene_level_model/download_data.sh
# evaluate on a test set (or train your own — see analysis/gene_level_model/train.sh)
bash analysis/gene_level_model/evaluate.shThe gene-level model takes a full 3'UTR sequence (up to several thousand nt) and a miRNA sequence and predicts a scalar fold change.
The binding-site model supports SHAP-based attribution (via Captum's GradientShap). See code/pairwise_binding_site_model/README.md for the SHAP, clustering, and aggregation pipelines.
To reproduce the published results or train from scratch:
bash data/scripts/run_zenodo_downloader.shThis pulls the AGO2 eCLIP Manakov 2022 train / test / leftout splits from Zenodo into data/chimeric_datasets/.
- code/ — model definitions, encoders, training and inference scripts.
- analysis/ — runnable wrapper scripts (
train.sh,inference.sh, etc.) for each model. - data/ — placeholder; populated by the download scripts above.
- models/ — trained model checkpoints.
We track model performance on the Manakov22 test and leftout datasets, ranked by Average Precision score (AP) AP(test) + AP(leftout).
| Rank | Model | AP(test) | AP(leftout) | Model | Code | Date | Authors |
|---|---|---|---|---|---|---|---|
| 1 | Pairwise encoding with conservation (+2 channels) | 85.93 | 82.26 | model | code | 2025-03-27 | Dimos, David, Panos |
| 2 | Pairwise encoding CNN | 84.97 | 83.08 | model | code | 2025-03-19 | David, Panos |
| 3 | Retrained miRBind CNN (miRBench) | 84.00 | 81.00 | — | — | 2025-03-19 | Eva |
| 4 | TargetScanCNN | 77.00 | 76.00 | — | — | 2025-03-19 | TargetScan |
Transcript repression predictions stored on Google Drive
If you use miRBind 2.0 in your work, please cite the corresponding manuscript: miRBind2 enables sequence-only prediction of miRNA binding and transcript repression.