Hoodini is a gene-centric comparative genomics toolkit that fetches public assemblies, extracts gene neighborhoods, runs pairwise protein and nucleotide comparisons, annotates neighborhoods with defense systems and mobile elements, and builds phylogenetic trees — all with GPU-accelerated interactive visualization.
| 🚀 Scales | ⚡ Fast | 🔬 Annotations | 🎨 Visualization |
|---|---|---|---|
| 1000s of genomes | Minutes | PADLOC, DefenseFinder, CCTyper | Publication-ready SVG |
- 📥 Automated data retrieval — Fetches assemblies from NCBI using protein or nucleotide accessions
- 🧬 Neighborhood extraction — Configurable genomic windows around target genes
- 🔗 Protein clustering — Groups homologous proteins for synteny comparison
- 📊 Pairwise comparisons — AAI (amino acid) and ANI (nucleotide) similarities
- 🌳 Tree construction — Phylogenetic trees from sequence identity
- 🛡️ Defense annotations — PADLOC, DefenseFinder, CCTyper, geNomad
- 🎨 Interactive visualization — Self-contained HTML with 50+ color palettes
# Single protein query
hoodini run --input WP_012345678.1 --output results
# With protein comparisons and phylogenetic tree
hoodini run --input proteins.txt --output results --prot-links --tree-mode aai_tree
# Full analysis with annotations
hoodini run --input proteins.txt --output results \
--prot-links --tree-mode aai_tree \
--padloc --deffinder --cctyper --genomad \
--num-threads 16📖 See the Tutorial for a complete walkthrough.
Hoodini requires Python packages and bioinformatics tools. The recommended methods handle all dependencies.
⚠️ Note: Bioconda and PyPI packages are coming soon. Use the development installation below.
|
Using pixi (recommended) git clone https://github.com/pentamorfico/hoodini.git
cd hoodini
pixi install
pixi run hoodini download databases |
Using mamba/conda git clone https://github.com/pentamorfico/hoodini.git
cd hoodini
mamba env create -f environment.yml
mamba activate hoodini
pip install -e .
hoodini download databases |
Python-only installation (uv/pip)
⚠️ This only installs Python packages. Bioinformatics tools must be in your PATH.
Using uv:
git clone https://github.com/pentamorfico/hoodini.git
cd hoodini
uv sync
uv run hoodini download databasesUsing pip:
git clone https://github.com/pentamorfico/hoodini.git
cd hoodini
pip install -e .
hoodini download databasesDocker
⚠️ Docker image available but not fully tested. Please report any issues.
docker volume create hoodini-data
docker run --rm -v hoodini-data:/app/src/hoodini/data \
pentamorfico/hoodini:latest hoodini download databases
docker run --rm -v hoodini-data:/app/src/hoodini/data -v $(pwd):/work \
pentamorfico/hoodini:latest hoodini run --input /work/proteins.txt --output /work/results📖 See Installation Guide for detailed instructions.
hoodini run Run the main pipeline
hoodini download Download required databases
Input Options
| Option | Description |
|---|---|
--input ID|FILE |
Single accession or file with one per line |
--inputsheet FILE |
TSV with accessions and custom metadata |
Neighborhood Extraction
| Option | Description |
|---|---|
--win-mode |
win_genes (gene count) or win_nts (nucleotides) |
--win INT |
Window size (default: 10 genes or 10000 nt) |
--sorfs |
Re-annotate small ORFs |
Comparisons & Trees
| Option | Description |
|---|---|
--prot-links |
All-vs-all protein similarities |
--nt-links |
Pairwise nucleotide alignments |
--tree-mode |
aai_tree or ani_tree |
Annotations
| Option | Description |
|---|---|
--padloc |
Defense systems (PADLOC) |
--deffinder |
Defense systems (DefenseFinder) |
--cctyper |
CRISPR-Cas typing |
--genomad |
Mobile genetic elements |
--domains LIST |
Domain databases |
📖 Full reference: CLI Documentation
Hoodini generates a hoodini-viz/ folder with:
- Self-contained HTML viewer
- Newick tree
- TSV and Parquet data files
📖 See Outputs Guide for details.
| Resource | Description |
|---|---|
| 📖 Documentation | Full documentation |
| 🎮 Live Demo | Interactive examples |
| 🖼️ Gallery | Real-world examples from publications |
| 🧪 Colab | Run in Google Colab |
| 📦 hoodini-viz | Visualization library (npm) |
Hoodini is inspired by excellent tools in the field:
- GCsnap — Gene context visualization
- FlaGs — Flanking genes analysis
- Taxonium — Large trees visualization
- clinker — Gene cluster comparison
- gggenes — Gene arrow maps in R
- gggenomes — Comparative genomics visualization
[Citation pending publication]
MIT License. See LICENSE file.