Skip to content

pentamorfico/hoodini

Repository files navigation

Hoodini Logo

Large-scale gene neighborhood analyses that feel like magic

Python Conda PyPI License: MIT

🎮 Live Demo · 📖 Documentation · 🖼️ Gallery · 🧪 Colab


🧬 What is Hoodini?

Hoodini is a gene-centric comparative genomics toolkit that fetches public assemblies, extracts gene neighborhoods, runs pairwise protein and nucleotide comparisons, annotates neighborhoods with defense systems and mobile elements, and builds phylogenetic trees — all with GPU-accelerated interactive visualization.


🚀 Scales Fast 🔬 Annotations 🎨 Visualization
1000s of genomes Minutes PADLOC, DefenseFinder, CCTyper Publication-ready SVG

Example visualization


✨ Key Features

  • 📥 Automated data retrieval — Fetches assemblies from NCBI using protein or nucleotide accessions
  • 🧬 Neighborhood extraction — Configurable genomic windows around target genes
  • 🔗 Protein clustering — Groups homologous proteins for synteny comparison
  • 📊 Pairwise comparisons — AAI (amino acid) and ANI (nucleotide) similarities
  • 🌳 Tree construction — Phylogenetic trees from sequence identity
  • 🛡️ Defense annotations — PADLOC, DefenseFinder, CCTyper, geNomad
  • 🎨 Interactive visualization — Self-contained HTML with 50+ color palettes

🚀 Quick Start

# Single protein query
hoodini run --input WP_012345678.1 --output results

# With protein comparisons and phylogenetic tree
hoodini run --input proteins.txt --output results --prot-links --tree-mode aai_tree

# Full analysis with annotations
hoodini run --input proteins.txt --output results \
  --prot-links --tree-mode aai_tree \
  --padloc --deffinder --cctyper --genomad \
  --num-threads 16

📖 See the Tutorial for a complete walkthrough.


📦 Installation

Hoodini requires Python packages and bioinformatics tools. The recommended methods handle all dependencies.

⚠️ Note: Bioconda and PyPI packages are coming soon. Use the development installation below.

Using pixi (recommended)

git clone https://github.com/pentamorfico/hoodini.git
cd hoodini
pixi install
pixi run hoodini download databases

Using mamba/conda

git clone https://github.com/pentamorfico/hoodini.git
cd hoodini
mamba env create -f environment.yml
mamba activate hoodini
pip install -e .
hoodini download databases
Python-only installation (uv/pip)

⚠️ This only installs Python packages. Bioinformatics tools must be in your PATH.

Using uv:

git clone https://github.com/pentamorfico/hoodini.git
cd hoodini
uv sync
uv run hoodini download databases

Using pip:

git clone https://github.com/pentamorfico/hoodini.git
cd hoodini
pip install -e .
hoodini download databases
Docker

⚠️ Docker image available but not fully tested. Please report any issues.

docker volume create hoodini-data
docker run --rm -v hoodini-data:/app/src/hoodini/data \
  pentamorfico/hoodini:latest hoodini download databases
docker run --rm -v hoodini-data:/app/src/hoodini/data -v $(pwd):/work \
  pentamorfico/hoodini:latest hoodini run --input /work/proteins.txt --output /work/results

📖 See Installation Guide for detailed instructions.


🛠️ Usage

hoodini run      Run the main pipeline
hoodini download Download required databases
Input Options
Option Description
--input ID|FILE Single accession or file with one per line
--inputsheet FILE TSV with accessions and custom metadata
Neighborhood Extraction
Option Description
--win-mode win_genes (gene count) or win_nts (nucleotides)
--win INT Window size (default: 10 genes or 10000 nt)
--sorfs Re-annotate small ORFs
Comparisons & Trees
Option Description
--prot-links All-vs-all protein similarities
--nt-links Pairwise nucleotide alignments
--tree-mode aai_tree or ani_tree
Annotations
Option Description
--padloc Defense systems (PADLOC)
--deffinder Defense systems (DefenseFinder)
--cctyper CRISPR-Cas typing
--genomad Mobile genetic elements
--domains LIST Domain databases

📖 Full reference: CLI Documentation


📁 Output

Hoodini generates a hoodini-viz/ folder with:

  • Self-contained HTML viewer
  • Newick tree
  • TSV and Parquet data files

📖 See Outputs Guide for details.


📚 Learn More

Resource Description
📖 Documentation Full documentation
🎮 Live Demo Interactive examples
🖼️ Gallery Real-world examples from publications
🧪 Colab Run in Google Colab
📦 hoodini-viz Visualization library (npm)

🙏 Acknowledgments

Hoodini is inspired by excellent tools in the field:

  • GCsnap — Gene context visualization
  • FlaGs — Flanking genes analysis
  • Taxonium — Large trees visualization
  • clinker — Gene cluster comparison
  • gggenes — Gene arrow maps in R
  • gggenomes — Comparative genomics visualization

📄 Citation

[Citation pending publication]

📜 License

MIT License. See LICENSE file.