(JA)TI is a command-line tool for maximum likelihood phylogenetic tree reconstruction. Currently it supports tree search with SPR moves on DNA and protein sequences, and allows two gap handling strategies: treating gaps in alignments as missing data or using the Poisson Indel Process (PIP) (paper) model.
- Multiple substitution models: JC69, K80, HKY85, TN93, GTR for DNA; WAG, HIVB, BLOSUM for proteins;
- Gap handling: PIP model or treating gaps as missing data;
- Character frequency optimisation: fixed or empirical frequencies (ML estimates to be added);
- Tree optimisation: SPR-based topology and branch length optimisation;
- Parallel computation: optional multi-threading support for faster tree search;
- Comprehensive logging: debug and info level logging with timestamps;
- Reproducible results: optional PRNG seed for deterministic runs.
- Input alignment: JATI requires aligned sequences in FASTA format as input.
- Starting tree: You can provide a starting tree in Newick format using the
--tree-fileoption. If no tree is provided, JATI will construct a starting tree using the Neighbor-Joining (NJ) algorithm. - Model parameters: You can optionally provide starting values for model parameters (e.g., λ and μ for PIP, substitution rate matrix values, kappa, frequencies) using the
--paramsand--freqsoptions. These values will be used as initial guesses and will be optimised during the run.
JATI uses an iterative optimisation approach:
- Model parameter optimisation: maximum likelihood estimation of substitution model parameters;
- Tree topology optimisation: SPR (Subtree Pruning and Regrafting) moves to improve the current tree topology;
- Branch length optimisation: optimises all branch lengths given the current topology.
The process continues until convergence (change in log-likelihood < epsilon) or maximum iterations reached.
- PIP model likelihood computation is more intensive and will take longer than treating gaps as missing data, as more computation needs to be done per site;
- Tree topology optimisation with SPR moves scales quadratically with the number of taxa (
$n$ ) and linearly with the number of sites ($m$ ), making the overall complexity$O(n^2m)$ .
When built with the parallel feature, JATI can use multiple CPU cores to speed up tree search by parallelising the evaluation of possible regrafting positions for each pruning location during SPR moves.
- Benefits: Speedup on multi-core systems during tree topology optimisation, especially for larger trees;
- Memory usage: Parallel execution may increase memory consumption due to thread-local data structures;
- Thread count: JATI automatically detects and uses all available CPU cores. You can control the number of threads using the
RAYON_NUM_THREADSenvironment variable:
# Limit to 4 threads
RAYON_NUM_THREADS=4 ./target/release/jati --seq-file data/dna_alignment.fasta --model JC69
# Use all available cores (default behavior)
./target/release/jati --seq-file data/dna_alignment.fasta --model JC69When using the parallel feature, results remain deterministic when a PRNG seed is specified, despite the parallel execution. The parallelisation is designed to maintain reproducible results across runs and the final tree and log-likelihood should be indentical to a run without the parallel feature given the same seed.
Pre-compiled binaries for different platforms are available on the release page.
- Minimum Supported Rust Version: 1.84.0 (detected using
cargo-msrv); - Git.
- Clone the repository:
git clone https://github.com/acg-team/JATI.git
cd JATI- Build the project: Standard build (single-threaded):
cargo build --releaseParallel build (multi-threaded support):
cargo build --release --features parallelThe compiled binary will be available at target/release/jati.
parallel: Enables multi-threading support for improved performance on multi-core systems. This feature uses Rayon for parallel computation of likelihood calculations and tree search operations.
./target/release/jati -s <sequence_file> -m <model># Basic DNA analysis with JC69 model and gaps as missing data
./target/release/jati \
--seq-file data/dna_alignment.fasta \
--model JC69 \
--gap-handling missing
# Protein analysis with input tree, WAG substitution model and PIP
./target/release/jati \
--seq-file data/protein_alignment.fasta \
--tree-file data/protein_starting_tree.newick \
--model WAG \
--gap-handling pip \
--out-folder results/| Parameter | Short | Description |
|---|---|---|
--seq-file |
-s |
Input sequence file in FASTA format |
--model |
-m |
Evolutionary model (see supported models below) |
| Parameter | Short | Default | Description |
|---|---|---|---|
--out-folder |
-d |
. |
Output directory for results |
--tree-file |
-t |
None | Input tree file in Newick format (if not provided, NJ tree will be built) |
--run-name |
-r |
None | Custom identifier for the run |
--max-iterations |
-x |
5 |
Maximum number of optimisation iterations |
--params |
-p |
None | Model-specific parameters (space-separated), when using PIP the first two parameters will be used for λ and μ |
--freqs |
-f |
None | Stationary frequencies: π_T π_C π_A π_G (DNA) or custom (protein) |
--freq-opt |
-o |
empirical |
Frequency optimisation: fixed or empirical |
--gap-handling |
-g |
pip |
Gap handling strategy: pip or missing |
--epsilon |
-e |
1e-5 |
Convergence threshold for optimisation |
--seed |
None | PRNG seed for reproducible results |
- JC69: Jukes-Cantor model (equal rates, equal frequencies);
- K80: Kimura 2-parameter model (equal frequencies):
- Parameters:
α(transition/transversion ratio),π_T π_C π_A π_G(stationary frequencies);
- Parameters:
- HKY85/HKY: Hasegawa-Kishino-Yano model:
- Parameters:
α(transition/transversion ratio),π_T π_C π_A π_G(stationary frequencies);
- Parameters:
- TN93: Tamura-Nei model:
- Parameters:
α₁α₂(transition rates),π_T π_C π_A π_G(stationary frequencies);
- Parameters:
- GTR: General Time Reversible model:
- Parameters:
r_TCr_TAr_TGr_CAr_CG(rate matrix values,r_AGfixed to 1.0),π_T π_C π_A π_G(stationary frequencies).
- Parameters:
- WAG: Whelan and Goldman model;
- HIVB: HIV between-host model;
- BLOSUM: BLOSUM62-based model.
- PIP: Uses the Poisson Indel Process model to handle gaps as evolutionary events;
- Missing: Treats gaps as missing data during likelihood calculation.
- fixed: Use provided frequencies without optimisation;
- empirical: Calculate frequencies from the input sequences.
JATI creates an output folder with the following structure:
<run_id>_out/
├── <run_id>_start_tree.newick # Starting tree (input or NJ tree)
├── <run_id>_tree.newick # Final optimised phylogenetic tree
├── <run_id>_logl.out # Final log-likelihood value
└── <run_id>.log # Detailed execution log
*_tree.newick: The final optimised phylogenetic tree in Newick format with branch lengths;*_start_tree.newick: The starting tree used for optimisation;*_logl.out: The final log-likelihood value of the optimised model and tree;*.log: Comprehensive log file with:- Run configuration and parameters;
- Model setup and gap handling strategy;
- Iteration-by-iteration optimisation progress;
- Final parameter values and frequencies;
- Timing and convergence information.
If no --run-name is provided, runs are identified by timestamp. Output folder format:
- With run name:
<run_name>_<timestamp>_out - Without run name:
<timestamp>_out
./target/release/jati \
--seq-file data/wickd3b_7705.processed.fasta \
--model JC69 \
--gap-handling pip \
--max-iterations 10 \
--out-folder results/./target/release/jati \
--seq-file data/whela7_Gene_0917.fasta \
--model WAG \
--gap-handling missing \
--freq-opt empirical \
--run-name protein_analysis./target/release/jati \
--seq-file data/wickd3b_7705.processed.fasta \
--model GTR \
--gap-handling pip \
--params 2.0 1.5 0.4 0.6 0.7 0.6 0.8 \
--freqs 0.28 0.22 0.25 0.25 \
--freq-opt fixed \
--seed 12345 \
--epsilon 1e-6./target/release/jati \
--seq-file data/wickd3b_7705.processed.fasta \
--model HKY \
--params 2.0 1.5 2.5 \
--freq-opt empirical \
--max-iterations 15# Build with parallel support
cargo build --release --features parallel
# Run with thread control
RAYON_NUM_THREADS=4 ./target/release/jati \
--seq-file data/HIV-1_env_DNA_mafft_alignment.fasta \
--model GTR \
--gap-handling pip \
--max-iterations 20 \
--run-name hiv_analysisThe repository includes example alignments in the data/ folder for testing purposes.
-
wickd3b_7705.processed.fasta: A medium size DNA alignment from the phylogenetic benchmarks (Zhou et al. 2018). -
whela7_Gene_0917.fasta: Protein alignment from the same phylogenetic benchmarks (Zhou et al. 2018). -
HIV-1_env_DNA_mafft_alignment.fasta: Large DNA alignment of HIV-1 env gene sequences.
At the moment JATI provides comprehensive logging with two levels:
- Console: Info-level messages showing run progress and major steps;
- File: Debug-level messages with detailed parameter information and convergence details.
Log timestamps are formatted as HH:MM:SS for clear timing information. The log file contains the complete run configuration at the start for reproducibility.
For reproducible results, you can specify a PRNG seed:
- If not provided: Uses current timestamp as seed (logged for reference);
- If provided: Uses the specified seed for all random operations.
This ensures deterministic results across runs on the same system with the same input and seed.
- Built with Rust 2021 edition;
- Uses the
phylolibrary for phylogenetic functionality; - Command-line parsing with
clap; - Logging with
logandftail; - Timestamps with
ntimestamp; - Error handling with
anyhow; - Optional:
rayonfor parallel computation (whenparallelfeature is enabled).
For questions, bug reports, or feature requests, please go to JATI discussion page and/or open an issue on GitHub.
This project is developed by Jūlija Pečerska (GitHub, email).
This project is licensed under either of
- Apache Licence, Version 2.0 (LICENSE-APACHE or www.apache.org/licenses/LICENSE-2.0), or
- MIT Licence (LICENSE-MIT or opensource.org/licenses/MIT)
at your option.