Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
113 commits
Select commit Hold shift + click to select a range
8e1ad87
Tree builder trait unimplemented
kalxed May 22, 2025
2dc20ec
NJBuilder class, no trait, public associated function implementation
kalxed May 22, 2025
fb70192
Mattes changes for deterministic tests passing
kalxed May 22, 2025
077a47e
Format Rust code using rustfmt
github-actions[bot] May 26, 2025
1573c0d
Added tests privately distance function still being debugged
kalxed May 27, 2025
f3026db
Merge branch 'feature/stochastic-tree-builder' of https://github.com/…
kalxed May 27, 2025
b08a0ad
Format Rust code using rustfmt
github-actions[bot] May 27, 2025
9acd004
Merge branch 'develop' of https://github.com/acg-team/rust-phylo into…
kalxed Jul 14, 2025
01f7a66
Custom distance function that aligns with bio impl, all but one test …
kalxed Jul 16, 2025
758f8df
Format Rust code using rustfmt
github-actions[bot] Jul 16, 2025
1f9170f
Options for both new() NJBuilder parameters, to work on adding stocha…
kalxed Jul 16, 2025
4b79e95
Merge branch 'feature/stochastic-tree-builder' of https://github.com/…
kalxed Jul 16, 2025
4d31305
Format Rust code using rustfmt
github-actions[bot] Jul 18, 2025
f631363
Used Into to circumvent calling Some() when constructing
kalxed Jul 18, 2025
05bac27
Temperature not implemented, default for NJBuilder and PhyloInfo Tree…
kalxed Jul 18, 2025
d0bded4
Format Rust code using rustfmt
github-actions[bot] Jul 18, 2025
abaf2a9
Stochastic aspect implemented, temperature implemented, tests to be a…
kalxed Jul 25, 2025
825500a
Merge branch 'develop' of https://github.com/acg-team/rust-phylo into…
kalxed Aug 5, 2025
4a88ccd
Add tree from the original paper to check NJ
junniest Aug 5, 2025
000b849
Reworked NJBuilder again with enum for temperature, and made it gener…
kalxed Aug 5, 2025
4422ea9
Merge branch 'feature/stochastic-tree-builder' of https://github.com/…
kalxed Aug 5, 2025
7ab93e1
Comment nj_uniform test before gloabl random implementation
kalxed Aug 5, 2025
6c2ca69
Fixed nj_tree_original_paper test
kalxed Aug 5, 2025
fe5a4a9
Naming for NJTreeBuilder and trait funciton changed, pub new(), comme…
kalxed Aug 6, 2025
ee394fa
Compute delta tree length reworked, temperature_branch working theore…
kalxed Aug 17, 2025
0395f26
Format Rust code using rustfmt
github-actions[bot] Aug 17, 2025
0099b13
Merge branch 'develop' of https://github.com/acg-team/rust-phylo into…
kalxed Aug 19, 2025
157fe16
Updated formatting with rest of comments, sample to be updated, and a…
kalxed Aug 19, 2025
618e8c8
Merge branch 'develop' of https://github.com/acg-team/rust-phylo into…
kalxed Aug 19, 2025
6739992
Softmax, Uniform, and build() tests to be added, also splitting curre…
kalxed Aug 19, 2025
e841a5d
Rename build_from_distances_w_rng
junniest Sep 25, 2025
2ef4ef4
Update comment and logs in softmax strategy
junniest Sep 25, 2025
acfb986
Remove coverage from tests and unnecessary comment
junniest Sep 25, 2025
a973ba4
Remove unnecessary argmin impl
junniest Sep 25, 2025
82a381a
Refactor softmax impl + avoid copying
junniest Sep 25, 2025
c69e931
Elide a lifetime to appease clippy
junniest Sep 25, 2025
6d67888
Add basic tests for softmax_from_distances
junniest Sep 25, 2025
784f69e
Move sampling from softmax to main build func
junniest Sep 25, 2025
150c7f5
Rename private tests -> tests
junniest Sep 25, 2025
0ce5b2f
Fix uniform weights in softmax
junniest Sep 25, 2025
56bccce
Add tests for softmax with temp
junniest Sep 25, 2025
1bce4bf
Aliases to make tests calls shorter
junniest Sep 26, 2025
d7013cc
Add derive Debug Clone PartialEq to Strategy Enum
junniest Sep 26, 2025
f389809
softmax_from_distances -> softmax_from_deltas
junniest Sep 26, 2025
9bbf557
Fix temp values, 0.0 is uniform, 1.0 is softmax
junniest Sep 26, 2025
5bcfb99
Check that temp values are set properly
junniest Sep 26, 2025
3e08c53
Move lower_triangle_index out of struct + tests
junniest Sep 26, 2025
2018638
Derive Clone for distance matrices
junniest Sep 26, 2025
db97160
Add softmax nj building tests
junniest Sep 26, 2025
88c04c8
Rename Strategy::Deterministic -> ArgMax
junniest Sep 26, 2025
ebbe2c0
Add log statement, fix comments
junniest Sep 26, 2025
2d761a3
Merge branch 'develop' into feature/stochastic-tree-builder
junniest Sep 26, 2025
2ffa019
Format Rust code using rustfmt
github-actions[bot] Sep 26, 2025
272b199
Remove unused use statement
junniest Sep 26, 2025
6e7ddf9
Add comment to indicate delta_tree_length is Q
junniest Sep 26, 2025
7361e01
Add doctring to TreeBuilder trait
junniest Sep 26, 2025
f4c5adb
Add missing semicolon
junniest Sep 26, 2025
052f979
Fix and clarify docstrings in NJTreeBuilder
junniest Sep 26, 2025
762b875
Tests for random sampling of 0.0/1.0 weights
junniest Sep 26, 2025
ffa1ffa
Remove redundant check from test and fix comment
junniest Sep 26, 2025
2e4967d
Fix comment on AA example with DNA distance func
junniest Sep 26, 2025
71c2d14
Clean up LevenshteinDNACorrected distance
junniest Sep 30, 2025
6cc307a
Add LevenshteinProteinCorrected distance
junniest Sep 30, 2025
9f577ba
Add sources for corrected distance formulas
junniest Sep 30, 2025
4a006e0
Add regular Levenshtein edit distance
junniest Sep 30, 2025
522f54e
Add tests for various distance measures
junniest Sep 30, 2025
4eb19dc
Add public use for NJBuilder and TreeBuilder
junniest Oct 2, 2025
98bd3e8
Add doctests and comments to NJTreeBuilder
junniest Oct 2, 2025
9d0498b
Add `build_nj_tree` method to `PhyloInfoBuilder`
junniest Oct 2, 2025
0c208a9
Remove unnecessary use statement
junniest Oct 2, 2025
e714d42
Fix comment for `PhyloInfoBuilder::build_nj_tree`
junniest Oct 2, 2025
9f856bc
Add basic test for `build_nj_tree`
junniest Oct 2, 2025
4eedb86
Remove unused `tree_builder` field
junniest Oct 2, 2025
2a0e2ec
Minor fixes for `PhyloInfoBuilder`
junniest Oct 3, 2025
3529f3b
Fix instantiation syntax for LDNACorr distance
junniest Oct 3, 2025
da9eb7a
Rename `delta_tree_length` to `compute_delta_...`
junniest Oct 3, 2025
f451ac7
Merge branch 'develop' into feature/stochastic-tree-builder
junniest Oct 27, 2025
f5f6eb6
Merge branch 'develop' into feature/stochastic-tree-builder
junniest Oct 31, 2025
e563d8a
Fix max_iter, precision tests for new start tree
junniest Oct 31, 2025
ab1e5e5
Rename q -> delta_lengths
junniest Oct 31, 2025
602181a
Merge branch 'develop' into feature/stochastic-tree-builder
junniest Nov 6, 2025
1689a65
Simplify NJBuilder::softmax
junniest Nov 10, 2025
644866a
Add back argmin randomisation and simplify build
junniest Nov 10, 2025
a5263e2
Fix fake values for fake softmax test
junniest Nov 10, 2025
ecef9d4
Add comment for lower_triangle_index transform
junniest Nov 10, 2025
66ee34e
Add comment to NJTreeBuilder::argmin
junniest Nov 10, 2025
56dd99b
Simplify optimise_tree macros and add passing rng
junniest Nov 10, 2025
15d2987
Add comment and test for FakeGenerator::shuffle
junniest Nov 10, 2025
864f350
FakeRng to make pip_optimise_model_tree test pass
junniest Nov 10, 2025
c862a1c
Modify 2 topo tests to start with phyml tree
junniest Nov 10, 2025
b3ffad9
Fix delta_tree_len matrix length formula
junniest Nov 10, 2025
5a1953a
Remove mut self from tree builder interface
junniest Nov 10, 2025
381adbf
Merge branch 'feature/stochastic-tree-builder' of github.com:acg-team…
junniest Nov 10, 2025
4a98a53
Remove unworking test
junniest Nov 10, 2025
c3a8882
Simplifying evolutionary distances
junniest Nov 10, 2025
d52cc5d
Fix comment to compute_delta_tree_length
junniest Nov 10, 2025
db9ea0e
Fix comment in test
junniest Nov 10, 2025
0e34e69
Fix docstring for NJ tie-breaking
junniest Nov 10, 2025
192cf2c
Remove unnecessary mut from doctests
junniest Nov 10, 2025
66fbaf9
Remove rogue pring statement
junniest Nov 10, 2025
54cf5fc
Fix info! statements on NJTreeBuilder::new
junniest Nov 11, 2025
2be8cf9
Replace unwrap() with ? in production code
junniest Nov 11, 2025
9e98cb8
Fix indentation in macro
junniest Nov 13, 2025
211d4aa
Refactor starting tree creation into a method
junniest Nov 13, 2025
993631c
Merge softmax examples into one test
junniest Nov 13, 2025
ae10615
Add test for softmax temperature interpolation
junniest Nov 13, 2025
1563be3
Add #[derive(PartialEq)] to Tree
junniest Nov 13, 2025
0a2fbb9
Fix `nj_builder_fake_softmax` test
junniest Nov 13, 2025
a86b7d8
Fix comment for argmax option in NJBuilder
junniest Nov 17, 2025
4d45b05
Merge branch 'feature/stochastic-tree-builder' of github.com:acg-team…
junniest Nov 17, 2025
1c5646f
Fix comment to clarify FakeRng::shuffle
junniest Nov 17, 2025
194af51
Remove unnecessary clone() from NJBuilder test
junniest Nov 17, 2025
5fbb6ba
Fix assertions in NJTreeBuilder::softmax test
junniest Nov 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 9 additions & 8 deletions phylo/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,18 @@ version = "0.1.0"
edition = "2021"
repository = "https://github.com/acg-team/rust-phylo"
license = "MIT OR Apache-2.0"
keywords = ["phylogenetics", "computational biology", "alignment", "MSA", "maximum likelihood"]
keywords = [
"phylogenetics",
"computational biology",
"alignment",
"MSA",
"maximum likelihood",
]
description = "A Rust library for phylogenetic analysis and multiple sequence alignment under maximum likelihood"
categories = ["science", "science::bioinformatics", "algorithms"]
exclude = [
"data/",
".github/",
"benches/",
"target/",
]
exclude = ["data/", ".github/", "benches/", "target/"]

rust-version = "1.82.0"
rust-version = "1.84.0"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

Expand Down
1 change: 1 addition & 0 deletions phylo/data/phyml_protein_example/phyml_nj_tree.newick
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
(((((((((PMarmoset:0.02931142,Tamarin:0.03977325):0.02582298,Squirrel:0.09657681):0.00554173,(Saki:0.03922165,Titi:0.03966531):0.02219892):0.00679583,(Howler:0.08093104,(Spider:0.03837468,Woolly:0.04073718):0.02217513):0.01497244):0.20396660,(((Chimp:0.01029413,Human:0.01228444):0.00350344,Gorilla:0.00833861):0.01134213,(Gibbon:0.06727730,Orangutan:0.03314814):0.00149069):0.03088307):0.04724073,(Colobus:0.00855577,DLangur:0.00980229):0.00782031):0.00711735,Patas:0.03048588):0.00608031,(AGM_cDNA:0.00848307,Tant_cDNA:0.00717932):0.01237098):0.00295050,Baboon:0.00807196,Rhes_cDNA:0.01226490);
224 changes: 224 additions & 0 deletions phylo/src/evolutionary_distances/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
use bio::{alignment::distance::levenshtein, io::fasta::Record};
use nalgebra::max;

pub trait EvolutionaryDistance {
fn dist(&self, a: &Record, b: &Record) -> f64;
}

#[derive(Clone, Copy, Debug, Default, Eq, Hash, PartialEq, PartialOrd, Ord)]
/// Levenshtein distance with Jukes-Cantor correction, meaningful for DNA sequences
pub struct LevenshteinDNACorrected;

impl EvolutionaryDistance for LevenshteinDNACorrected {
fn dist(&self, a: &Record, b: &Record) -> f64 {
// Distance formula corrected using the Jukes-Cantor model
// Formula 1.6 from "Computational Molecular Evolution" by Ziheng Yang (2006)
// d = -3/4 * ln(1 - (4/3) * p)
// where p is the proportion of differing sites (here, Levenshtein distance / max length).
// To avoid infinite distance when all characters are different, the maximum
// proportion of different characters is capped to 3/4=0.75.
let seq_i = a.seq();
let seq_j = b.seq();
let dist = levenshtein(seq_i, seq_j) as f64;
let p = f64::min(
dist / (max(seq_i.len(), seq_j.len()) as f64),
3.0 / 4.0 - f64::EPSILON,
);
-(3.0 / 4.0) * (1.0 - (4.0 / 3.0) * p).ln()
}
}

#[derive(Clone, Copy, Debug, Default, Eq, Hash, PartialEq, PartialOrd, Ord)]
/// Levenshtein distance with Jukes-Cantor-like correction, meaningful for protein sequences
pub struct LevenshteinProteinCorrected;

impl EvolutionaryDistance for LevenshteinProteinCorrected {
fn dist(&self, a: &Record, b: &Record) -> f64 {
// Corrected protein distance formula equivalent to Jukes-Cantor for proteins.
// Formula 2.3 from "Computational Molecular Evolution" by Ziheng Yang (2006)
// d = -19/20 * ln(1 - (20/19) * p)
// where p is the proportion of differing sites (here, Levenshtein distance / max length).
// To avoid infinite distance when all characters are different, the maximum
// proportion of different characters is capped to 19/20=0.95.

let seq_i = a.seq();
let seq_j = b.seq();
let dist = levenshtein(seq_i, seq_j) as f64;
let p = f64::min(
dist / (max(seq_i.len(), seq_j.len()) as f64),
19.0 / 20.0 - f64::EPSILON,
);
-(19.0 / 20.0) * (1.0 - (20.0 / 19.0) * p).ln()
}
}

#[derive(Clone, Copy, Debug, Default, Eq, Hash, PartialEq, PartialOrd, Ord)]
/// Simple Levenshtein distance without any correction, meaningful for both DNA
/// and protein sequences
pub struct Levenshtein;

impl EvolutionaryDistance for Levenshtein {
fn dist(&self, a: &Record, b: &Record) -> f64 {
let seq_i = a.seq();
let seq_j = b.seq();
levenshtein(seq_i, seq_j) as f64
}
}

#[cfg(test)]
#[cfg_attr(coverage, coverage(off))]
mod tests {
use super::*;

use crate::record_wo_desc as record;

#[test]
fn levenshtein_edit_dist() {
let dna_corr = Levenshtein {};

let s1 = record!("s1", b"AAAAAAAA");
let s2 = record!("s2", b"AAAAAAAA");
assert_eq!(dna_corr.dist(&s1, &s2), 0.0);
assert_eq!(dna_corr.dist(&s1, &s1), 0.0);

let s1 = record!("s1", b"AAAAAAAA");
let s2 = record!("s2", b"TTTTTTTT");
assert_eq!(dna_corr.dist(&s1, &s2), s1.seq().len() as f64);

let s1 = record!("s1", b"AAAAAAAAAAAAAAAA");
let s2 = record!("s2", b"AAAAA");
assert_eq!(
dna_corr.dist(&s1, &s2),
(s1.seq().len() - s2.seq().len()) as f64
);

let s1 = record!("s1", b"AAAAATAACAAAGTAAA");
let s2 = record!("s2", b"AAAAATCAGT");
assert_eq!(
dna_corr.dist(&s1, &s2),
(s1.seq().len() - s2.seq().len()) as f64
);

let s1 = record!("s1", b"AAAAAAAAAAAAAAAA");
let s2 = record!("s2", b"TTTTT");
assert_eq!(dna_corr.dist(&s1, &s2), s1.seq().len() as f64);
}

#[test]
fn levenshtein_dna_corrected_corners() {
let dna_corr = LevenshteinDNACorrected {};

let s1 = record!("s1", b"AAAAAAAA");
let s2 = record!("s2", b"AAAAAAAA");
assert_eq!(dna_corr.dist(&s1, &s2), 0.0);
assert_eq!(dna_corr.dist(&s1, &s1), 0.0);

let max_proportion = 3.0 / 4.0;
let max_distance =
-max_proportion * (1.0 - 1.0 / max_proportion * (max_proportion - f64::EPSILON)).ln();
// Max distance possible between sequences (all chars are different), computed with Python
assert_eq!(max_distance, 26.728641210756745);

let s1 = record!("s1", b"AAAAAAAA");
let s2 = record!("s2", b"TTTTTTTT");
assert_eq!(dna_corr.dist(&s1, &s2), max_distance);

let s1 = record!("s1", b"AAAAAAAA");
let s2 = record!("s2", b"");
assert_eq!(dna_corr.dist(&s1, &s2), max_distance);
}

#[test]
fn levenshtein_dna_corrected() {
let dna_corr = LevenshteinDNACorrected {};

let s1 = record!("s1", b"AAAAAAAA");
let s2 = record!("s2", b"AAAAAAAT");
// Levenshtein distance is 1, proportion is 1/8=0.125, distance computed with Python
assert_eq!(dna_corr.dist(&s1, &s2), 0.13674116759546595);

let s1 = record!("s1", b"AAAAATAACAAAGTAAA");
let s2 = record!("s2", b"AAAAATCAGT");
// Levenshtein distance is 7, proportion is 7/17=0.4117647058823529, distance computed with Python
assert_eq!(dna_corr.dist(&s1, &s2), 0.5972485625963819);
}

#[test]
fn levenshtein_dna_corrected_sanity() {
let dna_corr = LevenshteinDNACorrected {};

let s1 = record!("seq1", b"ACGTACGTXXXXXX");
let s2 = record!("seq2", b"AAGTACGTXXXXXX");
let s3 = record!("seq3", b"AAGTTCGTXXXXXX");
let s4 = record!("seq4", b"TTTTTTTTXXXXXX");

assert!(dna_corr.dist(&s1, &s2) < dna_corr.dist(&s1, &s3));
assert!(dna_corr.dist(&s1, &s2) < dna_corr.dist(&s1, &s4));
assert!(dna_corr.dist(&s1, &s3) < dna_corr.dist(&s1, &s4));
assert!(dna_corr.dist(&s2, &s3) < dna_corr.dist(&s2, &s4));
assert_eq!(dna_corr.dist(&s1, &s4), dna_corr.dist(&s2, &s4));
assert!(dna_corr.dist(&s3, &s4) < dna_corr.dist(&s2, &s4));

let s5 = record!("seq5", b"GGGGGGGGGGGGGG");
assert_eq!(dna_corr.dist(&s1, &s5), dna_corr.dist(&s2, &s5));
assert_eq!(dna_corr.dist(&s1, &s5), dna_corr.dist(&s3, &s5));
assert_eq!(dna_corr.dist(&s1, &s5), dna_corr.dist(&s4, &s5));
}

#[test]
fn levenshtein_protein_corrected_corners() {
let prot_corr = LevenshteinProteinCorrected {};

let s1 = record!("s1", b"PPPPPPPP");
let s2 = record!("s2", b"PPPPPPPP");
assert_eq!(prot_corr.dist(&s1, &s2), 0.0);
assert_eq!(prot_corr.dist(&s1, &s1), 0.0);

let max_proportion = 19.0 / 20.0;
let max_distance =
-max_proportion * (1.0 - 1.0 / max_proportion * (max_proportion - f64::EPSILON)).ln();
// Max distance possible between sequences (all chars are different), computed with Python
assert_eq!(max_distance, 33.85627886695854);

let s1 = record!("s1", b"PPPPPPPP");
let s2 = record!("s2", b"NNNNNNNN");
assert_eq!(prot_corr.dist(&s1, &s2), max_distance);

let s1 = record!("s1", b"PPPPPPPP");
let s2 = record!("s2", b"");
assert_eq!(prot_corr.dist(&s1, &s2), max_distance);
}

#[test]
fn levenshtein_protein_corrected() {
let prot_corr = LevenshteinProteinCorrected {};

let s1 = record!("s1", b"AAAAAAAA");
let s2 = record!("s2", b"AAAAAAAT");
// Levenshtein distance is 1, proportion is 1/8=0.125, distance computed with Python
assert_eq!(prot_corr.dist(&s1, &s2), 0.1340246683469102);

let s1 = record!("s1", b"AAAAATAACAAAGTAAA");
let s2 = record!("s2", b"AAAAATCAGT");
// Levenshtein distance is 7, proportion is 7/17=0.4117647058823529, distance computed with Python
assert_eq!(prot_corr.dist(&s1, &s2), 0.5397578618621736);
}

#[test]
fn levenshtein_protein_corrected_sanity() {
let prot_corr = LevenshteinProteinCorrected {};

let s1 = record!("seq1", b"ARNDCQEGHILKMFPSTWYV");
let s2 = record!("seq2", b"ARNDCQEGHILKMFPSTWVV");
let s3 = record!("seq3", b"ARNDCQEGHILKMFPSRRVV");
let s4 = record!("seq4", b"RRRRRRRRRRRRRRRRRRRR");

assert!(prot_corr.dist(&s1, &s2) < prot_corr.dist(&s1, &s3));
assert!(prot_corr.dist(&s1, &s2) < prot_corr.dist(&s1, &s4));
assert!(prot_corr.dist(&s1, &s3) < prot_corr.dist(&s1, &s4));
assert!(prot_corr.dist(&s2, &s3) < prot_corr.dist(&s2, &s4));
assert!(prot_corr.dist(&s3, &s4) < prot_corr.dist(&s2, &s4));
assert_eq!(prot_corr.dist(&s1, &s4), prot_corr.dist(&s2, &s4));
assert!(prot_corr.dist(&s1, &s4) > prot_corr.dist(&s3, &s4));
}
}
1 change: 1 addition & 0 deletions phylo/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ pub use bio::io::fasta::Record;
pub mod alignment;
pub mod alphabets;
pub mod asr;
pub mod evolutionary_distances;
pub mod evolutionary_models;
pub mod io;
pub mod likelihood;
Expand Down
Loading