Implementation of TKF92 #99

MattesMrzik · 2025-09-17T17:00:18Z

Implementation of the TKF92 cost for given tree and Multiple Ancestral Sequence Alignment (MASA), ie an alignment that includes ancestral wildcard sequences.

Closes #58. Also implements TKF91 (since this is a special case of TKF92 when the parameter r = 0)

The functionality of fixing/re-estimating ancestral sequences after a tree move (see #115) will be addressed in another PR, which will include the impl of TreeSearchCost

- i just moved the code from my own repo to here, in my repo i tested, but not yet here.

reassinged, perhaps this is numerical issue

- the msa that caused the error was that internal nodes had chars but the leaves not. this caused the x to be 1 during reassignment since it removed the ancestral chars, which is correct, but then intergrating is wrong, since it integrates over nothing (no chars are there in the whole msa column). added an assert to check for x != 1 - can run find_brute_force_max in parallel and in series, both with progress bar

…hange

why wasnt this noticed earlier?

codecov · 2025-10-16T17:52:24Z

Codecov Report

❌ Patch coverage is 97.92000% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.26%. Comparing base (527040d) to head (53f16b3).
⚠️ Report is 21 commits behind head on develop.

Files with missing lines	Patch %	Lines
phylo/src/alignment/mod.rs	33.33%	6 Missing ⚠️
phylo/src/tkf_model/tkf91.rs	96.72%	4 Missing ⚠️
phylo/src/tkf_model/tkf_indel.rs	99.26%	2 Missing ⚠️
phylo/src/tkf_model/tkf92.rs	99.42%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop      #99      +/-   ##
===========================================
+ Coverage    96.32%   97.26%   +0.94%     
===========================================
  Files           32       45      +13     
  Lines         4355     5671    +1316     
===========================================
+ Hits          4195     5516    +1321     
+ Misses         160      155       -5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- removed plot r - removed find fastas in benchmark dir - removed test that found a beta = 0 when very short branch

- clean up - model opti also for tkf91

Copilot

Pull Request Overview

This PR introduces support for TKF (Thorne-Kishino-Felsenstein) evolutionary models, specifically TKF91 and TKF92, which are models for sequence evolution with insertions and deletions. The implementation includes model building, likelihood calculations, parameter optimization, and integration with the existing topology optimization framework.

Key changes:

Implementation of TKF91 and TKF92 indel models with likelihood computation
Integration with substitution models to create combined TKF+substitution costs
Support for model parameter optimization and tree topology optimization (NNI)

Reviewed Changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
phylo/src/tkf_model/mod.rs	Core implementation of TKF91/TKF92 models, builders, and likelihood calculations
phylo/src/tkf_model/tests.rs	Comprehensive test suite covering model functionality, likelihood calculations, and parameter validation
phylo/src/lib.rs	Module export for tkf_model
phylo/src/alignment/mod.rs	Added methods to support ancestral alignment updates needed by TKF models
phylo/src/optimisers/topo_optimiser.rs	Added compatibility traits for TKF models with NNI optimization
phylo/src/optimisers/spr_optimiser.rs	Added comment clarifying optimization logic
phylo/src/optimisers/model_optimiser_tests.rs	Added test for TKF model optimization
phylo/src/phylo_info/phyloinfo_builder.rs	Enhanced error messages and added logging for tree node ID setting
phylo/src/phylo_info/tests.rs	Updated test to match new error message format
phylo/data/tree_multiple.newick	Fixed formatting (restored newline)
phylo/benches/helpers.rs	Fixed typo in comment
phylo/Cargo.toml	Removed extra blank line

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

phylo/src/phylo_info/phyloinfo_builder.rs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

junniest

This is quite hefty to review, so I think we'll need another round.
One more thought: I think it might be worth at this point to split the module into at the very least main (mod.rs), tkf91_model and tkf92_model or something along these lines (and tkfindel_model?). It becomes quite difficult to figure out which model is which in a file this long.

phylo/src/tkf_model/mod.rs

phylo/src/tkf_model/tests.rs

- num enum for tkf params - extracted actions function to separate getting indel x and factor n - split the validate params into (lambda, mu) and (r) - some minor renames and comments

- use bit flags instead of option - cache some r values in tkf92 model struct

junniest

I have one major comment to the design at the moment. You have a lot of methods called get_something which to me indicates that they should only perform a computation and return the result. However, most if not all of these methods also set values in the temporary storage, meaning that they produce side effects.
My gut feeling is that methods with names like get should not produce side effects (set values) and methods with names like set should not return values, because otherwise it is very difficult to track where changes to the internal states are happening.

phylo/src/optimisers/topo_optimiser.rs

phylo/src/tkf_model/mod.rs

phylo/src/tkf_model/tkf91.rs

phylo/src/tkf_model/mod.rs

junniest · 2025-11-11T21:17:42Z

phylo/src/tkf_model/mod.rs

+
+// TODO: link our paper once it is published. For now see original TKF92 paper: https://doi.org/10.1007/bf00163848
+#[derive(Clone, Debug)]
+struct TKFIndelModelInfo {


Doesn't this apply to all TKF-style models? If so, the name is confusing because it seems like it is just meant for the TKF indel model without the substitution models, rather than all of them.

Since this model info only cares about the indel process, i feel like the name is fitting. if we have tkf cost with substitutions, it's just tkf indel cost (with tkf indel model info) and substitution cost (with its own model info). Also, for the TKF with substitutions we simply do

self.indel_cost.cost() + self.subst_cost.cost()

So both the indel cost and subst cost have their own model info

Fair point!

I added a comment:

/// This struct holds intermediate values for the computation of the log likelihood /// of an ancestral alignment and tree under a TKF indel model, i.e., without substitutions. /// The intermediate values are needed for re-alignment.

phylo/src/tkf_model/tkf92.rs

- renamed vars - added comments - simplified caching (its just all precomputed now) - fns no either get or set stuff, not both at the same time (see functional programming)

junniest

There are some tiny nitpicks, but this looks great now! Well done! It is really well-decoupled now and it's a lot easier (at least for me) to understand what's going on.

phylo/src/alignment/mod.rs

junniest · 2025-11-17T11:55:44Z

phylo/src/alignment/mod.rs

+        if let Some(anc_map) = self.ancestral_maps.get_mut(node_idx) {
+            *anc_map = map;
+        } else {
+            panic!("NodeIdx {node_idx} is not an internal node");


Just wondering -- should this really panic? Or could you print a warning/error instead.
The reason why I'm asking is whether this would be something that breaks the reconstruction or could it just be safely ignored.

I feel like panic is fitting since if this method is called, there is something wrong with the logic of the calling code. When people use update_ancestral_map and then have code where a leaf is passed, sth is flawed with their logic and its nice to point that out directly as one might miss the warning in the log.

That's fair, I agree with this explanation!

phylo/src/tkf_model/tests.rs

phylo/src/tkf_model/tkf92.rs

phylo/src/tkf_model/tkf_indel.rs

- split template indel subst param test template into two - changed validate_r - bitset for previous ecent deletion

MattesMrzik added 6 commits September 17, 2025 18:58

dirty impl of tkf92, not tested in this repo.

fb0b0c5

- i just moved the code from my own repo to here, in my repo i tested, but not yet here.

added reassignment and fixed almost all warnings

50995e8

added toy msa tree tkf92 test

84af102

test substitutions dont depend on indel points

6b80f73

test reassignment parallel, dp prob not same as calculated from

6fceced

reassinged, perhaps this is numerical issue

forgot to commit test file

b64319e

MattesMrzik self-assigned this Sep 23, 2025

MattesMrzik added 12 commits September 23, 2025 10:09

compare runtimes of pip vs tkf cost (including reestimation)

320dff9

removed reassignment

872dfb0

TKFModelSearchCost & tests, still unused warning

fbd2db2

Merge remote-tracking branch 'upstream' into 58-TKF92

435b42f

missing added file change from last commit

55f6a82

added pub to TKFCostBuilder

09deae4

removed unnecessary import

be0f69e

sanity tests for model parameter change

f1d5fdf

params&freqs test, dont make indel model info invalid on substparam c…

27c80a6

…hange

fixed old bug

fd72ea4

why wasnt this noticed earlier?

clean up and removing duplicate code

57e4b42

MattesMrzik added 5 commits October 16, 2025 21:50

more tests for coverage

d09e977

added param valid checks to builder and set_param

e93addb

tkf trait and tkf91 impl, no tkf91 tests yet

d71f35c

tiny clean up

067ae57

more coverage; tk92 r cannot be zero fix

83ba149

MattesMrzik marked this pull request as ready for review October 20, 2025 17:22

MattesMrzik added 4 commits October 28, 2025 17:29

fix set param bug, strip fasta in info builder,

e6c2143

Merge remote-tracking branch 'upstream' into 58-TKF92

431ea6c

param ranges,

ec30d91

- removed plot r - removed find fastas in benchmark dir - removed test that found a beta = 0 when very short branch

clean up,fixed compatible: had spr instead of nni

40a1333

tkf91 now without r, tkf92 r not zero in check

2a19d81

- clean up - model opti also for tkf91

MattesMrzik requested review from Copilot and junniest October 29, 2025 18:26

Copilot AI reviewed Oct 29, 2025

View reviewed changes

phylo/src/phylo_info/phyloinfo_builder.rs Outdated Show resolved Hide resolved

MattesMrzik and others added 2 commits October 29, 2025 19:30

Update phylo/src/phylo_info/phyloinfo_builder.rs

644d635

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix copilot review bug

6c54ff2

MattesMrzik requested a review from Copilot October 29, 2025 18:34

This comment was marked as resolved.

Sign in to view

MattesMrzik and others added 2 commits October 29, 2025 22:08

Update phylo/src/tkf_model/mod.rs

41344ad

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update phylo/src/tkf_model/mod.rs

b483408

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

junniest reviewed Oct 30, 2025

View reviewed changes

MattesMrzik added 5 commits October 30, 2025 16:35

implemented most requested changes

be968d0

- num enum for tkf params - extracted actions function to separate getting indel x and factor n - split the validate params into (lambda, mu) and (r) - some minor renames and comments

separated tkf into multiple files/modules

5f7451f

reworked intermediate value caching

a4dced0

- use bit flags instead of option - cache some r values in tkf92 model struct

split tests into multiple fns

006962e

param range test

233d7a1

junniest reviewed Nov 11, 2025

View reviewed changes

before removing caching

3eac852

MattesMrzik mentioned this pull request Nov 12, 2025

Rerooting Tree #47

Open

MattesMrzik added 4 commits November 13, 2025 10:28

implemented requested changes

817efe3

- renamed vars - added comments - simplified caching (its just all precomputed now) - fns no either get or set stuff, not both at the same time (see functional programming)

moved tkf indel stuff to separate file

c2e7b59

more polish

11f6393

fixed borders of param range in tkf and tests

59bbf38

MattesMrzik requested a review from junniest November 14, 2025 14:37

junniest approved these changes Nov 17, 2025

View reviewed changes

MattesMrzik added 2 commits November 18, 2025 09:23

implemented requested changes

be6f230

- split template indel subst param test template into two - changed validate_r - bitset for previous ecent deletion

Merge remote-tracking branch 'upstream/develop' into 58-TKF92

53f16b3

MattesMrzik merged commit 792e96d into acg-team:develop Nov 18, 2025
7 checks passed

Implementation of TKF92 #99

Implementation of TKF92 #99

Uh oh!

Conversation

MattesMrzik commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

junniest left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

junniest left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

junniest Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

MattesMrzik Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junniest Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

MattesMrzik Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

junniest left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

junniest Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

MattesMrzik Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

junniest Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

MattesMrzik commented Sep 17, 2025 •

edited

Loading

codecov bot commented Oct 16, 2025 •

edited

Loading

MattesMrzik Nov 12, 2025 •

edited

Loading