Skip to content

Conversation

@MattesMrzik
Copy link
Collaborator

@MattesMrzik MattesMrzik commented Sep 17, 2025

Implementation of the TKF92 cost for given tree and Multiple Ancestral Sequence Alignment (MASA), ie an alignment that includes ancestral wildcard sequences.

Closes #58. Also implements TKF91 (since this is a special case of TKF92 when the parameter r = 0)

The functionality of fixing/re-estimating ancestral sequences after a tree move (see #115) will be addressed in another PR, which will include the impl of TreeSearchCost

@MattesMrzik MattesMrzik self-assigned this Sep 23, 2025
- the msa that caused the error was that internal nodes had chars but
  the leaves not. this caused the x to be 1 during reassignment since it
  removed the ancestral chars, which is correct, but then intergrating
  is wrong, since it integrates over nothing (no chars are there in the
  whole msa column). added an assert to check for x != 1
- can run find_brute_force_max in parallel and in series, both with
  progress bar
why wasnt this noticed earlier?
@codecov
Copy link

codecov bot commented Oct 16, 2025

Codecov Report

❌ Patch coverage is 97.92000% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.26%. Comparing base (527040d) to head (53f16b3).
⚠️ Report is 21 commits behind head on develop.

Files with missing lines Patch % Lines
phylo/src/alignment/mod.rs 33.33% 6 Missing ⚠️
phylo/src/tkf_model/tkf91.rs 96.72% 4 Missing ⚠️
phylo/src/tkf_model/tkf_indel.rs 99.26% 2 Missing ⚠️
phylo/src/tkf_model/tkf92.rs 99.42% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop      #99      +/-   ##
===========================================
+ Coverage    96.32%   97.26%   +0.94%     
===========================================
  Files           32       45      +13     
  Lines         4355     5671    +1316     
===========================================
+ Hits          4195     5516    +1321     
+ Misses         160      155       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MattesMrzik MattesMrzik marked this pull request as ready for review October 20, 2025 17:22
- clean up
- model opti also for tkf91
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces support for TKF (Thorne-Kishino-Felsenstein) evolutionary models, specifically TKF91 and TKF92, which are models for sequence evolution with insertions and deletions. The implementation includes model building, likelihood calculations, parameter optimization, and integration with the existing topology optimization framework.

Key changes:

  • Implementation of TKF91 and TKF92 indel models with likelihood computation
  • Integration with substitution models to create combined TKF+substitution costs
  • Support for model parameter optimization and tree topology optimization (NNI)

Reviewed Changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
phylo/src/tkf_model/mod.rs Core implementation of TKF91/TKF92 models, builders, and likelihood calculations
phylo/src/tkf_model/tests.rs Comprehensive test suite covering model functionality, likelihood calculations, and parameter validation
phylo/src/lib.rs Module export for tkf_model
phylo/src/alignment/mod.rs Added methods to support ancestral alignment updates needed by TKF models
phylo/src/optimisers/topo_optimiser.rs Added compatibility traits for TKF models with NNI optimization
phylo/src/optimisers/spr_optimiser.rs Added comment clarifying optimization logic
phylo/src/optimisers/model_optimiser_tests.rs Added test for TKF model optimization
phylo/src/phylo_info/phyloinfo_builder.rs Enhanced error messages and added logging for tree node ID setting
phylo/src/phylo_info/tests.rs Updated test to match new error message format
phylo/data/tree_multiple.newick Fixed formatting (restored newline)
phylo/benches/helpers.rs Fixed typo in comment
phylo/Cargo.toml Removed extra blank line

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@MattesMrzik MattesMrzik requested a review from Copilot October 29, 2025 18:34

This comment was marked as resolved.

Copy link
Contributor

@junniest junniest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite hefty to review, so I think we'll need another round.
One more thought: I think it might be worth at this point to split the module into at the very least main (mod.rs), tkf91_model and tkf92_model or something along these lines (and tkfindel_model?). It becomes quite difficult to figure out which model is which in a file this long.

- num enum for tkf params
- extracted actions function to separate getting indel x and factor n
- split the validate params into (lambda, mu) and (r)
- some minor renames and comments
- use bit flags instead of option
- cache some r values in tkf92 model struct
Copy link
Contributor

@junniest junniest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one major comment to the design at the moment. You have a lot of methods called get_something which to me indicates that they should only perform a computation and return the result. However, most if not all of these methods also set values in the temporary storage, meaning that they produce side effects.
My gut feeling is that methods with names like get should not produce side effects (set values) and methods with names like set should not return values, because otherwise it is very difficult to track where changes to the internal states are happening.


// TODO: link our paper once it is published. For now see original TKF92 paper: https://doi.org/10.1007/bf00163848
#[derive(Clone, Debug)]
struct TKFIndelModelInfo {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this apply to all TKF-style models? If so, the name is confusing because it seems like it is just meant for the TKF indel model without the substitution models, rather than all of them.

Copy link
Collaborator Author

@MattesMrzik MattesMrzik Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this model info only cares about the indel process, i feel like the name is fitting. if we have tkf cost with substitutions, it's just tkf indel cost (with tkf indel model info) and substitution cost (with its own model info). Also, for the TKF with substitutions we simply do

self.indel_cost.cost() + self.subst_cost.cost()

So both the indel cost and subst cost have their own model info

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment:

/// This struct holds intermediate values for the computation of the log likelihood
/// of an ancestral alignment and tree under a TKF indel model, i.e., without substitutions.
/// The intermediate values are needed for re-alignment.

@MattesMrzik MattesMrzik mentioned this pull request Nov 12, 2025
- renamed vars
- added comments
- simplified caching (its just all precomputed now)
- fns no either get or set stuff, not both at the same time (see
  functional programming)
@MattesMrzik MattesMrzik requested a review from junniest November 14, 2025 14:37
Copy link
Contributor

@junniest junniest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some tiny nitpicks, but this looks great now! Well done! It is really well-decoupled now and it's a lot easier (at least for me) to understand what's going on.

if let Some(anc_map) = self.ancestral_maps.get_mut(node_idx) {
*anc_map = map;
} else {
panic!("NodeIdx {node_idx} is not an internal node");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering -- should this really panic? Or could you print a warning/error instead.
The reason why I'm asking is whether this would be something that breaks the reconstruction or could it just be safely ignored.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like panic is fitting since if this method is called, there is something wrong with the logic of the calling code. When people use update_ancestral_map and then have code where a leaf is passed, sth is flawed with their logic and its nice to point that out directly as one might miss the warning in the log.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, I agree with this explanation!

- split template indel subst param test template into two
- changed validate_r
- bitset for previous ecent deletion
@MattesMrzik MattesMrzik merged commit 792e96d into acg-team:develop Nov 18, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement TKF92 model

2 participants