-
Notifications
You must be signed in to change notification settings - Fork 2
Enhancement/clean up optimisers #96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Removed unnecessary RefCells<Cost> from the blen_ and model_optimiser. The RefCell is still needed internally for CostFunction to work when using BrentOpt.
Added max_iters to blen_optimiser and limited the max number of iterations for blen optimisation to 5 during the spr search. The limit is 100 by default which is used at the end of each spr cycle.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #96 +/- ##
===========================================
+ Coverage 96.32% 96.79% +0.46%
===========================================
Files 32 34 +2
Lines 4355 4551 +196
===========================================
+ Hits 4195 4405 +210
+ Misses 160 146 -14 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Refactor TreeSearchCost::update_tree(..., tree: Tree, dirty_nodes: &[NodeIdx]) to update_tree(..., tree: Tree) and make cost functions update their internal states based on the dirty nodes set in the tree.
Substitution Model likelihood computation only uses tree.dirty to set internal values when the tree is updated and does not rely on tree.dirty for updating values correctly.
Simplified new empty and incomplete tree creation and removed unnecessary setup for the tree.dirty FixedBitSet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the tree optimization infrastructure by replacing Vec<bool> dirty tracking with FixedBitSet and removes unnecessary RefCell wrappers around cost objects for cleaner API design.
- Replaces
Vec<bool>withFixedBitSetfor dirty node tracking in trees - Removes
RefCellwrappers from optimizers and simplifies theTreeSearchCost::update_treeAPI - Adds configurable max iterations to branch length optimization for performance
Reviewed Changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| phylo/src/tree/tree_parser.rs | Updates tree initialization to use FixedBitSet for dirty tracking |
| phylo/src/tree/mod.rs | Replaces Vec<bool> dirty field with FixedBitSet and updates related methods |
| phylo/src/tree/tests.rs | Updates test code to use new FixedBitSet API |
| phylo/src/substitution_models/mod.rs | Simplifies update_tree method signature and removes dirty nodes parameter |
| phylo/src/pip_model/mod.rs | Simplifies update_tree method signature and removes dirty nodes parameter |
| phylo/src/parsimony/cost/tests.rs | Updates test calls to use simplified update_tree API |
| phylo/src/parsimony/cost/dollo_parsimony_cost.rs | Simplifies update_tree method and removes dirty nodes parameter |
| phylo/src/parsimony/cost/basic_parsimony_cost.rs | Simplifies update_tree method and removes dirty nodes parameter |
| phylo/src/optimisers/topo_optimiser.rs | Updates to use simplified update_tree API and adds tree dirtying |
| phylo/src/optimisers/spr_optimiser.rs | Removes MoveCostInfo dirty nodes tracking and adds max iterations to branch optimizer |
| phylo/src/optimisers/nni_optimiser.rs | Updates to use FixedBitSet dirty tracking and simplified MoveCostInfo |
| phylo/src/optimisers/move_optimiser.rs | Removes dirty nodes tracking from MoveCostInfo struct |
| phylo/src/optimisers/model_optimiser.rs | Removes RefCell wrapper from cost object and adds NaN/negative value handling |
| phylo/src/optimisers/blen_optimiser.rs | Removes RefCell wrapper, adds max iterations support, and NaN/negative value handling |
| phylo/src/likelihood/mod.rs | Simplifies TreeSearchCost::update_tree trait method signature |
| phylo/src/alignment/mod.rs | Fixes logical operator from bitwise OR to logical OR |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
kalxed
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this has probably been already thoroughly tested and checked enough but based on what I saw, you moved from vector of booleans to fixed bit set, which is nice change, and reworked how functions work with dirty nodes so that there isn't redundant data being passed around and for the most part it is stored in this bit set for all tree optimisers. Nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this basically so that dirty nodes are not redundantly kept track of within the struct and passed around in functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. The other issue I noticed is that after an SPR move the dirty nodes set in the tree were actually different to the nodes being passed to the update_tree() function. I did not notice that before, but the nodes that were passed weren't correct, it was the regraft and the prune branch, whereas after an SPR move the prune sibling and the new prune parent were the ones that needed to be cleaned up. They could be recalculated, but it seemed unnecessary to do twice, since the dirty nodes are set correctly during the move. With the way it's implemented now there's only one source of information for this and fewer options for errors.
MattesMrzik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, could you pls add the doc that I mentioned below?
| tree.complete = true; | ||
| tree.compute_postorder(); | ||
| tree.compute_preorder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we perhaps instead want to have tree.complete be set by Tree::new()? And perhaps also have the compute_postorder() and compute_preorder() be called also by Tree::new()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, new() doesn't actually make a complete tree, unless the tree only has a single node. It creates a "template" with the right amounts of memory allocated and that template should be filled out by the tree builder. The post- and pre-order traversals can't be generated until all the nodes are properly added.
I don't like the boolean var complete, and would love to get rid of it, and make sure that the orders are properly created (there's no point in time when the tree is accessible but in this weird incomplete stage). However, at this point I am waiting for Kai to merge his PR so that I don't modify too much and make it difficult for him to merge. Once that is merged I will definitely clean this up more!
Refactored the branch and model optimisation to remove unnecessary
RefCellwrappers around cost objects.Added max_iters to branch length optimisation to speed up long full tree optimisations.