Initial numba module #3225

benjeffery · 2025-06-18T11:26:32Z

Part of #3135

benjeffery · 2025-06-18T11:28:52Z

I've done quite a bit of numba investigation and found a way to use dataclasses in numba code. This seems to come at very little performance cost compared to tuples and is a lot nicer. Using a generator also seems to work fine!

codecov · 2025-06-18T11:32:13Z

Codecov Report

❌ Patch coverage is 0.81301% with 244 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.31%. Comparing base (ce09b35) to head (0e9ecb4).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
python/tskit/jit/numba.py	0.00%	244 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3225      +/-   ##
==========================================
- Coverage   89.61%   89.31%   -0.30%     
==========================================
  Files          28       27       -1     
  Lines       31983    30834    -1149     
  Branches     5888     5599     -289     
==========================================
- Hits        28660    27540    -1120     
- Misses       1888     1985      +97     
+ Partials     1435     1309     -126

Flag	Coverage Δ
c-tests	`86.59% <ø> (ø)`
lwt-tests	`?`
python-c-tests	`88.15% <ø> (ø)`
python-tests	`98.79% <100.00%> (-0.02%)`	⬇️
python-tests-numpy1	`50.78% <0.00%> (-1.66%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
python/tskit/drawing.py	`98.39% <100.00%> (ø)`
python/tskit/jit/numba.py	`0.00% <0.00%> (ø)`

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

benjeffery · 2025-06-18T13:05:01Z

Having some CI weirdness that I'm not yet able to recreate.

benjeffery · 2025-06-18T23:05:34Z

CI Fixed.

Here's some benchmarking with the "coalescent_nodes" method from #2778 on a TS with 12M edges:

Using ts.edge_diffs: 23.3s
Calculating edge diffs and coalescent nodes in a single numba.njit function: 0.085s
Using the classes here, calculating coalescent nodes in separate client numba.njit function: 0.093s

jeromekelleher · 2025-06-19T12:14:07Z

Shall we move the first commit into its own PR? It's cluttering up this one and making it hard to see the real changes.

jeromekelleher · 2025-06-19T12:19:56Z

I had imagined something lower level that was basically a copy of the TreePosition class from here: https://github.com/jeromekelleher/sc2ts/blob/7758245c3dc537aeec3b7cd6282241b65f8843dd/sc2ts/jit.py#L107

So, we don't try to provide Pythonic APIs, but just provide direct access to the edges out and edges in, which can be numba compiled like the example in the sc2ts code.

benjeffery · 2025-06-19T12:39:12Z

just provide direct access to the edges out and edges in

That's how this code works,
Your sc2ts code here:

    while tree_pos.next():
        for j in range(tree_pos.out_range[0], tree_pos.out_range[1]):
            e = tree_pos.edge_removal_order[j]
            c = edges_child[e]
            p = edges_parent[e]
            parent[c] = -1
            u = p
            while u != -1:
                num_samples[u] -= num_samples[c]
                u = parent[u]

becomes

    for tree_pos in numba_ts.edge_diffs():
        for j in range(*tree_pos.edges_out_index_range):
            e = numba_ts.indexes_edge_removal_order[j]
            c = edges_child[e]
            p = edges_parent[e]
            parent[c] = -1
            u = p
            while u != -1:
                num_samples[u] -= num_samples[c]
                u = parent[u]

It is still compiled, and 30% faster (for the coalesent nodes example)!

jeromekelleher · 2025-06-19T12:51:36Z

Ahh, I didn't spot that sorry. How is it faster then?

I do think we should just stick with the TreePosition interface though, because we want to support seeking backwards as well, and ultimately randomly. There's no point in adding a layer for indirection on top of that.

benjeffery · 2025-06-19T13:56:49Z

How is it faster then?

Mutating numpy arrays to maintain the state involves the following:

Creating a temporary list (build_list).
Performing bounds checks for the slice.
Copying the data from the list into the array's memory.

Whereas yielding lightweight immuatable objects is much more amenable to numba optimisation. We might be able to get the same gains by using native objects for the state rather than numpy arrays if you are set against iteration.

jeromekelleher · 2025-06-19T13:59:41Z

Let's talk it through in person - I don't have time to form an educated opinion I'm afraid.

benjeffery · 2025-06-27T15:30:17Z

I've tried to closely match the exisiting tsutil implementation with next and prev. Need to so some perf checks with this new code under numba.

benjeffery · 2025-07-01T09:30:07Z

New code looks just as fast, proceeding to add some more tests. Will merge this then before doign docs.

benjeffery · 2025-07-03T08:27:54Z

Getting some weird failures on Windows here, and coverage not counting for the new module, will fix.

I've added a stab at some docs.

hyanwong · 2025-07-03T08:35:43Z

Re docs, eventually we probably want a "high performance" tutorial with some of this stuff, but I can have a stab at that after 1.0. There's some comments here: tskit-dev/tutorials#150 (comment) and some code examples at tskit-dev/tutorials#63

hyanwong · 2025-07-03T08:43:33Z

docs/numba.md

+print(type(numba_ts))
+```
+
+## Tree Traversal


I normally think of "tree traversal" as iterating through the tree structure itself. Do you mean "Iterating through trees" here? I can't see any pre/postorder traversal code here.

Good point, I was avoiding the word "iteration" not not confuse it with a Python iterator - but I'll change it back as this is more confusing!

Maybe just "moving between trees"?

python/tskit/jit/numba.py

jeromekelleher

Looks great. Not obvious to me why we're setting up the tests like this though.

jeromekelleher · 2025-08-07T08:16:12Z

python/tests/test_jit.py

+
+    NODE_IS_SAMPLE = tskit.NODE_IS_SAMPLE
+
+    @numba.njit


Why are importing the jit module within the test functions here and defining the algorithm? I think we can assume that developers have numba installed, and there's pytest ways of skipping the module for CI?

Further up the test file we test importing the tskit.jit.numba module while it is mocked out, which needs the module to not be imported. I could try and find another way to do that which doesn't require all the local imports?

Maybe do that test in its own module so? The local import stuff will confuse people in to thinking it's necessary and the LLMs might also get this idea from the example code.

I've fixed by making the imported module look absent in the test that requires it to be, and have removed the function-local imports.

benjeffery force-pushed the numba branch from d22a9a1 to abf80c9 Compare June 18, 2025 12:29

benjeffery force-pushed the numba branch from df5b60e to 407eef1 Compare June 18, 2025 13:46

jeromekelleher mentioned this pull request Jun 19, 2025

Remove tsk_diff_iter_t #3221

Merged

benjeffery force-pushed the numba branch 2 times, most recently from 7aa7151 to 5d22c6c Compare June 27, 2025 15:22

benjeffery force-pushed the numba branch from 501260c to 89d4513 Compare July 2, 2025 10:19

hyanwong reviewed Jul 3, 2025

View reviewed changes

jeromekelleher reviewed Aug 4, 2025

View reviewed changes

python/tskit/jit/numba.py Show resolved Hide resolved

jeromekelleher reviewed Aug 7, 2025

View reviewed changes

benjeffery added 7 commits August 7, 2025 12:06

Initial numba edge diffs

89e8f23

Fix CI, add extra test

95c4ea8

Refactor to remove iteration

b00bc38

Remove dataclass, add ts properties

9fc344a

Use tsutil style next and prev

055ebc3

More tests

3618765

Attempt to get coverage

a64b845

benjeffery added 8 commits August 7, 2025 12:06

Docs first pass

47875ac

Remove mentions of traversal

91fa299

Fix docs build

e030d5e

WIP

1198371

WIP

1a95d6f

WIP

8f9a7c3

Diversity working

da7ca6c

Move diversity to numba class

b0fa5b1

benjeffery force-pushed the numba branch from 56b1761 to 1fe85a7 Compare August 7, 2025 11:07

benjeffery mentioned this pull request Aug 7, 2025

Testing for numpy errors #3250

Closed

Fix windows errors

077823a

benjeffery force-pushed the numba branch from ae496ae to 077823a Compare August 7, 2025 14:09

benjeffery added 2 commits August 8, 2025 01:33

Fix docs

32e5073

Fix import test

0e9ecb4

benjeffery force-pushed the numba branch from 9367893 to 0e9ecb4 Compare August 8, 2025 00:54

Initial numba module #3225

Are you sure you want to change the base?

Initial numba module #3225

Uh oh!

Conversation

benjeffery commented Jun 18, 2025

Uh oh!

benjeffery commented Jun 18, 2025

Uh oh!

codecov bot commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

benjeffery commented Jun 18, 2025

Uh oh!

benjeffery commented Jun 18, 2025

Uh oh!

jeromekelleher commented Jun 19, 2025

Uh oh!

jeromekelleher commented Jun 19, 2025

Uh oh!

benjeffery commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeromekelleher commented Jun 19, 2025

Uh oh!

benjeffery commented Jun 19, 2025

Uh oh!

jeromekelleher commented Jun 19, 2025

Uh oh!

benjeffery commented Jun 27, 2025

Uh oh!

benjeffery commented Jul 1, 2025

Uh oh!

benjeffery commented Jul 3, 2025

Uh oh!

hyanwong commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hyanwong Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

benjeffery Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

hyanwong Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeromekelleher left a comment

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

benjeffery Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

benjeffery Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Jun 18, 2025 •

edited

Loading

benjeffery commented Jun 19, 2025 •

edited

Loading

hyanwong commented Jul 3, 2025 •

edited

Loading