Skip to content

Conversation

hahnjo
Copy link
Member

@hahnjo hahnjo commented Aug 11, 2025

Switch the existing code to use the RNTupleParallelWriter with one RNTupleFillContext per slot. For sequential snapshotting, this should be (almost) as efficient as the RNTupleWriter (one additional cloned RNTupleModel for the only fill context), but save quite a bit of code duplication and in testing effort.

Copy link

github-actions bot commented Aug 11, 2025

Test Results

    21 files      21 suites   3d 5h 51m 4s ⏱️
 3 363 tests  3 352 ✅ 0 💤 11 ❌
68 855 runs  68 833 ✅ 0 💤 22 ❌

For more details on these failures, see this check.

Results for commit a0e8be9.

♻️ This comment has been updated with latest results.

@hahnjo hahnjo force-pushed the df-ntuple-snapshot-mt branch from ced6d97 to 8a09b36 Compare August 11, 2025 13:50
@pcanal
Copy link
Member

pcanal commented Aug 11, 2025

(one additional cloned RNTupleModel for the only fill context)

I am confused by the wording of this part of the commit log. only suggest there is a single fill context for the whole process ... which is contradicted by one fill context per slot. What is means by the above sentence?

for (decltype(values.size()) i = 0; i < values.size(); i++) {
outputEntry->BindRawPtr(fFieldTokens[i], values[i]);
}
fillContext->Fill(*outputEntry);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it meant to be FillNoFlush?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed here because UntypedSnapshotRNTupleHelper exclusively owns the TFile and there is only one RNTupleParallelWriter appending to it. Without the need to lock on the user / "framework" side, there is no benefit to using FillNoFlush (except it's longer and more code to write).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fillContext->Fill(*outputEntry);
// Any synchronization needed are handled by the underlying `RNTupleParallelWriter`
// which has exclusive access to its `TFile`.
fillContext->Fill(*outputEntry);

@hahnjo
Copy link
Member Author

hahnjo commented Aug 12, 2025

(one additional cloned RNTupleModel for the only fill context)

I am confused by the wording of this part of the commit log. only suggest there is a single fill context for the whole process ... which is contradicted by one fill context per slot. What is means by the above sentence?

Agreed, it's not well formulated. What I'm trying to say is that for sequential snapshotting (that is already supported before the PR), changing from the RNTupleWriter to the RNTupleParallelWriter has only a slight overhead. I will revise the commit message.

@hahnjo hahnjo marked this pull request as ready for review August 12, 2025 06:43
@hahnjo hahnjo requested a review from jblomer August 12, 2025 06:44
Copy link
Contributor

@enirolf enirolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really really nice! I'm wondering if for completeness we should also add some of the existing tests as MT tests 🤔

hahnjo added 7 commits August 12, 2025 15:11
... instead of the default entry. Then we also only need a bare model.
This is less expensive than string comparisons of field names during
every call to Exec().
Switch the existing code to use the RNTupleParallelWriter with one
RNTupleFillContext per slot. For sequential snapshotting, this
should be (almost) as efficient as the RNTupleWriter (one additional
cloned RNTupleModel for the only fill context), but save quite a bit
of code duplication and in testing effort.
Use the same conditions as TTree, looking at fOutputFile instead of
the data source.
@hahnjo hahnjo force-pushed the df-ntuple-snapshot-mt branch from 8a09b36 to a0e8be9 Compare August 12, 2025 13:20
@hahnjo
Copy link
Member Author

hahnjo commented Aug 12, 2025

I'm wondering if for completeness we should also add some of the existing tests as MT tests 🤔

Yes, I wasn't sure either. We're of course massively benefiting that we use the exact same model creation and filling code that you already wrote all the tests for. Additionally, we have the challenge of MT scheduling, so even the best test case will have to deal with non-determinism and potentially still not test the relevant thing because all events end up on a single thread...

@enirolf
Copy link
Contributor

enirolf commented Aug 12, 2025

I'm wondering if for completeness we should also add some of the existing tests as MT tests 🤔

Yes, I wasn't sure either. We're of course massively benefiting that we use the exact same model creation and filling code that you already wrote all the tests for. Additionally, we have the challenge of MT scheduling, so even the best test case will have to deal with non-determinism and potentially still not test the relevant thing because all events end up on a single thread...

That's fair, not to forget about the fact that the behavior of the parallel writer is already tested in isolation. I'm okay with leaving this like this then!

Copy link
Contributor

@enirolf enirolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Would probably be good to get a second approval from someone else as well :)

Copy link
Contributor

@jblomer jblomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Looks good to me but I'll leave approval to RDF owners.

Copy link
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot! I have one question left, does not need to be addressed in this PR specifically.

{
// In principle we would not need to flush a cluster here, but we want to benefit from parallelism for compression.
// NB: RNTupleFillContext::FlushCluster() is a nop if there is no new entry since the last flush.
fFillContexts[slot]->FlushCluster();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Snapshot to TTree, users can specify a value (expressed in number of entries) for the size of the output TTree clusters via fAutoFlush. I wonder how this line impacts this feature, which at the moment is not supported for the Snapshot to RNTuple, but I imagine it would be requested at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants