UDF checkpoints for aggregator #1470

ilongin · 2025-11-20T03:18:43Z

Implementing UDF checkpoints for aggregator.

codecov · 2025-11-20T03:26:41Z

Codecov Report

❌ Patch coverage is 92.30769% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/datachain/lib/udf.py	75.00%	0 Missing and 2 partials ⚠️
src/datachain/query/dataset.py	95.00%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

dreadatour · 2025-11-20T15:11:07Z

src/datachain/lib/udf.py

        self.setup()

+        # Check if partition_id is available (when partition_by is used)
+        partition_id_idx = None


Suggested change

partition_id_idx = None

partition_id_idx: int | None = None

dreadatour · 2025-11-20T15:12:12Z

src/datachain/lib/udf.py

-            )
+            # Include sys__input_id to track which partition produced each output
+            output = [
+                {"sys__input_id": input_id}


Do we want to add sys__input_id as constant (same way we added from datachain.data_storage.schema import PARTITION_COLUMN_ID)?

src/datachain/query/dataset.py

+        Create table with partition mappings (sys__id -> partition_id).
+
+        Args:
+            query: Input query with sys__id column
+            table_name: Name for the partition table.


…-checkpoints

ilongin added 2 commits November 6, 2025 16:14

implementing aggregator

6056529

fixing udf checkpoints agg

03fbc97

ilongin marked this pull request as draft November 20, 2025 03:18

ilongin linked an issue Nov 20, 2025 that may be closed by this pull request

Implement UDF checkpoints for aggregator #1469

Open

ilongin added 4 commits November 20, 2025 10:54

refactoring

326f102

refactoring

723b1ce

added another test

acf5811

refactoring

096376b

ilongin marked this pull request as ready for review November 20, 2025 14:37

ilongin requested review from amritghimire, dreadatour and shcheklein November 20, 2025 14:37

ilongin mentioned this pull request Nov 20, 2025

UDF Checkpoints #1422

Open

dreadatour reviewed Nov 20, 2025

View reviewed changes

src/datachain/query/dataset.py

Comment on lines +645 to +649

Create table with partition mappings (sys__id -> partition_id).

Args:

query: Input query with sys__id column

table_name: Name for the partition table.

This comment was marked as off-topic.

Sign in to view

ilongin added 3 commits November 20, 2025 16:19

removing reduntant if clause

8ee9326

Merge branch 'ilongin/1392-udf-checkpoints' into ilongin/1469-agg-udf…

0712161

…-checkpoints

Merge branch 'ilongin/1392-udf-checkpoints' into ilongin/1469-agg-udf…

16c3b74

…-checkpoints

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UDF checkpoints for aggregator #1470

UDF checkpoints for aggregator #1470

Uh oh!

ilongin commented Nov 20, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

dreadatour Nov 20, 2025

Uh oh!

dreadatour Nov 20, 2025

Uh oh!

This comment was marked as off-topic.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UDF checkpoints for aggregator #1470

Are you sure you want to change the base?

UDF checkpoints for aggregator #1470

Uh oh!

Conversation

ilongin commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dreadatour Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

dreadatour Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ilongin commented Nov 20, 2025 •

edited

Loading

codecov bot commented Nov 20, 2025 •

edited

Loading