[Append Scan] Extract manifest group planning into separate class #2232

smaheshwar-pltr · 2025-07-22T12:52:04Z

Rationale for this change

Split up from incremental append scan work - see #2031 (comment). PyIceberg doesn't support incremental reading of appended data between snapshots, like Spark does.

This PR introduces a ManifestGroupPlanner to hold the logic of using manifests to plan the files for a table scan. This allows this logic to be re-used across scans. The IncrementalAppendScan will also use this (see #2234).

Are these changes tested?

N / A

Are there any user-facing changes?

Yes, see #2232 (comment).

smaheshwar-pltr · 2025-07-22T13:12:31Z

pyiceberg/table/__init__.py

+    @property
+    def partition_filters(self) -> KeyDefaultDict[int, BooleanExpression]:
+        return self._manifest_planner.partition_filters


Keeping this public method around to not introduce a breaking change to DataScan. On the other hand, the private methods have been moved into ManifestGroupPlanner. Technically, that could still break users subclassing DataScan and calling the removed methods in the subclass (and also users accessing those private methods, but that feels more OK to break than the subclassing case).

I'm not familiar with PyIceberg breaks / deprecations - would it be fine to remove these private methods or is a deprecation cycle still required?

smaheshwar-pltr · 2025-07-22T13:15:21Z

pyiceberg/table/__init__.py

@@ -2075,6 +1957,160 @@ def count(self) -> int:
        return res


+class ManifestGroupPlanner:


The motivation for a manifest-based file scan task planner comes from the Java-side https://github.com/apache/iceberg/blob/1911c94ea605a3d3f10a1994b046f00a5e9fdceb/core/src/main/java/org/apache/iceberg/BaseIncrementalAppendScan.java#L76-L97 (class here).

It's also to share this logic between scans like DataScan and IncrementalAppendScan, that both use this flow.

smaheshwar-pltr · 2025-07-22T16:22:11Z

pyiceberg/table/__init__.py

-                case_sensitive=self.case_sensitive,
-                schema=self.table_metadata.schema(),
-            )
+    def _manifest_planner(self) -> ManifestGroupPlanner:


This could also be a field on the class set in the constructor. Kept the diff smaller here, but happy to change

Extract manifest group planning into separate class

f6537b9

smaheshwar-pltr mentioned this pull request Jul 22, 2025

[Append Scan] Introduce IncrementalAppendScan class (without integration tests) #2234

Open

smaheshwar-pltr commented Jul 22, 2025

View reviewed changes

smaheshwar-pltr mentioned this pull request Jul 22, 2025

Incremental Append Scan #2031

Closed

smaheshwar-pltr marked this pull request as ready for review July 22, 2025 13:36

smaheshwar-pltr commented Jul 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Append Scan] Extract manifest group planning into separate class #2232

[Append Scan] Extract manifest group planning into separate class #2232

Uh oh!

smaheshwar-pltr commented Jul 22, 2025 •

edited

Loading

Uh oh!

smaheshwar-pltr Jul 22, 2025 •

edited

Loading

Uh oh!

smaheshwar-pltr Jul 22, 2025 •

edited

Loading

Uh oh!

smaheshwar-pltr Jul 22, 2025

Uh oh!

smaheshwar-pltr Jul 22, 2025

Uh oh!

Uh oh!

		@@ -2075,6 +1957,160 @@ def count(self) -> int:
		return res


		class ManifestGroupPlanner:

[Append Scan] Extract manifest group planning into separate class #2232

Are you sure you want to change the base?

[Append Scan] Extract manifest group planning into separate class #2232

Uh oh!

Conversation

smaheshwar-pltr commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

Are these changes tested?

Are there any user-facing changes?

Uh oh!

smaheshwar-pltr Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smaheshwar-pltr Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smaheshwar-pltr Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

smaheshwar-pltr Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

smaheshwar-pltr commented Jul 22, 2025 •

edited

Loading

smaheshwar-pltr Jul 22, 2025 •

edited

Loading

smaheshwar-pltr Jul 22, 2025 •

edited

Loading