-
Notifications
You must be signed in to change notification settings - Fork 342
[Append Scan] Introduce an AbstractTableScan
with default methods
#2230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Append Scan] Introduce an AbstractTableScan
with default methods
#2230
Conversation
@@ -1630,16 +1630,17 @@ def _parse_row_filter(expr: Union[str, BooleanExpression]) -> BooleanExpression: | |||
return parser.parse(expr) if isinstance(expr, str) else expr | |||
|
|||
|
|||
S = TypeVar("S", bound="TableScan", covariant=True) | |||
A = TypeVar("A", bound="AbstractTableScan", covariant=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub's diff here got messed up - TableScan
and S
remains, just below.
|
||
return type(self)(**{**kwargs, **overrides}) | ||
|
||
def to_pandas(self, **kwargs: Any) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highlighting this change. I've added default implementations based on to_arrow()
in this class, whereas before previously these methods were abstract on TableScan
.
This technically changes the TableScan
class in a user-facing way because it now has default implementations, but this felt fine to me. Subclasses can still override these methods if they wish.
The motivation here was to reduce duplication between DataScan
and IncrementalAppendScan
- by introducing these default implementations in the superclass, all table scans get those defaults for free.
""" | ||
return self.to_arrow().to_pandas(**kwargs) | ||
|
||
def to_duckdb(self, table_name: str, connection: Optional[DuckDBPyConnection] = None) -> DuckDBPyConnection: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highlighting this also - to_duckdb
and to_ray
were previously only present on DataScan
, not even on TableScan
. With the change here moving them to AbstractTableScan
, TableScan
now has these additional methods. I felt like this was fine. Curious for reviewers' thoughts!
(Again, the motivation here was to reduce duplication)
@@ -1688,29 +1786,6 @@ def projection(self) -> Schema: | |||
|
|||
return current_schema.select(*self.selected_fields, case_sensitive=self.case_sensitive) | |||
|
|||
@abstractmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pointing out that these removed methods have just been moved to AbstractTableScan
, so they still exist on TableScan
that subclasses it. The goal of this PR is reorganising while not introducing a breaking change.
S = TypeVar("S", bound="TableScan", covariant=True) | ||
|
||
|
||
class TableScan(AbstractTableScan, ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Methods on this class like use_ref
and snapshot
don't translate nicely into incremental scans. I understand that the motivation behind this class was an abstract class for table scans, like Java has, but I felt like a less restrictive superclass is better for us, hence AbstractTableScan
.
Rationale for this change
Split up from incremental append scan work - see #2031 (comment). PyIceberg doesn't support incremental reading of appended data between snapshots, like Spark does.
This PR introduces an
AbstractTableScan
to reduce duplication between table scans and provide a superclass not specific to reads of a single snapshot.Are these changes tested?
N/A
Are there any user-facing changes?
Yes, see #2230 (comment) / #2230 (comment).