[ISSUE-149] Add column projection support to Python LogScanner #151

fresh-borzoni · 2026-01-11T18:44:33Z

Adds two new methods for column projection:

1. Table.new_log_scanner_with_projection(column_indices: List[int])
   Project columns by index (C++ parity)
   Example: scanner = table.new_log_scanner_with_projection([0, 2, 4])

2. Table.new_log_scanner_with_column_names(column_names: List[str])
   Project columns by name (Python-specific, more idiomatic!)
   Example: scanner = table.new_log_scanner_with_column_names(['id', 'name', 'email'])

Both methods create LogScanner with specified columns only, improving performance by reducing data transfer and processing overhead.

Implementation leverages core scanner.project() and scanner.project_by_name() APIs.
Error handling validates column indices/names before creating scanner.

Closes #149

leekeiabstraction

Thank you for the PR! Left a couple of comments. PTAL

leekeiabstraction · 2026-01-11T19:53:45Z

bindings/python/src/table.rs

+            let rust_scanner = table_scan.create_log_scanner().map_err(|e| {
+                FlussError::new_err(format!("Failed to create log scanner: {e}"))
+            })?;


We seem to use FlussError and PyErr within this file, for example line 72 to 75 uses PyErr. Can you clarify when each should be used?

let rust_scanner = table_scan.create_log_scanner().map_err(|e| { PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(format!( "Failed to create log scanner: {e:?}" )) })?;

we should use FlussError, this is a leftover.

Thank you for catching this 👍

leekeiabstraction · 2026-01-11T19:58:48Z

bindings/python/example/example.py

+        # Project specific columns by index (C++ parity)
+        print("\n1. Projection by index [0, 1] (id, name):")
+        scanner_index = await table.new_log_scanner_with_projection([0, 1])
+        scanner_index.subscribe(None, None)
+        df_projected = scanner_index.to_pandas()
+        print(df_projected.head())
+        print(f"   Projected {df_projected.shape[1]} columns: {list(df_projected.columns)}")
+
+        # Project specific columns by name (Python-specific, more idiomatic!)
+        print("\n2. Projection by name ['name', 'score'] (Pythonic):")
+        scanner_names = await table.new_log_scanner_with_column_names(["name", "score"])
+        scanner_names.subscribe(None, None)
+        df_named = scanner_names.to_pandas()
+        print(df_named.head())
+        print(f"   Projected {df_named.shape[1]} columns: {list(df_named.columns)}")


Should the polling part also be included (as with C++ example)?

I'm not sure I get what you mean.

to_pandas() polls internally, but if we're talking about adding a separate polling API to Python bindings - we can add it.

Let's file an issue for it, as it's orthogonal to the column projection feature.

Created an issue #152

Ah, I didn't realise that to panda polls. Thank you for the clarification

fresh-borzoni · 2026-01-11T22:29:35Z

@leekeiabstraction Thank you for the review.

Addressed the comments. PTAL 🙏

[ISSUE-149] Add column projection support to Python LogScanner

6d190c1

leekeiabstraction reviewed Jan 11, 2026

View reviewed changes

fix PyError to FlussError

5f128af

fresh-borzoni requested a review from leekeiabstraction January 11, 2026 22:29

leekeiabstraction approved these changes Jan 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ISSUE-149] Add column projection support to Python LogScanner #151

[ISSUE-149] Add column projection support to Python LogScanner #151

fresh-borzoni commented Jan 11, 2026 •

edited

Loading

Uh oh!

leekeiabstraction left a comment

Uh oh!

leekeiabstraction Jan 11, 2026

Uh oh!

fresh-borzoni Jan 11, 2026

Uh oh!

leekeiabstraction Jan 11, 2026

Uh oh!

fresh-borzoni Jan 11, 2026

Uh oh!

fresh-borzoni Jan 11, 2026 •

edited

Loading

Uh oh!

leekeiabstraction Jan 11, 2026

Uh oh!

fresh-borzoni commented Jan 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[ISSUE-149] Add column projection support to Python LogScanner #151

Are you sure you want to change the base?

[ISSUE-149] Add column projection support to Python LogScanner #151

Conversation

fresh-borzoni commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leekeiabstraction left a comment

Choose a reason for hiding this comment

Uh oh!

leekeiabstraction Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

fresh-borzoni Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

leekeiabstraction Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

fresh-borzoni Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

fresh-borzoni Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leekeiabstraction Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

fresh-borzoni commented Jan 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fresh-borzoni commented Jan 11, 2026 •

edited

Loading

fresh-borzoni Jan 11, 2026 •

edited

Loading