-
Notifications
You must be signed in to change notification settings - Fork 40
feat(r/sedonadb): add CRS printing for sedonadb_dataframe #475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds CRS (Coordinate Reference System) printing functionality to the sedonadb_dataframe print method. When printing a dataframe with geometry columns, the CRS information is now displayed below the header, showing the geometry column names along with their CRS identifiers (e.g., "EPSG:5070", "OGC:CRS84").
Key Changes:
- Added Rust function
parse_crs_metadatato extract CRS information from GeoArrow metadata - Enhanced
print.sedonadb_dataframeto display geometry column CRS information with width-aware truncation - Created comprehensive test suite covering various CRS scenarios including EPSG codes, engineering CRS, and edge cases
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| r/sedonadb/src/rust/src/lib.rs | Implements parse_crs_metadata Rust function to parse CRS from GeoArrow JSON metadata |
| r/sedonadb/src/rust/api.h | Adds FFI declaration for the new parse_crs_metadata function |
| r/sedonadb/src/rust/Cargo.toml | Adds serde_json dependency for JSON parsing |
| r/sedonadb/src/init.c | Registers the new parse_crs_metadata C binding |
| r/sedonadb/R/000-wrappers.R | Adds R wrapper for parse_crs_metadata FFI function |
| r/sedonadb/R/crs.R | Introduces sd_parse_crs helper function for parsing CRS metadata |
| r/sedonadb/R/dataframe.R | Enhances print.sedonadb_dataframe to display CRS information for geometry columns with truncation support |
| r/sedonadb/tests/testthat/test-crs.R | Adds comprehensive tests for CRS parsing and display functionality |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this! I love the column count, CRS, and geometry column information output when printing. I added a few high-level suggestions but with some tweaking we can merge the current approach too. The spirit of all my comments is that I'd love to use some of the places where we've already implemented some of this for Rust or Python or geoarrow/r already.
Something I added to the Python bindings but forgot to add here was a bare-bones wrapper around the SedonaDB/DataFusion schema, which provides access to column/type information including the CRS: https://github.com/apache/sedona-db/blob/e0e1d109480727faaf7be25923b57b4686144438/python/sedonadb/src/schema.rs . I added some suggestions inline about how to draw a few ideas from that hopefully without widening the scope of this PR too much 🙂
|
@paleolimbot I think I addressed all comments, but I might need some help with how to best approach the merge conflict. |
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the merge conflicts...I was trying to make it easier to develop the R package but it definitely conflicted with this PR 😬 . You should be able to git pull and use tools/update-savvy.sh and air format with the package now.
This is looking great! A few things we should solve here I think but I love the improved output and I think this is close!
| expect_snapshot(sedonadb:::sd_parse_crs(meta)) | ||
| }) | ||
|
|
||
| # Tests for CRS display in print.sedonadb_dataframe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I like your other tests that put this comment inside the test_that() block (I think this is what I did for the tests in some of the other files too)
| test_that("sd_parse_crs handles empty string", { | ||
| expect_snapshot( | ||
| sedonadb:::sd_parse_crs(""), | ||
| error = TRUE | ||
| ) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test feels like it should be renamed (or the behaviour modified such that it handles the empty string)
| out.set_name(0, "authority_code")?; | ||
| out.set_name(1, "srid")?; | ||
| out.set_name(2, "name")?; | ||
| out.set_name(3, "proj_string")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be more appropriate to call this input (which is the term sf uses to describe this concept, sort of), or maybe definition. (A "proj string" carries the connotation specific formatting that is not how this is typically formatted here)
| inner: crs_arc.clone(), | ||
| }) | ||
| } else { | ||
| Err(savvy::Error::new("No CRS available for this geometry type")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring says this returns NULL for the "no crs" case. If this is hard to do with savvy maybe just update the docstring explaining that.
| match self.inner.srid() { | ||
| Ok(Some(srid)) => savvy::Sexp::try_from(srid as i32), | ||
| Ok(None) => Ok(savvy::NullSexp.into()), | ||
| Err(e) => Err(savvy::Error::new(format!("Failed to get SRID: {e}"))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring says this should return NULL for this case?
| /// Get a formatted CRS display string like " (CRS: EPSG:4326)" or empty string | ||
| fn crs_display(&self) -> savvy::Result<savvy::Sexp> { | ||
| use sedona_schema::datatypes::SedonaType; | ||
|
|
||
| match &self.inner { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this one (from R you can do sd_type$crs()$display()?)
|
|
||
| // Use existing SedonaType infrastructure to parse the field | ||
| let inner = SedonaType::from_storage_field(&field) | ||
| .map_err(|e| savvy::Error::new(format!("Failed to create SedonaType: {e}")))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .map_err(|e| savvy::Error::new(format!("Failed to create SedonaType: {e}")))?; | |
| .map_err(|e| savvy_err!("Failed to create SedonaType: {e}"))?; |
(I've been trying to consistently use savvy_err!() elsewhere but I'm new to this so the conventions aren't perfect)
| +-----------------------------+----------------------------+ | ||
| +-----------------------------+----------------------------+ | ||
| Preview of up to 0 row(s) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a snapshot test for printing without any geometry column?
no worries, and I will get back to you on the questions a bit later. Thanks for another thorough review! |
Adds CRS printing to
sedonadb_dataframeprint method and relevant helper in rust that can be reused in other functions (but I kept it unexported for now).