Skip to content

Conversation

@e-kotov
Copy link

@e-kotov e-kotov commented Jan 1, 2026

Adds CRS printing to sedonadb_dataframe print method and relevant helper in rust that can be reused in other functions (but I kept it unexported for now).

library(sf)
nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
nc <- st_transform(nc, 5070)
x <- as_sedonadb_dataframe()
x
# A sedonadb_dataframe: ? x 16
# Geometry: geometry (CRS: EPSG:5070)
┌─────────┬───────────┬─────────┬───┬────────────────────────────────────────────────┬───────┐
│   AREAPERIMETERCNTY_  ┆ … ┆                    geometrysid  │
│ float64float64float64 ┆   ┆                    geometryint32 │
╞═════════╪═══════════╪═════════╪═══╪════════════════════════════════════════════════╪═══════╡
│   0.1141.4421825.0 ┆ … ┆ MULTIPOLYGON(((1288822.3256753243 1563699.442… ┆     1 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│   0.0611.2311827.0 ┆ … ┆ MULTIPOLYGON(((1307046.8945761852 1581379.933… ┆     2 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│   0.1431.631828.0 ┆ … ┆ MULTIPOLYGON(((1378069.6323642111 1578869.811… ┆     3 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│    0.072.9681831.0 ┆ … ┆ MULTIPOLYGON(((1765394.8318386688 1660922.198… ┆     4 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│   0.1532.2061832.0 ┆ … ┆ MULTIPOLYGON(((1661822.2229292062 1630503.661… ┆     5 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│   0.0971.671833.0 ┆ … ┆ MULTIPOLYGON(((1703230.114237553 1638095.6588… ┆     6 │
└─────────┴───────────┴─────────┴───┴────────────────────────────────────────────────┴───────┘
Preview of up to 6 row(s)

Copilot AI review requested due to automatic review settings January 1, 2026 19:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds CRS (Coordinate Reference System) printing functionality to the sedonadb_dataframe print method. When printing a dataframe with geometry columns, the CRS information is now displayed below the header, showing the geometry column names along with their CRS identifiers (e.g., "EPSG:5070", "OGC:CRS84").

Key Changes:

  • Added Rust function parse_crs_metadata to extract CRS information from GeoArrow metadata
  • Enhanced print.sedonadb_dataframe to display geometry column CRS information with width-aware truncation
  • Created comprehensive test suite covering various CRS scenarios including EPSG codes, engineering CRS, and edge cases

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
r/sedonadb/src/rust/src/lib.rs Implements parse_crs_metadata Rust function to parse CRS from GeoArrow JSON metadata
r/sedonadb/src/rust/api.h Adds FFI declaration for the new parse_crs_metadata function
r/sedonadb/src/rust/Cargo.toml Adds serde_json dependency for JSON parsing
r/sedonadb/src/init.c Registers the new parse_crs_metadata C binding
r/sedonadb/R/000-wrappers.R Adds R wrapper for parse_crs_metadata FFI function
r/sedonadb/R/crs.R Introduces sd_parse_crs helper function for parsing CRS metadata
r/sedonadb/R/dataframe.R Enhances print.sedonadb_dataframe to display CRS information for geometry columns with truncation support
r/sedonadb/tests/testthat/test-crs.R Adds comprehensive tests for CRS parsing and display functionality

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! I love the column count, CRS, and geometry column information output when printing. I added a few high-level suggestions but with some tweaking we can merge the current approach too. The spirit of all my comments is that I'd love to use some of the places where we've already implemented some of this for Rust or Python or geoarrow/r already.

Something I added to the Python bindings but forgot to add here was a bare-bones wrapper around the SedonaDB/DataFusion schema, which provides access to column/type information including the CRS: https://github.com/apache/sedona-db/blob/e0e1d109480727faaf7be25923b57b4686144438/python/sedonadb/src/schema.rs . I added some suggestions inline about how to draw a few ideas from that hopefully without widening the scope of this PR too much 🙂

@e-kotov
Copy link
Author

e-kotov commented Jan 7, 2026

@paleolimbot I think I addressed all comments, but I might need some help with how to best approach the merge conflict.

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the merge conflicts...I was trying to make it easier to develop the R package but it definitely conflicted with this PR 😬 . You should be able to git pull and use tools/update-savvy.sh and air format with the package now.

This is looking great! A few things we should solve here I think but I love the improved output and I think this is close!

expect_snapshot(sedonadb:::sd_parse_crs(meta))
})

# Tests for CRS display in print.sedonadb_dataframe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I like your other tests that put this comment inside the test_that() block (I think this is what I did for the tests in some of the other files too)

Comment on lines 112 to 117
test_that("sd_parse_crs handles empty string", {
expect_snapshot(
sedonadb:::sd_parse_crs(""),
error = TRUE
)
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test feels like it should be renamed (or the behaviour modified such that it handles the empty string)

out.set_name(0, "authority_code")?;
out.set_name(1, "srid")?;
out.set_name(2, "name")?;
out.set_name(3, "proj_string")?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more appropriate to call this input (which is the term sf uses to describe this concept, sort of), or maybe definition. (A "proj string" carries the connotation specific formatting that is not how this is typically formatted here)

inner: crs_arc.clone(),
})
} else {
Err(savvy::Error::new("No CRS available for this geometry type"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says this returns NULL for the "no crs" case. If this is hard to do with savvy maybe just update the docstring explaining that.

match self.inner.srid() {
Ok(Some(srid)) => savvy::Sexp::try_from(srid as i32),
Ok(None) => Ok(savvy::NullSexp.into()),
Err(e) => Err(savvy::Error::new(format!("Failed to get SRID: {e}"))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says this should return NULL for this case?

Comment on lines +239 to +243
/// Get a formatted CRS display string like " (CRS: EPSG:4326)" or empty string
fn crs_display(&self) -> savvy::Result<savvy::Sexp> {
use sedona_schema::datatypes::SedonaType;

match &self.inner {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this one (from R you can do sd_type$crs()$display()?)


// Use existing SedonaType infrastructure to parse the field
let inner = SedonaType::from_storage_field(&field)
.map_err(|e| savvy::Error::new(format!("Failed to create SedonaType: {e}")))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.map_err(|e| savvy::Error::new(format!("Failed to create SedonaType: {e}")))?;
.map_err(|e| savvy_err!("Failed to create SedonaType: {e}"))?;

(I've been trying to consistently use savvy_err!() elsewhere but I'm new to this so the conventions aren't perfect)

+-----------------------------+----------------------------+
+-----------------------------+----------------------------+
Preview of up to 0 row(s)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a snapshot test for printing without any geometry column?

@e-kotov
Copy link
Author

e-kotov commented Jan 7, 2026

Apologies for the merge conflicts...I was trying to make it easier to develop the R package

no worries, and air.toml is very welcome! I previously disabled mine locally to prevent unnecessary code changes, as I did not know your preference for this.

I will get back to you on the questions a bit later. Thanks for another thorough review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants