perf: Optimize st_has(z/m) using WKBBytesExecutor + Implement new WKBHeader #171

petern48 · 2025-10-01T01:50:47Z

This PR leverages the new WKBBytesExecutor for dimension calculation, so we can implement functions like st_hasz and st_hasm without parsing the entire geometry. The logic turns out to be more complicated than I originally expected (due to edge cases relating to inferring the dimensionality).

To properly get the dimensionality, we need to OR all of the following (short-circuiting permitted, of course):

the dimensionality of the geometry type (the obvious one)
- e.g POINT Z EMPTY -> xyz
the dimensionality of the first nested geometry (if it's some sort of collection)
- e.g GEOMETRYCOLLECTION (POINT Z (0 0 0)) -> xyz

closes issue #170

Benchmark results:

…dge case)

rust/sedona-functions/src/st_haszm.rs

petern48 · 2025-10-01T02:05:01Z

I'd also want to convert that function to one that returns the dimensionality (e.g xy, xyz, etc) and then use that to implement st_haszm, in case that logic can be reused elsewhere.

paleolimbot

Cool! In general I think this is a great idea (lazy parsing just the header when that's all we need).

I left a suggestion about consolidating some of the first-few-bytes parsing we're doing so that we have a place to test it better.

python/sedonadb/tests/functions/test_functions.py

rust/sedona-functions/src/st_haszm.rs

paleolimbot · 2025-10-01T02:43:55Z

rust/sedona-functions/src/st_haszm.rs

+fn infer_haszm(buf: &[u8], dim_index: usize) -> Result<Option<bool>> {
+    if buf.len() < 5 {
+        return sedona_internal_err!("Invalid WKB: buffer too small ({} bytes)", buf.len());
+    }


We should consider consolidating this with the geometry type optimization like:

struct WkbHeader { geometry_type: u32, size: u32, first_coord_geometry_type: u32, first_coord_offset: u32 } impl WkbHeader { pub fn geometry_type_id(&self) -> GeometryTypeId {...} pub fn dimensions(&self) -> Dimensions { ... } pub fn num_dimensions(&self) -> usize { ... } }

There are a few functions that can benefit from this (npoints in a few cases, hilbert, isempty), although it might make a better PR into the wkb crate where there's already the logic for the parsing.

Yeah, I like this idea. I was thinking just create it in another file in sedona-geometry for now, and then we can later decide to upstream to wkb if it makes sense to.

Done. Let me know what you think about the approach. I'll do a follow-up PR to make st_geometrytype use this, since I have another small feature to implement with it. Keep in mind that the dimension() calculation can potentially recurse a lot (e.g a bunch of nested GEOMETRYCOLLECTIONs), so I'd like to avoid just computing all of the fields at construction and saving them as fields. I instead went the route of computing them lazily and using the fields for caching values after they are calculated.

rust/sedona-functions/src/st_haszm.rs

…ture as test

…on fields

petern48 · 2025-10-04T06:12:40Z

Added perf benchmarks to the PR description 🤠

petern48 · 2025-10-04T16:20:00Z

I can't import sedona-testing into sedona-geometry due to a circular dependency, and hence can't import the fixture into the wkb_header.rs to be tested. It is at least being tested in st_has_(z/m), so I'm confident the logic is right. Let me know if you'd rather move something around or copy-paste the test over, otherwise, this is how I'll leave it.

The unparseable WKT strings are still left in the code as comments at the moment, though I did also mention them in #162 as a separate reminder if / whenever that's fixed. Personally, I prefer to leave the comments in the code as an additional reminder, but if you'd rather have me delete them. Let me know.

paleolimbot

This is going to be so cool! I left some suggestions about reorganizing the WkbHeader to support a few of the other things I'd like to do with it 🙂

python/sedonadb/tests/functions/test_functions.py

rust/sedona-testing/src/fixtures.rs

rust/sedona-geometry/src/wkb_header.rs

paleolimbot · 2025-10-05T03:02:39Z

rust/sedona-geometry/src/wkb_header.rs

+    match code / 1000 {
+        // If xy, it's possible we need to infer the dimension
+        0 => {}
+        1 => return Ok(Dimensions::Xyz),
+        2 => return Ok(Dimensions::Xym),
+        3 => return Ok(Dimensions::Xyzm),
+        _ => return sedona_internal_err!("Unexpected code: {code}"),
+    };


This should also handle EWKB high bit flags. Most of the time this will be ISO WKB from GeoParquet but not all tools have control over the type of WKB they generate and we're better for dealing with it (unless you can demonstrate measurable performance overhead, which I doubt is the case here). One notable data point is that WKB coming from Sedona Spark's dataframe_to_arrow() is EWKB.

paleolimbot · 2025-10-05T03:07:47Z

rust/sedona-geometry/src/wkb_header.rs

+    // Try to infer dimension
+    // If geometry is a collection (MULTIPOINT, ... GEOMETRYCOLLECTION, code 4-7), we need to check the dimension of the first geometry
+    if code & 0x7 >= 4 {
+        // The next 4 bytes are the number of geometries in the collection
+        let num_geometries = match byte_order {
+            0 => u32::from_be_bytes([buf[5], buf[6], buf[7], buf[8]]),
+            1 => u32::from_le_bytes([buf[5], buf[6], buf[7], buf[8]]),
+            other => return sedona_internal_err!("Unexpected byte order: {other}"),
+        };
+        // Check the dimension of the first geometry since they all have to be the same dimension
+        // Note: Attempting to create the following geometries error and are thus not possible to create:
+        // - Nested geometry dimension doesn't match the **specified** geom collection z-dimension
+        //   - GEOMETRYCOLLECTION M (POINT Z (1 1 1))
+        // - Nested geometry doesn't have the specified dimension
+        //   - GEOMETRYCOLLECTION Z (POINT (1 1))
+        // - Nested geometries have different dimensions
+        //   - GEOMETRYCOLLECTION (POINT Z (1 1 1), POINT (1 1))


This part is, I believe unique to st_hasz() and should possibly live in the file implementing that function (or be explicit in the name of the function...I think of dimensions as the explicitly declared dimensions at the top-level WKB).

I think this logic should be kept here, actually. The logic of st_hasz() is simply to get the dimensionality of the object and see if it has a z-dimension. No other special logic. This logic here you're referring to is for handling a slight nuance in how SedonaDB converts WKT to WKB. Specifically, it translates the following geometry into WKB where the top-most dimension is specified as xy, while all of the actual coordinates in the geometry have z dimension.

e.g GEOMETRYCOLLECTION (POINT Z (1 2 3))
(I'd expect the same issue with MULTIPOINT ((1 1 1)) is WKT supported parsing it)

I think of these examples as geometries that really are xyz dimension, but rely on us to infer the z-dimension.

SedonaDB parses the first example as follows:

select st_asbinary(st_geomfromtext('geometrycollection (point z (1 2 3))')); -- 01**07000000**0100000001e9030000000000000000f03f00000000000000400000000000000840

Notably, the top-level dimensionality is simply 7 (xy), whereas we should really be interpretting the whole thing as an xyz.

Interestingly, the same query on PostGIS, returns the binary as the following where the top-level dimensionality is xyz.

01**ef030000**0100000001e9030000000000000000f03f00000000000000400000000000000840

I'm not sure if this is a bug in how WKT is translated into WKB, but this logic should be necessary to interpret that WKT the same way as PostGIS interprets it. We'd want to kept this logic for ST_ZMFlag, for example. Are there any concrete functions you can think of where we'd want to take the top-level dimension and ignore any potential extra dimensions in the points?

Hmm, unless you want to move this dimensions() method from the WKBHeader class entirely. I do like your idea of removing the buf field from the class, but considering this edge case, I see two options we could do to maintain correctness:

Move the dimensions() function outside and don't provide any method inside of WKBHeader

In try_new() also check the dimension of the first coordinate (e.g, it's xyz) and store that as a separate field to be retrieved in the dimensions() method. We could get this info during our pass to getting first_xy.

edit: I'm working on option 2 atm, unless you say otherwise

Got it! How about two methods:

dimensions(&self) (top-level dimensions as declared by WKB)

first_coord_dimensions(&self)

Which one of those you want mostly depends why you're asking if a geometry has a Z component or what previous information you have (our implementation of st_z, for example, would have no need for the second version).

These are also both approximations...there's nothing stopping somebody from putting a Z value in in the second collection item (does PostGIS only check the first one?). Since neither are truly correct I don't think the WkbHeader should take sides...just provide information. If somebody really does need to wrangle badly written data from SQL there are other tools at their disposal (st_dump() maybe)...if a particular algorithm must know if if there are Z values, it should probably check the entire collection.

rust/sedona-geometry/src/wkb_header.rs

rust/sedona-functions/src/st_haszm.rs

Co-authored-by: Dewey Dunnington <[email protected]>

…t expected test results

petern48 · 2025-10-07T07:31:43Z

rust/sedona-geometry/src/wkb_header.rs

+    // Dimensions of the first nested geometry of a collection or None if empty
+    // For POINT, LINESTRING, POLYGON, returns the dimensions of the geometry
+    first_geom_dimensions: Option<Dimensions>,


I made it first_geom_dimensions instead of first_coord_dimensions since that's logically what I'm actually doing, finding the first non-collection geometry (using first_geom_idx()) and then taking the dimensions field of that. I'm not somehow checking the values of the first coordinate and determining whether it's xy or xyz. Feel free to propose a different name if you think it's oddly named.

How about first_sequence_geometry_type: u32? (Slightly more in tune with your existing pattern of storing raw data and calculating the value on request)

petern48 · 2025-10-07T07:36:10Z

rust/sedona-geometry/src/wkb_header.rs

+    // #[test]
+    // fn srid() {
+    //     // This doesn't work
+    //     let wkb = make_wkb("SRID=4326;POINT (1 2)");
+    //     println!("wkb: {:?}", wkb);
+    //     let header = WkbHeader::try_new(&wkb).unwrap();
+    //     assert_eq!(header.srid(), 4326);
+    // }


Can you give me some advice on how to test this? Any nice helper functions for SRIDs?

Gentle ping in case this slipped under your radar @paleolimbot. Looking here, it looks like WKB crate doesn't support writing EWKB, and instead relies on the geos crate for writing (which we can't access from sedona-geometry. I was hoping to write a decent number of cases originally, but it's looking like I'll need to hard-code these as fixtures. Let me know if you have any alternative ideas.

It did...thanks for the ping!

I use R's wk package to generate these (long ago I wrote EKWT parsing and EWKB writing as a default, which was not a good idea in retrospect, but has proved very useful for generating test data). You can do this yourself or use these as fixtures (I think these are all the ones you'll need):

wk::as_wkb("SRID=4326;POINT (1 2)") |> dput() #> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0x20, 0xe6, 0x10, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40))), class = c("wk_wkb", #> "wk_vctr")) wk::as_wkb("SRID=4326;POINT Z (1 2 3)") |> dput() #> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0xa0, 0xe6, 0x10, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, #> 0x00, 0x00, 0x08, 0x40))), class = c("wk_wkb", "wk_vctr")) wk::as_wkb("SRID=4326;POINT M (1 2 4)") |> dput() #> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0x60, 0xe6, 0x10, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, #> 0x00, 0x00, 0x10, 0x40))), class = c("wk_wkb", "wk_vctr")) wk::as_wkb("SRID=4326;POINT ZM (1 2 3 4)") |> dput() #> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0xe0, 0xe6, 0x10, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, #> 0x00, 0x00, 0x08, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, #> 0x40))), class = c("wk_wkb", "wk_vctr")) wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT (1 2))") |> dput() #> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10, #> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0x00, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00, #> 0x00, 0x00, 0x00, 0x00, 0x40))), class = c("wk_wkb", "wk_vctr" #> )) wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT Z (1 2 3))") |> dput() #> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10, #> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0x80, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00, #> 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, #> 0x08, 0x40))), class = c("wk_wkb", "wk_vctr")) wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT M (1 2 4))") |> dput() #> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10, #> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0x40, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00, #> 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, #> 0x10, 0x40))), class = c("wk_wkb", "wk_vctr")) wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT ZM (1 2 3 4))") |> dput() #> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10, #> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0xc0, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00, #> 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, #> 0x08, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x40))), class = c("wk_wkb", #> "wk_vctr"))

rust/sedona-geometry/src/wkb_header.rs

… to exec_err

petern48 · 2025-10-07T07:49:49Z

Nearly done, mainly just waiting for advice on how to test SRID. Might need to debug a bit. But otherwise, this is close. Variable / function renaming suggestions are welcome. Spent less time thinking about naming as things got complicated, I shifted towards just getting everything to work right.

paleolimbot

Sorry this slipped under my radar on your last update!

This is structurally great! I have some specific comments on the parser...the parser is a pretty important piece to ensure the details are correct (i.e., getting it wrong can lead to incorrect results or crashes), which is why my comments there are rather picky 😬

paleolimbot · 2025-10-09T14:33:25Z

rust/sedona-geometry/src/wkb_header.rs

+    // Dimensions of the first nested geometry of a collection or None if empty
+    // For POINT, LINESTRING, POLYGON, returns the dimensions of the geometry
+    first_geom_dimensions: Option<Dimensions>,


How about first_sequence_geometry_type: u32? (Slightly more in tune with your existing pattern of storing raw data and calculating the value on request)

paleolimbot · 2025-10-09T14:34:29Z

rust/sedona-geometry/src/wkb_header.rs

+    pub fn try_new(buf: &[u8]) -> Result<Self> {
+        if buf.len() < 5 {
+            return exec_err!("Invalid WKB: buffer too small -> try_new");


We should probably use SedonaGeometryError here (this should avoid a datafusion-common and sedona-common dependency here)

paleolimbot · 2025-10-09T14:35:21Z

rust/sedona-geometry/Cargo.toml

 wkt = { workspace = true }

 [dependencies]
+datafusion-common = { workspace = true }


We probably should not depend on datafusion-common or sedona-common here (this is otherwise a pretty lightweight crate).

paleolimbot · 2025-10-09T14:44:21Z

rust/sedona-geometry/src/wkb_header.rs

+        let dimensions = match self.geometry_type / 1000 {
+            0 => Dimensions::Xy,
+            1 => Dimensions::Xyz,
+            2 => Dimensions::Xym,
+            3 => Dimensions::Xyzm,
+            _ => exec_err!("Unexpected code: {}", self.geometry_type)?,
+        };
+        Ok(dimensions)


This also needs to handle the EWKB Z or M mask. This match exists in a few places and would benefit from its own function.

paleolimbot · 2025-10-09T14:53:16Z

rust/sedona-geometry/src/wkb_header.rs

+            srid = match byte_order {
+                0 => u32::from_be_bytes([buf[5], buf[6], buf[7], buf[8]]),
+                1 => u32::from_le_bytes([buf[5], buf[6], buf[7], buf[8]]),
+                other => return sedona_internal_err!("Unexpected byte order: {other}"),


This pattern is also repeated quite a few times and would benefit from a function

paleolimbot · 2025-10-09T15:13:45Z

rust/sedona-geometry/src/wkb_header.rs

+        _ => exec_err!("Unexpected code: {code:?}"),
+    }
+}
+


There is a lot of code here that is bookkeeping and byte swapping as you walk the buffer and a number of those elements are repeated. The part that makes this complicated is the collection part where you need to parse until the first sequence (otherwise you would just be copying the first few bytes of the buffer).

Many parsers manage abstracting that repetition with something like this:

struct WkbBuffer { buf: &[u8], offset: usize, remaining: usize, last_endian: u8 } impl WkbBuffer { pub fn read_endian(&mut self) -> Result<()> { if self.remaining < 1 { return Err(...) } self.last_endian = buf[self.offset]; self.remaining -= 1; self.offset += 1; Ok(()) } pub fn read_u32(&mut self) -> Result<u32> { if self.remaining < 4 { return Err(...) } let out = match self.last_endian { ... } self.remaining -= 4; self.offset += 4; Ok(out) } }

paleolimbot · 2025-10-09T15:27:00Z

rust/sedona-geometry/src/wkb_header.rs

+    // #[test]
+    // fn srid() {
+    //     // This doesn't work
+    //     let wkb = make_wkb("SRID=4326;POINT (1 2)");
+    //     println!("wkb: {:?}", wkb);
+    //     let header = WkbHeader::try_new(&wkb).unwrap();
+    //     assert_eq!(header.srid(), 4326);
+    // }


It did...thanks for the ping!

I use R's wk package to generate these (long ago I wrote EKWT parsing and EWKB writing as a default, which was not a good idea in retrospect, but has proved very useful for generating test data). You can do this yourself or use these as fixtures (I think these are all the ones you'll need):

wk::as_wkb("SRID=4326;POINT (1 2)") |> dput() #> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0x20, 0xe6, 0x10, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40))), class = c("wk_wkb", #> "wk_vctr")) wk::as_wkb("SRID=4326;POINT Z (1 2 3)") |> dput() #> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0xa0, 0xe6, 0x10, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, #> 0x00, 0x00, 0x08, 0x40))), class = c("wk_wkb", "wk_vctr")) wk::as_wkb("SRID=4326;POINT M (1 2 4)") |> dput() #> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0x60, 0xe6, 0x10, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, #> 0x00, 0x00, 0x10, 0x40))), class = c("wk_wkb", "wk_vctr")) wk::as_wkb("SRID=4326;POINT ZM (1 2 3 4)") |> dput() #> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0xe0, 0xe6, 0x10, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, #> 0x00, 0x00, 0x08, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, #> 0x40))), class = c("wk_wkb", "wk_vctr")) wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT (1 2))") |> dput() #> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10, #> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0x00, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00, #> 0x00, 0x00, 0x00, 0x00, 0x40))), class = c("wk_wkb", "wk_vctr" #> )) wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT Z (1 2 3))") |> dput() #> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10, #> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0x80, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00, #> 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, #> 0x08, 0x40))), class = c("wk_wkb", "wk_vctr")) wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT M (1 2 4))") |> dput() #> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10, #> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0x40, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00, #> 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, #> 0x10, 0x40))), class = c("wk_wkb", "wk_vctr")) wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT ZM (1 2 3 4))") |> dput() #> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10, #> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0xc0, #> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00, #> 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, #> 0x08, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x40))), class = c("wk_wkb", #> "wk_vctr"))

rust/sedona-geometry/src/wkb_header.rs

paleolimbot · 2025-10-09T15:33:26Z

rust/sedona-geometry/src/wkb_header.rs

+        assert_eq!(header.first_geom_dimensions(), None);
+    }
+}


There also needs to be tests here for incomplete buffers. In theory you have logic to check that if there are an insufficient number of bytes available on the buffer you don't call buf[i]; however, if your checks are wrong the process will crash.

This is another benefit of using something like the WkbBuffer I suggested above (that logic is consolidated and you don't have to test as many cases).

petern48 added 4 commits September 30, 2025 18:33

Implement st_haszm using WKBBytesExecutor instead (missing one last e…

ccae8ff

…dge case)

Add note about handling last edge case

07aa206

Minor fix to the comments

cf031fb

Fix pre-commit

526fc3b

petern48 commented Oct 1, 2025

View reviewed changes

rust/sedona-functions/src/st_haszm.rs Outdated Show resolved Hide resolved

paleolimbot reviewed Oct 1, 2025

View reviewed changes

petern48 added 9 commits October 2, 2025 08:55

Fix cargo clippy

a491d91

Save progress

cf90bcf

Pull out dimension calculation logic into new wkb_header.rs

22a6087

Add MULTIPOINT_WITH_INFERRED_Z_DIMENSION_WKB fixture

5c616af

Fix dimension calculation to support all collection types and add fix…

207ecb1

…ture as test

Fix clippy and clean up

43009f8

Remove public byte_order method since it's not needed atm

1078bdd

Perform all wkb_header operations lazily and cache the values as Opti…

4d4e7e0

…on fields

Add python integration test benches

dfd6c1a

Add tests for wkb_header

0ef812d

petern48 marked this pull request as ready for review October 4, 2025 16:33

petern48 requested a review from paleolimbot October 4, 2025 16:33

paleolimbot reviewed Oct 5, 2025

View reviewed changes

petern48 and others added 5 commits October 4, 2025 21:04

Apply suggestion from @paleolimbot

075d6e6

Co-authored-by: Dewey Dunnington <[email protected]>

Remaining clean up

491b3c7

Update to method to dimensions plural

7efccc0

Rename method to try_new

06501e5

Update fixture to be multipoint ((1 2 3)) instead of all zeros

1b397fd

petern48 marked this pull request as draft October 5, 2025 22:51

jiayuasu mentioned this pull request Oct 6, 2025

feat(sql): Implement ST_Azimuth() #183

Merged

Implement refactor

9ce9f08

petern48 added 8 commits October 6, 2025 23:23

Remove inferred dimension case

617f9b8

Move logic to st_haszm

96311fa

Add empty_geometry_first_coord_dimensions test

d1d4fc8

Add test for size

573f7c8

Add tests

b5984dc

Implement fix for first_xy POLYGON logic

8c1d2f0

clean up

dc49aac

Rename from first_coord_dimensions to first_geom_dimensions and adjus…

89ebb4c

…t expected test results

petern48 commented Oct 7, 2025

View reviewed changes

rust/sedona-geometry/src/wkb_header.rs Show resolved Hide resolved

petern48 commented Oct 7, 2025

View reviewed changes

rust/sedona-geometry/src/wkb_header.rs Show resolved Hide resolved

Update name of method in st_haszm and update some sedona_internal_err…

bdc0fae

… to exec_err

petern48 changed the title ~~perf: Optimize st_has(z/m) using WKBBytesExecutor~~ perf: Optimize st_has(z/m) using WKBBytesExecutor + Implement new WKBHeader Oct 7, 2025

petern48 mentioned this pull request Oct 9, 2025

feat: Implement fast implementations of functions using WKBHeader #198

Open

paleolimbot reviewed Oct 9, 2025

View reviewed changes

perf: Optimize st_has(z/m) using WKBBytesExecutor + Implement new WKBHeader #171

Are you sure you want to change the base?

perf: Optimize st_has(z/m) using WKBBytesExecutor + Implement new WKBHeader #171

Uh oh!

Conversation

petern48 commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

petern48 commented Oct 1, 2025

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

petern48 commented Oct 4, 2025

Uh oh!

petern48 commented Oct 4, 2025

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

petern48 Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

petern48 Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

petern48 commented Oct 7, 2025

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

petern48 commented Oct 1, 2025 •

edited

Loading

petern48 Oct 5, 2025 •

edited

Loading

petern48 Oct 5, 2025 •

edited

Loading