Skip to content

Conversation

petern48
Copy link
Collaborator

@petern48 petern48 commented Oct 1, 2025

This PR leverages the new WKBBytesExecutor for dimension calculation, so we can implement functions like st_hasz and st_hasm without parsing the entire geometry. The logic turns out to be more complicated than I originally expected (due to edge cases relating to inferring the dimensionality).

To properly get the dimensionality, we need to OR all of the following (short-circuiting permitted, of course):

  • the dimensionality of the geometry type (the obvious one)
    • e.g POINT Z EMPTY -> xyz
  • the dimensionality of the first nested geometry (if it's some sort of collection)
    • e.g GEOMETRYCOLLECTION (POINT Z (0 0 0)) -> xyz

closes issue #170

Benchmark results:

image image

@petern48
Copy link
Collaborator Author

petern48 commented Oct 1, 2025

I'd also want to convert that function to one that returns the dimensionality (e.g xy, xyz, etc) and then use that to implement st_haszm, in case that logic can be reused elsewhere.

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! In general I think this is a great idea (lazy parsing just the header when that's all we need).

I left a suggestion about consolidating some of the first-few-bytes parsing we're doing so that we have a place to test it better.

Comment on lines 114 to 117
fn infer_haszm(buf: &[u8], dim_index: usize) -> Result<Option<bool>> {
if buf.len() < 5 {
return sedona_internal_err!("Invalid WKB: buffer too small ({} bytes)", buf.len());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider consolidating this with the geometry type optimization like:

struct WkbHeader {
  geometry_type: u32,
  size: u32,
  first_coord_geometry_type: u32,
  first_coord_offset: u32
}

impl WkbHeader {
  pub fn geometry_type_id(&self) -> GeometryTypeId {...}
  pub fn dimensions(&self) -> Dimensions { ... }
  pub fn num_dimensions(&self) -> usize { ... }
}

There are a few functions that can benefit from this (npoints in a few cases, hilbert, isempty), although it might make a better PR into the wkb crate where there's already the logic for the parsing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I like this idea. I was thinking just create it in another file in sedona-geometry for now, and then we can later decide to upstream to wkb if it makes sense to.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Let me know what you think about the approach. I'll do a follow-up PR to make st_geometrytype use this, since I have another small feature to implement with it. Keep in mind that the dimension() calculation can potentially recurse a lot (e.g a bunch of nested GEOMETRYCOLLECTIONs), so I'd like to avoid just computing all of the fields at construction and saving them as fields. I instead went the route of computing them lazily and using the fields for caching values after they are calculated.

@petern48
Copy link
Collaborator Author

petern48 commented Oct 4, 2025

Added perf benchmarks to the PR description 🤠

@petern48
Copy link
Collaborator Author

petern48 commented Oct 4, 2025

I can't import sedona-testing into sedona-geometry due to a circular dependency, and hence can't import the fixture into the wkb_header.rs to be tested. It is at least being tested in st_has_(z/m), so I'm confident the logic is right. Let me know if you'd rather move something around or copy-paste the test over, otherwise, this is how I'll leave it.

The unparseable WKT strings are still left in the code as comments at the moment, though I did also mention them in #162 as a separate reminder if / whenever that's fixed. Personally, I prefer to leave the comments in the code as an additional reminder, but if you'd rather have me delete them. Let me know.

@petern48 petern48 marked this pull request as ready for review October 4, 2025 16:33
@petern48 petern48 requested a review from paleolimbot October 4, 2025 16:33
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be so cool! I left some suggestions about reorganizing the WkbHeader to support a few of the other things I'd like to do with it 🙂

Comment on lines 89 to 96
match code / 1000 {
// If xy, it's possible we need to infer the dimension
0 => {}
1 => return Ok(Dimensions::Xyz),
2 => return Ok(Dimensions::Xym),
3 => return Ok(Dimensions::Xyzm),
_ => return sedona_internal_err!("Unexpected code: {code}"),
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also handle EWKB high bit flags. Most of the time this will be ISO WKB from GeoParquet but not all tools have control over the type of WKB they generate and we're better for dealing with it (unless you can demonstrate measurable performance overhead, which I doubt is the case here). One notable data point is that WKB coming from Sedona Spark's dataframe_to_arrow() is EWKB.

Comment on lines 98 to 114
// Try to infer dimension
// If geometry is a collection (MULTIPOINT, ... GEOMETRYCOLLECTION, code 4-7), we need to check the dimension of the first geometry
if code & 0x7 >= 4 {
// The next 4 bytes are the number of geometries in the collection
let num_geometries = match byte_order {
0 => u32::from_be_bytes([buf[5], buf[6], buf[7], buf[8]]),
1 => u32::from_le_bytes([buf[5], buf[6], buf[7], buf[8]]),
other => return sedona_internal_err!("Unexpected byte order: {other}"),
};
// Check the dimension of the first geometry since they all have to be the same dimension
// Note: Attempting to create the following geometries error and are thus not possible to create:
// - Nested geometry dimension doesn't match the **specified** geom collection z-dimension
// - GEOMETRYCOLLECTION M (POINT Z (1 1 1))
// - Nested geometry doesn't have the specified dimension
// - GEOMETRYCOLLECTION Z (POINT (1 1))
// - Nested geometries have different dimensions
// - GEOMETRYCOLLECTION (POINT Z (1 1 1), POINT (1 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is, I believe unique to st_hasz() and should possibly live in the file implementing that function (or be explicit in the name of the function...I think of dimensions as the explicitly declared dimensions at the top-level WKB).

Copy link
Collaborator Author

@petern48 petern48 Oct 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic should be kept here, actually. The logic of st_hasz() is simply to get the dimensionality of the object and see if it has a z-dimension. No other special logic. This logic here you're referring to is for handling a slight nuance in how SedonaDB converts WKT to WKB. Specifically, it translates the following geometry into WKB where the top-most dimension is specified as xy, while all of the actual coordinates in the geometry have z dimension.

e.g GEOMETRYCOLLECTION (POINT Z (1 2 3))
(I'd expect the same issue with MULTIPOINT ((1 1 1)) is WKT supported parsing it)

I think of these examples as geometries that really are xyz dimension, but rely on us to infer the z-dimension.

SedonaDB parses the first example as follows:

select st_asbinary(st_geomfromtext('geometrycollection (point z (1 2 3))'));
-- 01**07000000**0100000001e9030000000000000000f03f00000000000000400000000000000840

Notably, the top-level dimensionality is simply 7 (xy), whereas we should really be interpretting the whole thing as an xyz.

Interestingly, the same query on PostGIS, returns the binary as the following where the top-level dimensionality is xyz.

01**ef030000**0100000001e9030000000000000000f03f00000000000000400000000000000840

I'm not sure if this is a bug in how WKT is translated into WKB, but this logic should be necessary to interpret that WKT the same way as PostGIS interprets it. We'd want to kept this logic for ST_ZMFlag, for example. Are there any concrete functions you can think of where we'd want to take the top-level dimension and ignore any potential extra dimensions in the points?

Copy link
Collaborator Author

@petern48 petern48 Oct 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, unless you want to move this dimensions() method from the WKBHeader class entirely. I do like your idea of removing the buf field from the class, but considering this edge case, I see two options we could do to maintain correctness:

  1. Move the dimensions() function outside and don't provide any method inside of WKBHeader
  2. In try_new() also check the dimension of the first coordinate (e.g, it's xyz) and store that as a separate field to be retrieved in the dimensions() method. We could get this info during our pass to getting first_xy.

edit: I'm working on option 2 atm, unless you say otherwise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! How about two methods:

  • dimensions(&self) (top-level dimensions as declared by WKB)
  • first_coord_dimensions(&self)

Which one of those you want mostly depends why you're asking if a geometry has a Z component or what previous information you have (our implementation of st_z, for example, would have no need for the second version).

These are also both approximations...there's nothing stopping somebody from putting a Z value in in the second collection item (does PostGIS only check the first one?). Since neither are truly correct I don't think the WkbHeader should take sides...just provide information. If somebody really does need to wrangle badly written data from SQL there are other tools at their disposal (st_dump() maybe)...if a particular algorithm must know if if there are Z values, it should probably check the entire collection.

@petern48 petern48 marked this pull request as draft October 5, 2025 22:51
Comment on lines +41 to +43
// Dimensions of the first nested geometry of a collection or None if empty
// For POINT, LINESTRING, POLYGON, returns the dimensions of the geometry
first_geom_dimensions: Option<Dimensions>,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it first_geom_dimensions instead of first_coord_dimensions since that's logically what I'm actually doing, finding the first non-collection geometry (using first_geom_idx()) and then taking the dimensions field of that. I'm not somehow checking the values of the first coordinate and determining whether it's xy or xyz. Feel free to propose a different name if you think it's oddly named.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about first_sequence_geometry_type: u32? (Slightly more in tune with your existing pattern of storing raw data and calculating the value on request)

Comment on lines +488 to +495
// #[test]
// fn srid() {
// // This doesn't work
// let wkb = make_wkb("SRID=4326;POINT (1 2)");
// println!("wkb: {:?}", wkb);
// let header = WkbHeader::try_new(&wkb).unwrap();
// assert_eq!(header.srid(), 4326);
// }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give me some advice on how to test this? Any nice helper functions for SRIDs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gentle ping in case this slipped under your radar @paleolimbot. Looking here, it looks like WKB crate doesn't support writing EWKB, and instead relies on the geos crate for writing (which we can't access from sedona-geometry. I was hoping to write a decent number of cases originally, but it's looking like I'll need to hard-code these as fixtures. Let me know if you have any alternative ideas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did...thanks for the ping!

I use R's wk package to generate these (long ago I wrote EKWT parsing and EWKB writing as a default, which was not a good idea in retrospect, but has proved very useful for generating test data). You can do this yourself or use these as fixtures (I think these are all the ones you'll need):

wk::as_wkb("SRID=4326;POINT (1 2)") |> dput()
#> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0x20, 0xe6, 0x10, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40))), class = c("wk_wkb", 
#> "wk_vctr"))
wk::as_wkb("SRID=4326;POINT Z (1 2 3)") |> dput()
#> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0xa0, 0xe6, 0x10, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 
#> 0x00, 0x00, 0x08, 0x40))), class = c("wk_wkb", "wk_vctr"))
wk::as_wkb("SRID=4326;POINT M (1 2 4)") |> dput()
#> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0x60, 0xe6, 0x10, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 
#> 0x00, 0x00, 0x10, 0x40))), class = c("wk_wkb", "wk_vctr"))
wk::as_wkb("SRID=4326;POINT ZM (1 2 3 4)") |> dput()
#> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0xe0, 0xe6, 0x10, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 
#> 0x00, 0x00, 0x08, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 
#> 0x40))), class = c("wk_wkb", "wk_vctr"))

wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT (1 2))") |> dput()
#> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10, 
#> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0x00, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00, 
#> 0x00, 0x00, 0x00, 0x00, 0x40))), class = c("wk_wkb", "wk_vctr"
#> ))
wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT Z (1 2 3))") |> dput()
#> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10, 
#> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0x80, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00, 
#> 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
#> 0x08, 0x40))), class = c("wk_wkb", "wk_vctr"))
wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT M (1 2 4))") |> dput()
#> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10, 
#> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0x40, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00, 
#> 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
#> 0x10, 0x40))), class = c("wk_wkb", "wk_vctr"))
wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT ZM (1 2 3 4))") |> dput()
#> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10, 
#> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0xc0, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00, 
#> 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
#> 0x08, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x40))), class = c("wk_wkb", 
#> "wk_vctr"))

@petern48
Copy link
Collaborator Author

petern48 commented Oct 7, 2025

Nearly done, mainly just waiting for advice on how to test SRID. Might need to debug a bit. But otherwise, this is close. Variable / function renaming suggestions are welcome. Spent less time thinking about naming as things got complicated, I shifted towards just getting everything to work right.

@petern48 petern48 changed the title perf: Optimize st_has(z/m) using WKBBytesExecutor perf: Optimize st_has(z/m) using WKBBytesExecutor + Implement new WKBHeader Oct 7, 2025
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this slipped under my radar on your last update!

This is structurally great! I have some specific comments on the parser...the parser is a pretty important piece to ensure the details are correct (i.e., getting it wrong can lead to incorrect results or crashes), which is why my comments there are rather picky 😬

Comment on lines +41 to +43
// Dimensions of the first nested geometry of a collection or None if empty
// For POINT, LINESTRING, POLYGON, returns the dimensions of the geometry
first_geom_dimensions: Option<Dimensions>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about first_sequence_geometry_type: u32? (Slightly more in tune with your existing pattern of storing raw data and calculating the value on request)

Comment on lines +48 to +50
pub fn try_new(buf: &[u8]) -> Result<Self> {
if buf.len() < 5 {
return exec_err!("Invalid WKB: buffer too small -> try_new");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably use SedonaGeometryError here (this should avoid a datafusion-common and sedona-common dependency here)

wkt = { workspace = true }

[dependencies]
datafusion-common = { workspace = true }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should not depend on datafusion-common or sedona-common here (this is otherwise a pretty lightweight crate).

Comment on lines +153 to +160
let dimensions = match self.geometry_type / 1000 {
0 => Dimensions::Xy,
1 => Dimensions::Xyz,
2 => Dimensions::Xym,
3 => Dimensions::Xyzm,
_ => exec_err!("Unexpected code: {}", self.geometry_type)?,
};
Ok(dimensions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs to handle the EWKB Z or M mask. This match exists in a few places and would benefit from its own function.

Comment on lines +69 to +72
srid = match byte_order {
0 => u32::from_be_bytes([buf[5], buf[6], buf[7], buf[8]]),
1 => u32::from_le_bytes([buf[5], buf[6], buf[7], buf[8]]),
other => return sedona_internal_err!("Unexpected byte order: {other}"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is also repeated quite a few times and would benefit from a function

_ => exec_err!("Unexpected code: {code:?}"),
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of code here that is bookkeeping and byte swapping as you walk the buffer and a number of those elements are repeated. The part that makes this complicated is the collection part where you need to parse until the first sequence (otherwise you would just be copying the first few bytes of the buffer).

Many parsers manage abstracting that repetition with something like this:

struct WkbBuffer {
  buf: &[u8],
  offset: usize,
  remaining: usize,
  last_endian: u8
}

impl WkbBuffer {

   pub fn read_endian(&mut self) -> Result<()> {
        if self.remaining < 1 {
            return Err(...)
        }
        self.last_endian = buf[self.offset];
        self.remaining -= 1;
        self.offset += 1;
        Ok(())
   }

   pub fn read_u32(&mut self) -> Result<u32> {
        if self.remaining < 4 {
            return Err(...)
        }
        let out = match self.last_endian { ... }
        self.remaining -= 4;
        self.offset += 4;
        Ok(out)
   }
}

Comment on lines +488 to +495
// #[test]
// fn srid() {
// // This doesn't work
// let wkb = make_wkb("SRID=4326;POINT (1 2)");
// println!("wkb: {:?}", wkb);
// let header = WkbHeader::try_new(&wkb).unwrap();
// assert_eq!(header.srid(), 4326);
// }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did...thanks for the ping!

I use R's wk package to generate these (long ago I wrote EKWT parsing and EWKB writing as a default, which was not a good idea in retrospect, but has proved very useful for generating test data). You can do this yourself or use these as fixtures (I think these are all the ones you'll need):

wk::as_wkb("SRID=4326;POINT (1 2)") |> dput()
#> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0x20, 0xe6, 0x10, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40))), class = c("wk_wkb", 
#> "wk_vctr"))
wk::as_wkb("SRID=4326;POINT Z (1 2 3)") |> dput()
#> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0xa0, 0xe6, 0x10, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 
#> 0x00, 0x00, 0x08, 0x40))), class = c("wk_wkb", "wk_vctr"))
wk::as_wkb("SRID=4326;POINT M (1 2 4)") |> dput()
#> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0x60, 0xe6, 0x10, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 
#> 0x00, 0x00, 0x10, 0x40))), class = c("wk_wkb", "wk_vctr"))
wk::as_wkb("SRID=4326;POINT ZM (1 2 3 4)") |> dput()
#> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0xe0, 0xe6, 0x10, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 
#> 0x00, 0x00, 0x08, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 
#> 0x40))), class = c("wk_wkb", "wk_vctr"))

wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT (1 2))") |> dput()
#> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10, 
#> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0x00, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00, 
#> 0x00, 0x00, 0x00, 0x00, 0x40))), class = c("wk_wkb", "wk_vctr"
#> ))
wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT Z (1 2 3))") |> dput()
#> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10, 
#> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0x80, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00, 
#> 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
#> 0x08, 0x40))), class = c("wk_wkb", "wk_vctr"))
wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT M (1 2 4))") |> dput()
#> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10, 
#> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0x40, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00, 
#> 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
#> 0x10, 0x40))), class = c("wk_wkb", "wk_vctr"))
wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT ZM (1 2 3 4))") |> dput()
#> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10, 
#> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0xc0, 
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00, 
#> 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
#> 0x08, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x40))), class = c("wk_wkb", 
#> "wk_vctr"))

Comment on lines +726 to +728
assert_eq!(header.first_geom_dimensions(), None);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There also needs to be tests here for incomplete buffers. In theory you have logic to check that if there are an insufficient number of bytes available on the buffer you don't call buf[i]; however, if your checks are wrong the process will crash.

This is another benefit of using something like the WkbBuffer I suggested above (that logic is consolidated and you don't have to test as many cases).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants