-
Notifications
You must be signed in to change notification settings - Fork 26
perf: Optimize st_has(z/m) using WKBBytesExecutor + Implement new WKBHeader #171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I'd also want to convert that function to one that returns the dimensionality (e.g xy, xyz, etc) and then use that to implement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! In general I think this is a great idea (lazy parsing just the header when that's all we need).
I left a suggestion about consolidating some of the first-few-bytes parsing we're doing so that we have a place to test it better.
fn infer_haszm(buf: &[u8], dim_index: usize) -> Result<Option<bool>> { | ||
if buf.len() < 5 { | ||
return sedona_internal_err!("Invalid WKB: buffer too small ({} bytes)", buf.len()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider consolidating this with the geometry type optimization like:
struct WkbHeader {
geometry_type: u32,
size: u32,
first_coord_geometry_type: u32,
first_coord_offset: u32
}
impl WkbHeader {
pub fn geometry_type_id(&self) -> GeometryTypeId {...}
pub fn dimensions(&self) -> Dimensions { ... }
pub fn num_dimensions(&self) -> usize { ... }
}
There are a few functions that can benefit from this (npoints in a few cases, hilbert, isempty), although it might make a better PR into the wkb
crate where there's already the logic for the parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I like this idea. I was thinking just create it in another file in sedona-geometry
for now, and then we can later decide to upstream to wkb
if it makes sense to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Let me know what you think about the approach. I'll do a follow-up PR to make st_geometrytype
use this, since I have another small feature to implement with it. Keep in mind that the dimension() calculation can potentially recurse a lot (e.g a bunch of nested GEOMETRYCOLLECTION
s), so I'd like to avoid just computing all of the fields at construction and saving them as fields. I instead went the route of computing them lazily and using the fields for caching values after they are calculated.
Added perf benchmarks to the PR description 🤠 |
I can't import The unparseable WKT strings are still left in the code as comments at the moment, though I did also mention them in #162 as a separate reminder if / whenever that's fixed. Personally, I prefer to leave the comments in the code as an additional reminder, but if you'd rather have me delete them. Let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be so cool! I left some suggestions about reorganizing the WkbHeader
to support a few of the other things I'd like to do with it 🙂
match code / 1000 { | ||
// If xy, it's possible we need to infer the dimension | ||
0 => {} | ||
1 => return Ok(Dimensions::Xyz), | ||
2 => return Ok(Dimensions::Xym), | ||
3 => return Ok(Dimensions::Xyzm), | ||
_ => return sedona_internal_err!("Unexpected code: {code}"), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also handle EWKB high bit flags. Most of the time this will be ISO WKB from GeoParquet but not all tools have control over the type of WKB they generate and we're better for dealing with it (unless you can demonstrate measurable performance overhead, which I doubt is the case here). One notable data point is that WKB coming from Sedona Spark's dataframe_to_arrow()
is EWKB.
// Try to infer dimension | ||
// If geometry is a collection (MULTIPOINT, ... GEOMETRYCOLLECTION, code 4-7), we need to check the dimension of the first geometry | ||
if code & 0x7 >= 4 { | ||
// The next 4 bytes are the number of geometries in the collection | ||
let num_geometries = match byte_order { | ||
0 => u32::from_be_bytes([buf[5], buf[6], buf[7], buf[8]]), | ||
1 => u32::from_le_bytes([buf[5], buf[6], buf[7], buf[8]]), | ||
other => return sedona_internal_err!("Unexpected byte order: {other}"), | ||
}; | ||
// Check the dimension of the first geometry since they all have to be the same dimension | ||
// Note: Attempting to create the following geometries error and are thus not possible to create: | ||
// - Nested geometry dimension doesn't match the **specified** geom collection z-dimension | ||
// - GEOMETRYCOLLECTION M (POINT Z (1 1 1)) | ||
// - Nested geometry doesn't have the specified dimension | ||
// - GEOMETRYCOLLECTION Z (POINT (1 1)) | ||
// - Nested geometries have different dimensions | ||
// - GEOMETRYCOLLECTION (POINT Z (1 1 1), POINT (1 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is, I believe unique to st_hasz()
and should possibly live in the file implementing that function (or be explicit in the name of the function...I think of dimensions
as the explicitly declared dimensions at the top-level WKB).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this logic should be kept here, actually. The logic of st_hasz()
is simply to get the dimensionality of the object and see if it has a z-dimension. No other special logic. This logic here you're referring to is for handling a slight nuance in how SedonaDB converts WKT to WKB. Specifically, it translates the following geometry into WKB where the top-most dimension is specified as xy, while all of the actual coordinates in the geometry have z dimension.
e.g GEOMETRYCOLLECTION (POINT Z (1 2 3))
(I'd expect the same issue with MULTIPOINT ((1 1 1))
is WKT supported parsing it)
I think of these examples as geometries that really are xyz dimension, but rely on us to infer the z-dimension.
SedonaDB parses the first example as follows:
select st_asbinary(st_geomfromtext('geometrycollection (point z (1 2 3))'));
-- 01**07000000**0100000001e9030000000000000000f03f00000000000000400000000000000840
Notably, the top-level dimensionality is simply 7
(xy), whereas we should really be interpretting the whole thing as an xyz.
Interestingly, the same query on PostGIS, returns the binary as the following where the top-level dimensionality is xyz.
01**ef030000**0100000001e9030000000000000000f03f00000000000000400000000000000840
I'm not sure if this is a bug in how WKT is translated into WKB, but this logic should be necessary to interpret that WKT the same way as PostGIS interprets it. We'd want to kept this logic for ST_ZMFlag, for example. Are there any concrete functions you can think of where we'd want to take the top-level dimension and ignore any potential extra dimensions in the points?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, unless you want to move this dimensions()
method from the WKBHeader
class entirely. I do like your idea of removing the buf
field from the class, but considering this edge case, I see two options we could do to maintain correctness:
- Move the
dimensions()
function outside and don't provide any method inside ofWKBHeader
- In
try_new()
also check the dimension of the first coordinate (e.g, it's xyz) and store that as a separate field to be retrieved in thedimensions()
method. We could get this info during our pass to gettingfirst_xy
.
edit: I'm working on option 2 atm, unless you say otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! How about two methods:
dimensions(&self)
(top-level dimensions as declared by WKB)first_coord_dimensions(&self)
Which one of those you want mostly depends why you're asking if a geometry has a Z component or what previous information you have (our implementation of st_z, for example, would have no need for the second version).
These are also both approximations...there's nothing stopping somebody from putting a Z value in in the second collection item (does PostGIS only check the first one?). Since neither are truly correct I don't think the WkbHeader
should take sides...just provide information. If somebody really does need to wrangle badly written data from SQL there are other tools at their disposal (st_dump()
maybe)...if a particular algorithm must know if if there are Z values, it should probably check the entire collection.
…t expected test results
// Dimensions of the first nested geometry of a collection or None if empty | ||
// For POINT, LINESTRING, POLYGON, returns the dimensions of the geometry | ||
first_geom_dimensions: Option<Dimensions>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it first_geom_dimensions
instead of first_coord_dimensions
since that's logically what I'm actually doing, finding the first non-collection geometry (using first_geom_idx()
) and then taking the dimensions field of that. I'm not somehow checking the values of the first coordinate and determining whether it's xy or xyz. Feel free to propose a different name if you think it's oddly named.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about first_sequence_geometry_type: u32
? (Slightly more in tune with your existing pattern of storing raw data and calculating the value on request)
// #[test] | ||
// fn srid() { | ||
// // This doesn't work | ||
// let wkb = make_wkb("SRID=4326;POINT (1 2)"); | ||
// println!("wkb: {:?}", wkb); | ||
// let header = WkbHeader::try_new(&wkb).unwrap(); | ||
// assert_eq!(header.srid(), 4326); | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give me some advice on how to test this? Any nice helper functions for SRIDs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gentle ping in case this slipped under your radar @paleolimbot. Looking here, it looks like WKB crate doesn't support writing EWKB, and instead relies on the geos
crate for writing (which we can't access from sedona-geometry
. I was hoping to write a decent number of cases originally, but it's looking like I'll need to hard-code these as fixtures. Let me know if you have any alternative ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It did...thanks for the ping!
I use R's wk package to generate these (long ago I wrote EKWT parsing and EWKB writing as a default, which was not a good idea in retrospect, but has proved very useful for generating test data). You can do this yourself or use these as fixtures (I think these are all the ones you'll need):
wk::as_wkb("SRID=4326;POINT (1 2)") |> dput()
#> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0x20, 0xe6, 0x10,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40))), class = c("wk_wkb",
#> "wk_vctr"))
wk::as_wkb("SRID=4326;POINT Z (1 2 3)") |> dput()
#> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0xa0, 0xe6, 0x10,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00,
#> 0x00, 0x00, 0x08, 0x40))), class = c("wk_wkb", "wk_vctr"))
wk::as_wkb("SRID=4326;POINT M (1 2 4)") |> dput()
#> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0x60, 0xe6, 0x10,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00,
#> 0x00, 0x00, 0x10, 0x40))), class = c("wk_wkb", "wk_vctr"))
wk::as_wkb("SRID=4326;POINT ZM (1 2 3 4)") |> dput()
#> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0xe0, 0xe6, 0x10,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00,
#> 0x00, 0x00, 0x08, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10,
#> 0x40))), class = c("wk_wkb", "wk_vctr"))
wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT (1 2))") |> dput()
#> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10,
#> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0x00,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00,
#> 0x00, 0x00, 0x00, 0x00, 0x40))), class = c("wk_wkb", "wk_vctr"
#> ))
wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT Z (1 2 3))") |> dput()
#> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10,
#> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0x80,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00,
#> 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
#> 0x08, 0x40))), class = c("wk_wkb", "wk_vctr"))
wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT M (1 2 4))") |> dput()
#> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10,
#> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0x40,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00,
#> 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
#> 0x10, 0x40))), class = c("wk_wkb", "wk_vctr"))
wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT ZM (1 2 3 4))") |> dput()
#> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10,
#> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0xc0,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00,
#> 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
#> 0x08, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x40))), class = c("wk_wkb",
#> "wk_vctr"))
Nearly done, mainly just waiting for advice on how to test SRID. Might need to debug a bit. But otherwise, this is close. Variable / function renaming suggestions are welcome. Spent less time thinking about naming as things got complicated, I shifted towards just getting everything to work right. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this slipped under my radar on your last update!
This is structurally great! I have some specific comments on the parser...the parser is a pretty important piece to ensure the details are correct (i.e., getting it wrong can lead to incorrect results or crashes), which is why my comments there are rather picky 😬
// Dimensions of the first nested geometry of a collection or None if empty | ||
// For POINT, LINESTRING, POLYGON, returns the dimensions of the geometry | ||
first_geom_dimensions: Option<Dimensions>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about first_sequence_geometry_type: u32
? (Slightly more in tune with your existing pattern of storing raw data and calculating the value on request)
pub fn try_new(buf: &[u8]) -> Result<Self> { | ||
if buf.len() < 5 { | ||
return exec_err!("Invalid WKB: buffer too small -> try_new"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably use SedonaGeometryError
here (this should avoid a datafusion-common and sedona-common dependency here)
wkt = { workspace = true } | ||
|
||
[dependencies] | ||
datafusion-common = { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should not depend on datafusion-common or sedona-common here (this is otherwise a pretty lightweight crate).
let dimensions = match self.geometry_type / 1000 { | ||
0 => Dimensions::Xy, | ||
1 => Dimensions::Xyz, | ||
2 => Dimensions::Xym, | ||
3 => Dimensions::Xyzm, | ||
_ => exec_err!("Unexpected code: {}", self.geometry_type)?, | ||
}; | ||
Ok(dimensions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also needs to handle the EWKB Z or M mask. This match exists in a few places and would benefit from its own function.
srid = match byte_order { | ||
0 => u32::from_be_bytes([buf[5], buf[6], buf[7], buf[8]]), | ||
1 => u32::from_le_bytes([buf[5], buf[6], buf[7], buf[8]]), | ||
other => return sedona_internal_err!("Unexpected byte order: {other}"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is also repeated quite a few times and would benefit from a function
_ => exec_err!("Unexpected code: {code:?}"), | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of code here that is bookkeeping and byte swapping as you walk the buffer and a number of those elements are repeated. The part that makes this complicated is the collection part where you need to parse until the first sequence (otherwise you would just be copying the first few bytes of the buffer).
Many parsers manage abstracting that repetition with something like this:
struct WkbBuffer {
buf: &[u8],
offset: usize,
remaining: usize,
last_endian: u8
}
impl WkbBuffer {
pub fn read_endian(&mut self) -> Result<()> {
if self.remaining < 1 {
return Err(...)
}
self.last_endian = buf[self.offset];
self.remaining -= 1;
self.offset += 1;
Ok(())
}
pub fn read_u32(&mut self) -> Result<u32> {
if self.remaining < 4 {
return Err(...)
}
let out = match self.last_endian { ... }
self.remaining -= 4;
self.offset += 4;
Ok(out)
}
}
// #[test] | ||
// fn srid() { | ||
// // This doesn't work | ||
// let wkb = make_wkb("SRID=4326;POINT (1 2)"); | ||
// println!("wkb: {:?}", wkb); | ||
// let header = WkbHeader::try_new(&wkb).unwrap(); | ||
// assert_eq!(header.srid(), 4326); | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It did...thanks for the ping!
I use R's wk package to generate these (long ago I wrote EKWT parsing and EWKB writing as a default, which was not a good idea in retrospect, but has proved very useful for generating test data). You can do this yourself or use these as fixtures (I think these are all the ones you'll need):
wk::as_wkb("SRID=4326;POINT (1 2)") |> dput()
#> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0x20, 0xe6, 0x10,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40))), class = c("wk_wkb",
#> "wk_vctr"))
wk::as_wkb("SRID=4326;POINT Z (1 2 3)") |> dput()
#> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0xa0, 0xe6, 0x10,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00,
#> 0x00, 0x00, 0x08, 0x40))), class = c("wk_wkb", "wk_vctr"))
wk::as_wkb("SRID=4326;POINT M (1 2 4)") |> dput()
#> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0x60, 0xe6, 0x10,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00,
#> 0x00, 0x00, 0x10, 0x40))), class = c("wk_wkb", "wk_vctr"))
wk::as_wkb("SRID=4326;POINT ZM (1 2 3 4)") |> dput()
#> structure(list(as.raw(c(0x01, 0x01, 0x00, 0x00, 0xe0, 0xe6, 0x10,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00,
#> 0x00, 0x00, 0x08, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10,
#> 0x40))), class = c("wk_wkb", "wk_vctr"))
wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT (1 2))") |> dput()
#> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10,
#> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0x00,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00,
#> 0x00, 0x00, 0x00, 0x00, 0x40))), class = c("wk_wkb", "wk_vctr"
#> ))
wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT Z (1 2 3))") |> dput()
#> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10,
#> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0x80,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00,
#> 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
#> 0x08, 0x40))), class = c("wk_wkb", "wk_vctr"))
wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT M (1 2 4))") |> dput()
#> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10,
#> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0x40,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00,
#> 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
#> 0x10, 0x40))), class = c("wk_wkb", "wk_vctr"))
wk::as_wkb("SRID=4326;GEOMETRYCOLLECTION (POINT ZM (1 2 3 4))") |> dput()
#> structure(list(as.raw(c(0x01, 0x07, 0x00, 0x00, 0x20, 0xe6, 0x10,
#> 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x01, 0x00, 0x00, 0xc0,
#> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f, 0x00, 0x00, 0x00,
#> 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
#> 0x08, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x40))), class = c("wk_wkb",
#> "wk_vctr"))
assert_eq!(header.first_geom_dimensions(), None); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There also needs to be tests here for incomplete buffers. In theory you have logic to check that if there are an insufficient number of bytes available on the buffer you don't call buf[i]
; however, if your checks are wrong the process will crash.
This is another benefit of using something like the WkbBuffer
I suggested above (that logic is consolidated and you don't have to test as many cases).
This PR leverages the new WKBBytesExecutor for dimension calculation, so we can implement functions like st_hasz and st_hasm without parsing the entire geometry. The logic turns out to be more complicated than I originally expected (due to edge cases relating to inferring the dimensionality).
To properly get the dimensionality, we need to OR all of the following (short-circuiting permitted, of course):
POINT Z EMPTY
-> xyzGEOMETRYCOLLECTION (POINT Z (0 0 0))
-> xyzcloses issue #170
Benchmark results: