Bug: `sjoin` produces duplicate rows for points on shared polygon boundaries

When performing a spatial join between region polygons and point coordinates, points that fall exactly on a shared boundary between two regions are matched to both adjacent polygons. This causes the resulting `joined` GeoDataFrame to have more rows than the input `coords`, silently producing incorrect results.

https://github.com/SheffieldSolar/Geocode/blob/4312fa817d48a674716fac005bad83b89e3285fc/geocode/utilities.py#L262-L267

### Expected behaviour

`len(joined) == len(coords)`

### Actual behaviour

`len(joined) == len(coords) + N` where N is the number of points landing on shared boundaries.

Inspecting the duplicates:

```
        region_id    index                   geometry
351695  E01019227  18235.0  POINT (-2.96825 55.00966)
351695  E01019228  18236.0  POINT (-2.96825 55.00966)
```

The same point is matched to two adjacent regions because the default `predicate="intersects"` counts boundary-straddling points as belonging to both polygons.

	x_coords, y_coords = zip(*[(x, y) for y, x in coords])
	coords = gpd.GeoDataFrame(
	{"geometry": gpd.points_from_xy(x_coords, y_coords)}, crs="EPSG:4326"
	).to_crs(regions.crs)
	regions.set_index("region_id", inplace=True)
	joined = regions.sjoin(coords, how="right").to_crs(regions.crs)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: `sjoin` produces duplicate rows for points on shared polygon boundaries #82

Expected behaviour

Actual behaviour

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: sjoin produces duplicate rows for points on shared polygon boundaries #82

Description

Expected behaviour

Actual behaviour

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Bug: `sjoin` produces duplicate rows for points on shared polygon boundaries #82