feat(rust/sedona-spatial-join-gpu): Add GPU-accelerated spatial join support #465

zhangfengcdt · 2025-12-17T17:51:09Z

This commit introduces GPU-accelerated spatial join capabilities to SedonaDB, enabling significant performance improvements for large-scale spatial join operations.

Key changes:

Add new sedona-spatial-join-gpu crate that provides GPU-accelerated spatial join execution using CUDA via the sedona-libgpuspatial library.
Implement GpuSpatialJoinExec execution plan with build/probe phases that efficiently handles partitioned data by sharing build-side data across probes.
Add GPU backend abstraction (GpuBackend) for geometry data transfer and spatial predicate evaluation on GPU.
Extend the spatial join optimizer to automatically select GPU execution when available and beneficial, with configurable thresholds and fallback to CPU.
Add configuration options in SedonaOptions for GPU spatial join settings including enable/disable, row thresholds, and CPU fallback behavior.
Include comprehensive benchmarks and functional tests for GPU spatial join correctness validation against CPU reference implementations.

…support This commit introduces GPU-accelerated spatial join capabilities to SedonaDB, enabling significant performance improvements for large-scale spatial join operations. Key changes: - Add new `sedona-spatial-join-gpu` crate that provides GPU-accelerated spatial join execution using CUDA via the `sedona-libgpuspatial` library. - Implement `GpuSpatialJoinExec` execution plan with build/probe phases that efficiently handles partitioned data by sharing build-side data across probes. - Add GPU backend abstraction (`GpuBackend`) for geometry data transfer and spatial predicate evaluation on GPU. - Extend the spatial join optimizer to automatically select GPU execution when available and beneficial, with configurable thresholds and fallback to CPU. - Add configuration options in `SedonaOptions` for GPU spatial join settings including enable/disable, row thresholds, and CPU fallback behavior. - Include comprehensive benchmarks and functional tests for GPU spatial join correctness validation against CPU reference implementations.

…pendencies

c/sedona-libgpuspatial/build.rs

c/sedona-libgpuspatial/CMakeLists.txt

…re/gpu-spatial-join-new

…t/sedona-db into feature/gpu-spatial-join-new

zhangfengcdt · 2026-01-05T23:52:02Z

Will merge the patch to fix the failing example build once this is merged - #486

paleolimbot

Thank you for working on this!

In addition to specific comments, I'm concerned about the proliferation of conditional compilation (partiuclarly within sedona-spatial-join, which is a rather important part of our engine to keep clean).

At a high level what sedona-spatial-join-gpu is doing is more like sedona-spatial-join-extension: it provides a simpler (FFI-friendly) mechanism to inject a join operator without dealing with the DataFusion-y details. I think most of the conditional compilation/dead code/unused/ignore directives could be avoided if we add a CPU join extension and use that for all the tests. The GPU extension itself would then be a runtime implementation detail (eventually loaded at runtime via FFI).

paleolimbot · 2026-01-06T02:43:58Z

rust/sedona-spatial-join-gpu/benches/gpu_spatial_join.rs

+// Helper execution plan that returns a single pre-loaded batch
+struct SingleBatchExec {
+    schema: Arc<Schema>,
+    batch: RecordBatch,
+    props: datafusion::physical_plan::PlanProperties,
+}


This seems very similar to SessionContext::register_batch() and is a lot of lines of code. Do we need this?

paleolimbot · 2026-01-06T02:51:56Z

rust/sedona-spatial-join-gpu/benches/gpu_spatial_join.rs

+/// Generate random points within a bounding box
+fn generate_random_points(count: usize) -> Vec<String> {
+    use rand::Rng;
+    let mut rng = rand::thread_rng();
+    (0..count)
+        .map(|_| {
+            let x: f64 = rng.gen_range(-180.0..180.0);
+            let y: f64 = rng.gen_range(-90.0..90.0);
+            format!("POINT ({} {})", x, y)
+        })
+        .collect()
+}


We have a random geometry generator in sedona-testing (that is used in the non-GPU join tests and elsewhere) that I think we should be using here!

paleolimbot · 2026-01-06T02:55:01Z

rust/sedona-spatial-join-gpu/src/config.rs

+                sedona_libgpuspatial::SpatialPredicate::Intersects,
+            ),
+            device_id: 0,
+            batch_size: 8192,


Should this be Option<usize> so that it can default to the datafusion.batch_size setting?

paleolimbot · 2026-01-06T02:57:01Z

rust/sedona-spatial-join-gpu/src/exec.rs

+        let properties = PlanProperties::new(
+            eq_props,
+            partitioning,
+            EmissionType::Final, // GPU join produces all results at once


Just checking that this is correct (I thought that because one side is streaming the output might be incremental?)

paleolimbot · 2026-01-06T02:58:51Z

rust/sedona-spatial-join-gpu/src/gpu_backend.rs

+/// GPU backend for spatial operations
+#[allow(dead_code)]
+pub struct GpuBackend {
+    device_id: i32,
+    gpu_context: Option<GpuSpatialContext>,
+}
+
+#[allow(dead_code)]
+impl GpuBackend {


Can these dead code markers be removed?

paleolimbot · 2026-01-06T03:15:53Z

rust/sedona-spatial-join-gpu/tests/gpu_functional_test.rs

+    let kernels = scalar_kernels();
+    let sedona_type = SedonaType::Wkb(Edges::Planar, lnglat());
+
+    let _cpu_testers: std::collections::HashMap<&str, ScalarUdfTester> = [


Is there a reason this variable is not used / can we do this using a for loop to avoid this indirection?

paleolimbot · 2026-01-06T03:31:39Z

rust/sedona-spatial-join/src/exec.rs

+
+    #[cfg(feature = "gpu")]
+    #[tokio::test]
+    #[ignore] // Requires GPU hardware


We need to figure out a way to not ignore tests in this repo (in this case I think these tests shouldn't exist if the gpu feature isn't enabled so we shouldn't need the ignore it?)

paleolimbot · 2026-01-06T03:34:17Z

rust/sedona-spatial-join/src/optimizer.rs

+                    SpatialRelationType::Intersects => LibGpuPred::Intersects,
+                    SpatialRelationType::Contains => LibGpuPred::Contains,
+                    SpatialRelationType::Covers => LibGpuPred::Covers,
+                    SpatialRelationType::Within => LibGpuPred::Within,
+                    SpatialRelationType::CoveredBy => LibGpuPred::CoveredBy,
+                    SpatialRelationType::Touches => LibGpuPred::Touches,
+                    SpatialRelationType::Equals => LibGpuPred::Equals,


Can we move SpatialRelationType to sedona-geometry or sedona-common to avoid two copies?

paleolimbot · 2026-01-06T03:43:24Z

c/sedona-s2geography/s2geography

git submodule update --recursive should remove this diff

paleolimbot · 2026-01-06T03:44:12Z

python/sedonadb/Cargo.toml

 default = ["mimalloc"]
 mimalloc = ["dep:mimalloc", "dep:libmimalloc-sys"]
 s2geography = ["sedona/s2geography"]
+gpu = ["sedona/gpu"]


Because we don't have any tests in Python for this feature I suggest leaving this out for now (a follow-up PR could add Python support + a test)

…re/gpu-spatial-join-new

zhangfengcdt added 15 commits December 17, 2025 17:49

Merge upstream/main into feature/gpu-spatial-join-new

262f604

Add missing license files

c025a58

Added spdlog fmt to the vcpkg install command

31b83c4

default build to build release for consistent

d18f52b

simplified the workflow to run a single job instead of 4 identical ones

b7c2e91

add zstd library

0f2e836

Added zstd to c/sedona-libgpuspatial/libgpuspatial/vcpkg.json test de…

ecaae09

…pendencies

exclude gpu build from the rust.yml file

a23369c

exclude other gpu build

597c929

free disk space for rust build and test pipeline

3a3c300

modify cargo toml file to be consistent with other projects

189672a

clean up eprint and print

5261ad9

more cleanups

9a1ff23

restored rust.yml to match the main branch and keep gpu build exclusion

54c5a5b

pwrliang mentioned this pull request Dec 19, 2025

[WIP] feat(sedona-spatial-join-gpu): Implement the GPU-based spatial join #439

Closed

zhangfengcdt marked this pull request as ready for review December 19, 2025 15:52

pwrliang reviewed Dec 19, 2025

View reviewed changes

c/sedona-libgpuspatial/build.rs Outdated Show resolved Hide resolved

c/sedona-libgpuspatial/CMakeLists.txt Show resolved Hide resolved

addre pr comments: spdlogd name and remove unused file

8abb915

zhangfengcdt requested review from Kontinuation, jiayuasu and paleolimbot December 22, 2025 19:42

zhangfengcdt and others added 4 commits January 5, 2026 12:54

Merge branch 'main' into feature/gpu-spatial-join-new

393de67

Merge branch 'main' of https://github.com/apache/sedona-db into featu…

d947560

…re/gpu-spatial-join-new

Merge branch 'feature/gpu-spatial-join-new' of github.com:zhangfengcd…

df76031

…t/sedona-db into feature/gpu-spatial-join-new

fix require comfy-table 7.2+ for set_truncation_indicator method

63850f5

paleolimbot reviewed Jan 6, 2026

View reviewed changes

Merge branch 'main' of https://github.com/apache/sedona-db into featu…

ad6362f

…re/gpu-spatial-join-new

feat(rust/sedona-spatial-join-gpu): Add GPU-accelerated spatial join support #465

Are you sure you want to change the base?

feat(rust/sedona-spatial-join-gpu): Add GPU-accelerated spatial join support #465

Uh oh!

Conversation

zhangfengcdt commented Dec 17, 2025

Uh oh!

Uh oh!

Uh oh!

zhangfengcdt commented Jan 5, 2026

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants