[SPARK-54420][SS] Introduce StatePartitionWriter for Offline Repartitioning #53287

zifeif2 · 2025-12-02T18:11:22Z

What changes were proposed in this pull request?

Introducing StatePartitionAllColumnFamiliesWriter as part of the offline repartition project. In this PR, we only support a single-column-family operator.

This writer takes the repartitioned DataFrame returned from StatePartitionAllColumnFamiliesReader and writes it to a new version in the state store. See the comments for the DataFrame schema. In addition, this writer does not load previous state (since we are overwriting the state with the repartitioned data), and when committing, it will always commit a snapshot.

Major Changes

Introduce StatePartitionAllColumnFamiliesWriter
Introduce a new parameter loadEmpty for StateStoreProvider.getStore()
Introduce a new function loadEmpty for RocksDB

Why are the changes needed?

This will be used in offline repartitioning to allow OfflineRepartitioningRunner to directly write data to state store

Does this PR introduce any user-facing change?

No

How was this patch tested?

Integration tests in sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/state/StatePartitionAllColumnFamiliesWriterSuite.scala
Unit tests in sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala

Was this patch authored or co-authored using generative AI tooling?

Yes. Sonnet 4.5

Ubuntu and others added 15 commits December 1, 2025 22:33

scan simple operator state

ac4bd31

add test and support for HDFS

f14e024

remove unused code

0cd1330

address comment

f6e15ed

refactor test

158c846

add more test

2129dcb

address comment

251f306

small changes

fa776e1

fix small issue

63e0753

address commenet

aee5732

get keySchema from stateStoreColFamilySchemaOpt

48521c3

address comment

6003f54

initial commit

24a40f9

add one more test

a81b4c1

add test + do not load prev stores

4a48212

github-actions bot added SQL STRUCTURED STREAMING labels Dec 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54420][SS] Introduce StatePartitionWriter for Offline Repartitioning #53287

[SPARK-54420][SS] Introduce StatePartitionWriter for Offline Repartitioning #53287

zifeif2 commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[SPARK-54420][SS] Introduce StatePartitionWriter for Offline Repartitioning #53287

Are you sure you want to change the base?

[SPARK-54420][SS] Introduce StatePartitionWriter for Offline Repartitioning #53287

Conversation

zifeif2 commented Dec 2, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant