[SPARK-53656][SS] Refactor MemoryStream to use SparkSession instead of SQLContext #52402

ganeshashree · 2025-09-22T03:24:16Z

What changes were proposed in this pull request?

Refactor MemoryStream to use SparkSession instead of SQLContext.

Why are the changes needed?

SQLContext is deprecated in newer versions of Spark.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Verified that the affected tests are passing successfully.

Was this patch authored or co-authored using generative AI tooling?

No

...src/main/scala/org/apache/spark/sql/execution/streaming/sources/ContinuousMemoryStream.scala

...org/apache/spark/sql/execution/streaming/AsyncProgressTrackingMicroBatchExecutionSuite.scala

sql/pipelines/src/test/scala/org/apache/spark/sql/pipelines/graph/MaterializeTablesSuite.scala

cloud-fan · 2025-09-22T03:41:33Z

...lines/src/test/scala/org/apache/spark/sql/pipelines/graph/TriggeredGraphExecutionSuite.scala


  test("three hop pipeline") {
    val session = spark
+    implicit val sparkSession: SparkSession = spark


where was the previous implicit SQLContext defined?

It seems like it was getting implicit sqlContext defined in SharedSparkSession. Explicitly defining the implicit SparkSession is required because the existing implicit SparkSession was assigned to a non-implicit session variable, and it couldn't locate the implicit SparkSession within the anonymous block.

cloud-fan · 2025-09-22T03:41:46Z

cc @HeartSaVioR

HeartSaVioR · 2025-09-22T08:46:19Z

@ganeshashree
Thanks for the proposal. The change looks OK to me.

Have we checked the warn (build/log) message when we use SQLContext here? If we weren't providing the message to migrate easily, it might be beneficial to defer replacement of apply() and have intermediate migration step (deprecation of the existing methods and removal of them in Spark 5.0.0).

ganeshashree · 2025-09-29T04:50:45Z

@ganeshashree Thanks for the proposal. The change looks OK to me.

Have we checked the warn (build/log) message when we use SQLContext here? If we weren't providing the message to migrate easily, it might be beneficial to defer replacement of apply() and have intermediate migration step (deprecation of the existing methods and removal of them in Spark 5.0.0).

@HeartSaVioR Thanks for reviewing. Currently, no warning appears in the build log when we use SQLContext. Creating two versions of MemoryStream.apply for SparkSession and SQLContext and showing a warning for SQLContext would require resolving ambiguity when both sparkSession and sqlContext are set as implicit variables. Since this is an internal API, please review whether it's acceptable to make this change and update the callers to use MemoryStream with an implicit SparkSession instead of SQLContext, where applicable. I'm exploring further to resolve the ambiguity by preferring SparkSession instead of SQLContext.

ganeshashree · 2025-10-06T04:24:57Z

@ganeshashree Thanks for the proposal. The change looks OK to me.
Have we checked the warn (build/log) message when we use SQLContext here? If we weren't providing the message to migrate easily, it might be beneficial to defer replacement of apply() and have intermediate migration step (deprecation of the existing methods and removal of them in Spark 5.0.0).

@HeartSaVioR Thanks for reviewing. Currently, no warning appears in the build log when we use SQLContext. Creating two versions of MemoryStream.apply for SparkSession and SQLContext and showing a warning for SQLContext would require resolving ambiguity when both sparkSession and sqlContext are set as implicit variables. Since this is an internal API, please review whether it's acceptable to make this change and update the callers to use MemoryStream with an implicit SparkSession instead of SQLContext, where applicable. I'm exploring further to resolve the ambiguity by preferring SparkSession instead of SQLContext.

Made changes to support two versions of MemoryStream.apply for SparkSession and SQLContext, with a warning for SQLContext, and also addressed ambiguity when both sparkSession and sqlContext are set as implicit variables by defining a low-priority trait.

cloud-fan · 2025-10-09T03:41:58Z

...src/main/scala/org/apache/spark/sql/execution/streaming/sources/ContinuousMemoryStream.scala

  override def commit(end: Offset): Unit = {}
 }

 object ContinuousMemoryStream {


shall we do the same low priority implicit trick here?

cloud-fan · 2025-10-09T03:44:34Z

@HeartSaVioR do you have any other concerns with this change?

HeartSaVioR

+1

Could you please resolve the conflict?

…f SQLContext

ganeshashree · 2025-10-20T04:11:41Z

Could you please resolve the conflict?

Done.

HeartSaVioR

+1 pending CI

HeartSaVioR · 2025-10-20T06:43:31Z

Thanks! Merging to master.

manuzhang · 2025-10-27T08:03:44Z

Note this is not an internal API as Iceberg uses it in tests. Of course, we can easily change it at the caller side.

https://github.com/apache/iceberg/blob/68e555b94f4706a2af41dcb561c84007230c0bc1/spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/source/TestForwardCompatibility.java#L222-L224

HeartSaVioR · 2025-10-27T08:25:08Z

I understand it's a "hard-to-understand" protocol, but historically, Apache Spark project considers the classes not documented in Scala/Java/Python doc as non-public API. I'm not a part of the discussion/decision, but IIUC there is a protocol with it.

cloud-fan · 2025-10-27T09:41:23Z

@manuzhang we didn't remove the old method, how does it break iceberg tests?

manuzhang · 2025-10-27T12:03:48Z

@cloud-fan which old method do you mean? The constructor has changed and that's breaking for Java code.

ganeshashree · 2025-10-27T12:28:44Z

@cloud-fan which old method do you mean? The constructor has changed and that's breaking for Java code.

@manuzhang Thanks for reporting this. The current changes are backward compatible with Scala but not with Java. I see that two tests in Iceberg 4.0 are breaking due to this change. Is it fine to modify the tests to use SparkSession instead of sqlContext? Please let me know if you rely on the old version of the constructor that takes sqlContext as a parameter. I can consider making this change backward compatible with Java as well. However, since sqlContext is deprecated, it is best practice to use the new version of the constructor and pass sparkSession as a parameter.

manuzhang · 2025-10-27T12:51:38Z

@ganeshashree Yes, I've already made the change in 4.1.0 support and the tests passed for 4.1.0-preview3(RC1). I just want to call out this should not be considered an internal API, especially for downstream Java projects.

cloud-fan · 2025-10-28T09:06:33Z

We have a clear definition of public APIs: the APIs listed in the public doc such as https://spark.apache.org/docs/latest/api/scala/org/apache/spark/index.html are public.

Spark does not use modifiers like private[sql] to hide all the internal APIs, so that Spark plugins can do powerful things easily without using reflection. But it does not mean Spark guarantees backward compatibility for all these compile-time-public internal APIs, which is just not possible. Spark plugins are responsible for updating their code to catch up with Spark's internal APIs changes.

And this is also case by case. If an internal API is unfortunately widely used by many Spark plugins, Spark should try its best to keep backward compatibility.

…f SQLContext ### What changes were proposed in this pull request? Refactor MemoryStream to use SparkSession instead of SQLContext. ### Why are the changes needed? SQLContext is deprecated in newer versions of Spark. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Verified that the affected tests are passing successfully. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#52402 from ganeshashree/SPARK-53656. Authored-by: Ganesha S <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]>

github-actions bot added SQL STRUCTURED STREAMING labels Sep 22, 2025

cloud-fan reviewed Sep 22, 2025

View reviewed changes

...src/main/scala/org/apache/spark/sql/execution/streaming/sources/ContinuousMemoryStream.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Sep 22, 2025

View reviewed changes

...org/apache/spark/sql/execution/streaming/AsyncProgressTrackingMicroBatchExecutionSuite.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Sep 22, 2025

View reviewed changes

sql/pipelines/src/test/scala/org/apache/spark/sql/pipelines/graph/MaterializeTablesSuite.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Sep 22, 2025

View reviewed changes

ganeshashree force-pushed the SPARK-53656 branch from 215f484 to 8dca5fc Compare September 22, 2025 06:43

HeartSaVioR changed the title ~~[SPARK-53656][SQL] Refactor MemoryStream to use SparkSession instead of SQLContext~~ [SPARK-53656][SS] Refactor MemoryStream to use SparkSession instead of SQLContext Sep 22, 2025

ganeshashree force-pushed the SPARK-53656 branch 2 times, most recently from 7a8de69 to ae36d87 Compare October 5, 2025 15:56

cloud-fan reviewed Oct 9, 2025

View reviewed changes

cloud-fan approved these changes Oct 9, 2025

View reviewed changes

HeartSaVioR approved these changes Oct 19, 2025

View reviewed changes

[SPARK-53656][SS] Refactor MemoryStream to use SparkSession instead o…

53bb98a

…f SQLContext

ganeshashree force-pushed the SPARK-53656 branch from ae36d87 to 53bb98a Compare October 20, 2025 04:09

HeartSaVioR approved these changes Oct 20, 2025

View reviewed changes

HeartSaVioR closed this in 5ae573f Oct 20, 2025

[SPARK-53656][SS] Refactor MemoryStream to use SparkSession instead of SQLContext #52402

[SPARK-53656][SS] Refactor MemoryStream to use SparkSession instead of SQLContext #52402

Uh oh!

Conversation

ganeshashree commented Sep 22, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cloud-fan Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

ganeshashree Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Sep 22, 2025

Uh oh!

HeartSaVioR commented Sep 22, 2025

Uh oh!

ganeshashree commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ganeshashree commented Oct 6, 2025

Uh oh!

cloud-fan Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

ganeshashree Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Oct 9, 2025

Uh oh!

HeartSaVioR left a comment

Choose a reason for hiding this comment

Uh oh!

ganeshashree commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HeartSaVioR left a comment

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Oct 20, 2025

Uh oh!

manuzhang commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HeartSaVioR commented Oct 27, 2025

Uh oh!

cloud-fan commented Oct 27, 2025

Uh oh!

manuzhang commented Oct 27, 2025

Uh oh!

ganeshashree commented Oct 27, 2025

Uh oh!

manuzhang commented Oct 27, 2025

Uh oh!

cloud-fan commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ganeshashree Sep 22, 2025 •

edited

Loading

ganeshashree commented Sep 29, 2025 •

edited

Loading

ganeshashree commented Oct 20, 2025 •

edited

Loading

manuzhang commented Oct 27, 2025 •

edited

Loading

cloud-fan commented Oct 28, 2025 •

edited

Loading