-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-53656][SS] Refactor MemoryStream to use SparkSession instead of SQLContext #52402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...src/main/scala/org/apache/spark/sql/execution/streaming/sources/ContinuousMemoryStream.scala
Outdated
Show resolved
Hide resolved
...org/apache/spark/sql/execution/streaming/AsyncProgressTrackingMicroBatchExecutionSuite.scala
Outdated
Show resolved
Hide resolved
sql/pipelines/src/test/scala/org/apache/spark/sql/pipelines/graph/MaterializeTablesSuite.scala
Outdated
Show resolved
Hide resolved
|
|
||
| test("three hop pipeline") { | ||
| val session = spark | ||
| implicit val sparkSession: SparkSession = spark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where was the previous implicit SQLContext defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it was getting implicit sqlContext defined in SharedSparkSession. Explicitly defining the implicit SparkSession is required because the existing implicit SparkSession was assigned to a non-implicit session variable, and it couldn't locate the implicit SparkSession within the anonymous block.
|
cc @HeartSaVioR |
215f484 to
8dca5fc
Compare
|
@ganeshashree Have we checked the warn (build/log) message when we use SQLContext here? If we weren't providing the message to migrate easily, it might be beneficial to defer replacement of apply() and have intermediate migration step (deprecation of the existing methods and removal of them in Spark 5.0.0). |
@HeartSaVioR Thanks for reviewing. Currently, no warning appears in the build log when we use SQLContext. Creating two versions of |
7a8de69 to
ae36d87
Compare
Made changes to support two versions of |
| override def commit(end: Offset): Unit = {} | ||
| } | ||
|
|
||
| object ContinuousMemoryStream { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we do the same low priority implicit trick here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
@HeartSaVioR do you have any other concerns with this change? |
HeartSaVioR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Could you please resolve the conflict?
ae36d87 to
53bb98a
Compare
Done. |
HeartSaVioR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 pending CI
|
Thanks! Merging to master. |
|
Note this is not an internal API as Iceberg uses it in tests. Of course, we can easily change it at the caller side. |
|
I understand it's a "hard-to-understand" protocol, but historically, Apache Spark project considers the classes not documented in Scala/Java/Python doc as non-public API. I'm not a part of the discussion/decision, but IIUC there is a protocol with it. |
|
@manuzhang we didn't remove the old method, how does it break iceberg tests? |
|
@cloud-fan which old method do you mean? The constructor has changed and that's breaking for Java code. |
@manuzhang Thanks for reporting this. The current changes are backward compatible with Scala but not with Java. I see that two tests in Iceberg 4.0 are breaking due to this change. Is it fine to modify the tests to use SparkSession instead of sqlContext? Please let me know if you rely on the old version of the constructor that takes sqlContext as a parameter. I can consider making this change backward compatible with Java as well. However, since sqlContext is deprecated, it is best practice to use the new version of the constructor and pass sparkSession as a parameter. |
|
@ganeshashree Yes, I've already made the change in 4.1.0 support and the tests passed for 4.1.0-preview3(RC1). I just want to call out this should not be considered an internal API, especially for downstream Java projects. |
|
We have a clear definition of public APIs: the APIs listed in the public doc such as https://spark.apache.org/docs/latest/api/scala/org/apache/spark/index.html are public. Spark does not use modifiers like And this is also case by case. If an internal API is unfortunately widely used by many Spark plugins, Spark should try its best to keep backward compatibility. |
…f SQLContext ### What changes were proposed in this pull request? Refactor MemoryStream to use SparkSession instead of SQLContext. ### Why are the changes needed? SQLContext is deprecated in newer versions of Spark. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Verified that the affected tests are passing successfully. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#52402 from ganeshashree/SPARK-53656. Authored-by: Ganesha S <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]>
What changes were proposed in this pull request?
Refactor MemoryStream to use SparkSession instead of SQLContext.
Why are the changes needed?
SQLContext is deprecated in newer versions of Spark.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Verified that the affected tests are passing successfully.
Was this patch authored or co-authored using generative AI tooling?
No