[SPARK-54560][SQL] Safe type casting in QueryPlan._subqueries
#53272
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Safe type casting in
QueryPlan._subqueriesWhy simpling removing
e.plan.asInstanceOf[PlanType]is not enoughlooking at the
foreachWithSubqueriesAPI belowPlanExpression’s type parameter is erased at runtime. When we matched onthe JVM only checked that the object was a
PlanExpression; it did not verify that its plan was really aPlanType(e.g.SparkPlan). Because of@unchecked, the compiler suppressed the exhaustivity/type warning and happily treated the result as aPlanType. Later, whenforeachWithSubqueriesinvoked the lambdaf: SparkPlan => Unit, the JVM inserted a cast toSparkPlan, and we will still end upClassCastExceptionif the actual object is a logical plan.Why are the changes needed?
QueryPlan._subqueriesis dangerous because it force type casting (code pointer)Imagine a
SparkPlaninstance invoke this API where some of its subqueries could beLogicalPlan(this could happen in AQE where logical->phyiscal planning happen respectively in main/sub queries and they could be out-of-sync at a specific point.Reasoning why we dont' need a new API
Although this API is at critical path of the whole Spark SQL, there is no need to create a separate API since if we run into this class cast error the whole query will just fail so it's always better to fix the issue and no need to preserve the "failure" buggy behavior.
Does this PR introduce any user-facing change?
NO
How was this patch tested?
Existing UTs.
Was this patch authored or co-authored using generative AI tooling?
NO