[SPARK-53001][TESTS] Fix test_df_unpivot to pass with Spark 4.0

sarutak · sarutak · commit ca629aa34fb4 · 2025-08-12T11:50:00.000+09:00
# Description This PR proposes to fix the issue that `test_df_unpivot` doesn't pass with Spark 4.0. ``` ---- dataframe::tests::test_df_unpivot stdout ---- SparkSession Setup thread 'dataframe::tests::test_df_unpivot' panicked at crates/connect/src/dataframe.rs:2463:9: assertion `left == right` failed left: RecordBatch { schema: Schema { fields: [Field { name: "id", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "var", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "val", data_type: Float32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [PrimitiveArray<Int64> [ 1, 1, 2, 2, ], StringArray [ "int", "float", "int", "float", ], PrimitiveArray<Float32> [ 11.0, 1.1, 12.0, 1.2, ]], row_count: 4 } right: RecordBatch { schema: Schema { fields: [Field { name: "id", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "var", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "val", data_type: Float64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [PrimitiveArray<Int64> [ 1, 1, 2, 2, ], StringArray [ "int", "float", "int", "float", ], PrimitiveArray<Float64> [ 11.0, 1.100000023841858, 12.0, 1.2000000476837158, ]], row_count: 4 } ``` As of Spark 4.0, ANSI mode is enabled by default but this test doesn't consider it. To fix this issue, I tweaked the test so that the test pass in both ANSI mode is enabled or not. ## Related Issue(s) SPARK-53001 Closes #4 from sarutak/fix-unpivot-test. Authored-by: Kousuke Saruta <sarutak@amazon.co.jp> Signed-off-by: Kousuke Saruta <sarutak@amazon.co.jp>
diff --git a/crates/connect/src/dataframe.rs b/crates/connect/src/dataframe.rs
@@ -2445,12 +2445,14 @@ mod tests {
 
         let df = spark.create_dataframe(&data)?;
 
-        let df = df.unpivot(
-            [col("id")],
-            Some(vec![col("int"), col("float")]),
-            "var",
-            "val",
-        );
+        let df = df
+            .unpivot(
+                [col("id")],
+                Some(vec![col("int"), col("float")]),
+                "var",
+                "val",
+            )
+            .select(vec![col("id"), col("var"), col("val").cast("float")]);
 
         let res = df.collect().await?;