You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### What changes were proposed in this pull request?
Implementation of [SPARK-5261](https://issues.apache.org/jira/browse/SPARK-52619) which is for casting from `TimeType ` i.e. `TIME` data type to `IntegralType` (`ByteType`, `ShortType`, `IntegerType`, `LongType`) i.e. (`BYTE`, `SHORT`, `INT`, `LONG`) following the SQL standard by flooring `TIME` values to seconds since midnight.
#### Changes:
- Add `TIME` to integral casting logic in `Cast.scala` with proper overflow handling
- Can potentially overflow on both `Short` and `Byte`
- Add unit test coverage for `TIME` casting scenarios for both ansi and non-ansi
- Update SQL integration tests with TIME casting examples for valid and error cases
- Handle fractional seconds with "proper rounding" behavior - which is to truncate
**Note:** `TimeType` to `IntegralType` largely follows `TimestampType` to integral casting for a few reasons:
1. Both types are stored as `Long` internally (`TIME`: nanoseconds since midnight, `TIMESTAMP`: microseconds since epoch)
2. Since they are both already stored as Long, there is a conversion to `Long` seconds as an intermediate step before casting to smaller integral types
3. Uses similar overflow checking (i.e. `longValue == longValue.toByte`) to validate conversion safety
### Why are the changes needed?
Currently, Spark SQL's`TIME` data type cannot be cast to integral types. The reasons for this functionality are:
- Better overall SQL standard compliance
- Interoperability/Code Migrations with other database systems that support `TIME` to integral casting
- Also to enable simple arithmetic operations by using `TIME` values as integers
### Does this PR introduce _any_ user-facing change?
Yes. This PR adds new casting functionality from `TimeType` -> (`Byte`, `Short`, `Int`, `Long`) that previously did not exist.
**Before this change:**
```scala
spark.sql("SELECT CAST(TIME '10:10:10' AS INT)").show()
/*
org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.CAST_WITHOUT_SUGGESTION] Cannot resolve "CAST(TIME '10:10:10' AS INT)" due to data type mismatch: cannot cast "TIME(6)" to "INT". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(cast(10:10:10 as int))]
*/
```
After this change:
```sql
SELECT CAST(TIME '10:10:10' AS INT)
/**** Returns: 36610 (seconds since midnight) ****/
SELECT CAST(TIME '00:00:04' AS TINYINT)
/**** Returns: 4 ****/
SELECT CAST(TIME '23:59:59' AS TINYINT)
/**** Error: CAST_OVERFLOW - SQLSTATE: 22003 - when ansi flag is enabled - (86399 > 127) ****/
/**** Returns: NULL when ansi flag is set to False - (86399 > 127) ****/
Fractional seconds are truncated:
SELECT CAST(TIME '00:00:01.7' AS INT)
/**** Returns: 1 (floored not rounded up so 1 and not 2) ****/
```
### How was this patch tested?
1. Unit tests: Added comprehensive test coverage in CastSuiteBase, CastWithAnsiOnSuite, and CastWithAnsiOffSuite covering:
- Basic `TimeType` to `IntegralType` casting
- Overflow scenarios for smaller types (`ByteType`, `ShortType`)
- Fractional second truncation
- `ANSI` mode vs `non-ANSI` mode behavior
<img width="1111" height="572" alt="Screenshot 2025-07-11 at 11 15 37 PM" src="https://github.com/user-attachments/assets/ff258b22-8b33-4b4a-9a87-4be1855d5a50" />
<img width="1111" height="572" alt="Screenshot 2025-07-11 at 11 14 28 PM" src="https://github.com/user-attachments/assets/39ab5724-c868-4a67-889e-b064b242c52b" />
2. SQL integration tests: Updated cast.sql with `TIME` casting examples and expected outputs - updated Golden Files and then ran test successfully
```bash
SPARK_GENERATE_GOLDEN_FILES=1 ./build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z cast.sql"
./build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z cast.sql"
```
3. Manual testing in Spark shell:
```scala
val timeData = Seq(
LocalTime.of(0, 0, 0), // midnight = 0 seconds
LocalTime.of(0, 0, 8), // 8 seconds
LocalTime.of(4, 41, 17), // 16,877 seconds
LocalTime.of(23, 59, 59) // 86,399 seconds
).toDF("time_col")
timeData.select(
$"time_col",
$"time_col".cast(IntegerType).alias("time_to_int"),
$"time_col".cast(ByteType).alias("time_to_byte")
).show()
// Results:
// +--------+-----------+------------+
// |time_col|time_to_int|time_to_byte|
// +--------+-----------+------------+
// |00:00:00| 0| 0|
// |00:00:08| 8| 8|
// |04:41:17| 16877| NULL| // Overflow for BYTE
// |23:59:59| 86399| NULL| // Overflow for BYTE
// +--------+-----------+------------+
```
4. Fractional second testing manually in spark-shell:
```scala
val edgeCases = Seq(
LocalTime.of(0, 0, 0, 900000000), // 0.9 seconds -> 0
LocalTime.of(0, 0, 1, 700000000) // 1.7 seconds -> 1
).toDF("time_col")
edgeCases.select($"time_col", $"time_col".cast(IntegerType).alias("as_int")).show()
/*
+----------+------+
| time_col|as_int|
+----------+------+
|00:00:00.9| 0|
|00:00:00.3| 0|
|00:00:01.7| 1|
+----------+------+
*/
```
5. Overflow error message when there is overflow - manual test in spark-shell
```scala
timeData.select( $"time_col",
$"time_col".cast(ByteType).alias("time_to_byte")
).show()
timeData.select( $"time_col",
$"time_col".cast(ByteType).alias("time_to_byte")
).show()
/*
org.apache.spark.SparkArithmeticException: [CAST_OVERFLOW] The value TIME '04:41:17' of the type "TIME(6)" cannot be cast to "TINYINT" due to an overflow. Use `try_cast` to tolerate overflow and return NULL instead. SQLSTATE: 22003
at org.apache.spark.sql.errors.QueryExecutionErrors$.castingCauseOverflowError(QueryExecutionErrors.scala:87)
*/
```
### Was this patch authored or co-authored using generative AI tooling?
No
Closes#51461 from fartzy/SPARK-51162_Cast_Time_Type_To_Integral_Type.
Authored-by: Mike Artz <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
0 commit comments