Skip to content

Commit b777498

Browse files
Mike ArtzMaxGekk
authored andcommitted
[SPARK-52619][SQL] Cast TimeType to IntegralType
### What changes were proposed in this pull request? Implementation of [SPARK-5261](https://issues.apache.org/jira/browse/SPARK-52619) which is for casting from `TimeType ` i.e. `TIME` data type to `IntegralType` (`ByteType`, `ShortType`, `IntegerType`, `LongType`) i.e. (`BYTE`, `SHORT`, `INT`, `LONG`) following the SQL standard by flooring `TIME` values to seconds since midnight. #### Changes: - Add `TIME` to integral casting logic in `Cast.scala` with proper overflow handling - Can potentially overflow on both `Short` and `Byte` - Add unit test coverage for `TIME` casting scenarios for both ansi and non-ansi - Update SQL integration tests with TIME casting examples for valid and error cases - Handle fractional seconds with "proper rounding" behavior - which is to truncate **Note:** `TimeType` to `IntegralType` largely follows `TimestampType` to integral casting for a few reasons: 1. Both types are stored as `Long` internally (`TIME`: nanoseconds since midnight, `TIMESTAMP`: microseconds since epoch) 2. Since they are both already stored as Long, there is a conversion to `Long` seconds as an intermediate step before casting to smaller integral types 3. Uses similar overflow checking (i.e. `longValue == longValue.toByte`) to validate conversion safety ### Why are the changes needed? Currently, Spark SQL's`TIME` data type cannot be cast to integral types. The reasons for this functionality are: - Better overall SQL standard compliance - Interoperability/Code Migrations with other database systems that support `TIME` to integral casting - Also to enable simple arithmetic operations by using `TIME` values as integers ### Does this PR introduce _any_ user-facing change? Yes. This PR adds new casting functionality from `TimeType` -> (`Byte`, `Short`, `Int`, `Long`) that previously did not exist. **Before this change:** ```scala spark.sql("SELECT CAST(TIME '10:10:10' AS INT)").show() /* org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.CAST_WITHOUT_SUGGESTION] Cannot resolve "CAST(TIME '10:10:10' AS INT)" due to data type mismatch: cannot cast "TIME(6)" to "INT". SQLSTATE: 42K09; line 1 pos 7; 'Project [unresolvedalias(cast(10:10:10 as int))] */ ``` After this change: ```sql SELECT CAST(TIME '10:10:10' AS INT) /**** Returns: 36610 (seconds since midnight) ****/ SELECT CAST(TIME '00:00:04' AS TINYINT) /**** Returns: 4 ****/ SELECT CAST(TIME '23:59:59' AS TINYINT) /**** Error: CAST_OVERFLOW - SQLSTATE: 22003 - when ansi flag is enabled - (86399 > 127) ****/ /**** Returns: NULL when ansi flag is set to False - (86399 > 127) ****/ Fractional seconds are truncated: SELECT CAST(TIME '00:00:01.7' AS INT) /**** Returns: 1 (floored not rounded up so 1 and not 2) ****/ ``` ### How was this patch tested? 1. Unit tests: Added comprehensive test coverage in CastSuiteBase, CastWithAnsiOnSuite, and CastWithAnsiOffSuite covering: - Basic `TimeType` to `IntegralType` casting - Overflow scenarios for smaller types (`ByteType`, `ShortType`) - Fractional second truncation - `ANSI` mode vs `non-ANSI` mode behavior <img width="1111" height="572" alt="Screenshot 2025-07-11 at 11 15 37 PM" src="https://github.com/user-attachments/assets/ff258b22-8b33-4b4a-9a87-4be1855d5a50" /> <img width="1111" height="572" alt="Screenshot 2025-07-11 at 11 14 28 PM" src="https://github.com/user-attachments/assets/39ab5724-c868-4a67-889e-b064b242c52b" /> 2. SQL integration tests: Updated cast.sql with `TIME` casting examples and expected outputs - updated Golden Files and then ran test successfully ```bash SPARK_GENERATE_GOLDEN_FILES=1 ./build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z cast.sql" ./build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z cast.sql" ``` 3. Manual testing in Spark shell: ```scala val timeData = Seq( LocalTime.of(0, 0, 0), // midnight = 0 seconds LocalTime.of(0, 0, 8), // 8 seconds LocalTime.of(4, 41, 17), // 16,877 seconds LocalTime.of(23, 59, 59) // 86,399 seconds ).toDF("time_col") timeData.select( $"time_col", $"time_col".cast(IntegerType).alias("time_to_int"), $"time_col".cast(ByteType).alias("time_to_byte") ).show() // Results: // +--------+-----------+------------+ // |time_col|time_to_int|time_to_byte| // +--------+-----------+------------+ // |00:00:00| 0| 0| // |00:00:08| 8| 8| // |04:41:17| 16877| NULL| // Overflow for BYTE // |23:59:59| 86399| NULL| // Overflow for BYTE // +--------+-----------+------------+ ``` 4. Fractional second testing manually in spark-shell: ```scala val edgeCases = Seq( LocalTime.of(0, 0, 0, 900000000), // 0.9 seconds -> 0 LocalTime.of(0, 0, 1, 700000000) // 1.7 seconds -> 1 ).toDF("time_col") edgeCases.select($"time_col", $"time_col".cast(IntegerType).alias("as_int")).show() /* +----------+------+ | time_col|as_int| +----------+------+ |00:00:00.9| 0| |00:00:00.3| 0| |00:00:01.7| 1| +----------+------+ */ ``` 5. Overflow error message when there is overflow - manual test in spark-shell ```scala timeData.select( $"time_col", $"time_col".cast(ByteType).alias("time_to_byte") ).show() timeData.select( $"time_col", $"time_col".cast(ByteType).alias("time_to_byte") ).show() /* org.apache.spark.SparkArithmeticException: [CAST_OVERFLOW] The value TIME '04:41:17' of the type "TIME(6)" cannot be cast to "TINYINT" due to an overflow. Use `try_cast` to tolerate overflow and return NULL instead. SQLSTATE: 22003 at org.apache.spark.sql.errors.QueryExecutionErrors$.castingCauseOverflowError(QueryExecutionErrors.scala:87) */ ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #51461 from fartzy/SPARK-51162_Cast_Time_Type_To_Integral_Type. Authored-by: Mike Artz <[email protected]> Signed-off-by: Max Gekk <[email protected]>
1 parent 315c625 commit b777498

File tree

9 files changed

+723
-2
lines changed

9 files changed

+723
-2
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,7 @@ object Cast extends QueryErrorsBase {
135135
case (_, VariantType) => variant.VariantGet.checkDataType(from, allowStructsAndMaps = false)
136136

137137
case (_: TimeType, _: TimeType) => true
138+
case (_: TimeType, _: IntegralType) => true
138139

139140
// non-null variants can generate nulls even in ANSI mode
140141
case (ArrayType(fromType, fn), ArrayType(toType, tn)) =>
@@ -254,6 +255,7 @@ object Cast extends QueryErrorsBase {
254255
case (_, VariantType) => variant.VariantGet.checkDataType(from, allowStructsAndMaps = false)
255256

256257
case (_: TimeType, _: TimeType) => true
258+
case (_: TimeType, _: IntegralType) => true
257259

258260
case (ArrayType(fromType, fn), ArrayType(toType, tn)) =>
259261
canCast(fromType, toType) &&
@@ -370,6 +372,7 @@ object Cast extends QueryErrorsBase {
370372
case (_, _: StringType) => false
371373

372374
case (TimestampType, ByteType | ShortType | IntegerType) => true
375+
case (_: TimeType, ByteType | ShortType) => true
373376
case (FloatType | DoubleType, TimestampType) => true
374377
case (TimestampType, DateType) => false
375378
case (_, DateType) => true
@@ -720,6 +723,9 @@ case class Cast(
720723
private[this] def timestampToDouble(ts: Long): Double = {
721724
ts / MICROS_PER_SECOND.toDouble
722725
}
726+
private[this] def timeToLong(timeNanos: Long): Long = {
727+
Math.floorDiv(timeNanos, NANOS_PER_SECOND)
728+
}
723729

724730
// DateConverter
725731
private[this] def castToDate(from: DataType): Any => Any = from match {
@@ -807,6 +813,8 @@ case class Cast(
807813
buildCast[Int](_, d => null)
808814
case TimestampType =>
809815
buildCast[Long](_, t => timestampToLong(t))
816+
case _: TimeType =>
817+
buildCast[Long](_, t => timeToLong(t))
810818
case x: NumericType if ansiEnabled =>
811819
val exactNumeric = PhysicalNumericType.exactNumeric(x)
812820
b => exactNumeric.toLong(b)
@@ -847,6 +855,15 @@ case class Cast(
847855
errorOrNull(t, from, IntegerType)
848856
}
849857
})
858+
case _: TimeType =>
859+
buildCast[Long](_, t => {
860+
val longValue = timeToLong(t)
861+
if (longValue == longValue.toInt) {
862+
longValue.toInt
863+
} else {
864+
errorOrNull(t, from, IntegerType)
865+
}
866+
})
850867
case x: NumericType if ansiEnabled =>
851868
val exactNumeric = PhysicalNumericType.exactNumeric(x)
852869
b => exactNumeric.toInt(b)
@@ -883,6 +900,15 @@ case class Cast(
883900
errorOrNull(t, from, ShortType)
884901
}
885902
})
903+
case _: TimeType =>
904+
buildCast[Long](_, t => {
905+
val longValue = timeToLong(t)
906+
if (longValue == longValue.toShort) {
907+
longValue.toShort
908+
} else {
909+
errorOrNull(t, from, ShortType)
910+
}
911+
})
886912
case x: NumericType if ansiEnabled =>
887913
val exactNumeric = PhysicalNumericType.exactNumeric(x)
888914
b =>
@@ -930,6 +956,15 @@ case class Cast(
930956
errorOrNull(t, from, ByteType)
931957
}
932958
})
959+
case _: TimeType =>
960+
buildCast[Long](_, t => {
961+
val longValue = timeToLong(t)
962+
if (longValue == longValue.toByte) {
963+
longValue.toByte
964+
} else {
965+
errorOrNull(t, from, ByteType)
966+
}
967+
})
933968
case x: NumericType if ansiEnabled =>
934969
val exactNumeric = PhysicalNumericType.exactNumeric(x)
935970
b =>
@@ -1723,6 +1758,9 @@ case class Cast(
17231758
private[this] def timestampToDoubleCode(ts: ExprValue): Block =
17241759
code"$ts / (double)$MICROS_PER_SECOND"
17251760

1761+
private[this] def timeToLongCode(timeValue: ExprValue): Block =
1762+
code"Math.floorDiv($timeValue, ${NANOS_PER_SECOND}L)"
1763+
17261764
private[this] def castToBooleanCode(
17271765
from: DataType,
17281766
ctx: CodegenContext): CastFunction = from match {
@@ -1782,6 +1820,33 @@ case class Cast(
17821820
"""
17831821
}
17841822

1823+
private[this] def castTimeToIntegralTypeCode(
1824+
ctx: CodegenContext,
1825+
integralType: String,
1826+
from: DataType,
1827+
to: DataType): CastFunction = {
1828+
1829+
val longValue = ctx.freshName("longValue")
1830+
val fromDt = ctx.addReferenceObj("from", from, from.getClass.getName)
1831+
val toDt = ctx.addReferenceObj("to", to, to.getClass.getName)
1832+
1833+
(c, evPrim, evNull) =>
1834+
val overflow = if (ansiEnabled) {
1835+
code"""throw QueryExecutionErrors.castingCauseOverflowError($c, $fromDt, $toDt);"""
1836+
} else {
1837+
code"$evNull = true;"
1838+
}
1839+
1840+
code"""
1841+
long $longValue = ${timeToLongCode(c)};
1842+
if ($longValue == ($integralType) $longValue) {
1843+
$evPrim = ($integralType) $longValue;
1844+
} else {
1845+
$overflow
1846+
}
1847+
"""
1848+
}
1849+
17851850
private[this] def castDayTimeIntervalToIntegralTypeCode(
17861851
startField: Byte,
17871852
endField: Byte,
@@ -1888,6 +1953,7 @@ case class Cast(
18881953
case DateType =>
18891954
(c, evPrim, evNull) => code"$evNull = true;"
18901955
case TimestampType => castTimestampToIntegralTypeCode(ctx, "byte", from, ByteType)
1956+
case _: TimeType => castTimeToIntegralTypeCode(ctx, "byte", from, ByteType)
18911957
case DecimalType() => castDecimalToIntegralTypeCode("byte")
18921958
case ShortType | IntegerType | LongType if ansiEnabled =>
18931959
castIntegralTypeToIntegralTypeExactCode(ctx, "byte", from, ByteType)
@@ -1925,6 +1991,7 @@ case class Cast(
19251991
case DateType =>
19261992
(c, evPrim, evNull) => code"$evNull = true;"
19271993
case TimestampType => castTimestampToIntegralTypeCode(ctx, "short", from, ShortType)
1994+
case _: TimeType => castTimeToIntegralTypeCode(ctx, "short", from, ShortType)
19281995
case DecimalType() => castDecimalToIntegralTypeCode("short")
19291996
case IntegerType | LongType if ansiEnabled =>
19301997
castIntegralTypeToIntegralTypeExactCode(ctx, "short", from, ShortType)
@@ -1960,6 +2027,7 @@ case class Cast(
19602027
case DateType =>
19612028
(c, evPrim, evNull) => code"$evNull = true;"
19622029
case TimestampType => castTimestampToIntegralTypeCode(ctx, "int", from, IntegerType)
2030+
case _: TimeType => castTimeToIntegralTypeCode(ctx, "int", from, IntegerType)
19632031
case DecimalType() => castDecimalToIntegralTypeCode("int")
19642032
case LongType if ansiEnabled =>
19652033
castIntegralTypeToIntegralTypeExactCode(ctx, "int", from, IntegerType)
@@ -1996,6 +2064,8 @@ case class Cast(
19962064
(c, evPrim, evNull) => code"$evNull = true;"
19972065
case TimestampType =>
19982066
(c, evPrim, evNull) => code"$evPrim = (long) ${timestampToLongCode(c)};"
2067+
case _: TimeType =>
2068+
(c, evPrim, evNull) => code"$evPrim = (long) ${timeToLongCode(c)};"
19992069
case DecimalType() => castDecimalToIntegralTypeCode("long")
20002070
case FloatType | DoubleType if ansiEnabled =>
20012071
castFractionToIntegralTypeCode(ctx, "long", from, LongType)

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1507,4 +1507,46 @@ abstract class CastSuiteBase extends SparkFunSuite with ExpressionEvalHelper {
15071507
}
15081508
}
15091509
}
1510+
test("SPARK-52619: cast time to integral types") {
1511+
// Test normal cases that should work with a small number like 112 seconds after midnight
1512+
val smallTime = Literal.create(LocalTime.of(0, 1, 52), TimeType(6))
1513+
checkEvaluation(cast(smallTime, ByteType), 112.toByte)
1514+
checkEvaluation(cast(smallTime, ShortType), 112.toShort)
1515+
checkEvaluation(cast(smallTime, IntegerType), 112)
1516+
checkEvaluation(cast(smallTime, LongType), 112L)
1517+
1518+
// Test midnight to all integral types
1519+
val midnight = Literal.create(LocalTime.MIDNIGHT, TimeType(6))
1520+
checkEvaluation(cast(midnight, ByteType), 0.toByte)
1521+
checkEvaluation(cast(midnight, ShortType), 0.toShort)
1522+
checkEvaluation(cast(midnight, IntegerType), 0)
1523+
checkEvaluation(cast(midnight, LongType), 0L)
1524+
1525+
// Precision rounding/truncation tests with fractional seconds
1526+
val time0 = Literal.create(LocalTime.NOON, TimeType(0))
1527+
val time2 = Literal.create(LocalTime.of(12, 0, 0, 120000000), TimeType(2))
1528+
val time4 = Literal.create(LocalTime.of(12, 0, 0, 345600000), TimeType(4))
1529+
val oneTwoThreeTime5 = Literal.create(LocalTime.of(1, 2, 3, 555550000), TimeType(5))
1530+
val maxTime4 = Literal.create(LocalTime.of(23, 59, 59, 999900000), TimeType(4))
1531+
val fractional5 = Literal.create(LocalTime.of(0, 0, 17, 500000000), TimeType(1))
1532+
val fractional000001 = Literal.create(LocalTime.of(0, 0, 17, 1000), TimeType(6))
1533+
val fractional999999 = Literal.create(LocalTime.of(0, 0, 17, 999999000), TimeType(6))
1534+
val fractional6 = Literal.create(LocalTime.of(0, 0, 17, 600000000), TimeType(1))
1535+
val fractional4 = Literal.create(LocalTime.of(0, 0, 17, 400000000), TimeType(1))
1536+
val fractional555 = Literal.create(LocalTime.of(0, 0, 17, 555000000), TimeType(3))
1537+
checkEvaluation(cast(fractional5, IntegerType), 17)
1538+
checkEvaluation(cast(fractional5, LongType), 17L)
1539+
checkEvaluation(cast(fractional000001, IntegerType), 17)
1540+
checkEvaluation(cast(fractional999999, IntegerType), 17)
1541+
checkEvaluation(cast(fractional6, IntegerType), 17)
1542+
checkEvaluation(cast(fractional4, IntegerType), 17)
1543+
checkEvaluation(cast(fractional555, IntegerType), 17)
1544+
checkEvaluation(cast(time0, IntegerType), 43200)
1545+
checkEvaluation(cast(time2, IntegerType), 43200)
1546+
checkEvaluation(cast(time4, IntegerType), 43200)
1547+
checkEvaluation(cast(oneTwoThreeTime5, IntegerType), 3723)
1548+
checkEvaluation(cast(oneTwoThreeTime5, LongType), 3723L)
1549+
checkEvaluation(cast(maxTime4, IntegerType), 86399)
1550+
checkEvaluation(cast(maxTime4, LongType), 86399L)
1551+
}
15101552
}

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOffSuite.scala

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
package org.apache.spark.sql.catalyst.expressions
1919

2020
import java.sql.{Date, Timestamp}
21-
import java.time.{Duration, Period}
21+
import java.time.{Duration, LocalTime, Period}
2222
import java.time.temporal.ChronoUnit
2323

2424
import org.apache.spark.sql.Row
@@ -907,4 +907,20 @@ class CastWithAnsiOffSuite extends CastSuiteBase {
907907
checkEvaluation(cast(invalidInput, TimeType()), null)
908908
}
909909
}
910+
911+
test("SPARK-52619: cast time to integral types with overflow with ansi off") {
912+
// Create a time that will overflow Byte and Short: 23:59:59 = 86399 seconds
913+
val largeTime6 = Literal.create(LocalTime.of(23, 59, 59, 123456000), TimeType(6))
914+
val largeTime1 = Literal.create(LocalTime.of(23, 59, 59, 100000000), TimeType(1))
915+
916+
// Long and Int should work (86399 fits in both)
917+
// Short and Byte should overflow and return null (non-ANSI mode)
918+
// 86399 > Short.MaxValue (32767) and > Byte.MaxValue (127)
919+
checkEvaluation(cast(largeTime6, LongType), 86399L)
920+
checkEvaluation(cast(largeTime6, IntegerType), 86399)
921+
checkEvaluation(cast(largeTime6, ShortType), null)
922+
checkEvaluation(cast(largeTime6, ByteType), null)
923+
checkEvaluation(cast(largeTime1, ShortType), null)
924+
checkEvaluation(cast(largeTime1, ByteType), null)
925+
}
910926
}

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOnSuite.scala

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
package org.apache.spark.sql.catalyst.expressions
1919

2020
import java.sql.Timestamp
21-
import java.time.DateTimeException
21+
import java.time.{DateTimeException, LocalTime}
2222

2323
import org.apache.spark.{SparkArithmeticException, SparkRuntimeException}
2424
import org.apache.spark.sql.Row
@@ -797,4 +797,30 @@ class CastWithAnsiOnSuite extends CastSuiteBase with QueryErrorsBase {
797797
castErrMsg(invalidInput, TimeType()))
798798
}
799799
}
800+
801+
test("SPARK-52619: cast time to integral types with overflow with ansi on") {
802+
// Test overflow cases: 23:59:59 = 86399 seconds
803+
val largeTime6 = Literal.create(LocalTime.of(23, 59, 59, 123456000), TimeType(6))
804+
val largeTime4 = Literal.create(LocalTime.of(23, 59, 59, 678900000), TimeType(4))
805+
806+
// Short and Byte should overflow and throw ArithmeticException (ANSI mode)
807+
// 86399 > Short.MaxValue (32767) and > Byte.MaxValue (127)
808+
Seq(
809+
(largeTime6, ShortType),
810+
(largeTime6, ByteType),
811+
(largeTime4, ShortType),
812+
(largeTime4, ByteType)
813+
).foreach { case (timeValue, targetType) =>
814+
checkErrorInExpression[SparkArithmeticException](
815+
cast(timeValue, targetType),
816+
"CAST_OVERFLOW",
817+
Map(
818+
"value" -> s"TIME '${timeValue.toString}'",
819+
"sourceType" -> s"\"${timeValue.dataType.sql}\"",
820+
"targetType" -> s"\"${targetType.sql}\"",
821+
"ansiConfig" -> "\"spark.sql.ansi.enabled\""
822+
)
823+
)
824+
}
825+
}
800826
}

0 commit comments

Comments
 (0)