Skip to content

Conversation

@codetyri0n
Copy link
Contributor

@codetyri0n codetyri0n commented Oct 20, 2025

Which issue does this PR close?

Closes #15916

What changes are included in this PR?

  • Main logic of the function remains the same as comet.
  • Function has been registered as a ScalarUDFImpl.
  • Added tests in the corresponding slt file.

Are these changes tested?

  • Yes

Are there any user-facing changes?

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) spark labels Oct 20, 2025
use datafusion_functions::utils::make_scalar_function;

/// <https://spark.apache.org/docs/latest/api/sql/index.html#ceil>
/// Difference between spark: There is no second optional argument to control the rounding behaviour.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have highlighted the difference between spark here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is... interesting

I did not realise our ceil accepts two arguments 🤯

> select ceil(50.5123, 0);
+------------------------+
| ceil(Float64(50.5123)) |
+------------------------+
| 51.0                   |
+------------------------+
1 row(s) fetched.
Elapsed 0.006 seconds.

> select ceil(50.5123, 1);
+------------------------+
| ceil(Float64(50.5123)) |
+------------------------+
| 51.0                   |
+------------------------+
1 row(s) fetched.
Elapsed 0.002 seconds.

I'll try understand what's happening here 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised #18175

@codetyri0n
Copy link
Contributor Author

CC: @alamb @andygrove @Jefffrey

use datafusion_functions::utils::make_scalar_function;

/// <https://spark.apache.org/docs/latest/api/sql/index.html#ceil>
/// Difference between spark: There is no second optional argument to control the rounding behaviour.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is... interesting

I did not realise our ceil accepts two arguments 🤯

> select ceil(50.5123, 0);
+------------------------+
| ceil(Float64(50.5123)) |
+------------------------+
| 51.0                   |
+------------------------+
1 row(s) fetched.
Elapsed 0.006 seconds.

> select ceil(50.5123, 1);
+------------------------+
| ceil(Float64(50.5123)) |
+------------------------+
| 51.0                   |
+------------------------+
1 row(s) fetched.
Elapsed 0.002 seconds.

I'll try understand what's happening here 🤔

}

fn return_type(&self, _arg_types: &[DataType]) -> Result<DataType> {
Ok(Int64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is another difference; our ceil returns float I believe. So Spark requires integer?

Copy link
Contributor Author

@codetyri0n codetyri0n Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after some research while implementing decimal support figured this should return float as well 😅, thanks for raising this!

Ok(Arc::new(array))
}
Int64 => Ok(Arc::clone(&args[0])),
_ => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will decimal also be supported?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is Comet's ceil implementation for reference. It supports decimal types.

https://github.com/apache/datafusion-comet/blob/main/native/spark-expr/src/math_funcs/ceil.rs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decimal should be supported as well, right now it would be coerced as a float

@codetyri0n codetyri0n requested a review from Jefffrey October 20, 2025 20:11
@Jefffrey
Copy link
Contributor

I wonder if we address #18175 then will we still need a separate ceil function for Spark? What differences are there with the DataFusion version now? 🤔

@codetyri0n
Copy link
Contributor Author

I wonder if we address #18175 then will we still need a separate ceil function for Spark? What differences are there with the DataFusion version now? 🤔

I was not aware of the attempts made with #15958 and on having a look the return type has been kept the same as the input type (and for floor it is i128) so I am now doubtful whether my conclusion on what it should return is correct. Would love to have the thoughts of @shehabgamin and @andygrove for some clarity.

@shehabgamin
Copy link
Contributor

shehabgamin commented Oct 21, 2025

I wonder if we address #18175 then will we still need a separate ceil function for Spark? What differences are there with the DataFusion version now? 🤔

I was not aware of the attempts made with #15958 and on having a look the return type has been kept the same as the input type (and for floor it is i128) so I am now doubtful whether my conclusion on what it should return is correct. Would love to have the thoughts of @shehabgamin and @andygrove for some clarity.

@codetyri0n You can port over the implementation from Sail if you'd like:
https://github.com/lakehq/sail/blob/main/crates/sail-function/src/scalar/math/spark_ceil_floor.rs

It should be a straightforward port over. Further, we have some additional tests not covered by the Spark test suite here:
https://github.com/lakehq/sail/blob/main/python/pysail/tests/spark/function/test_ceil.txt
https://github.com/lakehq/sail/blob/main/python/pysail/tests/spark/function/test_floor.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[datafusion-spark] Implement ceil function

4 participants