-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Deduplicate range/gen_series nested functions code #18198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
($UDF:ident, $EXPR_FN:ident, $($arg:ident)*, $DOC:expr, $SCALAR_UDF_FN:ident) => { | ||
make_udf_expr_and_func!($UDF, $EXPR_FN, $($arg)*, $DOC, $SCALAR_UDF_FN, $UDF::new); | ||
}; | ||
($UDF:ident, $EXPR_FN:ident, $($arg:ident)*, $DOC:expr, $SCALAR_UDF_FN:ident, $CTOR:path) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just like what window functions do:
datafusion/datafusion/functions-window/src/macros.rs
Lines 96 to 105 in 155b56e
macro_rules! get_or_init_udwf { | |
($UDWF:ident, $OUT_FN_NAME:ident, $DOC:expr) => { | |
get_or_init_udwf!($UDWF, $OUT_FN_NAME, $DOC, $UDWF::default); | |
}; | |
($UDWF:ident, $OUT_FN_NAME:ident, $DOC:expr, $CTOR:path) => { | |
paste::paste! { | |
#[doc = concat!(" Returns a [`WindowUDF`](datafusion_expr::WindowUDF) for [`", stringify!($OUT_FN_NAME), "`].")] | |
#[doc = ""] | |
#[doc = concat!(" ", $DOC)] |
So we can use same Range
struct with different constructor methods
start stop step, | ||
"create a list of values in the range between start and stop, include upper bound", | ||
gen_series_udf, | ||
Range::generate_series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Custom constructor here
"create a list of values in the range between start and stop, include upper bound", | ||
gen_series_udf | ||
); | ||
struct RangeDoc {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty struct just so I can use the user_doc
attribute; I was going to do it like this:
datafusion/datafusion/functions-window/src/rank.rs
Lines 107 to 140 in 155b56e
static RANK_DOCUMENTATION: LazyLock<Documentation> = LazyLock::new(|| { | |
Documentation::builder( | |
DOC_SECTION_RANKING, | |
"Returns the rank of the current row within its partition, allowing \ | |
gaps between ranks. This function provides a ranking similar to `row_number`, but \ | |
skips ranks for identical values.", | |
"rank()") | |
.with_sql_example(r#" | |
```sql | |
-- Example usage of the rank window function: | |
SELECT department, | |
salary, | |
rank() OVER (PARTITION BY department ORDER BY salary DESC) AS rank | |
FROM employees; | |
+-------------+--------+------+ | |
| department | salary | rank | | |
+-------------+--------+------+ | |
| Sales | 70000 | 1 | | |
| Sales | 50000 | 2 | | |
| Sales | 50000 | 2 | | |
| Sales | 30000 | 4 | | |
| Engineering | 90000 | 1 | | |
| Engineering | 80000 | 2 | | |
+-------------+--------+------+ | |
``` | |
"#) | |
.build() | |
}); | |
fn get_rank_doc() -> &'static Documentation { | |
&RANK_DOCUMENTATION | |
} |
But I prefer using the user_doc
attribute as it looks nicer imo; happy to switch if this isn't recommended
/// gen_range(3) => [0, 1, 2] | ||
/// gen_range(1, 4) => [1, 2, 3] | ||
/// gen_range(1, 7, 2) => [1, 3, 5] | ||
pub(super) fn gen_range_inner( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulled these into the impl Range
so they can easily access self.name()
and self.include_upper_bound
let [start, stop, step] = take_function_args(self.name(), args)?; | ||
|
||
// coerce_types fn should coerce all types to Timestamp(Nanosecond, tz) | ||
// TODO: remove these map_err once the signature is robust enough to guard against this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking to do this for #15881 itself
Which issue does this PR close?
generate_series
: Internal error: could not cast array of type Date32 #15881Rationale for this change
Range
andGenSeries
are essentially the same except for whether they include upper bounds or not; unify their function code to reduce duplication, making future changes easier.What changes are included in this PR?
Remove
GenSeries
struct, folding it intoRange
. Do some more minor refactoring to their code.Are these changes tested?
Existing tests (updated some error messages).
Are there any user-facing changes?
Not really (updated some error messages).