-
-
Notifications
You must be signed in to change notification settings - Fork 262
Fixes a loop in the GENERATE_SERIES function on boundary values. #8812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| result += step; | ||
| // Prevent freezing at boundary values | ||
| if (((step < 0) && (result == MIN_SINT64)) || | ||
| ((step > 0) && (result == MAX_SINT64))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this logic correct for scaled numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic doesn't work if step is different from 1 or -1. I'll rework this solution for a more general case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these dynamic type switches needed? Why not count in output data type from the beginning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these dynamic type switches needed? Why not count in output data type from the beginning?
The type is already chosen based on the most general of the three parameters. However, calculations with the INT128 type are handled differently than with INT64, so these are separate code paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this logic correct for scaled numbers?
This is a protection against internal type overflows. It's unlikely that you can specify scaled values at the function parameter level without causing an error sooner. At least these errors will be detected and won't simply lead to a loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any significant advantages to this approach?
Not deal with different types every time.
The errors for me are correct, the step is applied and next the condition is checked. The overflow happens when applying the next step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I admit that for a FOR RANGE LOOP, this error is perfectly logical. However, generate_seriers works a little differently. Such errors are definitely not expected there.
PostgreSQL example:
select * from generate_series(-9223372036854775807, -9223372036854775808, -1) as x;|----------------------|
| x |
|----------------------|
| -9223372036854775807 |
| -9223372036854775808 |
select * from generate_series(-9223372036854775806, -9223372036854775807, -3) as x|----------------------|
| x |
|----------------------|
| -9223372036854775806 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, using ArithmeticNode::add to control computation boundaries is quite difficult. When a user uses the generate_series table function, they certainly don't think of it as a loop. And the goal of this PR was precisely to fix the loop/error issue where it shouldn't occur. A refactoring is possible in the future, if anyone deems it useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, using
ArithmeticNode::addto control computation boundaries is quite difficult. When a user uses thegenerate_seriestable function, they certainly don't think of it as a loop. And the goal of this PR was precisely to fix the loop/error issue where it shouldn't occur. A refactoring is possible in the future, if anyone deems it useful.
You have the option to enclose it in try-catch and swallow the error. It will do only in boundaries case.
Isn't it a good compromise to avoid extra verifications and custom implementations for different data types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll try to do it this way.
The GENERATE_SERIES function behaves incorrectly at boundary values for the BIGINT and INT128 types. A loop occurs with the BIGINT type, and an exception is thrown with the INT128 type. Examples of incorrect behavior:
(OK, expected)
For
INT128example: