Skip to content

Conversation

@sim1984
Copy link
Contributor

@sim1984 sim1984 commented Nov 25, 2025

The GENERATE_SERIES function behaves incorrectly at boundary values ​​for the BIGINT and INT128 types. A loop occurs with the BIGINT type, and an exception is thrown with the INT128 type. Examples of incorrect behavior:

select * from generate_series(-9223372036854775807, -9223372036854775809, -1) as x rows 10;
INPUT message field count: 0
 
OUTPUT message field count: 1
01: sqltype: 32752 INT128 scale: 0 subtype: 0 len: 16
  :  name: GENERATE_SERIES  alias: GENERATE_SERIES
  : table:   schema:   owner:
 
                              GENERATE_SERIES
=============================================
                         -9223372036854775807
                         -9223372036854775808
                         -9223372036854775809

(OK, expected)

select * from generate_series(-9223372036854775807, -9223372036854775808, -1) as x rows 10;
INPUT message field count: 0
 
OUTPUT message field count: 1
01: sqltype: 580 INT64 scale: 0 subtype: 0 len: 8
  :  name: GENERATE_SERIES  alias: GENERATE_SERIES
  : table:   schema:   owner:
 
      GENERATE_SERIES
=====================
 -9223372036854775807
 -9223372036854775808
  9223372036854775807 // <-- Error. 
  9223372036854775806
  9223372036854775805
  9223372036854775804
  9223372036854775803
  9223372036854775802
  9223372036854775801
  9223372036854775800
===============

For INT128 example:

select * from generate_series(bin_shl(cast(2 as int128),126)+1, bin_shl(cast(2 as int128),126), -1) as x rows 10;
INPUT message field count: 0

OUTPUT message field count: 1
01: sqltype: 32752 INT128 scale: 0 subtype: 0 len: 16
  :  name: GENERATE_SERIES  alias: GENERATE_SERIES
  : table:   schema:   owner:

                              GENERATE_SERIES
=============================================
     -170141183460469231731687303715884105727
Statement failed, SQLSTATE = 22003
arithmetic exception, numeric overflow, or string truncation
-Integer overflow.  The result of an integer operation caused the most significant bit of the result to carry.

result += step;
// Prevent freezing at boundary values
if (((step < 0) && (result == MIN_SINT64)) ||
((step > 0) && (result == MAX_SINT64)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this logic correct for scaled numbers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic doesn't work if step is different from 1 or -1. I'll rework this solution for a more general case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these dynamic type switches needed? Why not count in output data type from the beginning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these dynamic type switches needed? Why not count in output data type from the beginning?

The type is already chosen based on the most general of the three parameters. However, calculations with the INT128 type are handled differently than with INT64, so these are separate code paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this logic correct for scaled numbers?

This is a protection against internal type overflows. It's unlikely that you can specify scaled values ​​at the function parameter level without causing an error sooner. At least these errors will be detected and won't simply lead to a loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any significant advantages to this approach?

Not deal with different types every time.

The errors for me are correct, the step is applied and next the condition is checked. The overflow happens when applying the next step.

Copy link
Contributor Author

@sim1984 sim1984 Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I admit that for a FOR RANGE LOOP, this error is perfectly logical. However, generate_seriers works a little differently. Such errors are definitely not expected there.

PostgreSQL example:

select * from generate_series(-9223372036854775807, -9223372036854775808, -1) as x;
|----------------------|
| x                    |
|----------------------|
| -9223372036854775807 |
| -9223372036854775808 |
select * from generate_series(-9223372036854775806, -9223372036854775807, -3) as x
|----------------------|
| x                    |
|----------------------|
| -9223372036854775806 |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, using ArithmeticNode::add to control computation boundaries is quite difficult. When a user uses the generate_series table function, they certainly don't think of it as a loop. And the goal of this PR was precisely to fix the loop/error issue where it shouldn't occur. A refactoring is possible in the future, if anyone deems it useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, using ArithmeticNode::add to control computation boundaries is quite difficult. When a user uses the generate_series table function, they certainly don't think of it as a loop. And the goal of this PR was precisely to fix the loop/error issue where it shouldn't occur. A refactoring is possible in the future, if anyone deems it useful.

You have the option to enclose it in try-catch and swallow the error. It will do only in boundaries case.
Isn't it a good compromise to avoid extra verifications and custom implementations for different data types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll try to do it this way.

@sim1984 sim1984 requested a review from asfernandes November 30, 2025 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants