feat(snowflake)!: Transpilation support for TO_DECIMAL, TO_NUMBER,NUMERIC#7315
feat(snowflake)!: Transpilation support for TO_DECIMAL, TO_NUMBER,NUMERIC#7315fivetran-ashashankar wants to merge 3 commits intomainfrom
Conversation
| @@ -894,9 +894,9 @@ def eliminate_join_marks(expression: exp.Expr) -> exp.Expr: | |||
| if not left_join_table: | |||
There was a problem hiding this comment.
make style formatter modified the transforms.py
| @@ -4181,6 +4181,84 @@ def strtok_sql(self, expression: exp.Strtok) -> str: | |||
|
|
|||
| return self.function_fallback_sql(expression) | |||
|
|
|||
There was a problem hiding this comment.
Snowflake: TO_NUMBER(expr) defaults to NUMBER(38, 0) -> truncates decimals -> BIGINT
Oracle/others: TO_NUMBER(expr) defaults to NUMBER -> keeps decimals -> DOUBLE
Working on handling this difference. should I use is_snowflake here.
SQLGlot Integration Test ResultsComparing:
By Dialect
Overallmain: 4004 total, 4003 passed (pass rate: 100.0%), sqlglot version: sqlglot:RD-1069319_TO_NUMBER_DECIMAL_NUMERIC: 4004 total, 4003 passed (pass rate: 100.0%), sqlglot version: Transitions: |
…_NUMERIC . 3.9 support. string concatenation ("\\" + c) instead of f-string interpolation
| @@ -927,9 +927,9 @@ def eliminate_join_marks(expression: exp.Expr) -> exp.Expr: | |||
|
|
|||
| if query_from.alias_or_name in new_joins: | |||
| only_old_joins = old_joins.keys() - new_joins.keys() | |||
| assert len(only_old_joins) >= 1, ( | |||
| "Cannot determine which table to use in the new FROM clause" | |||
| ) | |||
| assert ( | |||
| len(only_old_joins) >= 1 | |||
| ), "Cannot determine which table to use in the new FROM clause" | |||
There was a problem hiding this comment.
Latest main, with a fresh venv should solve this.
| # Parse arguments: format_arg could be format string or precision | ||
| if format_arg and format_arg.is_string: | ||
| format_string = format_arg.this | ||
| precision, scale = precision_arg, scale_arg | ||
| else: | ||
| format_string = None | ||
| precision, scale = format_arg, precision_arg |
There was a problem hiding this comment.
This doesn't handle cases like:
WITH t AS (SELECT '$9,999.99' AS f) SELECT TO_DECIMAL('$3,741.72', f, 6, 2) FROM t
| precision, scale = format_arg, precision_arg | ||
|
|
||
| # Handle hexadecimal format | ||
| if format_string and format_string.lower() in ("xxx", "xxxx"): |
There was a problem hiding this comment.
This doesn't cover all the cases right ?
For example:
Snowflake:
SELECT TO_DECIMAL('ae5', '0XX')
> 2789
Transpiled Duckdb:
SELECT CAST('ae5' AS BIGINT)
> Conversion Error:
Could not convert string 'ae5' to INT64
LINE 1: SELECT CAST('ae5' AS BIGINT);
Also, again what happens if we dont have the value of format ?
WITH t AS (SELECT '0XX' AS f) SELECT TO_DECIMAL('ae5', f) from t;
We should generate the solution on top of the query right ? Instead of handling the values on parse time.
| # Parse arguments: format_arg could be format string or precision | ||
| if format_arg and format_arg.is_string: | ||
| format_string = format_arg.this | ||
| precision, scale = precision_arg, scale_arg |
There was a problem hiding this comment.
No need for re-assignment of persicion and scale.
| if format_string: | ||
| chars_to_remove = {",": ",", "$": "$", "£": "£", "€": "€", "¥": "¥"} | ||
| chars = [c for symbol, c in chars_to_remove.items() if symbol in format_string] | ||
|
|
||
| if chars: | ||
| if len(chars) == 1: | ||
| pattern = chars[0] | ||
| else: | ||
| # Escape special regex characters (Python 3.9 compatible) | ||
| regex_special = r".^$*+?{}[]\|()" | ||
| escaped = [c if c not in regex_special else "\\" + c for c in chars] | ||
| pattern = "[" + "".join(escaped) + "]" | ||
| this = exp.RegexpReplace( | ||
| this=this, | ||
| expression=exp.Literal.string(pattern), | ||
| replacement=exp.Literal.string(""), | ||
| modifiers=exp.Literal.string("g"), | ||
| ) |
There was a problem hiding this comment.
This should be done on the query side.
Check what we do in this PR: https://github.com/tobymao/sqlglot/pull/7283/changes#diff-a286b0dbb51576bf61912639b01aa715baa497e89392148823a841e1ef138d5dR4095
Also, let's ensure that we cover all the formats. For example do we cover select TO_NUMBER('-123.45', 'MI999.99') ?
| prec_val = int(precision.to_py()) | ||
|
|
||
| if scale and not isinstance(scale, exp.Literal): | ||
| self.unsupported( | ||
| "TO_NUMBER with non-literal scale is not supported. Using DOUBLE instead of DECIMAL." | ||
| ) | ||
| return self.sql(exp.cast(this, exp.DataType.Type.DOUBLE)) | ||
|
|
||
| scale_val = int(scale.to_py()) if scale else 0 |
There was a problem hiding this comment.
Do we need to do to_py and cast into int ?
The precision and the scale will always be literals in snowflake right ? (Can you search for the rest of the dialects ? if we always have a literal here and not an identifier)
So, we can directly use them in the cast, avoiding the back and forth.
No description provided.