Skip to content

[WIP][SPARK-52821][PYTHON] add int->DecimalType pyspark udf return type coercion #51538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

benrobby
Copy link

What changes were proposed in this pull request?

  • implements int to decimal coercion for data returned from the python UDF worker
  • the change is gated by a sql conf (default disabled)
  • we are making this change to all pandas_udfs to keep a consistent behavior across all
    • affected evalTypes: SQL_ARROW_TABLE_UDF, SQL_COGROUPED_MAP_PANDAS_UDF, SQL_GROUPED_MAP_PANDAS_UDF_WITH_STATE, SQL_TRANSFORM_WITH_STATE_PANDAS_UDF, SQL_TRANSFORM_WITH_STATE_PANDAS_INIT_STATE_UDF, SQL_ARROW_BATCHED_UDF, SQL_SCALAR_PANDAS_UDF, SQL_SCALAR_PANDAS_ITER_UDF, SQL_MAP_PANDAS_ITER_UDF
  • mapInArrow UDFs are not affected, generally there is no casting/coercion done for UDFs that directly return Arrow data

Why are the changes needed?

  • python UDFs with useArrow=True do not support type coercion from int to DecimalType if the target precision of the DecimalType is too low.
@udf(returnType=DecimalType(2, 1), useArrow=True)
def test:
  return 1
spark.range(1,2,1,1).select(test(col('id'))).display() 
# expected: (Decimal) 1.0
# actual: pyarrow.lib.ArrowInvalid: Precision is not great enough for the result. It should be at least 20 

Does this PR introduce any user-facing change?

  • yes, but this is purely additive change. Before this, int->decimal threw an error.

How was this patch tested?

  • added unit tests

Was this patch authored or co-authored using generative AI tooling?

No

@benrobby benrobby changed the title [SPARK-52821] add int->DecimalType pyspark udf return type coercion [SPARK-52821][PYTHON] add int->DecimalType pyspark udf return type coercion Jul 17, 2025
@benrobby benrobby changed the title [SPARK-52821][PYTHON] add int->DecimalType pyspark udf return type coercion [WIP][SPARK-52821][PYTHON] add int->DecimalType pyspark udf return type coercion Jul 17, 2025
@benrobby
Copy link
Author

@HyukjinKwon could you take a look? :)

@HyukjinKwon
Copy link
Member

I am fine with this change. Let's make the CI happy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants