You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-52394][PS] Fix autocorr divide-by-zero error under ANSI mode
### What changes were proposed in this pull request?
Fix autocorr divide-by-zero error under ANSI mode
### Why are the changes needed?
Ensure pandas on Spark works well with ANSI mode on.
Part of https://issues.apache.org/jira/browse/SPARK-52169.
### Does this PR introduce _any_ user-facing change?
When ANSI is on,
FROM
```py
>>> s = ps.Series([1, 0, 0, 0])
>>> s.autocorr()
...
25/08/04 13:25:13 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 33)
org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22012
== DataFrame ==
"corr" was called from
...
```
TO
```py
>>> s = ps.Series([1, 0, 0, 0])
>>> s.autocorr()
nan
```
### How was this patch tested?
Unit tests.
Commands below passed
```
1004 SPARK_ANSI_SQL_MODE=true ./python/run-tests --python-executables=python3.11 --testnames "pyspark.pandas.tests.series.test_stat SeriesStatTests.test_autocorr"
1009 SPARK_ANSI_SQL_MODE=false ./python/run-tests --python-executables=python3.11 --testnames "pyspark.pandas.tests.series.test_stat SeriesStatTests.test_autocorr
```
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes#51192 from xinrong-meng/autocorr.
Authored-by: Xinrong Meng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
0 commit comments