API: mode.nan_is_na to consistently distinguish NaN-vs-NA #62040

jbrockmendel · 2025-08-04T15:42:47Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

As discussed on the last dev call, this implements "mode.nan_is_na" (default True) to consider NaN as either always-equivalent or never-equivalent to NA.

This sits on top of

TST: nan->NA in non-construction tests #62021, which trims the diff here by updating some tests to use NA instead of NaN.
API: consistent NaN treatment for pyarrow dtypes #61732 which implements the option but only for pyarrow dtypes.
API: improve dtype in df.where with EA other #62038 which addresses an issue in DataFrame.where
BUG: read_csv with engine=pyarrow and numpy-nullable dtype #62053 which addresses a kludge in read_csv with engine="pyarrow"

Still need to

Add docs for the new option, including whatsnew section
deal with a kludge in algorithms.rank; fixed by API: rank with nullable dtypes preserve NA #62043
deal with a kludge in read_csv with engine="pyarrow"; fixed by BUG: read_csv with engine=pyarrow and numpy-nullable dtype #62053
Add tests for the issues this addresses

…estamp type

jbrockmendel · 2025-08-13T22:06:09Z

Discussed in the dev call before last where I, @mroeschke, and @Dr-Irv were +1. Joris was unenthused but "not necessarily opposed". On slack @rhshadrach expressed a +1. All those opinions were to the concept, not the execution.

jbrockmendel · 2025-08-25T22:53:26Z

gentle ping @mroeschke

pandas/core/arrays/arrow/array.py

mroeschke · 2025-08-26T16:59:46Z

pandas/core/arrays/arrow/array.py

+        result = self._evaluate_op_method(other, op, ARROW_ARITHMETIC_FUNCS)
+        if is_nan_na() and result.dtype.kind == "f":
+            parr = result._pa_array
+            mask = pc.is_nan(parr).to_numpy()


Does this need to be cast to_numpy()?

That surprised me too, but yes. Without it pc.replace_with_mask raises

Curious, what was the error?

pandas/core/arrays/arrow/array.py:1008: in _arith_method arr = pc.replace_with_mask(parr, mask, pa.scalar(None, type=parr.type)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ../../../.pyenv/versions/3.11.13/lib/python3.11/site-packages/pyarrow/compute.py:252: in wrapper return func.call(args, None, memory_pool) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pyarrow/_compute.pyx:399: in pyarrow._compute.Function.call ??? pyarrow/error.pxi:155: in pyarrow.lib.pyarrow_internal_check_status

mroeschke · 2025-08-26T17:05:00Z

pandas/core/internals/construction.py

-        arrays = [np.nan] * len(columns)
+        if dtype is not None and not isinstance(dtype, np.dtype):
+            # e.g. test_dataframe_from_dict_of_series
+            arrays = [NA] * len(columns)


Would we want the placeholder here to be nan for StringDtype(na_value=nan), i.e.

if ... and isinstance(dtype, ExtensionDtype): arrays = [dtype.na_value] * len(columns)

that'd probably be benign. would we expect pd.NA to ever not-work?

Yeah not sure if something like

df = pd.DataFrame({"a": ["b"]}, columns=["a", "b"], dtype=pd.StringDtype(na_value=np.nan)) df.loc[0, "b"]

Would correctly return nan here

yes it does

pandas/tests/extension/test_arrow.py

jbrockmendel mentioned this pull request Aug 4, 2025

POC: NA-only behavior for numpy-nullable dtypes #61708

Closed

jbrockmendel force-pushed the api-nan-vs-na branch 2 times, most recently from 1d85ad8 to 1ccaaa4 Compare August 4, 2025 20:41

This was referenced Aug 5, 2025

API: NaN vs NA in mixed reduction #62024

Open

BUG: read_csv loses precision when engine='pyarrow' and dtype Int64 #56136

Closed

BUG: read_csv with engine=pyarrow and numpy-nullable dtype #62053

Merged

jbrockmendel force-pushed the api-nan-vs-na branch 3 times, most recently from f0e5e34 to 71d1c03 Compare August 6, 2025 14:45

jbrockmendel added 21 commits August 12, 2025 09:07

BUG: read_csv with engine=pyarrow and numpy-nullable dtype

5e88fde

mypy fixup, error message compat for 32bit builds

eae6f64

minimum version compat

2861b16

not-infer-string compat

5369afa

mypy fixup

db35a9c

update usage

505bfb6

CLN: remove redundant check

febe83c

Use Matts idea

c81cbec

re-xfail

26a3049

API: rank with nullable dtypes preserve NA

a70b429

API: improve dtype in df.where with EA other

99a71b7

GH refs

c86747d

doc fixup

9d222d8

BUG: Decimal(NaN) incorrectly allowed in ArrowEA constructor with tim…

6f800b3

…estamp type

GH ref

514a56f

BUG: ArrowEA constructor with timestamp type

fca3c7c

POC: consistent NaN treatment for pyarrow dtypes

f20758a

comment

cc416fa

Down to 40 failing tests

7094d85

Fix rank, json tests

eeb0d32

CLN: remove outdated

814d001

jbrockmendel added 13 commits August 12, 2025 09:07

Test for setitem/construction

cf7b229

update ufunc test

eb12ea1

Improve rank test skips

f0262ef

ENH: mode.nan_is_na for numpy-nullable dtypes

544faf1

update style test

6c4b68f

update asvs, mypy ignores

90d3a28

pre-commit fixup

408aa06

doc fixup

9e5ebec

Remove special-casing

0fd2e2d

comment

7de9f40

ruff format

2f61a58

Set default to True

36143ad

whatsnew

b7ea9ae

jbrockmendel force-pushed the api-nan-vs-na branch from 71d1c03 to b7ea9ae Compare August 12, 2025 16:30

jbrockmendel marked this pull request as ready for review August 12, 2025 17:47

jbrockmendel added this to the 3.0 milestone Aug 13, 2025

jbrockmendel mentioned this pull request Aug 19, 2025

ENH: EA._cast_pointwise_result #62105

Merged

5 tasks

jbrockmendel added 5 commits August 20, 2025 09:25

Merge branch 'main' into api-nan-vs-na

a625190

update _cast_pointwise_result

d471aa8

update cast_pointwise_result

27cd097

Merge branch 'main' into api-nan-vs-na

1bb0a4e

Merge branch 'main' into api-nan-vs-na

7cc3b41

mroeschke reviewed Aug 26, 2025

View reviewed changes

pandas/core/arrays/arrow/array.py Show resolved Hide resolved

mroeschke reviewed Aug 26, 2025

View reviewed changes

pandas/tests/extension/test_arrow.py Outdated Show resolved Hide resolved

jbrockmendel added 2 commits August 26, 2025 10:20

Merge branch 'main' into api-nan-vs-na

5f76e19

remove unnecessary import

b2a64bb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API: mode.nan_is_na to consistently distinguish NaN-vs-NA #62040

API: mode.nan_is_na to consistently distinguish NaN-vs-NA #62040

jbrockmendel commented Aug 4, 2025 •

edited

Loading

Uh oh!

jbrockmendel commented Aug 13, 2025

Uh oh!

jbrockmendel commented Aug 25, 2025

Uh oh!

Uh oh!

mroeschke Aug 26, 2025

Uh oh!

jbrockmendel Aug 26, 2025

Uh oh!

mroeschke Aug 26, 2025

Uh oh!

jbrockmendel Aug 26, 2025

Uh oh!

mroeschke Aug 26, 2025

Uh oh!

jbrockmendel Aug 26, 2025

Uh oh!

mroeschke Aug 26, 2025 •

edited by jbrockmendel

Loading

Uh oh!

jbrockmendel Aug 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

API: mode.nan_is_na to consistently distinguish NaN-vs-NA #62040

Are you sure you want to change the base?

API: mode.nan_is_na to consistently distinguish NaN-vs-NA #62040

Conversation

jbrockmendel commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbrockmendel commented Aug 13, 2025

Uh oh!

jbrockmendel commented Aug 25, 2025

Uh oh!

Uh oh!

mroeschke Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

mroeschke Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

mroeschke Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

mroeschke Aug 26, 2025 • edited by jbrockmendel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jbrockmendel commented Aug 4, 2025 •

edited

Loading

mroeschke Aug 26, 2025 •

edited by jbrockmendel

Loading