Skip to content

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Aug 4, 2025

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

As discussed on the last dev call, this implements "mode.nan_is_na" (default True) to consider NaN as either always-equivalent or never-equivalent to NA.

This sits on top of

Still need to

@jbrockmendel jbrockmendel marked this pull request as ready for review August 12, 2025 17:47
@jbrockmendel jbrockmendel added this to the 3.0 milestone Aug 13, 2025
@jbrockmendel
Copy link
Member Author

Discussed in the dev call before last where I, @mroeschke, and @Dr-Irv were +1. Joris was unenthused but "not necessarily opposed". On slack @rhshadrach expressed a +1. All those opinions were to the concept, not the execution.

@jbrockmendel
Copy link
Member Author

gentle ping @mroeschke

result = self._evaluate_op_method(other, op, ARROW_ARITHMETIC_FUNCS)
if is_nan_na() and result.dtype.kind == "f":
parr = result._pa_array
mask = pc.is_nan(parr).to_numpy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be cast to_numpy()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That surprised me too, but yes. Without it pc.replace_with_mask raises

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, what was the error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas/core/arrays/arrow/array.py:1008: in _arith_method
    arr = pc.replace_with_mask(parr, mask, pa.scalar(None, type=parr.type))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
../../../.pyenv/versions/3.11.13/lib/python3.11/site-packages/pyarrow/compute.py:252: in wrapper
    return func.call(args, None, memory_pool)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pyarrow/_compute.pyx:399: in pyarrow._compute.Function.call
    ???
pyarrow/error.pxi:155: in pyarrow.lib.pyarrow_internal_check_status

arrays = [np.nan] * len(columns)
if dtype is not None and not isinstance(dtype, np.dtype):
# e.g. test_dataframe_from_dict_of_series
arrays = [NA] * len(columns)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want the placeholder here to be nan for StringDtype(na_value=nan), i.e.

if ... and isinstance(dtype, ExtensionDtype):
    arrays = [dtype.na_value] * len(columns)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that'd probably be benign. would we expect pd.NA to ever not-work?

Copy link
Member

@mroeschke mroeschke Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah not sure if something like

df = pd.DataFrame({"a": ["b"]}, columns=["a", "b"], dtype=pd.StringDtype(na_value=np.nan))
df.loc[0, "b"]

Would correctly return nan here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it does

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants