Skip to content
Open
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
a9c8d85
Initial test case
eicchen Jul 10, 2025
f303a04
Updated test case to account for results of mul being NaN if both inp…
eicchen Jul 10, 2025
5ac26a4
Removed test cases which expect an error from fill_value
eicchen Jul 10, 2025
a60fbb0
Updated test case to include other operators which included fill_value
eicchen Jul 10, 2025
87ecfc4
Removed restriction on using fill_value with series
eicchen Jul 10, 2025
bc805fd
Included PR suggestions, added seperate dtype test (WIP)
eicchen Jul 15, 2025
be09616
temp files
eicchen Jul 16, 2025
1ebcf6e
Added test case to test EA and NUMPY dtypes
eicchen Aug 18, 2025
98fb07f
Merge branch 'pandas-dev:main' into BUG-#61581-DataFrame.mul
eicchen Aug 19, 2025
a5940d5
addressed changes brought up in PR, converted test cases to not use n…
eicchen Aug 21, 2025
dcf3391
Limit np conversion to IntegerArray and FloatArray
eicchen Aug 21, 2025
1179098
Updated EA catch method in _maybe_align_series_as_frame
eicchen Aug 21, 2025
2dfb4bf
Addressed errors from changes in som tests
eicchen Aug 21, 2025
ce0b2ef
removed comment and errant print statement
eicchen Aug 21, 2025
eaac655
Commented out test_add_frame's xfail to test CI
eicchen Aug 23, 2025
a719a1d
Allows frames to be added to strings, with modifications to tests tha…
eicchen Aug 25, 2025
06bcebc
Merge branch 'main' into BUG-#61581-DataFrame.mul
eicchen Aug 25, 2025
801b39e
Moved type conversion within add and radd if statement, removed datea…
eicchen Aug 26, 2025
4be6817
Removed PeriodArray special casing and modified test case
eicchen Aug 27, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -960,6 +960,7 @@ MultiIndex
- :func:`MultiIndex.get_level_values` accessing a :class:`DatetimeIndex` does not carry the frequency attribute along (:issue:`58327`, :issue:`57949`)
- Bug in :class:`DataFrame` arithmetic operations in case of unaligned MultiIndex columns (:issue:`60498`)
- Bug in :class:`DataFrame` arithmetic operations with :class:`Series` in case of unaligned MultiIndex (:issue:`61009`)
- Bug in :class:`DataFrame` arithmetic operations with :class:`Series` now works with ``fill_value`` parameter (:issue:`61581`)
- Bug in :meth:`MultiIndex.from_tuples` causing wrong output with input of type tuples having NaN values (:issue:`60695`, :issue:`60988`)
- Bug in :meth:`DataFrame.reindex` and :meth:`Series.reindex` where reindexing :class:`Index` to a :class:`MultiIndex` would incorrectly set all values to ``NaN``.(:issue:`60923`)

Expand Down
15 changes: 5 additions & 10 deletions pandas/core/arrays/arrow/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -865,7 +865,9 @@ def _op_method_error_message(self, other, op) -> str:
def _evaluate_op_method(self, other, op, arrow_funcs) -> Self:
pa_type = self._pa_array.type
other_original = other
other = self._box_pa(other)
other_NA = self._box_pa(other)
# pyarrow gets upset if you try to join a NullArray
other = other_NA.cast(pa_type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it obvious this is always right? e.g. what if self is pa.timestamp("us") and other is pa.int64()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, I did try to only check for NullArrays, but that returned the error about how it couldn't concatenate the frame in the original add_to_frame testcase.

We could circumvent that by casting the initial df as an object but I didn't want to mess with the test case because I didn't know if that was something it was testing for.

Alternatively, I can just reimplement a check and check for dtypes we'd want to let go through

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this is the cause of a bunch of test failures FAILED pandas/tests/extension/test_arrow.py::test_arithmetic_temporal[pa_type11] - pyarrow.lib.ArrowNotImplementedError: Unsupported cast from duration[us] to timestamp using function cast_timestamp .

are you running the tests locally before committing/pushing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only ran the array folder because the full suite takes a lot of time, Ill be sure to run the full thing going forward. That's on me.

Copy link
Contributor Author

@eicchen eicchen Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill add official testcases once the build clears CI due to the weird tack-on nature of this bug fix. Just from some local testing, it looks like there is already a preexisting error message for trying to use the add operation on dtypes like Datetime and TimeDelta.

That being said, it looks like the CI is throwing errors on some of the builds but not others again, and what do you know, they're not replicated on my local machine. Would you know who I could talk to to figure out why that is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the exception message in a test needs to be updated thats fine as long as the new one makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

Any pointers for the CI or should I ask it during the meeting tmr?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i havent looked too closely, but the CI failiures ll look like cases of "the test needs to be updated to check for the new exception message".

none of the edits to the ArrowEA are necessary, nor is the special-casing for Period.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that 2/3 of the unit tests succeed as-is, so it doesn't make sense why only 7 are failing. Especially since the error is about a float being concatenated with a string, which all the other builds are able to do. My guess was that something was different about their set up process.


if (
pa.types.is_string(pa_type)
Expand All @@ -886,7 +888,7 @@ def _evaluate_op_method(self, other, op, arrow_funcs) -> Self:
return self._from_pyarrow_array(result)
elif op in [operator.mul, roperator.rmul]:
binary = self._pa_array
integral = other
integral = other_NA
if not pa.types.is_integer(integral.type):
raise TypeError("Can only string multiply by an integer.")
pa_integral = pc.if_else(pc.less(integral, 0), 0, integral)
Expand All @@ -903,14 +905,7 @@ def _evaluate_op_method(self, other, op, arrow_funcs) -> Self:
raise TypeError("Can only string multiply by an integer.")
pa_integral = pc.if_else(pc.less(integral, 0), 0, integral)
result = pc.binary_repeat(binary, pa_integral)
return self._from_pyarrow_array(result)
if (
isinstance(other, pa.Scalar)
and pc.is_null(other).as_py()
and op.__name__ in ARROW_LOGICAL_FUNCS
):
# pyarrow kleene ops require null to be typed
other = other.cast(pa_type)
return type(self)(result)

pc_func = arrow_funcs[op.__name__]
if pc_func is NotImplemented:
Expand Down
27 changes: 16 additions & 11 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -8450,13 +8450,23 @@ def _maybe_align_series_as_frame(self, series: Series, axis: AxisInt):
blockwise.
"""
rvalues = series._values
if not isinstance(rvalues, np.ndarray):
# TODO(EA2D): no need to special-case with 2D EAs
if rvalues.dtype in ("datetime64[ns]", "timedelta64[ns]"):
# We can losslessly+cheaply cast to ndarray
rvalues = np.asarray(rvalues)
if isinstance(rvalues, PeriodArray):
return series
if not isinstance(rvalues, np.ndarray) and rvalues.dtype not in (
"datetime64[ns]",
"timedelta64[ns]",
):
if axis == 0:
df = DataFrame(dict.fromkeys(range(self.shape[1]), rvalues))
else:
return series
nrows = self.shape[0]
df = DataFrame(
{i: rvalues[[i]].repeat(nrows) for i in range(self.shape[1])},
dtype=rvalues.dtype,
)
df.index = self.index
df.columns = self.columns
return df

if axis == 0:
rvalues = rvalues.reshape(-1, 1)
Expand All @@ -8480,11 +8490,6 @@ def _flex_arith_method(
if self._should_reindex_frame_op(other, op, axis, fill_value, level):
return self._arith_method_with_reindex(other, op)

if isinstance(other, Series) and fill_value is not None:
# TODO: We could allow this in cases where we end up going
# through the DataFrame path
raise NotImplementedError(f"fill_value {fill_value} not supported.")

other = ops.maybe_prepare_scalar_for_op(other, self.shape)
self, other = self._align_for_op(other, axis, flex=True, level=level)

Expand Down
7 changes: 1 addition & 6 deletions pandas/tests/arithmetic/test_period.py
Original file line number Diff line number Diff line change
Expand Up @@ -1361,12 +1361,7 @@ def test_period_add_timestamp_raises(self, box_with_array):
arr + ts
with pytest.raises(TypeError, match=msg):
ts + arr
if box_with_array is pd.DataFrame:
# TODO: before implementing resolution-inference we got the same
# message with DataFrame and non-DataFrame. Why did that change?
msg = "cannot add PeriodArray and Timestamp"
else:
msg = "cannot add PeriodArray and DatetimeArray"
msg = "cannot add PeriodArray and DatetimeArray"
with pytest.raises(TypeError, match=msg):
arr + Series([ts])
with pytest.raises(TypeError, match=msg):
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/arrays/boolean/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ def test_error_invalid_values(data, all_arithmetic_operators):
ops(pd.Timestamp("20180101"))

# invalid array-likes
if op not in ("__mul__", "__rmul__"):
if op not in ("__mul__", "__rmul__", "__add__", "__radd__"):
# TODO(extension) numpy's mul with object array sees booleans as numbers
msg = "|".join(
[
Expand Down
34 changes: 32 additions & 2 deletions pandas/tests/arrays/floating/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,8 +152,38 @@ def test_error_invalid_values(data, all_arithmetic_operators):
ops(pd.Timestamp("20180101"))

# invalid array-likes
with pytest.raises(TypeError, match=msg):
ops(pd.Series("foo", index=s.index))
str_ser = pd.Series("foo", index=s.index)
if all_arithmetic_operators in [
"__add__",
"__radd__",
]:
res = ops(str_ser)
if all_arithmetic_operators == "__radd__":
data_expected = []
for i in data:
if pd.isna(i):
data_expected.append(i)
elif i.is_integer():
data_expected.append("foo" + str(int(i)))
else:
data_expected.append("foo" + str(i))

expected = pd.Series(data_expected, index=s.index)
else:
data_expected = []
for i in data:
if pd.isna(i):
data_expected.append(i)
elif i.is_integer():
data_expected.append(str(int(i)) + "foo")
else:
data_expected.append(str(i) + "foo")

expected = pd.Series(data_expected, index=s.index)
tm.assert_series_equal(res, expected)
else:
with pytest.raises(TypeError, match=msg):
ops(str_ser)

msg = "|".join(
[
Expand Down
16 changes: 16 additions & 0 deletions pandas/tests/arrays/integer/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,22 @@ def test_error_invalid_values(data, all_arithmetic_operators):
# assert_almost_equal stricter, but the expected with pd.NA seems
# more-correct than np.nan here.
tm.assert_series_equal(res, expected)
elif all_arithmetic_operators in [
"__add__",
"__radd__",
]:
res = ops(str_ser)
if all_arithmetic_operators == "__radd__":
expected = pd.Series(
[np.nan if pd.isna(x) == 1 else "foo" + str(x) for x in data],
index=s.index,
)
else:
expected = pd.Series(
[np.nan if pd.isna(x) == 1 else str(x) + "foo" for x in data],
index=s.index,
)
tm.assert_series_equal(res, expected)
else:
with tm.external_error_raised(TypeError):
ops(str_ser)
Expand Down
9 changes: 5 additions & 4 deletions pandas/tests/arrays/string_/test_string.py
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,6 @@ def test_mul(dtype):
tm.assert_extension_array_equal(result, expected)


@pytest.mark.xfail(reason="GH-28527")
def test_add_strings(dtype):
arr = pd.array(["a", "b", "c", "d"], dtype=dtype)
df = pd.DataFrame([["t", "y", "v", "w"]], dtype=object)
Expand All @@ -268,20 +267,22 @@ def test_add_strings(dtype):
tm.assert_frame_equal(result, expected)


@pytest.mark.xfail(reason="GH-28527")
def test_add_frame(dtype):
arr = pd.array(["a", "b", np.nan, np.nan], dtype=dtype)
df = pd.DataFrame([["x", np.nan, "y", np.nan]])

assert arr.__add__(df) is NotImplemented

# TODO
# pyarrow returns a different dtype despite the values being the same
# could be addressed this PR if needed
result = arr + df
expected = pd.DataFrame([["ax", np.nan, np.nan, np.nan]]).astype(dtype)
tm.assert_frame_equal(result, expected)
tm.assert_frame_equal(result, expected, check_dtype=False)

result = df + arr
expected = pd.DataFrame([["xa", np.nan, np.nan, np.nan]]).astype(dtype)
tm.assert_frame_equal(result, expected)
tm.assert_frame_equal(result, expected, check_dtype=False)


def test_comparison_methods_scalar(comparison_op, dtype):
Expand Down
77 changes: 58 additions & 19 deletions pandas/tests/frame/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -626,12 +626,6 @@ def test_arith_flex_frame_corner(self, float_frame):
expected = float_frame.sort_index() * np.nan
tm.assert_frame_equal(result, expected)

with pytest.raises(NotImplementedError, match="fill_value"):
float_frame.add(float_frame.iloc[0], fill_value=3)

with pytest.raises(NotImplementedError, match="fill_value"):
float_frame.add(float_frame.iloc[0], axis="index", fill_value=3)

@pytest.mark.parametrize("op", ["add", "sub", "mul", "mod"])
def test_arith_flex_series_ops(self, simple_frame, op):
# after arithmetic refactor, add truediv here
Expand Down Expand Up @@ -665,19 +659,6 @@ def test_arith_flex_series_broadcasting(self, any_real_numpy_dtype):
result = df.div(df[0], axis="index")
tm.assert_frame_equal(result, expected)

def test_arith_flex_zero_len_raises(self):
# GH 19522 passing fill_value to frame flex arith methods should
# raise even in the zero-length special cases
ser_len0 = Series([], dtype=object)
df_len0 = DataFrame(columns=["A", "B"])
df = DataFrame([[1, 2], [3, 4]], columns=["A", "B"])

with pytest.raises(NotImplementedError, match="fill_value"):
df.add(ser_len0, fill_value="E")

with pytest.raises(NotImplementedError, match="fill_value"):
df_len0.sub(df["A"], axis=None, fill_value=3)

def test_flex_add_scalar_fill_value(self):
# GH#12723
dat = np.array([0, 1, np.nan, 3, 4, 5], dtype="float")
Expand Down Expand Up @@ -2192,3 +2173,61 @@ def test_mixed_col_index_dtype(string_dtype_no_object):
expected.columns = expected.columns.astype(string_dtype_no_object)

tm.assert_frame_equal(result, expected)


dt_params = [
(tm.ALL_INT_NUMPY_DTYPES[0], 5),
(tm.ALL_INT_EA_DTYPES[0], 5),
(tm.FLOAT_NUMPY_DTYPES[0], 4.9),
(tm.FLOAT_EA_DTYPES[0], 4.9),
]

axes = [0, 1]


@pytest.mark.parametrize(
"data_type,fill_val, axis",
[(dt, val, axis) for axis in axes for dt, val in dt_params],
)
def test_df_fill_value_dtype(data_type, fill_val, axis):
# GH 61581
base_data = np.arange(25).reshape(5, 5)
mult_list = [1, np.nan, 5, np.nan, 3]
np_int_flag = 0

try:
mult_data = pd.array(mult_list, dtype=data_type)
except ValueError as e:
# Numpy int type cannot represent NaN, it will end up here
if "cannot convert float NaN to integer" in str(e):
mult_data = np.asarray(mult_list)
np_int_flag = 1

columns = list("ABCDE")
df = DataFrame(base_data, columns=columns)

for i in range(df.shape[0]):
try:
df.iat[i, i] = np.nan
df.iat[i + 1, i] = pd.NA
df.iat[i + 3, i] = pd.NA
except IndexError:
pass

mult_mat = np.broadcast_to(mult_data, df.shape)
if axis == 0:
mask = np.isnan(mult_mat).T
else:
mask = np.isnan(mult_mat)
mask = df.isna().values & mask

df_result = df.mul(mult_data, axis=axis, fill_value=fill_val)
if np_int_flag == 1:
mult_np = np.nan_to_num(mult_data, nan=fill_val)
df_expected = (df.fillna(fill_val).mul(mult_np, axis=axis)).mask(mask, np.nan)
else:
df_expected = (
df.fillna(fill_val).mul(mult_data.fillna(fill_val), axis=axis)
).mask(mask, np.nan)

tm.assert_frame_equal(df_result, df_expected)
Loading