Skip to content

Commit 9edf890

Browse files
committed
61760: merge with main
2 parents 8de38e8 + 9b3c6ac commit 9edf890

File tree

3 files changed

+40
-0
lines changed

3 files changed

+40
-0
lines changed

doc/source/whatsnew/v3.0.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -927,6 +927,7 @@ Other
927927
- Bug in :meth:`Index.sort_values` when passing a key function that turns values into tuples, e.g. ``key=natsort.natsort_key``, would raise ``TypeError`` (:issue:`56081`)
928928
- Bug in :meth:`MultiIndex.fillna` error message was referring to ``isna`` instead of ``fillna`` (:issue:`60974`)
929929
- Bug in :meth:`Series.describe` where median percentile was always included when the ``percentiles`` argument was passed (:issue:`60550`).
930+
- Bug in :meth:`Series.describe` where statistics with multiple dtypes for ExtensionArrays were coerced to ``float64`` which raised a ``DimensionalityError``` (:issue:`61707`)
930931
- Bug in :meth:`Series.diff` allowing non-integer values for the ``periods`` argument. (:issue:`56607`)
931932
- Bug in :meth:`Series.dt` methods in :class:`ArrowDtype` that were returning incorrect values. (:issue:`57355`)
932933
- Bug in :meth:`Series.isin` raising ``TypeError`` when series is large (>10**6) and ``values`` contains NA (:issue:`60678`)

pandas/core/methods/describe.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
)
1313
from typing import (
1414
TYPE_CHECKING,
15+
Any,
1516
cast,
1617
)
1718

@@ -215,6 +216,14 @@ def reorder_columns(ldesc: Sequence[Series]) -> list[Hashable]:
215216
return names
216217

217218

219+
def has_multiple_internal_dtypes(d: list[Any]) -> bool:
220+
"""Check if the sequence has multiple internal dtypes."""
221+
if not d:
222+
return False
223+
224+
return any(type(item) != type(d[0]) for item in d)
225+
226+
218227
def describe_numeric_1d(series: Series, percentiles: Sequence[float]) -> Series:
219228
"""Describe series containing numerical data.
220229
@@ -251,6 +260,10 @@ def describe_numeric_1d(series: Series, percentiles: Sequence[float]) -> Series:
251260
import pyarrow as pa
252261

253262
dtype = ArrowDtype(pa.float64())
263+
elif has_multiple_internal_dtypes(d):
264+
# GH61707: describe() doesn't work on EAs
265+
# with multiple internal dtypes, so return object dtype
266+
dtype = None
254267
else:
255268
dtype = Float64Dtype()
256269
elif series.dtype.kind in "iufb":

pandas/tests/series/methods/test_describe.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,32 @@ def test_describe_empty_object(self):
9393
assert np.isnan(result.iloc[2])
9494
assert np.isnan(result.iloc[3])
9595

96+
def test_describe_multiple_dtypes(self):
97+
"""
98+
GH61707: describe() doesn't work on EAs which generate
99+
statistics with multiple dtypes.
100+
"""
101+
from decimal import Decimal
102+
103+
from pandas.tests.extension.decimal import to_decimal
104+
105+
s = Series(to_decimal([1, 2.5, 3]), dtype="decimal")
106+
107+
expected = Series(
108+
[
109+
3,
110+
Decimal("2.166666666666666666666666667"),
111+
Decimal("0.8498365855987974716713706849"),
112+
Decimal("1"),
113+
Decimal("3"),
114+
],
115+
index=["count", "mean", "std", "min", "max"],
116+
dtype="object",
117+
)
118+
119+
result = s.describe(percentiles=[])
120+
tm.assert_series_equal(result, expected)
121+
96122
def test_describe_with_tz(self, tz_naive_fixture):
97123
# GH 21332
98124
tz = tz_naive_fixture

0 commit comments

Comments
 (0)