Skip to content

Commit 9b3c6ac

Browse files
committed
Fix describe() for ExtensionArrays with multiple internal dtypes
1 parent 4d67bb2 commit 9b3c6ac

File tree

3 files changed

+40
-0
lines changed

3 files changed

+40
-0
lines changed

doc/source/whatsnew/v3.0.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -909,6 +909,7 @@ Other
909909
- Bug in :meth:`Index.sort_values` when passing a key function that turns values into tuples, e.g. ``key=natsort.natsort_key``, would raise ``TypeError`` (:issue:`56081`)
910910
- Bug in :meth:`MultiIndex.fillna` error message was referring to ``isna`` instead of ``fillna`` (:issue:`60974`)
911911
- Bug in :meth:`Series.describe` where median percentile was always included when the ``percentiles`` argument was passed (:issue:`60550`).
912+
- Bug in :meth:`Series.describe` where statistics with multiple dtypes for ExtensionArrays were coerced to ``float64`` which raised a ``DimensionalityError``` (:issue:`61707`)
912913
- Bug in :meth:`Series.diff` allowing non-integer values for the ``periods`` argument. (:issue:`56607`)
913914
- Bug in :meth:`Series.dt` methods in :class:`ArrowDtype` that were returning incorrect values. (:issue:`57355`)
914915
- Bug in :meth:`Series.isin` raising ``TypeError`` when series is large (>10**6) and ``values`` contains NA (:issue:`60678`)

pandas/core/methods/describe.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
)
1313
from typing import (
1414
TYPE_CHECKING,
15+
Any,
1516
cast,
1617
)
1718

@@ -215,6 +216,14 @@ def reorder_columns(ldesc: Sequence[Series]) -> list[Hashable]:
215216
return names
216217

217218

219+
def has_multiple_internal_dtypes(d: list[Any]) -> bool:
220+
"""Check if the sequence has multiple internal dtypes."""
221+
if not d:
222+
return False
223+
224+
return any(type(item) != type(d[0]) for item in d)
225+
226+
218227
def describe_numeric_1d(series: Series, percentiles: Sequence[float]) -> Series:
219228
"""Describe series containing numerical data.
220229
@@ -251,6 +260,10 @@ def describe_numeric_1d(series: Series, percentiles: Sequence[float]) -> Series:
251260
import pyarrow as pa
252261

253262
dtype = ArrowDtype(pa.float64())
263+
elif has_multiple_internal_dtypes(d):
264+
# GH61707: describe() doesn't work on EAs
265+
# with multiple internal dtypes, so return object dtype
266+
dtype = None
254267
else:
255268
dtype = Float64Dtype()
256269
elif series.dtype.kind in "iufb":

pandas/tests/series/methods/test_describe.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,32 @@ def test_describe_empty_object(self):
9595
assert np.isnan(result.iloc[2])
9696
assert np.isnan(result.iloc[3])
9797

98+
def test_describe_multiple_dtypes(self):
99+
"""
100+
GH61707: describe() doesn't work on EAs which generate
101+
statistics with multiple dtypes.
102+
"""
103+
from decimal import Decimal
104+
105+
from pandas.tests.extension.decimal import to_decimal
106+
107+
s = Series(to_decimal([1, 2.5, 3]), dtype="decimal")
108+
109+
expected = Series(
110+
[
111+
3,
112+
Decimal("2.166666666666666666666666667"),
113+
Decimal("0.8498365855987974716713706849"),
114+
Decimal("1"),
115+
Decimal("3"),
116+
],
117+
index=["count", "mean", "std", "min", "max"],
118+
dtype="object",
119+
)
120+
121+
result = s.describe(percentiles=[])
122+
tm.assert_series_equal(result, expected)
123+
98124
def test_describe_with_tz(self, tz_naive_fixture):
99125
# GH 21332
100126
tz = tz_naive_fixture

0 commit comments

Comments
 (0)