Skip to content

Commit c44e2af

Browse files
committed
Merge remote-tracking branch 'upstream/main' into series-round-object-dtype
2 parents a919646 + 188b2da commit c44e2af

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

69 files changed

+1150
-1017
lines changed

README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,9 @@
1919
**pandas** is a Python package that provides fast, flexible, and expressive data
2020
structures designed to make working with "relational" or "labeled" data both
2121
easy and intuitive. It aims to be the fundamental high-level building block for
22-
doing practical, **real world** data analysis in Python. Additionally, it has
23-
the broader goal of becoming **the most powerful and flexible open source data
24-
analysis / manipulation tool available in any language**. It is already well on
22+
doing practical, **real-world** data analysis in Python. Additionally, it has
23+
the broader goal of becoming **the most powerful and flexible open-source data
24+
analysis/manipulation tool available in any language**. It is already well on
2525
its way towards this goal.
2626

2727
## Table of Contents
@@ -64,7 +64,7 @@ Here are just a few of the things that pandas does well:
6464
data sets
6565
- [**Hierarchical**][mi] labeling of axes (possible to have multiple
6666
labels per tick)
67-
- Robust IO tools for loading data from [**flat files**][flat-files]
67+
- Robust I/O tools for loading data from [**flat files**][flat-files]
6868
(CSV and delimited), [**Excel files**][excel], [**databases**][db],
6969
and saving/loading data from the ultrafast [**HDF5 format**][hdfstore]
7070
- [**Time series**][timeseries]-specific functionality: date range
@@ -138,7 +138,7 @@ or for installing in [development mode](https://pip.pypa.io/en/latest/cli/pip_in
138138

139139

140140
```sh
141-
python -m pip install -ve . --no-build-isolation -Ceditable-verbose=true
141+
python -m pip install -ve . --no-build-isolation --config-settings editable-verbose=true
142142
```
143143

144144
See the full instructions for [installing from source](https://pandas.pydata.org/docs/dev/development/contributing_environment.html).
@@ -155,7 +155,7 @@ has been under active development since then.
155155

156156
## Getting Help
157157

158-
For usage questions, the best place to go to is [StackOverflow](https://stackoverflow.com/questions/tagged/pandas).
158+
For usage questions, the best place to go to is [Stack Overflow](https://stackoverflow.com/questions/tagged/pandas).
159159
Further, general questions and discussions can also take place on the [pydata mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata).
160160

161161
## Discussion and Development

doc/redirects.csv

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -643,7 +643,6 @@ generated/pandas.Index.get_slice_bound,../reference/api/pandas.Index.get_slice_b
643643
generated/pandas.Index.groupby,../reference/api/pandas.Index.groupby
644644
generated/pandas.Index.has_duplicates,../reference/api/pandas.Index.has_duplicates
645645
generated/pandas.Index.hasnans,../reference/api/pandas.Index.hasnans
646-
generated/pandas.Index.holds_integer,../reference/api/pandas.Index.holds_integer
647646
generated/pandas.Index,../reference/api/pandas.Index
648647
generated/pandas.Index.identical,../reference/api/pandas.Index.identical
649648
generated/pandas.Index.inferred_type,../reference/api/pandas.Index.inferred_type

doc/source/index.rst.template

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ programming language.
113113
:titlesonly:
114114

115115
{{ single_doc[:-4] }}
116-
{% elif single_doc and single_doc.count('.') <= 1 %}
116+
{% elif single_doc and ((single_doc.count('.') <= 1) or ('tseries' in single_doc)) -%}
117117
.. autosummary::
118118
:toctree: reference/api/
119119

doc/source/reference/general_functions.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@ Top-level evaluation
7171
.. autosummary::
7272
:toctree: api/
7373

74+
col
7475
eval
7576

7677
Datetime formats

doc/source/user_guide/dsintro.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -553,6 +553,12 @@ a function of one argument to be evaluated on the DataFrame being assigned to.
553553
554554
iris.assign(sepal_ratio=lambda x: (x["SepalWidth"] / x["SepalLength"])).head()
555555
556+
or, using :meth:`pandas.col`:
557+
558+
.. ipython:: python
559+
560+
iris.assign(sepal_ratio=pd.col("SepalWidth") / pd.col("SepalLength")).head()
561+
556562
:meth:`~pandas.DataFrame.assign` **always** returns a copy of the data, leaving the original
557563
DataFrame untouched.
558564

doc/source/user_guide/migration-3-strings.rst

Lines changed: 45 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,14 @@ let pandas do the inference. But if you want to be specific, you can specify the
188188
This is actually compatible with pandas 2.x as well, since in pandas < 3,
189189
``dtype="str"`` was essentially treated as an alias for object dtype.
190190

191+
.. attention::
192+
193+
While using ``dtype="str"`` in constructors is compatible with pandas 2.x,
194+
specifying it as the dtype in :meth:`~Series.astype` runs into the issue
195+
of also stringifying missing values in pandas 2.x. See the section
196+
:ref:`string_migration_guide-astype_str` for more details.
197+
198+
191199
The missing value sentinel is now always NaN
192200
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
193201

@@ -310,52 +318,69 @@ case.
310318
Notable bug fixes
311319
~~~~~~~~~~~~~~~~~
312320

321+
.. _string_migration_guide-astype_str:
322+
313323
``astype(str)`` preserving missing values
314324
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
315325

316-
This is a long standing "bug" or misfeature, as discussed in https://github.com/pandas-dev/pandas/issues/25353.
326+
The stringifying of missing values is a long standing "bug" or misfeature, as
327+
discussed in https://github.com/pandas-dev/pandas/issues/25353, but fixing it
328+
introduces a significant behaviour change.
317329

318-
With pandas < 3, when using ``astype(str)`` (using the built-in :func:`str`, not
319-
``astype("str")``!), the operation would convert every element to a string,
320-
including the missing values:
330+
With pandas < 3, when using ``astype(str)`` or ``astype("str")``, the operation
331+
would convert every element to a string, including the missing values:
321332

322333
.. code-block:: python
323334
324335
# OLD behavior in pandas < 3
325-
>>> ser = pd.Series(["a", np.nan], dtype=object)
336+
>>> ser = pd.Series([1.5, np.nan])
326337
>>> ser
327-
0 a
338+
0 1.5
328339
1 NaN
329-
dtype: object
330-
>>> ser.astype(str)
331-
0 a
340+
dtype: float64
341+
>>> ser.astype("str")
342+
0 1.5
332343
1 nan
333344
dtype: object
334-
>>> ser.astype(str).to_numpy()
335-
array(['a', 'nan'], dtype=object)
345+
>>> ser.astype("str").to_numpy()
346+
array(['1.5', 'nan'], dtype=object)
336347
337348
Note how ``NaN`` (``np.nan``) was converted to the string ``"nan"``. This was
338349
not the intended behavior, and it was inconsistent with how other dtypes handled
339350
missing values.
340351

341-
With pandas 3, this behavior has been fixed, and now ``astype(str)`` is an alias
342-
for ``astype("str")``, i.e. casting to the new string dtype, which will preserve
343-
the missing values:
352+
With pandas 3, this behavior has been fixed, and now ``astype("str")`` will cast
353+
to the new string dtype, which preserves the missing values:
344354

345355
.. code-block:: python
346356
347357
# NEW behavior in pandas 3
348358
>>> pd.options.future.infer_string = True
349-
>>> ser = pd.Series(["a", np.nan], dtype=object)
350-
>>> ser.astype(str)
351-
0 a
359+
>>> ser = pd.Series([1.5, np.nan])
360+
>>> ser.astype("str")
361+
0 1.5
352362
1 NaN
353363
dtype: str
354-
>>> ser.astype(str).values
355-
array(['a', nan], dtype=object)
364+
>>> ser.astype("str").to_numpy()
365+
array(['1.5', nan], dtype=object)
356366
357367
If you want to preserve the old behaviour of converting every object to a
358-
string, you can use ``ser.map(str)`` instead.
368+
string, you can use ``ser.map(str)`` instead. If you want do such conversion
369+
while preserving the missing values in a way that works with both pandas 2.x and
370+
3.x, you can use ``ser.map(str, na_action="ignore")`` (for pandas 3.x only, you
371+
can do ``ser.astype("str")``).
372+
373+
If you want to convert to object or string dtype for pandas 2.x and 3.x,
374+
respectively, without needing to stringify each individual element, you will
375+
have to use a conditional check on the pandas version.
376+
For example, to convert a categorical Series with string categories to its
377+
dense non-categorical version with object or string dtype:
378+
379+
.. code-block:: python
380+
381+
>>> import pandas as pd
382+
>>> ser = pd.Series(["a", np.nan], dtype="category")
383+
>>> ser.astype(object if pd.__version__ < "3" else "str")
359384
360385
361386
``prod()`` raising for string data

doc/source/whatsnew/v2.3.1.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,4 +73,4 @@ Bug fixes
7373
Contributors
7474
~~~~~~~~~~~~
7575

76-
.. contributors:: v2.3.0..v2.3.1|HEAD
76+
.. contributors:: v2.3.0..v2.3.1

doc/source/whatsnew/v2.3.2.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
.. _whatsnew_232:
22

3-
What's new in 2.3.2 (August XX, 2025)
3+
What's new in 2.3.2 (August 21, 2025)
44
-------------------------------------
55

66
These are the changes in pandas 2.3.2. See :ref:`release` for a full changelog
@@ -28,9 +28,13 @@ Bug fixes
2828
- Boolean operations (``|``, ``&``, ``^``) with bool-dtype objects on the left and :class:`StringDtype` objects on the right now cast the string to bool, with a deprecation warning (:issue:`60234`)
2929
- Fixed ``~Series.str.match``, ``~Series.str.fullmatch`` and ``~Series.str.contains``
3030
with compiled regex for the Arrow-backed string dtype (:issue:`61964`, :issue:`61942`)
31+
- Bug in :meth:`Series.replace` and :meth:`DataFrame.replace` inconsistently
32+
replacing matching values when missing values are present for string dtypes (:issue:`56599`)
3133

3234
.. ---------------------------------------------------------------------------
3335
.. _whatsnew_232.contributors:
3436

3537
Contributors
3638
~~~~~~~~~~~~
39+
40+
.. contributors:: v2.3.1..v2.3.2|HEAD

doc/source/whatsnew/v3.0.0.rst

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -117,10 +117,28 @@ process in more detail.
117117

118118
`PDEP-7: Consistent copy/view semantics in pandas with Copy-on-Write <https://pandas.pydata.org/pdeps/0007-copy-on-write.html>`__
119119

120-
.. _whatsnew_300.enhancements.enhancement2:
120+
.. _whatsnew_300.enhancements.col:
121121

122-
Enhancement2
123-
^^^^^^^^^^^^
122+
``pd.col`` syntax can now be used in :meth:`DataFrame.assign` and :meth:`DataFrame.loc`
123+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
124+
125+
You can now use ``pd.col`` to create callables for use in dataframe methods which accept them. For example, if you have a dataframe
126+
127+
.. ipython:: python
128+
129+
df = pd.DataFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})
130+
131+
and you want to create a new column ``'c'`` by summing ``'a'`` and ``'b'``, then instead of
132+
133+
.. ipython:: python
134+
135+
df.assign(c = lambda df: df['a'] + df['b'])
136+
137+
you can now write:
138+
139+
.. ipython:: python
140+
141+
df.assign(c = pd.col('a') + pd.col('b'))
124142
125143
New Deprecation Policy
126144
^^^^^^^^^^^^^^^^^^^^^^
@@ -1094,7 +1112,6 @@ Other
10941112
- Bug in :meth:`Series.isin` raising ``TypeError`` when series is large (>10**6) and ``values`` contains NA (:issue:`60678`)
10951113
- Bug in :meth:`Series.mode` where an exception was raised when taking the mode with nullable types with no null values in the series. (:issue:`58926`)
10961114
- Bug in :meth:`Series.rank` that doesn't preserve missing values for nullable integers when ``na_option='keep'``. (:issue:`56976`)
1097-
- Bug in :meth:`Series.replace` and :meth:`DataFrame.replace` inconsistently replacing matching instances when ``regex=True`` and missing values are present. (:issue:`56599`)
10981115
- Bug in :meth:`Series.replace` and :meth:`DataFrame.replace` throwing ``ValueError`` when ``regex=True`` and all NA values. (:issue:`60688`)
10991116
- Bug in :meth:`Series.to_string` when series contains complex floats with exponents (:issue:`60405`)
11001117
- Bug in :meth:`read_csv` where chained fsspec TAR file and ``compression="infer"`` fails with ``tarfile.ReadError`` (:issue:`60028`)

pandas/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@
105105
Series,
106106
DataFrame,
107107
)
108+
from pandas.core.col import col
108109

109110
from pandas.core.dtypes.dtypes import SparseDtype
110111

@@ -281,6 +282,7 @@
281282
"array",
282283
"arrays",
283284
"bdate_range",
285+
"col",
284286
"concat",
285287
"crosstab",
286288
"cut",

0 commit comments

Comments
 (0)