Skip to content

DEPR: PeriodDtype.freq #61897

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Motivated by #47227.

First (and by a wide margin smallest) of several PRs to deprecate using freq to refer to a Period's resolution. If we go down this route, the other PRs will incrementally do the same for freq in the Period constructor, then freq as a Period attribute, then the same for PeriodArray/Index. We shouldn't merge this until we're agreed to do this across the board.

This uses "unit" as the replacement, as that is also what we use for Timestamp/Timedelta. But while it has roughly the same semantic meaning, Period.unit does return a different type from Timestamp.unit, which might be a reason to use a third term? Also "unit" is overloaded in to_datetime/to_timedelta. So while "unit" is the best idea I've had, I'll understand if people want to bikeshed.

cc @mroeschke @jorisvandenbossche

@jbrockmendel
Copy link
Member Author

Could do "offset" for the BaseOffset object and "unit" for the string?

@mroeschke mroeschke added Period Period data type Deprecate Functionality to remove in pandas labels Jul 21, 2025
@mroeschke
Copy link
Member

Period.unit does return a different type from Timestamp.unit, which might be a reason to use a third term?

I think this is a good reason to use a different term. I think "offset" or "interval" would be good terms

@jbrockmendel
Copy link
Member Author

I think this is a good reason to use a different term. I think "offset" or "interval" would be good terms

I'd like to avoid "interval" since that already has a meaning in pandas. offset im happy with.

@jorisvandenbossche are you on board with this plan?

@jorisvandenbossche
Copy link
Member

Agreed that we should ideally avoid overloading unit.

While I definitely understand the issue for DatetimeIndex vs PeriodIndex freq attribute, and that it would be nice that those two cases would have the same meaning, I am not entirely sure that I feel that this is worth deprecating freq in all the period-related places (and the code churn for user of it). Certainly if we think the alternative we can come up with is not necessarily a better name (I don't know if that is the case though, freq is also not the ideal term)
(if we do this, IMO we could start with adding the alternatives before actually deprecating)

An alternative would be to add a new method to DTI/PI to avoid the conflict?


On alternative names for Period's freq: while "offset" is an obvious choice given that we use this term in our implementation, and therefore I am also used to hearing this term, I am not actually sure this is a very clear term for newcomers? Outside of the context of pandas, I wouldn't directly think about those kind of periods when hearing "offset" (and for "time offset" I would mostly thinking about the +/-HH:SS offset for tz ware timestamps)

Looking at R's lubridate / Java's JodaTime, they have the concept of durations, periods and intervals, where the duration is essentially our timedelta (absolute length of time in seconds), a period is the calendar time length, and interval is span with specific start/stop instant (so like our Interval but specific to timestamps).
So in that sense, you could say that the pandas.Period represents a "period" (day, month, etc) at a certain point in time. But using pd.Period.period is probably not going to be less confusing ..?

Arrow (and some SQL systems) calls the period from JodaTime/lubridate an "interval" , so that could also be an option as mentioned above, but indeed that then conflicts with our pd.Interval ..

Assume we would add a pandas extension dtype for Arrow's interval type of data (i.e. what you can now store as pd.ArrowDtype(pa.month_day_nano_interval())), how would we call that?

@jorisvandenbossche
Copy link
Member

I think I am also coming around to liking unit .. ("the unit of time that each period represents")
While it is annoying that the type is different (str vs object), it does match in meaning. Timestamps just represents an instant point in time and only supports small (fixed-size) units, and periods represents the span for the same units, and then additionally also supports larger (and relative) units.

But as Brock mentioned in the top post, unit is then also already overloaded in to_datetime

Yet another idea: "span", indicating the time span of the Period (although I don't know if people's first connotation for "time span" is the absolute vs the relative version .., while here it would of course represent a relative time span)

(sorry, not being very helpful here in coming closer to a decision ..)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants