Skip to content

Conversation

@Ben-Cutler
Copy link

This PR fixes some behavior with the datetime guesser code. Previously, if it was supplied with an array such as:

["01/02/2025", "30/07/2025"]

Then the parser would call tslib.first_non_null and assume that the format string for the entire array was month/day/year while the series is actually day/month/year.

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@Ben-Cutler Ben-Cutler changed the title [BUG] Make the datetime guesser cover edge cases [BUG] Make the datetime guesser cover edge case where the first date's formatting isn't the whole list Nov 3, 2025
@Ben-Cutler Ben-Cutler changed the title [BUG] Make the datetime guesser cover edge case where the first date's formatting isn't the whole list [BUG] Make the datetime guesser cover edge case where the first date's formatting isn't the whole list's formatting Nov 3, 2025
Comment on lines -1075 to -1097
cdef void _maybe_warn_about_dayfirst(format: str, bint dayfirst) noexcept:
"""Warn if guessed datetime format doesn't respect dayfirst argument."""
cdef:
int day_index = format.find("%d")
int month_index = format.find("%m")

if (day_index != -1) and (month_index != -1):
if (day_index > month_index) and dayfirst:
warnings.warn(
f"Parsing dates in {format} format when dayfirst=True was specified. "
"Pass `dayfirst=False` or specify a format to silence this warning.",
UserWarning,
stacklevel=find_stack_level(),
)
if (day_index < month_index) and not dayfirst:
warnings.warn(
f"Parsing dates in {format} format when dayfirst=False (the default) "
"was specified. "
"Pass `dayfirst=True` or specify a format to silence this warning.",
UserWarning,
stacklevel=find_stack_level(),
)

Copy link
Author

@Ben-Cutler Ben-Cutler Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure what to do with this code. Are these kinds of warnings helpful? They were causing my tests to fail on pytest teardown, and was being logged on tests which had different formatting for the input string (So if I specified dayfirst = True on a month first input it'd raise the warning, and if I specified dayfirst=False on a day first test case it'd fail.

Happy to revert this or make some other change

) -> str | None:
# Try to guess the format based on the first non-NaN element, return None if can't
if (first_non_null := tslib.first_non_null(arr)) != -1:
if type(first_non_nan_element := arr[first_non_null]) is str:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to replace if type(x) == str with isinstance(x, str) so that we could consume np.str types

@jbrockmendel
Copy link
Member

cc @MarcoGorelli i think this was your idea

@Ben-Cutler
Copy link
Author

Let me know if you like the changes (from a high level) and I can fix the rest of the test cases and tag you when things are ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants