Skip to content

fix: BaberghDistrictCouncil - handle "Today - " / "Tomorrow - " date prefix#2080

Open
pacso wants to merge 2 commits into
robbrad:masterfrom
pacso:fix/babergh-today-prefix-date-parsing
Open

fix: BaberghDistrictCouncil - handle "Today - " / "Tomorrow - " date prefix#2080
pacso wants to merge 2 commits into
robbrad:masterfrom
pacso:fix/babergh-today-prefix-date-parsing

Conversation

@pacso
Copy link
Copy Markdown

@pacso pacso commented May 15, 2026

Summary

Babergh's collection-day site now prefixes the Next Collection value with Today - or Tomorrow - , e.g. Today - Thu 14 May 2026. The parser previously called strptime(single_date, "%a %d %b %Y") which raises ValueError on the prefixed form, crashing every scrape since the format change on 14 May 2026.

Switched to the same regex-based date extraction MidSuffolkDistrictCouncil already uses — Babergh and Mid Suffolk are paired districts on the same web platform, so the formats track in lockstep. The regex pulls every well-formed Day DD Mon YYYY token out of each <p>, naturally ignoring the Today - / Tomorrow - / Frequency: prefixes and still handling the existing multi-date Following Collections: case.

Error this fixes

ValueError: time data 'Today - Thu 14 May 2026' does not match format '%a %d %b %Y'

Summary by CodeRabbit

  • Bug Fixes
    • Improved collection date extraction for Babergh District Council to ensure more reliable and accurate bin collection schedules.

Review Change Stack

…prefix

The council's website now renders "Next Collection:" values as
"Today - Thu 14 May 2026" (or "Tomorrow - ..."), which broke the
strptime call expecting bare "%a %d %b %Y" strings. Switch to the same
regex-based date extraction MidSuffolkDistrictCouncil already uses —
these are paired councils on the same web platform, so the formats
move in lockstep.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

📝 Walkthrough

Walkthrough

Refactored date parsing in the Babergh District Council module from string manipulation to regex-based extraction. Added an import re statement, defined a compiled regex pattern (_DATE_RE) to match the council's collection date format, and updated the parse_data method to extract dates from paragraph text using the regex instead of splitting on colons and commas.

Changes

Date Extraction Refactor

Layer / File(s) Summary
Regex-based date extraction implementation
uk_bin_collection/uk_bin_collection/councils/BaberghDistrictCouncil.py
Import re, define _DATE_RE pattern to match dates like Thu 14 May 2026, and refactor parse_data to skip frequency paragraphs and apply _DATE_RE.findall() to extract collection dates, parsing each via datetime.strptime.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related issues

Suggested reviewers

  • dp247

Poem

🐰 A regex feast for dates so fine,
No more splitting on colons divine,
Thu 14 May is caught in the net,
Babergh's bin days won't be forgot yet! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main fix: handling date prefixes ('Today - ' / 'Tomorrow - ') in Babergh's collection date parsing, which aligns directly with the code changes and PR objectives.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/BaberghDistrictCouncil.py`:
- Around line 127-148: When parsing each p_tag text, add an explicit guard that
fails fast if a collection-labelled line contains no parseable dates: after
computing matches = _DATE_RE.findall(text) and before iterating, if the text
begins with "Next Collection" or "Following Collections" (or otherwise appears
to describe collections) and matches is empty, raise a clear exception (e.g.,
ValueError) that includes the offending text and the collection_type; otherwise
proceed to parse each date_str into collection_date and append to data["bins"]
using date_format. This ensures _DATE_RE, collection_type, date_format and the
data["bins"] append logic are unchanged but will surface format changes
immediately.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e39a5ff4-3553-4f99-a0c5-a3ebec565d57

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and 1c34dab.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/BaberghDistrictCouncil.py

Add a guard so a `<p>` starting with `Next Collection` or
`Following Collections` that yields zero regex matches raises
ValueError instead of silently emitting nothing. The previous
strptime-based code failed loudly on format changes (that's how
this PR's bug was discovered); the regex approach would otherwise
mask the next format change as "no upcoming collection". Addresses
CodeRabbit review feedback on robbrad#2080.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant