fix: BaberghDistrictCouncil - handle "Today - " / "Tomorrow - " date prefix by pacso · Pull Request #2080 · robbrad/UKBinCollectionData

pacso · 2026-05-15T08:55:58Z

Summary

Babergh's collection-day site now prefixes the Next Collection value with Today - or Tomorrow - , e.g. Today - Thu 14 May 2026. The parser previously called strptime(single_date, "%a %d %b %Y") which raises ValueError on the prefixed form, crashing every scrape since the format change on 14 May 2026.

Switched to the same regex-based date extraction MidSuffolkDistrictCouncil already uses — Babergh and Mid Suffolk are paired districts on the same web platform, so the formats track in lockstep. The regex pulls every well-formed Day DD Mon YYYY token out of each <p>, naturally ignoring the Today - / Tomorrow - / Frequency: prefixes and still handling the existing multi-date Following Collections: case.

Error this fixes

ValueError: time data 'Today - Thu 14 May 2026' does not match format '%a %d %b %Y'

Summary by CodeRabbit

Bug Fixes
- Improved collection date extraction for Babergh District Council to ensure more reliable and accurate bin collection schedules.

…prefix The council's website now renders "Next Collection:" values as "Today - Thu 14 May 2026" (or "Tomorrow - ..."), which broke the strptime call expecting bare "%a %d %b %Y" strings. Switch to the same regex-based date extraction MidSuffolkDistrictCouncil already uses — these are paired councils on the same web platform, so the formats move in lockstep. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai · 2026-05-15T08:56:12Z

📝 Walkthrough

Walkthrough

Refactored date parsing in the Babergh District Council module from string manipulation to regex-based extraction. Added an import re statement, defined a compiled regex pattern (_DATE_RE) to match the council's collection date format, and updated the parse_data method to extract dates from paragraph text using the regex instead of splitting on colons and commas.

Changes

Date Extraction Refactor

Layer / File(s)	Summary
Regex-based date extraction implementation `uk_bin_collection/uk_bin_collection/councils/BaberghDistrictCouncil.py`	Import `re`, define `_DATE_RE` pattern to match dates like `Thu 14 May 2026`, and refactor `parse_data` to skip frequency paragraphs and apply `_DATE_RE.findall()` to extract collection dates, parsing each via `datetime.strptime`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related issues

Mid Suffolk District Council - Date Parsing Error #1904: Addresses the same root cause—robust date extraction for council collection parsing using regex before datetime.strptime.

Suggested reviewers

dp247

Poem

🐰 A regex feast for dates so fine,
No more splitting on colons divine,
Thu 14 May is caught in the net,
Babergh's bin days won't be forgot yet! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main fix: handling date prefixes ('Today - ' / 'Tomorrow - ') in Babergh's collection date parsing, which aligns directly with the code changes and PR objectives.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/BaberghDistrictCouncil.py`:
- Around line 127-148: When parsing each p_tag text, add an explicit guard that
fails fast if a collection-labelled line contains no parseable dates: after
computing matches = _DATE_RE.findall(text) and before iterating, if the text
begins with "Next Collection" or "Following Collections" (or otherwise appears
to describe collections) and matches is empty, raise a clear exception (e.g.,
ValueError) that includes the offending text and the collection_type; otherwise
proceed to parse each date_str into collection_date and append to data["bins"]
using date_format. This ensures _DATE_RE, collection_type, date_format and the
data["bins"] append logic are unchanged but will surface format changes
immediately.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e39a5ff4-3553-4f99-a0c5-a3ebec565d57

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and 1c34dab.

📒 Files selected for processing (1)

uk_bin_collection/uk_bin_collection/councils/BaberghDistrictCouncil.py

Add a guard so a `<p>` starting with `Next Collection` or `Following Collections` that yields zero regex matches raises ValueError instead of silently emitting nothing. The previous strptime-based code failed loudly on format changes (that's how this PR's bug was discovered); the regex approach would otherwise mask the next format change as "no upcoming collection". Addresses CodeRabbit review feedback on robbrad#2080. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

Comment thread uk_bin_collection/uk_bin_collection/councils/BaberghDistrictCouncil.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: BaberghDistrictCouncil - handle "Today - " / "Tomorrow - " date prefix#2080

fix: BaberghDistrictCouncil - handle "Today - " / "Tomorrow - " date prefix#2080
pacso wants to merge 2 commits into
robbrad:masterfrom
pacso:fix/babergh-today-prefix-date-parsing

pacso commented May 15, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 15, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pacso commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Error this fixes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pacso commented May 15, 2026 •

edited

Loading

coderabbitai Bot commented May 15, 2026 •

edited

Loading