fix: BaberghDistrictCouncil - handle "Today - " / "Tomorrow - " date prefix#2080
fix: BaberghDistrictCouncil - handle "Today - " / "Tomorrow - " date prefix#2080pacso wants to merge 2 commits into
Conversation
…prefix The council's website now renders "Next Collection:" values as "Today - Thu 14 May 2026" (or "Tomorrow - ..."), which broke the strptime call expecting bare "%a %d %b %Y" strings. Switch to the same regex-based date extraction MidSuffolkDistrictCouncil already uses — these are paired councils on the same web platform, so the formats move in lockstep. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
📝 WalkthroughWalkthroughRefactored date parsing in the Babergh District Council module from string manipulation to regex-based extraction. Added an ChangesDate Extraction Refactor
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related issues
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/BaberghDistrictCouncil.py`:
- Around line 127-148: When parsing each p_tag text, add an explicit guard that
fails fast if a collection-labelled line contains no parseable dates: after
computing matches = _DATE_RE.findall(text) and before iterating, if the text
begins with "Next Collection" or "Following Collections" (or otherwise appears
to describe collections) and matches is empty, raise a clear exception (e.g.,
ValueError) that includes the offending text and the collection_type; otherwise
proceed to parse each date_str into collection_date and append to data["bins"]
using date_format. This ensures _DATE_RE, collection_type, date_format and the
data["bins"] append logic are unchanged but will surface format changes
immediately.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e39a5ff4-3553-4f99-a0c5-a3ebec565d57
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/BaberghDistrictCouncil.py
Add a guard so a `<p>` starting with `Next Collection` or `Following Collections` that yields zero regex matches raises ValueError instead of silently emitting nothing. The previous strptime-based code failed loudly on format changes (that's how this PR's bug was discovered); the regex approach would otherwise mask the next format change as "no upcoming collection". Addresses CodeRabbit review feedback on robbrad#2080. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Babergh's collection-day site now prefixes the Next Collection value with
Today -orTomorrow -, e.g.Today - Thu 14 May 2026. The parser previously calledstrptime(single_date, "%a %d %b %Y")which raisesValueErroron the prefixed form, crashing every scrape since the format change on 14 May 2026.Switched to the same regex-based date extraction
MidSuffolkDistrictCouncilalready uses — Babergh and Mid Suffolk are paired districts on the same web platform, so the formats track in lockstep. The regex pulls every well-formedDay DD Mon YYYYtoken out of each<p>, naturally ignoring theToday -/Tomorrow -/Frequency:prefixes and still handling the existing multi-dateFollowing Collections:case.Error this fixes
Summary by CodeRabbit