Skip to content

feat: add westminster city council scraper#2086

Open
InertiaUK wants to merge 3 commits into
robbrad:masterfrom
InertiaUK:feat/westminster-city-council
Open

feat: add westminster city council scraper#2086
InertiaUK wants to merge 3 commits into
robbrad:masterfrom
InertiaUK:feat/westminster-city-council

Conversation

@InertiaUK
Copy link
Copy Markdown
Contributor

@InertiaUK InertiaUK commented May 22, 2026

Summary

  • New scraper for Westminster City Council (population ~261k, central London)
  • Westminster uses a USRN-based street schedule at transact.westminster.gov.uk, not the standard UPRN system
  • Parses rubbish and recycling HTML tables, converts day-of-week schedules (e.g. "Mon-Fri") to the next 14 days of concrete collection dates
  • Filters out business-only collections
  • Pure HTTP with requests + BeautifulSoup - no Selenium needed

Notes

Test plan

  • Tested with USRN 8401405 (Marylebone High Street) - returns rubbish + recycling dates
  • Also verified with USRNs 8401123 (Baker Street) and 8400910 (Wood's Mews)

Summary by CodeRabbit

  • New Features

    • Added Westminster City Council support: lookup by postcode/USRN and conversion of recurring weekday schedules into upcoming collection dates so residents see concrete future pickup dates.
    • Improved handling of weekday ranges and lists for accurate schedule interpretation.
  • Tests

    • Updated test data entries to reflect minor content adjustments and the new council sample.

Review Change Stack

Westminster uses a USRN-based street schedule at transact.westminster.gov.uk.
Parses rubbish and recycling tables, converts day-of-week schedules to concrete
collection dates. Pure HTTP - no Selenium needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

Warning

Review limit reached

@InertiaUK, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 2 reviews/hour. Refill in 21 minutes and 17 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 14cf10ad-e3c4-462b-817b-97a35fb23f3a

📥 Commits

Reviewing files that changed from the base of the PR and between ce6fff6 and 22af78c.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/WestminsterCityCouncil.py
📝 Walkthrough

Walkthrough

Adds a WestminsterCityCouncil scraper plus test-data entry. It resolves USRNs from postcodes, parses Westminster street report HTML for weekday-based rubbish and recycling schedules, converts weekday sets into upcoming concrete collection dates, and outputs deduplicated, sorted bin collection entries.

Changes

Westminster City Council Support

Layer / File(s) Summary
Test data updates and new Westminster entry
uk_bin_collection/tests/input.json
Edited EnvironmentFirst wiki_note, reinserted NorthHertfordshireDistrictCouncil block, and added a new WestminsterCityCouncil test configuration (postcode, transact.westminster.gov.uk lookup URL, wiki_name/wiki_note, LAD24CD).
Module constants and street normalisation
uk_bin_collection/uk_bin_collection/councils/WestminsterCityCouncil.py (lines 1–67)
Adds imports, DAY_ABBREV_MAP, LOOKAHEAD_DAYS, Westminster endpoint URLs, a road-name normalisation regex, and _normalise_street() helper.
USRN resolution
uk_bin_collection/uk_bin_collection/councils/WestminsterCityCouncil.py (lines 69–190)
Adds _resolve_usrn_from_postcode() implementing postcodes.io lookup, Nominatim reverse-geocoding, Westminster street search dropdown parsing, and exact/fuzzy matching to select a USRN or raise ValueError.
Weekday parsing and date generation
uk_bin_collection/uk_bin_collection/councils/WestminsterCityCouncil.py (lines 192–255)
Adds _parse_day_list() to convert textual weekday ranges/lists to weekday numbers and _next_dates_for_weekdays() to generate upcoming dates within lookahead starting from tomorrow.
Scraper core: parse_data
uk_bin_collection/uk_bin_collection/councils/WestminsterCityCouncil.py (lines 257–383)
Implements CouncilClass.parse_data() to resolve USRN/uprn, fetch and parse street report tables (rubbish table 0, recycling table 1 excluding business-only), merge weekdays by service, expand to dates, deduplicate and format bin entries, sort chronologically, and raise on no collections.

Sequence Diagram

sequenceDiagram
  participant Caller as parse_data caller
  participant PostcodesIO as postcodes.io
  participant Nominatim as Nominatim
  participant WestminsterSearch as Westminster street search
  participant WestminsterReport as transact.westminster.gov.uk
  participant HTMLParser as HTML parser
  participant DateHelpers as Date helpers
  participant Output as Formatted bins

  Caller->>PostcodesIO: lookup(postcode)
  PostcodesIO-->>Caller: lat,lng
  Caller->>Nominatim: reverse_geocode(lat,lng)
  Nominatim-->>Caller: street name
  Caller->>WestminsterSearch: search(street)
  WestminsterSearch-->>Caller: USRN candidates
  Caller->>WestminsterReport: fetch(street_report?usrn)
  WestminsterReport-->>HTMLParser: HTML page
  HTMLParser-->>Caller: rubbish/recycling weekday text
  Caller->>DateHelpers: parse days & generate dates
  DateHelpers-->>Caller: concrete collection dates
  Caller->>Output: dedupe, format, sort -> bins list
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Suggested reviewers

  • dp247

Poem

🐰 Westminster streets I sniff and map,
USRN whispers guide my nap,
Weekday words to dates I turn,
Rubbish and recycling in ordered churn,
A rabbit hops—collections on the map!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and directly describes the main change: adding a new Westminster City Council scraper module.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (22af78c).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2086   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/tests/input.json`:
- Line 882: The "wiki_note" JSON value contains a Unicode replacement character
(�); replace that character with a proper em-dash (U+2014) or a simple hyphen,
e.g. change the string containing "you can use [FindMyAddress]... of your
property�you can use" to use "—" (or "-" if preferred) and save the file with
UTF-8 encoding (or escape the em-dash as "\u2014" if you must keep ASCII-only
escapes).

In `@uk_bin_collection/uk_bin_collection/councils/WestminsterCityCouncil.py`:
- Around line 17-18: The file docstring/comment incorrectly states "next 7 days"
while the constant LOOKAHEAD_DAYS is set to 14; update the comment(s) (including
the header comment and the line around 38) to accurately reflect the actual
behavior (e.g., "outputs the next 14 days of collection dates") or change
LOOKAHEAD_DAYS to 7 if the intended behavior was 7 days; locate references to
LOOKAHEAD_DAYS and the top-of-file comment in WestminsterCityCouncil.py and make
the text consistent with the chosen value.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8cbe53e6-2f70-4bd9-8f41-1209c2295395

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and 9e2c4c8.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/WestminsterCityCouncil.py

Comment thread uk_bin_collection/tests/input.json Outdated
Comment thread uk_bin_collection/uk_bin_collection/councils/WestminsterCityCouncil.py Outdated
Adds 3-step USRN resolution: postcodes.io for lat/lng, Nominatim for
street name, Westminster dropdown for USRN match. No API keys needed.
Falls back to direct USRN if uprn kwarg provided.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
uk_bin_collection/uk_bin_collection/councils/WestminsterCityCouncil.py (1)

222-237: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail on unsupported weekday tokens instead of partially parsing them.

The part[:3] fallback turns any token that merely starts with a weekday into a valid match. For example, "Mon - Fri except bank holidays" becomes just Monday, and other unknown fragments are silently dropped. That will generate wrong collection dates instead of detecting a format change. Validate each non-empty token against the supported formats and raise on anything else. Based on learnings: in uk_bin_collection/**/*.py, prefer explicit failures on unexpected formats over silent defaults or swallowed errors.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/WestminsterCityCouncil.py`
around lines 222 - 237, The parser currently accepts any token that starts with
a weekday because of the fallback using part[:3], which silently swallows
unexpected text; change the logic in WestminsterCityCouncil.py (the loop that
fills days, the inner_range handling and the fallback using DAY_ABBREV_MAP) so
that after splitting and stripping each part you only accept two forms: a
matched range (regex ^(\w{3})\s*[-–]\s*(\w{3})$ already handled) or an exact
weekday token that maps to DAY_ABBREV_MAP (case-insensitive match of the full
token or its 3-letter abbreviation), and for any non-empty token that does not
match either form raise an exception (e.g. ValueError or a ParseError) instead
of silently ignoring it; ensure you reference DAY_ABBREV_MAP and the existing
days list/inner_range variables when implementing the validation.
🧹 Nitpick comments (2)
uk_bin_collection/tests/input.json (2)

2886-2886: 💤 Low value

Consider clarifying USRN terminology for user understanding.

The wiki_note correctly explains the functionality, but users unfamiliar with UK addressing standards might not understand that Westminster uses USRN (Unique Street Reference Number) values instead of UPRN (Unique Property Reference Number) values. Consider explicitly noting this distinction, for example: "Pass postcode for auto-resolution, or pass USRN (street reference, not property UPRN) directly as uprn parameter."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/tests/input.json` at line 2886, Update the "wiki_note" text
to explicitly clarify that Westminster expects USRN (Unique Street Reference
Number) values rather than UPRN (Unique Property Reference Number) so users
understand the distinction; modify the wiki_note string (key "wiki_note" in the
test JSON) to read something like: "Pass postcode for auto USRN resolution, or
pass USRN (Unique Street Reference Number — street-level, not property-level
UPRN) directly as uprn. Street-level (not address-level) collection schedules."
ensuring the message clearly calls out USRN vs UPRN.

2882-2882: 💤 Low value

Consider using a test postcode that maps to a verified USRN.

The postcode "SW1P 3BU" is valid, but the PR test plan documents testing with specific USRNs: 8401405 (Marylebone High Street), 8401123 (Baker Street), and 8400910 (Wood's Mews). Using a postcode that resolves to one of these tested USRNs would better align the test configuration with the verified functionality.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/tests/input.json` at line 2882, Replace the "postcode"
value in the test input (the "postcode" JSON field) with a postcode that maps to
one of the verified USRNs used in the PR test plan (8401405, 8401123, or
8400910) so the test aligns with documented verification; update the "postcode"
string to a known postcode that resolves to one of those USRNs (e.g., use the
postcode that maps to USRN 8401123 or 8401405) and ensure the JSON remains
valid.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/WestminsterCityCouncil.py`:
- Around line 136-147: The code collapses distinct streets when building options
because options currently maps a normalized key to a single (name, USRN); change
options to map each _normalise_street(txt) to a list of (txt, val) by appending
from the loop over select.find_all("option"). When checking the exact-match path
using road_norm and options[road_norm], if the list has length 1 return that
USRN, but if it contains multiple entries do not blindly return the last
one—instead try to disambiguate by finding an entry whose original txt equals
the input road (case-insensitive) and return that USRN if found; otherwise
preserve the ambiguity by not returning and allowing the later fuzzy/alternate
matching logic to run.

---

Outside diff comments:
In `@uk_bin_collection/uk_bin_collection/councils/WestminsterCityCouncil.py`:
- Around line 222-237: The parser currently accepts any token that starts with a
weekday because of the fallback using part[:3], which silently swallows
unexpected text; change the logic in WestminsterCityCouncil.py (the loop that
fills days, the inner_range handling and the fallback using DAY_ABBREV_MAP) so
that after splitting and stripping each part you only accept two forms: a
matched range (regex ^(\w{3})\s*[-–]\s*(\w{3})$ already handled) or an exact
weekday token that maps to DAY_ABBREV_MAP (case-insensitive match of the full
token or its 3-letter abbreviation), and for any non-empty token that does not
match either form raise an exception (e.g. ValueError or a ParseError) instead
of silently ignoring it; ensure you reference DAY_ABBREV_MAP and the existing
days list/inner_range variables when implementing the validation.

---

Nitpick comments:
In `@uk_bin_collection/tests/input.json`:
- Line 2886: Update the "wiki_note" text to explicitly clarify that Westminster
expects USRN (Unique Street Reference Number) values rather than UPRN (Unique
Property Reference Number) so users understand the distinction; modify the
wiki_note string (key "wiki_note" in the test JSON) to read something like:
"Pass postcode for auto USRN resolution, or pass USRN (Unique Street Reference
Number — street-level, not property-level UPRN) directly as uprn. Street-level
(not address-level) collection schedules." ensuring the message clearly calls
out USRN vs UPRN.
- Line 2882: Replace the "postcode" value in the test input (the "postcode" JSON
field) with a postcode that maps to one of the verified USRNs used in the PR
test plan (8401405, 8401123, or 8400910) so the test aligns with documented
verification; update the "postcode" string to a known postcode that resolves to
one of those USRNs (e.g., use the postcode that maps to USRN 8401123 or 8401405)
and ensure the JSON remains valid.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b04c4efa-1fb3-469e-aa96-d568c4f713cf

📥 Commits

Reviewing files that changed from the base of the PR and between 9e2c4c8 and ce6fff6.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/WestminsterCityCouncil.py

Comment thread uk_bin_collection/uk_bin_collection/councils/WestminsterCityCouncil.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant