Skip to content

fix: harden forest of dean address search and data extraction#2072

Open
InertiaUK wants to merge 2 commits into
robbrad:masterfrom
InertiaUK:fix/forest-of-dean-address-and-extraction
Open

fix: harden forest of dean address search and data extraction#2072
InertiaUK wants to merge 2 commits into
robbrad:masterfrom
InertiaUK:fix/forest-of-dean-address-and-extraction

Conversation

@InertiaUK
Copy link
Copy Markdown
Contributor

@InertiaUK InertiaUK commented May 13, 2026

Summary

  • Fix address construction to avoid duplicating postcode when already present in house_number
  • Click combobox before typing for reliable Salesforce Lightning component focus
  • Skip search header in dropdown results (first li is "Search", not an address)
  • Add text-based extraction fallback for Chrome versions where Shadow DOM hides table elements
  • Reduce unnecessary sleeps for faster scraping
  • Use data-cell-value attribute with get_text fallback for robust date extraction

Testing

  • Full address path: ELMOGAL, PARKEND ROAD, BREAM, LYDNEY + GL15 6JT
  • Tested via API end-to-end ✅

Summary by CodeRabbit

Bug Fixes

  • Improved reliability of bin collection data retrieval for Forest of Dean District Council by enhancing form element detection and data extraction.
  • Enhanced date parsing to handle multiple date formats, including relative dates.
  • Added fallback mechanisms for more robust extraction when expected data structure varies.

Review Change Stack

- fix address construction: avoid duplicating postcode when it's already in the house_number field
- click combobox before typing for reliable lightning component focus
- skip search header in dropdown results (first li is "Search", not an address)
- add text-based extraction fallback for chrome versions where shadow dom hides table elements
- reduce unnecessary sleeps for faster scraping
- use data-cell-value attribute with get_text fallback for robust date extraction

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

Warning

Review limit reached

@InertiaUK, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 2 reviews/hour. Refill in 21 minutes and 44 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 67dd3dc4-ccdd-40b9-96a3-d51d9de750ce

📥 Commits

Reviewing files that changed from the base of the PR and between 9c35a78 and 5c66014.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/ForestOfDeanDistrictCouncil.py
📝 Walkthrough

Walkthrough

The Forest of Dean District Council scraper is updated to handle form elements, parsing strategies, and date normalization. The parse_data method now uses new Selenium selectors, implements primary and fallback data extraction flows, and delegates date string parsing to helper utilities that handle relative dates and day-of-week strings with year rollover.

Changes

Forest of Dean Bin Collection Parser

Layer / File(s) Summary
Module imports, bin types, and date parsing helpers
uk_bin_collection/uk_bin_collection/councils/ForestOfDeanDistrictCouncil.py
Imports updated to include time module; new _BIN_TYPES constant defines recognized collection bin types; _looks_like_date and _parse_date static methods detect and normalize date strings including "today", "tomorrow", day-of-week prefixes, and handle year rollover for past-dated results.
Form interaction and bin collection extraction
uk_bin_collection/uk_bin_collection/councils/ForestOfDeanDistrictCouncil.py
parse_data method refactored to use updated form element selectors (combobox and text input), new wait/sleep strategy, and dual-path parsing: primary extraction from page source HTML with BeautifulSoup soup rows, fallback to body text line scanning when table rows absent; bin types and dates validated and normalized through the helper methods.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • dp247

Poem

🐰 A Forest of dates, now parsed with care,
The Dean's bins dance through autumn air,
Tomorrow, today, or Wednesday's call—
These rabbity helpers normalize all! 🌙

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: harden forest of dean address search and data extraction' accurately describes the main changes: improving address search and data extraction for the Forest of Dean council module.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (8ecf878) to head (5c66014).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2072   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/ForestOfDeanDistrictCouncil.py (2)

130-142: 💤 Low value

Consider wrapping strptime with a clearer error message.

If the date format changes on the council website, strptime will raise a generic ValueError. Wrapping it to include the original raw_date value would aid debugging.

Optional enhancement
         cleaned = re.sub(r"[^\w\s,]", "", raw_date)
-        parsed = datetime.strptime(cleaned, "%a, %d %B")
+        try:
+            parsed = datetime.strptime(cleaned, "%a, %d %B")
+        except ValueError as e:
+            raise ValueError(f"Unable to parse date '{raw_date}': {e}") from e
         parsed = parsed.replace(year=current_year)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/ForestOfDeanDistrictCouncil.py`
around lines 130 - 142, In _parse_date, wrap the datetime.strptime call in a
try/except so parsing failures raise or log a clearer error that includes the
original raw_date (and optionally cleaned) and the expected format; catch
ValueError around datetime.strptime(cleaned, "%a, %d %B") in the _parse_date
staticmethod, and re-raise a new ValueError or add to the log with a message
like "Failed to parse date '{raw_date}' (cleaned: '{cleaned}') with format '%a,
%d %B'" to make future format changes easier to diagnose.

107-107: ⚡ Quick win

Rename ambiguous variable l.

The single-letter l is easily confused with 1 or I. Use a descriptive name.

Proposed fix
-                body_text = driver.find_element(By.TAG_NAME, "body").text
-                lines = [l.strip() for l in body_text.split("\n") if l.strip()]
+                body_text = driver.find_element(By.TAG_NAME, "body").text
+                lines = [ln.strip() for ln in body_text.split("\n") if ln.strip()]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/ForestOfDeanDistrictCouncil.py`
at line 107, The list comprehension in ForestOfDeanDistrictCouncil (variable
`lines = [l.strip() for l in body_text.split("\n") if l.strip()]`) uses the
ambiguous single-letter `l`; change it to a clear name (e.g., `line`) so it
reads `lines = [line.strip() for line in body_text.split("\n") if
line.strip()]`, and update any other occurrences in that function/method that
reference `l` to the new name to avoid confusion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/ForestOfDeanDistrictCouncil.py`:
- Around line 103-104: The bare "except Exception: continue" inside the
ForestOfDeanDistrictCouncil class swallows errors; change it to capture the
exception (except Exception as e) and log it before continuing (e.g.,
logger.exception or _LOGGER.exception with a helpful message that includes
context such as the council name or the current item being parsed), so parsing
failures are visible; if this module uses a module-level logger (e.g., _LOGGER)
use that, otherwise import logging and call logging.exception to record the
stack trace, then continue.

---

Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/ForestOfDeanDistrictCouncil.py`:
- Around line 130-142: In _parse_date, wrap the datetime.strptime call in a
try/except so parsing failures raise or log a clearer error that includes the
original raw_date (and optionally cleaned) and the expected format; catch
ValueError around datetime.strptime(cleaned, "%a, %d %B") in the _parse_date
staticmethod, and re-raise a new ValueError or add to the log with a message
like "Failed to parse date '{raw_date}' (cleaned: '{cleaned}') with format '%a,
%d %B'" to make future format changes easier to diagnose.
- Line 107: The list comprehension in ForestOfDeanDistrictCouncil (variable
`lines = [l.strip() for l in body_text.split("\n") if l.strip()]`) uses the
ambiguous single-letter `l`; change it to a clear name (e.g., `line`) so it
reads `lines = [line.strip() for line in body_text.split("\n") if
line.strip()]`, and update any other occurrences in that function/method that
reference `l` to the new name to avoid confusion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 630546b2-b9c3-4eb6-867b-43664571421c

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecf878 and 9c35a78.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/ForestOfDeanDistrictCouncil.py

Comment on lines +103 to +104
except Exception:
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Log exceptions instead of silently continuing.

The bare except Exception: continue swallows all errors without any visibility into what failed. When the council website format changes, this will silently produce incomplete data rather than alerting to the issue. At minimum, log the exception to aid debugging.

Proposed fix
-                    except Exception:
-                        continue
+                    except Exception as e:
+                        print(f"Warning: failed to parse row: {e}")
+                        continue

Based on learnings: "In uk_bin_collection/**/*.py, when parsing council bin collection data, prefer explicit failures over silent defaults or swallowed errors. This ensures format changes are detected early."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception:
continue
except Exception as e:
print(f"Warning: failed to parse row: {e}")
continue
🧰 Tools
🪛 Ruff (0.15.12)

[error] 103-104: try-except-continue detected, consider logging the exception

(S112)


[warning] 103-103: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/ForestOfDeanDistrictCouncil.py`
around lines 103 - 104, The bare "except Exception: continue" inside the
ForestOfDeanDistrictCouncil class swallows errors; change it to capture the
exception (except Exception as e) and log it before continuing (e.g.,
logger.exception or _LOGGER.exception with a helpful message that includes
context such as the council name or the current item being parsed), so parsing
failures are visible; if this module uses a module-level logger (e.g., _LOGGER)
use that, otherwise import logging and call logging.exception to record the
stack trace, then continue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant