feat(events): add parsed start_date and end_date fields to event info#32
feat(events): add parsed start_date and end_date fields to event info#32Vanshbordia merged 1 commit intomainfrom
Conversation
Add date parsing functionality to convert date_text strings into actual datetime.date objects. The Info model now includes start_date and end_date fields that are parsed from the date range text. Update list_events() to use a three-tier fallback system for date accuracy: - Primary: fetch dates from event info page (includes correct year) - Secondary: parse from listing page text - Tertiary: fetch from matches page as last resort Also fix team name extraction in series module to handle nested elements and improve team metadata matching with multi-strategy approach.
|
Note
|
| Cohort / File(s) | Summary |
|---|---|
Version Bumps betterdocs/app/(home)/page.tsx, pyproject.toml, src/vlrdevapi/__init__.py, docs/source/conf.py |
Version updated from 1.6.0 to 1.6.1 across project manifests and UI. |
Event API Documentation betterdocs/content/docs/api/events/info.mdx, betterdocs/content/docs/api/events/list-events.mdx, docs/source/api/events.rst, docs/source/api/models.rst |
Added start_date and end_date fields to API documentation, Date Parsing section, and example usage demonstrating optional date handling and filtering. |
Event Model & Core Logic src/vlrdevapi/events/models.py, src/vlrdevapi/events/info.py, src/vlrdevapi/events/list_events.py |
Introduced start_date and end_date optional fields to Info model; implemented date range parsing via regex and month abbreviation map in info.py; enhanced list_events.py with layered fallbacks (info lookup → listing page → matches page). |
Series API Refactoring src/vlrdevapi/series/info.py, src/vlrdevapi/series/matches.py |
Refactored team name extraction to use direct text nodes; introduced \_match_team_to_meta helper with multi-strategy matching (exact match, short name, fuzzy containment, player-table fallback, position-based fallback). |
Estimated code review effort
🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
🐰 Hop along to 1.6.1!
Dates now parse with regex fun,
Start and end in hand so neat,
Teams match up a clever feat,
Better docs, refactored code—
Smoother travels down the road! 🎉
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title check | ✅ Passed | The title accurately describes the main feature being added: parsed start_date and end_date fields to event info, which aligns with the primary objective of adding date parsing functionality. |
| Docstring Coverage | ✅ Passed | Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%. |
| Merge Conflict Detection | ✅ Passed | ✅ No merge conflicts detected when merging into main |
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
✨ Finishing touches
- 📝 Generate docstrings
🧪 Generate unit tests (beta)
- Create PR with unit tests
- Post copyable unit tests in a comment
- Commit unit tests in branch
edge-cases-update-series-events
Tip
Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
Deploying vlrdevapi with
|
| Latest commit: |
80c9163
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://a7709c01.vlrdevapi.pages.dev |
| Branch Preview URL: | https://edge-cases-update-series-eve.vlrdevapi.pages.dev |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pyproject.toml (1)
41-47:⚠️ Potential issue | 🟠 Major
pytestshould not be a runtime dependency.
pytest(line 45) is a test framework listed under the maindependencies, meaning it will be installed for all end users. It already appears under[project.optional-dependencies]fortestanddev. Move it out of the core dependencies.Proposed fix
dependencies = [ "beautifulsoup4>=4.12", "httpx[brotli,http2]>=0.27", "lxml>=5.0", - "pytest>=8.4.2", "python-dateutil>=2.8.0", ]
🤖 Fix all issues with AI agents
In `@src/vlrdevapi/events/info.py`:
- Around line 26-63: The _parse_date_range function currently uses a narrow
regex and returns (None, None) for many valid formats; replace the custom regex
logic by calling the existing utilities split_date_range and parse_date from
utils.py: use split_date_range(date_text) to get the two sides, then run
parse_date(...) on each side (handling single-day vs range results) and
construct start_date/end_date accordingly, preserving the same return type as
_parse_date_range; ensure you still return (None, None) when parsing fails and
keep callers like info() and list_events working unchanged.
In `@src/vlrdevapi/events/list_events.py`:
- Around line 203-210: The current loop in list_events calls
get_event_info(ev_id, effective_timeout) for every event which causes N+1 HTTP
requests; instead, invert the priority so you first use the already-fetched
listing-page parse (the current SECONDARY logic that extracts
start_date/end_date from the listing HTML) and only call get_event_info(ev_id,
effective_timeout) when those listing-derived dates are missing or incomplete;
update the code around the start_date/end_date assignments in list_events to
check listing-parsed fields first, and call get_event_info as a fallback for
missing dates (use ev_id and effective_timeout as before).
In `@src/vlrdevapi/series/matches.py`:
- Around line 62-70: The fuzzy substring rule in matches.py (the block using
name_clean, short_lookup and short_clean within Strategy 3a) is producing false
positives for very short abbreviations; change the containment heuristic to
avoid 2-character bidirectional matches by increasing the minimum length
threshold (e.g., >=3 or >=3 for both sides) and implement a simple tie-breaker
when multiple short_lookup entries match—collect all matches instead of
returning the first, then pick the match with the longest common overlap (or
highest similarity ratio) before returning meta.get("id") and meta.get("short").
Ensure changes are made where name_clean and short_clean are compared so that
short abbreviations like "G2"/"NV" do not prematurely match unrelated names.
🧹 Nitpick comments (4)
src/vlrdevapi/series/matches.py (2)
45-48: Duplicatecanonicalhelper — extract to module level.This nested function is identical to the one defined inside
matches()at line 146. Consider extracting a single module-level_canonical(value)to eliminate the duplication.
20-27: Use parameterized type hints instead of barelist.
info_teams: list(line 22) andinfo_teams_list: list(line 156) lose the element type. Since these holdTeamInfoobjects, annotating aslist[TeamInfo](with a forward reference or import) would improve readability and enable better static analysis.src/vlrdevapi/series/info.py (1)
111-124: Direct text path skips NFC normalization thatextract_textapplies.When
extract_direct_textfinds direct text nodes (line 116-119), it returns without NFC normalization. The fallback path delegates toextract_textwhich does normalize. This inconsistency could cause team names with special characters (accented names, etc.) to differ in representation depending on which path is taken.Proposed fix
+import unicodedata + def extract_direct_text(element) -> str: """Extract only direct text content from element, excluding nested elements.""" if not element: return "" # Get only direct text nodes (not text from nested elements) direct_text = "".join( str(child) for child in element.children if isinstance(child, str) ).strip() # Fallback to full text if no direct text found - return direct_text if direct_text else extract_text(element) + if direct_text: + try: + return unicodedata.normalize("NFC", direct_text) + except Exception: + return direct_text + return extract_text(element)Note:
unicodedatais already available viaextract_textinsrc/vlrdevapi/utils.py(lines 24-33). You could alternatively extract this as a utility function alongsideextract_textto keep normalization consistent.src/vlrdevapi/events/info.py (1)
53-63: Nit (Ruff TRY300): Move the success return out of thetryblock.The
return start_date, end_dateon line 61 doesn't needtryprotection. Moving it to anelseclause makes the intent clearer — only the date construction can raise.Proposed fix
try: year = int(year_str) start_day = int(start_day_str) end_day = int(end_day_str) start_date = datetime.date(year, month_num, start_day) end_date = datetime.date(year, month_num, end_day) - - return start_date, end_date except (ValueError, TypeError): return None, None + else: + return start_date, end_date
| def _parse_date_range(date_text: str | None) -> tuple[datetime.date | None, datetime.date | None]: | ||
| """Parse a date range string into start and end dates. | ||
|
|
||
| Parses format like "Dec 17 - 22, 2025" into (2025-12-17, 2025-12-22). | ||
|
|
||
| Args: | ||
| date_text: The date range string to parse. | ||
|
|
||
| Returns: | ||
| A tuple of (start_date, end_date), or (None, None) if parsing fails. | ||
| """ | ||
| if not date_text: | ||
| return None, None | ||
|
|
||
| # Pattern: "Month Day - EndDay, Year" (e.g., "Dec 17 - 22, 2025") | ||
| pattern = r"^([A-Za-z]{3})\s+(\d{1,2})\s*-\s*(\d{1,2}),\s*(\d{4})$" | ||
| match = re.match(pattern, date_text.strip()) | ||
|
|
||
| if not match: | ||
| return None, None | ||
|
|
||
| month_abbr, start_day_str, end_day_str, year_str = match.groups() | ||
|
|
||
| month_num = _MONTH_MAP.get(month_abbr.lower()) | ||
| if month_num is None: | ||
| return None, None | ||
|
|
||
| try: | ||
| year = int(year_str) | ||
| start_day = int(start_day_str) | ||
| end_day = int(end_day_str) | ||
|
|
||
| start_date = datetime.date(year, month_num, start_day) | ||
| end_date = datetime.date(year, month_num, end_day) | ||
|
|
||
| return start_date, end_date | ||
| except (ValueError, TypeError): | ||
| return None, None |
There was a problem hiding this comment.
_parse_date_range only handles same-month, abbreviated ranges — many common formats will silently return (None, None).
The regex on line 41 only matches "Mon DD - DD, YYYY" (e.g., "Dec 17 - 22, 2025"). It will fail for:
- Cross-month ranges:
"Feb 8 - Mar 1, 2025"or"Dec 28, 2025 - Jan 5, 2026" - Full month names:
"December 17 - 22, 2025" - Dash variants (em-dash, en-dash) — though
split_date_rangeinutils.pyalready normalizes these
The codebase already has split_date_range and parse_date in utils.py that handle a broader set of formats. Consider reusing those utilities here instead of a bespoke regex, or at minimum extending the regex to cover cross-month ranges, since those are common for multi-week events.
Direct callers of info() (not going through list_events) won't benefit from the secondary dateutil fallback in list_events.py, so they'll get None dates for any range this regex doesn't match.
Proposed fix: reuse existing utils
+from ..utils import extract_text, normalize_whitespace, split_date_range, parse_date
-from ..utils import extract_text, normalize_whitespace
+_DATE_FORMATS = [
+ "%b %d, %Y", "%B %d, %Y",
+ "%d %b, %Y", "%d %B, %Y",
+ "%m/%d/%Y", "%d/%m/%Y",
+]
def _parse_date_range(date_text: str | None) -> tuple[datetime.date | None, datetime.date | None]:
- ...regex-only approach...
+ if not date_text:
+ return None, None
+ start_raw, end_raw = split_date_range(date_text)
+ start_date = parse_date(start_raw, _DATE_FORMATS) if start_raw else None
+ end_date = parse_date(end_raw, _DATE_FORMATS) if end_raw else None
+ return start_date, end_date🧰 Tools
🪛 Ruff (0.15.0)
[warning] 61-61: Consider moving this statement to an else block
(TRY300)
🤖 Prompt for AI Agents
In `@src/vlrdevapi/events/info.py` around lines 26 - 63, The _parse_date_range
function currently uses a narrow regex and returns (None, None) for many valid
formats; replace the custom regex logic by calling the existing utilities
split_date_range and parse_date from utils.py: use split_date_range(date_text)
to get the two sides, then run parse_date(...) on each side (handling single-day
vs range results) and construct start_date/end_date accordingly, preserving the
same return type as _parse_date_range; ensure you still return (None, None) when
parsing fails and keep callers like info() and list_events working unchanged.
| # PRIMARY: Get accurate dates from event info page (includes proper year) | ||
| # This is the most reliable source as the listing page may omit years | ||
| event_info = get_event_info(ev_id, effective_timeout) | ||
| if event_info: | ||
| if event_info.start_date: | ||
| start_date = event_info.start_date | ||
| if event_info.end_date: | ||
| end_date = event_info.end_date |
There was a problem hiding this comment.
Critical N+1 HTTP request problem: get_event_info is called for every event in the listing.
Each iteration of the loop makes a synchronous HTTP request to fetch the individual event info page. For a listing with, say, 20 events, this turns a single list_events() call into 21+ HTTP requests (1 listing + 20 info pages), and potentially 41+ if the tertiary fallback also fires. This will cause severe latency for callers.
Consider inverting the priority: parse from the already-fetched listing page first (current SECONDARY), and only fall back to get_event_info when listing-page parsing fails. This way the expensive HTTP call is only made when needed.
Proposed fix: swap PRIMARY and SECONDARY tiers
- # PRIMARY: Get accurate dates from event info page (includes proper year)
- # This is the most reliable source as the listing page may omit years
- event_info = get_event_info(ev_id, effective_timeout)
- if event_info:
- if event_info.start_date:
- start_date = event_info.start_date
- if event_info.end_date:
- end_date = event_info.end_date
-
- # SECONDARY: Parse dates from listing page if info() didn't return dates
- # This handles cases where info() fails or returns None for dates
- if start_date is None or end_date is None:
- # Try to parse start date
- if start_date is None and start_text:
+ # PRIMARY: Parse dates from listing page text (no extra HTTP request)
+ if start_text:
+ try:
+ parsed = dateutil_parser.parse(start_text, fuzzy=False)
+ start_date = parsed.date()
+ except (ValueError, TypeError, dateutil_parser.ParserError):
+ start_date = parse_date(start_text, _DATE_FORMATS)
+
+ if end_text:
+ try:
+ parsed = dateutil_parser.parse(end_text, fuzzy=False)
+ end_date = parsed.date()
+ except (ValueError, TypeError, dateutil_parser.ParserError):
+ end_date = parse_date(end_text, _DATE_FORMATS)
+
+ # SECONDARY: Fetch from event info page only when listing-page parsing failed
+ if start_date is None or end_date is None:
+ event_info = get_event_info(ev_id, effective_timeout)
+ if event_info:
+ if start_date is None and event_info.start_date:
+ start_date = event_info.start_date
+ if end_date is None and event_info.end_date:
+ end_date = event_info.end_date🤖 Prompt for AI Agents
In `@src/vlrdevapi/events/list_events.py` around lines 203 - 210, The current loop
in list_events calls get_event_info(ev_id, effective_timeout) for every event
which causes N+1 HTTP requests; instead, invert the priority so you first use
the already-fetched listing-page parse (the current SECONDARY logic that
extracts start_date/end_date from the listing HTML) and only call
get_event_info(ev_id, effective_timeout) when those listing-derived dates are
missing or incomplete; update the code around the start_date/end_date
assignments in list_events to check listing-parsed fields first, and call
get_event_info as a fallback for missing dates (use ev_id and effective_timeout
as before).
| # Strategy 3: Fuzzy match - check if names/shorts contain each other | ||
| name_clean = game_name.lower().replace(" ", "").replace("-", "").replace("_", "") | ||
|
|
||
| # 3a: Check against short names | ||
| for short_key, meta in short_lookup.items(): | ||
| short_clean = short_key.lower().replace(" ", "").replace("-", "") | ||
| if short_clean and len(short_clean) >= 2: | ||
| if short_clean in name_clean or name_clean in short_clean: | ||
| return meta.get("id"), meta.get("short") |
There was a problem hiding this comment.
Fuzzy substring matching can produce false positives with short abbreviations.
Strategy 3a performs a bidirectional substring check (short_clean in name_clean or name_clean in short_clean) with a floor of only 2 characters. Common esports tags like "G2", "T1", or "NV" will trivially match unrelated names (e.g., "nv" ⊂ "envyus"). Since the loop returns the first hit from short_lookup, the wrong team can be selected when both entries match.
Consider adding a tie-breaking heuristic (e.g., prefer the longer overlap or highest similarity ratio) or raising the minimum length threshold for containment checks to reduce false matches.
🤖 Prompt for AI Agents
In `@src/vlrdevapi/series/matches.py` around lines 62 - 70, The fuzzy substring
rule in matches.py (the block using name_clean, short_lookup and short_clean
within Strategy 3a) is producing false positives for very short abbreviations;
change the containment heuristic to avoid 2-character bidirectional matches by
increasing the minimum length threshold (e.g., >=3 or >=3 for both sides) and
implement a simple tie-breaker when multiple short_lookup entries match—collect
all matches instead of returning the first, then pick the match with the longest
common overlap (or highest similarity ratio) before returning meta.get("id") and
meta.get("short"). Ensure changes are made where name_clean and short_clean are
compared so that short abbreviations like "G2"/"NV" do not prematurely match
unrelated names.
| # Extract team short names from player tables for additional matching | ||
| player_table_shorts: set[str] = set() | ||
| for table in game.select("table.wf-table-inset"): | ||
| for short_el in table.select(".mod-player .ge-text-light"): | ||
| short = extract_text(short_el) | ||
| if short: | ||
| player_table_shorts.add(short.upper()) | ||
|
|
||
| t1_id_val = t1_meta.get("id") if t1_meta else None | ||
| t1_short_val = t1_meta.get("short") if t1_meta else None | ||
| t2_id_val = t2_meta.get("id") if t2_meta else None | ||
| t2_short_val = t2_meta.get("short") if t2_meta else None | ||
| # Use multi-strategy matching for team metadata | ||
| t1_id_val, t1_short_val = _match_team_to_meta( | ||
| t1_name, info_teams_list, team_meta_lookup, team_short_to_meta, 0, player_table_shorts | ||
| ) | ||
| t2_id_val, t2_short_val = _match_team_to_meta( | ||
| t2_name, info_teams_list, team_meta_lookup, team_short_to_meta, 1, player_table_shorts | ||
| ) |
There was a problem hiding this comment.
player_table_shorts is shared across both teams — Strategy 3.5 can misattribute metadata.
The set is built from all player tables in the game section (both teams' tables), then the same set is passed to _match_team_to_meta for both t1 and t2. In Strategy 3.5, the first short in the set found in short_lookup is returned — since set iteration order is arbitrary, this can resolve to the wrong team's metadata for either or both calls.
Consider collecting per-table shorts and passing only the relevant subset to each call:
Proposed fix
if t1_name and t2_name:
- # Extract team short names from player tables for additional matching
- player_table_shorts: set[str] = set()
- for table in game.select("table.wf-table-inset"):
- for short_el in table.select(".mod-player .ge-text-light"):
- short = extract_text(short_el)
- if short:
- player_table_shorts.add(short.upper())
-
- # Use multi-strategy matching for team metadata
- t1_id_val, t1_short_val = _match_team_to_meta(
- t1_name, info_teams_list, team_meta_lookup, team_short_to_meta, 0, player_table_shorts
- )
- t2_id_val, t2_short_val = _match_team_to_meta(
- t2_name, info_teams_list, team_meta_lookup, team_short_to_meta, 1, player_table_shorts
- )
+ # Extract per-team short names from player tables
+ per_table_shorts: list[set[str]] = []
+ for table in game.select("table.wf-table-inset"):
+ shorts: set[str] = set()
+ for short_el in table.select(".mod-player .ge-text-light"):
+ short = extract_text(short_el)
+ if short:
+ shorts.add(short.upper())
+ per_table_shorts.append(shorts)
+
+ # Use multi-strategy matching for team metadata
+ t1_id_val, t1_short_val = _match_team_to_meta(
+ t1_name, info_teams_list, team_meta_lookup, team_short_to_meta, 0,
+ per_table_shorts[0] if len(per_table_shorts) > 0 else None,
+ )
+ t2_id_val, t2_short_val = _match_team_to_meta(
+ t2_name, info_teams_list, team_meta_lookup, team_short_to_meta, 1,
+ per_table_shorts[1] if len(per_table_shorts) > 1 else None,
+ )
Add date parsing functionality to convert date_text strings into actual datetime.date objects. The Info model now includes start_date and end_date fields that are parsed from the date range text.
Update list_events() to use a three-tier fallback system for date accuracy:
Also fix team name extraction in series module to handle nested elements and improve team metadata matching with multi-strategy approach.
Summary by CodeRabbit
Release Notes v1.6.1
New Features
start_dateandend_datefields to event details for improved date handlingDocumentation
Improvements
Chores