Skip to content

feat(events): add parsed start_date and end_date fields to event info#32

Merged
Vanshbordia merged 1 commit intomainfrom
edge-cases-update-series-events
Feb 14, 2026
Merged

feat(events): add parsed start_date and end_date fields to event info#32
Vanshbordia merged 1 commit intomainfrom
edge-cases-update-series-events

Conversation

@Vanshbordia
Copy link
Copy Markdown
Owner

@Vanshbordia Vanshbordia commented Feb 14, 2026

Add date parsing functionality to convert date_text strings into actual datetime.date objects. The Info model now includes start_date and end_date fields that are parsed from the date range text.

Update list_events() to use a three-tier fallback system for date accuracy:

  • Primary: fetch dates from event info page (includes correct year)
  • Secondary: parse from listing page text
  • Tertiary: fetch from matches page as last resort

Also fix team name extraction in series module to handle nested elements and improve team metadata matching with multi-strategy approach.

Summary by CodeRabbit

Release Notes v1.6.1

  • New Features

    • Added optional start_date and end_date fields to event details for improved date handling
  • Documentation

    • Updated API documentation with new date field descriptions and examples
    • Added date parsing guidance for event filtering and sorting
  • Improvements

    • Enhanced team name extraction reliability
    • Improved team metadata matching with multi-strategy resolution
  • Chores

    • Version bumped to 1.6.1

Add date parsing functionality to convert date_text strings into actual
datetime.date objects. The Info model now includes start_date and end_date
fields that are parsed from the date range text.

Update list_events() to use a three-tier fallback system for date accuracy:
- Primary: fetch dates from event info page (includes correct year)
- Secondary: parse from listing page text
- Tertiary: fetch from matches page as last resort

Also fix team name extraction in series module to handle nested elements
and improve team metadata matching with multi-strategy approach.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 14, 2026

Note

.coderabbit.yaml has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)
Validation error: Unrecognized key(s) in object: '+language', 'include', 'exclude', 'focus', 'rules', 'documentation', 'testing', 'style', 'dependencies', 'performance', 'integrations', 'severity', 'messages', 'automation'
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
📝 Walkthrough

Walkthrough

This release adds optional start_date and end_date fields to the event Info model, implementing date parsing from event text and supporting fallback retrieval via event info lookup. Additionally, team name extraction and team metadata matching logic are refactored for improved reliability. Version is bumped to 1.6.1 across all manifests and documentation.

Changes

Cohort / File(s) Summary
Version Bumps
betterdocs/app/(home)/page.tsx, pyproject.toml, src/vlrdevapi/__init__.py, docs/source/conf.py
Version updated from 1.6.0 to 1.6.1 across project manifests and UI.
Event API Documentation
betterdocs/content/docs/api/events/info.mdx, betterdocs/content/docs/api/events/list-events.mdx, docs/source/api/events.rst, docs/source/api/models.rst
Added start_date and end_date fields to API documentation, Date Parsing section, and example usage demonstrating optional date handling and filtering.
Event Model & Core Logic
src/vlrdevapi/events/models.py, src/vlrdevapi/events/info.py, src/vlrdevapi/events/list_events.py
Introduced start_date and end_date optional fields to Info model; implemented date range parsing via regex and month abbreviation map in info.py; enhanced list_events.py with layered fallbacks (info lookup → listing page → matches page).
Series API Refactoring
src/vlrdevapi/series/info.py, src/vlrdevapi/series/matches.py
Refactored team name extraction to use direct text nodes; introduced \_match_team_to_meta helper with multi-strategy matching (exact match, short name, fuzzy containment, player-table fallback, position-based fallback).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hop along to 1.6.1!
Dates now parse with regex fun,
Start and end in hand so neat,
Teams match up a clever feat,
Better docs, refactored code—
Smoother travels down the road! 🎉

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main feature being added: parsed start_date and end_date fields to event info, which aligns with the primary objective of adding date parsing functionality.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch edge-cases-update-series-events

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying vlrdevapi with  Cloudflare Pages  Cloudflare Pages

Latest commit: 80c9163
Status: ✅  Deploy successful!
Preview URL: https://a7709c01.vlrdevapi.pages.dev
Branch Preview URL: https://edge-cases-update-series-eve.vlrdevapi.pages.dev

View logs

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pyproject.toml (1)

41-47: ⚠️ Potential issue | 🟠 Major

pytest should not be a runtime dependency.

pytest (line 45) is a test framework listed under the main dependencies, meaning it will be installed for all end users. It already appears under [project.optional-dependencies] for test and dev. Move it out of the core dependencies.

Proposed fix
 dependencies = [
     "beautifulsoup4>=4.12",
     "httpx[brotli,http2]>=0.27",
     "lxml>=5.0",
-    "pytest>=8.4.2",
     "python-dateutil>=2.8.0",
 ]
🤖 Fix all issues with AI agents
In `@src/vlrdevapi/events/info.py`:
- Around line 26-63: The _parse_date_range function currently uses a narrow
regex and returns (None, None) for many valid formats; replace the custom regex
logic by calling the existing utilities split_date_range and parse_date from
utils.py: use split_date_range(date_text) to get the two sides, then run
parse_date(...) on each side (handling single-day vs range results) and
construct start_date/end_date accordingly, preserving the same return type as
_parse_date_range; ensure you still return (None, None) when parsing fails and
keep callers like info() and list_events working unchanged.

In `@src/vlrdevapi/events/list_events.py`:
- Around line 203-210: The current loop in list_events calls
get_event_info(ev_id, effective_timeout) for every event which causes N+1 HTTP
requests; instead, invert the priority so you first use the already-fetched
listing-page parse (the current SECONDARY logic that extracts
start_date/end_date from the listing HTML) and only call get_event_info(ev_id,
effective_timeout) when those listing-derived dates are missing or incomplete;
update the code around the start_date/end_date assignments in list_events to
check listing-parsed fields first, and call get_event_info as a fallback for
missing dates (use ev_id and effective_timeout as before).

In `@src/vlrdevapi/series/matches.py`:
- Around line 62-70: The fuzzy substring rule in matches.py (the block using
name_clean, short_lookup and short_clean within Strategy 3a) is producing false
positives for very short abbreviations; change the containment heuristic to
avoid 2-character bidirectional matches by increasing the minimum length
threshold (e.g., >=3 or >=3 for both sides) and implement a simple tie-breaker
when multiple short_lookup entries match—collect all matches instead of
returning the first, then pick the match with the longest common overlap (or
highest similarity ratio) before returning meta.get("id") and meta.get("short").
Ensure changes are made where name_clean and short_clean are compared so that
short abbreviations like "G2"/"NV" do not prematurely match unrelated names.
🧹 Nitpick comments (4)
src/vlrdevapi/series/matches.py (2)

45-48: Duplicate canonical helper — extract to module level.

This nested function is identical to the one defined inside matches() at line 146. Consider extracting a single module-level _canonical(value) to eliminate the duplication.


20-27: Use parameterized type hints instead of bare list.

info_teams: list (line 22) and info_teams_list: list (line 156) lose the element type. Since these hold TeamInfo objects, annotating as list[TeamInfo] (with a forward reference or import) would improve readability and enable better static analysis.

src/vlrdevapi/series/info.py (1)

111-124: Direct text path skips NFC normalization that extract_text applies.

When extract_direct_text finds direct text nodes (line 116-119), it returns without NFC normalization. The fallback path delegates to extract_text which does normalize. This inconsistency could cause team names with special characters (accented names, etc.) to differ in representation depending on which path is taken.

Proposed fix
+import unicodedata
+
 def extract_direct_text(element) -> str:
     """Extract only direct text content from element, excluding nested elements."""
     if not element:
         return ""
     # Get only direct text nodes (not text from nested elements)
     direct_text = "".join(
         str(child) for child in element.children 
         if isinstance(child, str)
     ).strip()
     # Fallback to full text if no direct text found
-    return direct_text if direct_text else extract_text(element)
+    if direct_text:
+        try:
+            return unicodedata.normalize("NFC", direct_text)
+        except Exception:
+            return direct_text
+    return extract_text(element)

Note: unicodedata is already available via extract_text in src/vlrdevapi/utils.py (lines 24-33). You could alternatively extract this as a utility function alongside extract_text to keep normalization consistent.

src/vlrdevapi/events/info.py (1)

53-63: Nit (Ruff TRY300): Move the success return out of the try block.

The return start_date, end_date on line 61 doesn't need try protection. Moving it to an else clause makes the intent clearer — only the date construction can raise.

Proposed fix
     try:
         year = int(year_str)
         start_day = int(start_day_str)
         end_day = int(end_day_str)
 
         start_date = datetime.date(year, month_num, start_day)
         end_date = datetime.date(year, month_num, end_day)
-
-        return start_date, end_date
     except (ValueError, TypeError):
         return None, None
+    else:
+        return start_date, end_date

Comment on lines +26 to +63
def _parse_date_range(date_text: str | None) -> tuple[datetime.date | None, datetime.date | None]:
"""Parse a date range string into start and end dates.

Parses format like "Dec 17 - 22, 2025" into (2025-12-17, 2025-12-22).

Args:
date_text: The date range string to parse.

Returns:
A tuple of (start_date, end_date), or (None, None) if parsing fails.
"""
if not date_text:
return None, None

# Pattern: "Month Day - EndDay, Year" (e.g., "Dec 17 - 22, 2025")
pattern = r"^([A-Za-z]{3})\s+(\d{1,2})\s*-\s*(\d{1,2}),\s*(\d{4})$"
match = re.match(pattern, date_text.strip())

if not match:
return None, None

month_abbr, start_day_str, end_day_str, year_str = match.groups()

month_num = _MONTH_MAP.get(month_abbr.lower())
if month_num is None:
return None, None

try:
year = int(year_str)
start_day = int(start_day_str)
end_day = int(end_day_str)

start_date = datetime.date(year, month_num, start_day)
end_date = datetime.date(year, month_num, end_day)

return start_date, end_date
except (ValueError, TypeError):
return None, None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

_parse_date_range only handles same-month, abbreviated ranges — many common formats will silently return (None, None).

The regex on line 41 only matches "Mon DD - DD, YYYY" (e.g., "Dec 17 - 22, 2025"). It will fail for:

  • Cross-month ranges: "Feb 8 - Mar 1, 2025" or "Dec 28, 2025 - Jan 5, 2026"
  • Full month names: "December 17 - 22, 2025"
  • Dash variants (em-dash, en-dash) — though split_date_range in utils.py already normalizes these

The codebase already has split_date_range and parse_date in utils.py that handle a broader set of formats. Consider reusing those utilities here instead of a bespoke regex, or at minimum extending the regex to cover cross-month ranges, since those are common for multi-week events.

Direct callers of info() (not going through list_events) won't benefit from the secondary dateutil fallback in list_events.py, so they'll get None dates for any range this regex doesn't match.

Proposed fix: reuse existing utils
+from ..utils import extract_text, normalize_whitespace, split_date_range, parse_date
-from ..utils import extract_text, normalize_whitespace

+_DATE_FORMATS = [
+    "%b %d, %Y", "%B %d, %Y",
+    "%d %b, %Y", "%d %B, %Y",
+    "%m/%d/%Y", "%d/%m/%Y",
+]

 def _parse_date_range(date_text: str | None) -> tuple[datetime.date | None, datetime.date | None]:
-    ...regex-only approach...
+    if not date_text:
+        return None, None
+    start_raw, end_raw = split_date_range(date_text)
+    start_date = parse_date(start_raw, _DATE_FORMATS) if start_raw else None
+    end_date = parse_date(end_raw, _DATE_FORMATS) if end_raw else None
+    return start_date, end_date
🧰 Tools
🪛 Ruff (0.15.0)

[warning] 61-61: Consider moving this statement to an else block

(TRY300)

🤖 Prompt for AI Agents
In `@src/vlrdevapi/events/info.py` around lines 26 - 63, The _parse_date_range
function currently uses a narrow regex and returns (None, None) for many valid
formats; replace the custom regex logic by calling the existing utilities
split_date_range and parse_date from utils.py: use split_date_range(date_text)
to get the two sides, then run parse_date(...) on each side (handling single-day
vs range results) and construct start_date/end_date accordingly, preserving the
same return type as _parse_date_range; ensure you still return (None, None) when
parsing fails and keep callers like info() and list_events working unchanged.

Comment on lines +203 to +210
# PRIMARY: Get accurate dates from event info page (includes proper year)
# This is the most reliable source as the listing page may omit years
event_info = get_event_info(ev_id, effective_timeout)
if event_info:
if event_info.start_date:
start_date = event_info.start_date
if event_info.end_date:
end_date = event_info.end_date
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical N+1 HTTP request problem: get_event_info is called for every event in the listing.

Each iteration of the loop makes a synchronous HTTP request to fetch the individual event info page. For a listing with, say, 20 events, this turns a single list_events() call into 21+ HTTP requests (1 listing + 20 info pages), and potentially 41+ if the tertiary fallback also fires. This will cause severe latency for callers.

Consider inverting the priority: parse from the already-fetched listing page first (current SECONDARY), and only fall back to get_event_info when listing-page parsing fails. This way the expensive HTTP call is only made when needed.

Proposed fix: swap PRIMARY and SECONDARY tiers
-        # PRIMARY: Get accurate dates from event info page (includes proper year)
-        # This is the most reliable source as the listing page may omit years
-        event_info = get_event_info(ev_id, effective_timeout)
-        if event_info:
-            if event_info.start_date:
-                start_date = event_info.start_date
-            if event_info.end_date:
-                end_date = event_info.end_date
-        
-        # SECONDARY: Parse dates from listing page if info() didn't return dates
-        # This handles cases where info() fails or returns None for dates
-        if start_date is None or end_date is None:
-            # Try to parse start date
-            if start_date is None and start_text:
+        # PRIMARY: Parse dates from listing page text (no extra HTTP request)
+        if start_text:
+            try:
+                parsed = dateutil_parser.parse(start_text, fuzzy=False)
+                start_date = parsed.date()
+            except (ValueError, TypeError, dateutil_parser.ParserError):
+                start_date = parse_date(start_text, _DATE_FORMATS)
+
+        if end_text:
+            try:
+                parsed = dateutil_parser.parse(end_text, fuzzy=False)
+                end_date = parsed.date()
+            except (ValueError, TypeError, dateutil_parser.ParserError):
+                end_date = parse_date(end_text, _DATE_FORMATS)
+
+        # SECONDARY: Fetch from event info page only when listing-page parsing failed
+        if start_date is None or end_date is None:
+            event_info = get_event_info(ev_id, effective_timeout)
+            if event_info:
+                if start_date is None and event_info.start_date:
+                    start_date = event_info.start_date
+                if end_date is None and event_info.end_date:
+                    end_date = event_info.end_date
🤖 Prompt for AI Agents
In `@src/vlrdevapi/events/list_events.py` around lines 203 - 210, The current loop
in list_events calls get_event_info(ev_id, effective_timeout) for every event
which causes N+1 HTTP requests; instead, invert the priority so you first use
the already-fetched listing-page parse (the current SECONDARY logic that
extracts start_date/end_date from the listing HTML) and only call
get_event_info(ev_id, effective_timeout) when those listing-derived dates are
missing or incomplete; update the code around the start_date/end_date
assignments in list_events to check listing-parsed fields first, and call
get_event_info as a fallback for missing dates (use ev_id and effective_timeout
as before).

Comment on lines +62 to +70
# Strategy 3: Fuzzy match - check if names/shorts contain each other
name_clean = game_name.lower().replace(" ", "").replace("-", "").replace("_", "")

# 3a: Check against short names
for short_key, meta in short_lookup.items():
short_clean = short_key.lower().replace(" ", "").replace("-", "")
if short_clean and len(short_clean) >= 2:
if short_clean in name_clean or name_clean in short_clean:
return meta.get("id"), meta.get("short")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fuzzy substring matching can produce false positives with short abbreviations.

Strategy 3a performs a bidirectional substring check (short_clean in name_clean or name_clean in short_clean) with a floor of only 2 characters. Common esports tags like "G2", "T1", or "NV" will trivially match unrelated names (e.g., "nv""envyus"). Since the loop returns the first hit from short_lookup, the wrong team can be selected when both entries match.

Consider adding a tie-breaking heuristic (e.g., prefer the longer overlap or highest similarity ratio) or raising the minimum length threshold for containment checks to reduce false matches.

🤖 Prompt for AI Agents
In `@src/vlrdevapi/series/matches.py` around lines 62 - 70, The fuzzy substring
rule in matches.py (the block using name_clean, short_lookup and short_clean
within Strategy 3a) is producing false positives for very short abbreviations;
change the containment heuristic to avoid 2-character bidirectional matches by
increasing the minimum length threshold (e.g., >=3 or >=3 for both sides) and
implement a simple tie-breaker when multiple short_lookup entries match—collect
all matches instead of returning the first, then pick the match with the longest
common overlap (or highest similarity ratio) before returning meta.get("id") and
meta.get("short"). Ensure changes are made where name_clean and short_clean are
compared so that short abbreviations like "G2"/"NV" do not prematurely match
unrelated names.

Comment on lines +276 to +290
# Extract team short names from player tables for additional matching
player_table_shorts: set[str] = set()
for table in game.select("table.wf-table-inset"):
for short_el in table.select(".mod-player .ge-text-light"):
short = extract_text(short_el)
if short:
player_table_shorts.add(short.upper())

t1_id_val = t1_meta.get("id") if t1_meta else None
t1_short_val = t1_meta.get("short") if t1_meta else None
t2_id_val = t2_meta.get("id") if t2_meta else None
t2_short_val = t2_meta.get("short") if t2_meta else None
# Use multi-strategy matching for team metadata
t1_id_val, t1_short_val = _match_team_to_meta(
t1_name, info_teams_list, team_meta_lookup, team_short_to_meta, 0, player_table_shorts
)
t2_id_val, t2_short_val = _match_team_to_meta(
t2_name, info_teams_list, team_meta_lookup, team_short_to_meta, 1, player_table_shorts
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

player_table_shorts is shared across both teams — Strategy 3.5 can misattribute metadata.

The set is built from all player tables in the game section (both teams' tables), then the same set is passed to _match_team_to_meta for both t1 and t2. In Strategy 3.5, the first short in the set found in short_lookup is returned — since set iteration order is arbitrary, this can resolve to the wrong team's metadata for either or both calls.

Consider collecting per-table shorts and passing only the relevant subset to each call:

Proposed fix
                if t1_name and t2_name:
-                    # Extract team short names from player tables for additional matching
-                    player_table_shorts: set[str] = set()
-                    for table in game.select("table.wf-table-inset"):
-                        for short_el in table.select(".mod-player .ge-text-light"):
-                            short = extract_text(short_el)
-                            if short:
-                                player_table_shorts.add(short.upper())
-                    
-                    # Use multi-strategy matching for team metadata
-                    t1_id_val, t1_short_val = _match_team_to_meta(
-                        t1_name, info_teams_list, team_meta_lookup, team_short_to_meta, 0, player_table_shorts
-                    )
-                    t2_id_val, t2_short_val = _match_team_to_meta(
-                        t2_name, info_teams_list, team_meta_lookup, team_short_to_meta, 1, player_table_shorts
-                    )
+                    # Extract per-team short names from player tables
+                    per_table_shorts: list[set[str]] = []
+                    for table in game.select("table.wf-table-inset"):
+                        shorts: set[str] = set()
+                        for short_el in table.select(".mod-player .ge-text-light"):
+                            short = extract_text(short_el)
+                            if short:
+                                shorts.add(short.upper())
+                        per_table_shorts.append(shorts)
+                    
+                    # Use multi-strategy matching for team metadata
+                    t1_id_val, t1_short_val = _match_team_to_meta(
+                        t1_name, info_teams_list, team_meta_lookup, team_short_to_meta, 0,
+                        per_table_shorts[0] if len(per_table_shorts) > 0 else None,
+                    )
+                    t2_id_val, t2_short_val = _match_team_to_meta(
+                        t2_name, info_teams_list, team_meta_lookup, team_short_to_meta, 1,
+                        per_table_shorts[1] if len(per_table_shorts) > 1 else None,
+                    )

@Vanshbordia Vanshbordia merged commit 51cb9f8 into main Feb 14, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant