Skip to content

Conversation

vevetron
Copy link
Contributor

Description

Script to see if we've had RT parse jobs that failed in the past. Saves to a file.

Contribute to #4284

Type of change

  • New feature

How has this been tested?

➜  data git:(script_to_find_missing_rt_data) ✗ python find_missing_rt_parsed_files.py                                  
trip_updates_outcomes/dt=2022-09-15/hour=2022-09-15T20:00:00+00:00
Elapsed time: 75.82 seconds
['trip_updates_outcomes/dt=2022-09-15/hour=2022-09-15T20:00:00+00:00', 'trip_updates_outcomes/dt=2022-09-15/hour=2022-09-15T21:00:00+00:00', 'trip_updates_outcomes/dt=2022-09-15/hour=2022-09-15T22:00:00+00:00', 'trip_updates_outcomes/dt=2022-09-15/hour=2022-09-15T23:00:00+00:00', 'trip_updates_outcomes/dt=2022-09-16/hour=2022-09-16T00:00:00+00:00']
Missing files: 567
Extra files: 222
Sample missing files: ['trip_updates_outcomes/dt=2025-07-11/hour=2025-07-11T13:00:00+00:00', 'trip_updates_outcomes/dt=2022-09-15/hour=2022-09-15T10:00:00+00:00', 'trip_updates_outcomes/dt=2022-10-26/hour=2022-10-26T10:00:00+00:00', 'service_alerts_outcomes/dt=2025-01-31/hour=2025-01-31T00:00:00+00:00', 'trip_updates_outcomes/dt=2025-07-14/hour=2025-07-14T20:00:00+00:00']
Sample extra files: ['trip_updates_outcomes/dt=2025-09-10/hour=2025-09-10T12:00:00+00:00', 'trip_updates_outcomes/dt=2025-09-09/hour=2025-09-09T06:00:00+00:00', 'trip_updates_outcomes/dt=2025-09-10/hour=2025-09-10T20:00:00+00:00', 'service_alerts_outcomes/dt=2025-09-11/hour=2025-09-11T13:00:00+00:00', 'trip_updates_outcomes/dt=2025-09-12/hour=2025-09-12T02:00:00+00:00']

Post-merge follow-ups

Document any actions that must be taken post-merge to deploy or otherwise implement the changes in this PR (for example, running a full refresh of some incremental model in dbt). If these actions will take more than a few hours after the merge or if they will be completed by someone other than the PR author, please create a dedicated follow-up issue and link it here to track resolution.

  • No action required
  • Actions required (specified below)

@erikamov erikamov force-pushed the script_to_find_missing_rt_data branch from c0bfdc9 to fdc6062 Compare September 15, 2025 17:56
@erikamov
Copy link
Contributor

erikamov commented Sep 16, 2025

Maybe it could live in airflow folder.... airflow/plugins/scripts/ or a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants