fix: Handle TimeoutError in fetch_info to prevent double wait#600
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces two main improvements: (1) URL validation is now performed asynchronously in a thread to avoid blocking the event loop, and (2) extraction timeouts are dynamically increased ("budgeted") for requests that are expected to involve significant sleep intervals, improving reliability for sleep-heavy sources. The changes also include new tests to ensure these behaviors are correct.
Asynchronous URL Validation:
validate_urlin request handlers are now run usingasyncio.to_thread, ensuring that potentially blocking validation does not block the main event loop.Timeout Budgeting for Sleep-Heavy Extraction:
budget_sleepparameter is added tofetch_infoand all its intended call sites. When enabled, the extraction timeout is increased based on the configured sleep interval, up to a maximum of 300 seconds extra. This helps prevent premature timeouts for sources that require many sleep intervals.Other Improvements:
app/routes/api/images.py, the thumbnail fetcher now explicitly sets a 10-second timeout for HTTP requests.