Skip to content

Conversation

alberba
Copy link

@alberba alberba commented May 11, 2025

This commit introduces improvements to the get_url_soup function to handle rate-limiting errors (HTTP 429) more effectively. Below are the details of the changes:

Changes Made:

  1. Added max_attempts Parameter:

    • Introduced a new parameter max_attempts with a default value of 5, allowing multiple retry attempts when temporary errors occur.
  2. Handling "Too Many Requests" (HTTP 429):

    • Implemented an exponential backoff retry mechanism in case the page content indicates "too many requests."
    • Added random jitter to the delay between retries to reduce the likelihood of triggering server rate limits.
  3. Improved Error Messages:

    • Enhanced error messages to include the URL and number of failed attempts for better debugging.
  4. Additional Logic:

    • Checks if the page contains a <pre> element with the text "too many requests" and retries after a delay if detected.

… extracción de subtítulos y fechas, con manejo de errores para fechas no encontradas. Se implementa un mecanismo de reintentos para la obtención de páginas.
@alberba alberba closed this May 11, 2025
@alberba alberba deleted the fix/too-many-requests branch May 11, 2025 08:43
@alberba alberba restored the fix/too-many-requests branch May 12, 2025 15:13
@alberba alberba reopened this May 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant