Skip to content

Conversation

charlaie
Copy link
Contributor

@charlaie charlaie commented Aug 25, 2025

Summary

Add redirected_status_code to CrawlResult so that there is a way to tell whether the redirected response is successful or not

Fixes #1434

List of files changed and why

  • async_crawler_stategy.py - store and return the redirected_status_code from _crawl_web
  • async_webcrawler.py - copy the redirected_status_code from the async_response to the crawl_result
  • models.py - added the optional redirected_status_code field to CrawlResult and AsyncCrawlResponse

How Has This Been Tested?

I have tested with an example site containing working and failing redirect links, and the redirected status code is correctly set

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added/updated unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Results

Here is the new response from the fail and success redirect site (some fields removed for conciseness)

{
  "url": "http://www.localhost:8000/",
  "html": "<!DOCTYPE html><html lang=\"en\"><head>\n    <meta charset=\"utf-8\">\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n    <title>Home Page</title>\n  </head>\n  <body>\n    <h1>Home Page</h1>\n    <p><a href=\"/redirect-fail\">Redirect to rate limit</a></p>\n    <p><a href=\"/redirect-success\">Redirect to success</a></p>\n  \n  \n\n</body></html>",
  "fit_html": "<html lang=\"en\"><body>\n    <h1>Home Page</h1>\n    <p><a>Redirect to rate limit</a></p>\n    <p><a>Redirect to success</a></p>\n  \n  \n\n</body></html>",
  "success": true,
  "cleaned_html": "<html>\n<head>\n    <title>Home Page</title>\n  </head>\n  <body>\n    <h1>Home Page</h1>\n    <p><a href=\"/redirect-fail\">Redirect to rate limit</a></p>\n    <p><a href=\"/redirect-success\">Redirect to success</a></p>\n  \n  \n\n</body>\n</html>",
  "links": {
    "internal": [
      {
        "href": "http://www.localhost:8000/redirect-fail",
        "text": "Redirect to rate limit",
        "title": "",
        "base_domain": "localhost",
        "head_data": null,
        "head_extraction_status": null,
        "head_extraction_error": null,
        "intrinsic_score": 0.0,
        "contextual_score": null,
        "total_score": null
      },
      {
        "href": "http://www.localhost:8000/redirect-success",
        "text": "Redirect to success",
        "title": "",
        "base_domain": "localhost",
        "head_data": null,
        "head_extraction_status": null,
        "head_extraction_error": null,
        "intrinsic_score": 0.0,
        "contextual_score": null,
        "total_score": null
      }
    ],
    "external": [
      {
        "href": "http://www.localhost:8000/redirect-fail",
        "text": "Redirect to rate limit",
        "title": "",
        "base_domain": "localhost",
        "head_data": null,
        "head_extraction_status": null,
        "head_extraction_error": null,
        "intrinsic_score": 0.0,
        "contextual_score": null,
        "total_score": null
      },
      {
        "href": "http://www.localhost:8000/redirect-success",
        "text": "Redirect to success",
        "title": "",
        "base_domain": "localhost",
        "head_data": null,
        "head_extraction_status": null,
        "head_extraction_error": null,
        "intrinsic_score": 0.0,
        "contextual_score": null,
        "total_score": null
      }
    ]
  },
  "metadata": {
    "depth": 0,
    "parent_url": null
  },
  "error_message": "",
  "response_headers": {
    "connection": "close",
    "content-length": "365",
    "content-type": "text/html; charset=utf-8",
    "date": "Mon, 25 Aug 2025 05:48:33 GMT",
    "server": "Werkzeug/3.1.3 Python/3.13.4"
  },
  "status_code": 200,
  "redirected_url": "http://www.localhost:8000/",
  "redirected_status_code": 200,
}
{
  "url": "http://www.localhost:8000/redirect-fail",
  "html": "<html><head></head><body>Too Many Requests - rate limited (simulated)</body></html>",
  "fit_html": "<html><body>Too Many Requests - rate limited (simulated)</body></html>",
  "success": true,
  "cleaned_html": "<html><body>Too Many Requests - rate limited (simulated)</body></html>",
  "metadata": {
    "title": null,
    "description": null,
    "keywords": null,
    "author": null,
    "depth": 1,
    "parent_url": "http://www.localhost:8000/"
  },
  "error_message": "",
  "response_headers": {
    "connection": "close",
    "content-length": "197",
    "content-type": "text/html; charset=utf-8",
    "date": "Mon, 25 Aug 2025 05:48:33 GMT",
    "location": "/fail",
    "server": "Werkzeug/3.1.3 Python/3.13.4"
  },
  "status_code": 302,
  "redirected_url": "http://www.localhost:8000/fail",
  "redirected_status_code": 429,
  "tables": []
}
{
  "url": "http://www.localhost:8000/redirect-success",
  "html": "<html><head></head><body>Success</body></html>",
  "fit_html": "<html><body>Success</body></html>",
  "success": true,
  "cleaned_html": "<html><body>Success</body></html>",
  "metadata": {
    "depth": 1,
    "parent_url": "http://www.localhost:8000/"
  },
  "error_message": "",
  "response_headers": {
    "connection": "close",
    "content-length": "203",
    "content-type": "text/html; charset=utf-8",
    "date": "Mon, 25 Aug 2025 05:48:36 GMT",
    "location": "/success",
    "server": "Werkzeug/3.1.3 Python/3.13.4"
  },
  "status_code": 302,
  "redirected_url": "http://www.localhost:8000/success",
  "redirected_status_code": 200,
  "tables": []
}

Copy link
Contributor

coderabbitai bot commented Aug 25, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant