Skip to content

Crawler may be too politeΒ #247

@laundmo

Description

@laundmo

when trying stract with one of my common searches, specifically searching docs.rs: "bevy Commands site:docs.rs" i noticed there were no results at all. even searching the whole crate "bevy site:docs.rs" led to no relevant results.

Reading the crawler documentation, especially the section about politeness, its immediately obvious why: 1 request every 5 seconds is simply not fast enough.

Some very rough math:
bevy has ~3200 items in their docs (counted on the "all items" docs.rs page)
3200*5seconds = 4.4h/24h = 5.45 scans of bevy-equivalent docs per day

docs.rs recieves around 800 releases at minimum per day, with one day recently having 1800 releases. Its very likely a few of these will be of similar size to bevy, or at least adding a few together will reach that level.

That means, assuming my math isnt horribly off in some way, at 5 seconds per request the crawler can never catch up.

I'm sure theres other domains like this, ones hosting a lot of new pages per day.

It may be worth considering if the crawler is too polite.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions