Skip to content
This repository was archived by the owner on Sep 28, 2022. It is now read-only.

Conversation

flagist0
Copy link

Middleware was emitting requests with dont_filter=True, causing multiple uncaught duplicates.

dont_filter is not needed by itself, but it was protecting request queue from exhaustion -- middleware emits one request at a time, so there is always only one request in Scrapy queue. If this request is duplicate and it is dropped by dupefilter, Scrapy request queue becomes empty and spider is closed, even if there are many requests in middleware's queue.

The solution is to catch spider_idle signal and supply next request from the queue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant