Handling lost connection with Playwright process that make scrape hangs in error

I have long scrapes with long sequences of actions needed to be taken during playwright scrape. I have handled most of the problems with scrape and established long runes. I am now facing issues with lost connection with the playwright process. We sure, can’t do much about process that died but please help me ensure that such a request ends up in errorback so we can properly handle it and continue scraping.

Minimum spider setup is described in the **minimal_spider_setyp.txt**

Route cause of error is:
/opt/scrapy_enviroment/lib/python3.11/site-packages/playwright/driver/playwright.sh: line 6: 2323044 Hangup                  "$PLAYWRIGHT_NODEJS_PATH" "$SCRIPT_PATH/package/lib/cli/cli.js" "$@"
/opt/scrapy_enviroment/lib/python3.11/site-packages/playwright/driver/playwright.sh: line 6: 2323042 Hangup                  "$PLAYWRIGHT_NODEJS_PATH" "$SCRIPT_PATH/package/lib/cli/cli.js" "$@"

To be able to raise awareness about that we have used **ScrapyPlaywrightMemoryUsageExtension** and we caught it as shown in example **inital_error.txt**

We have extended ScrapyPlaywrightMemoryUsageExtension to be able to try/catch such exceptions. We have attempted to raise some scrapy playwright known error to be able to route it back to errorback function that should handle remaining and proceed with scrape.

Can you please evaluate our **CustomScrapyPlaywrightMemoryUsageExtension** and advise if **IgnoreRequest** is a suitable exception and suggest what we can do moving forward? We are debugging the current solution as I am reporting this now.

[minimal_spider_setyp.txt](https://github.com/user-attachments/files/18230319/minimal_spider_setyp.txt)
[inital_error.txt](https://github.com/user-attachments/files/18230328/inital_error.txt)
[custom_memusage_extension.txt](https://github.com/user-attachments/files/18230336/custom_memusage_extension.txt)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handling lost connection with Playwright process that make scrape hangs in error #331

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Handling lost connection with Playwright process that make scrape hangs in error #331

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions