Skip to content

fix: use Crawlee dataset.drop() for graceful storage cleanup#745

Draft
younglim wants to merge 1 commit into
masterfrom
improv/use-dataset-to-destroy
Draft

fix: use Crawlee dataset.drop() for graceful storage cleanup#745
younglim wants to merge 1 commit into
masterfrom
improv/use-dataset-to-destroy

Conversation

@younglim

@younglim younglim commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Replace platform-specific EPERM workaround in generateArtifacts with Crawlee's native dataset.drop() and requestQueue.drop() APIs
  • Call Configuration.getStorageClient().teardown() to flush pending background writes before dropping storage
  • Eliminate the 3-5 second delay and Windows retry logic that was needed to work around file locks

Test plan

  • Run a full scan on macOS/Linux — verify crawlee folder is cleanly removed with no warnings
  • Run on Windows — verify no EPERM errors or Node crashes
  • Verify scan results (HTML report, CSV, JSON) are generated correctly
  • Test error path: kill scan mid-way, verify cleanUp() doesn't hang
  • Run existing test suite: npm test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant