Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 26, 2025

Related Issues

  • fixes #959

Proposed Changes:

Users processing large volumes of articles need to know where cache files are stored and how to delete them. Expanded the Caching section in docs/user_guide/advanced.rst with:

  • Cache Location: OS-specific paths (/tmp/.newspaper_scraper/ on Linux/macOS) and programmatic discovery via newspaper.settings
  • Cache Contents: Documents memoized/ (article URLs per domain) and category_cache/ (category detection, 24h TTL)
  • Clearing the Cache: Three approaches:
    • Per-source: source.clean_memo_cache()
    • Programmatic global: shutil.rmtree(settings.TOP_DIRECTORY)
    • CLI: rm -rf /tmp/.newspaper_scraper
  • Quick Tips: Best practices for cache management at scale

Example to find cache location:

from newspaper import settings

print(f"Cache directory: {settings.TOP_DIRECTORY}")
print(f"Article memoization cache: {settings.MEMO_DIR}")
print(f"Category cache: {settings.CACHE_DIRECTORY}")

How did you test it?

Documentation build via make html - builds successfully with no new warnings.

Notes for the reviewer

Documentation-only change. Reorganized existing caching content into "Disabling Caching" subsection for better structure alongside new subsections.

Checklist

  • I have updated the related issue with new insights and changes
  • I added unit tests and updated the docstrings
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • docs.python.org
    • Triggering command: /home/REDACTED/.local/bin/sphinx-build sphinx-build -M html . _build (dns block)
  • docs.readthedocs.io
    • Triggering command: /home/REDACTED/.local/bin/sphinx-build sphinx-build -M html . _build (dns block)
  • www.sphinx-doc.org
    • Triggering command: /home/REDACTED/.local/bin/sphinx-build sphinx-build -M html . _build (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>Where to find and delete all articles?</issue_title>
<issue_description> Issue by steeljardas
Fri Jan 6 08:33:40 2023
Originally opened as codelucas/newspaper#959


I am using Newspaper3k on around 20k articles, where would I need to go to delete all these articles that Newspaper3k is downloading?
</issue_description>

<agent_instructions>Improve the documentation in Advanced/Caching section with information addressing this issue. Give some code examples and quick tips</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@AndyTheFactory **Comment by [johnbumgarner](https://github.com/johnbumgarner)** _Sat Jan 14 17:16:18 2023_

If memoize_articles is not set to False then Newspaper will cache the article's urls and associated data in your system's temp directory. Here are some details on this cache in my Newspaper3k Overview Document.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits November 26, 2025 08:53
…ons, and tips

Co-authored-by: AndyTheFactory <863810+AndyTheFactory@users.noreply.github.com>
…guration

Co-authored-by: AndyTheFactory <863810+AndyTheFactory@users.noreply.github.com>
Copilot AI changed the title [WIP] Improve documentation for deleting articles in Newspaper3k docs: Improve caching documentation with location, clearing instructions, and tips Nov 26, 2025
Copilot AI requested a review from AndyTheFactory November 26, 2025 08:56
@AndyTheFactory AndyTheFactory marked this pull request as ready for review November 26, 2025 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Where to find and delete all articles?

2 participants