-
Notifications
You must be signed in to change notification settings - Fork 95
Description
Describe the bug
GoogleNewsSource does not respect the config.language setting when changed after initialization. The underlying gnews.GNews object is created during __init__ with the default language ('en'), and modifying source.config.language afterwards has no effect. This causes English articles to be fetched regardless of the specified country or language configuration, making the class unusable for international news fetching.
To Reproduce
Steps to reproduce the behavior:
-
Install newspaper4k with gnews support:
pip install newspaper4k[gnews]
-
Use the following code to fetch Spanish news from Spain:
from newspaper.google_news import GoogleNewsSource # Attempt to fetch Spanish news from Spain source = GoogleNewsSource(country="ES", period="3d", max_results=3) source.config.language = "es" # Set Spanish language after initialization source.build(top_news=True) # Print article titles print(f"Found {len(source.articles)} articles:") for i, art in enumerate(source.articles, 1): print(f"{i}. {art.title}")
-
Observe that articles are in English (US news) instead of Spanish:
Found 3 articles: 1. FBI arrests suspect in 2021 D.C. pipe bomb case, sources say - CBS News 2. What to know about Adm. 'Mitch' Bradley, commander at the center of boat strike... 3. Trump 'garbage' rhetoric about Somalis draws cheers from administration... -
For comparison, using gnews directly (without newspaper4k wrapper) works correctly:
import gnews g = gnews.GNews(language='spanish', country='ES', max_results=3) articles = g.get_top_news() print(f"Found {len(articles)} articles:") for i, art in enumerate(articles, 1): print(f"{i}. {art['title']}")
Output (correct Spanish articles):
Found 3 articles: 1. Ferraz se niega a llevar a Salazar a la Fiscalía como piden dirigentes del PSOE 2. La empresa gestora del hospital de Torrejón despidió a cuatro directivos... 3. Los sindicatos médicos convocan una huelga indefinida en enero contra el estatuto
Expected behavior
When setting source.config.language = "es" with country="ES", the source should fetch Spanish-language articles from Spanish news sources. The articles should be from Spanish news outlets (like Cadena SER, ABC, elDiario.es) and in the Spanish language.
Expected output example:
- "Ferraz se niega a llevar a Salazar a la Fiscalía como piden dirigentes del PSOE"
- "La empresa gestora del hospital de Torrejón despidió a cuatro directivos tras denunciar órdenes..."
- "Los sindicatos médicos convocan una huelga indefinida en enero contra el estatuto"
Screenshots
Not applicable - this is a console/API behavior issue without visual components.
System information
- OS: Linux (kernel 6.8.0-88-generic)
- Python version: 3.12.3
- Library version: newspaper4k 0.9.4.1
- Dependency version: gnews 0.4.2
- Installation method:
pip install newspaper4k[gnews]in virtual environment
Additional context
Root Cause
The issue occurs in newspaper/google_news.py in the GoogleNewsSource.__init__() method (around line 95-104):
self.gnews = gnews.GNews(
language=self.config.language, # Reads config.language at initialization time
country=self.country,
period=self.period,
start_date=self.start_date,
end_date=self.end_date,
max_results=self.max_results,
exclude_websites=self.exclude_websites,
proxy=proxy,
)At initialization, self.config.language defaults to 'en'. The gnews.GNews object is created with this default value. When users subsequently set source.config.language = 'es', this only updates the configuration object but does not recreate or update the underlying gnews object, which continues to use language='en'.
Suggested Solutions
-
Accept language parameter in
__init__(Recommended):def __init__(self, country=None, language='en', period=None, start_date=None, ...): # Pass language to parent Source class or set before creating gnews
-
Make config.language setter update the gnews object:
# In Configuration class @language.setter def language(self, value): self._language = value # Update gnews object if it exists if hasattr(self, '_source') and hasattr(self._source, 'gnews'): self._source.gnews.language = value
-
Lazy initialization of gnews object:
Create the gnews object inbuild()method instead of__init__(), so it picks up the current config values.
Current Workaround
For users who need this functionality now, the following workaround bypasses newspaper4k's validation:
from newspaper.google_news import GoogleNewsSource
source = GoogleNewsSource(country="ES", period="3d", max_results=10)
# Bypass newspaper4k by setting gnews language directly
source.gnews.language = 'spanish' # Use gnews full language name
source.build(top_news=True)Important notes about the workaround:
- Use gnews full language names ('spanish', 'portuguese portugal', 'portuguese brasil') instead of 2-char ISO codes ('es', 'pt')
- This directly modifies the gnews object, bypassing newspaper4k's validation
- This is fragile and may break with future updates
Impact
This bug makes GoogleNewsSource completely unusable for fetching non-English news when following the documented pattern of setting config.language after initialization. It affects:
- All non-English languages
- Any application that needs to fetch international news
- Multi-language news aggregation systems
The issue is particularly problematic because:
- The documented pattern (creating source, then setting config) doesn't work
- There's no clear way to pass language during initialization
- The failure is silent - wrong results instead of an error
- Users may not notice they're getting wrong-language content until they inspect the results