Skip to content

[BUG] GoogleNewsSource ignores language configuration after initialization #688

@antoniocosta

Description

@antoniocosta

Describe the bug

GoogleNewsSource does not respect the config.language setting when changed after initialization. The underlying gnews.GNews object is created during __init__ with the default language ('en'), and modifying source.config.language afterwards has no effect. This causes English articles to be fetched regardless of the specified country or language configuration, making the class unusable for international news fetching.

To Reproduce

Steps to reproduce the behavior:

  1. Install newspaper4k with gnews support:

    pip install newspaper4k[gnews]
  2. Use the following code to fetch Spanish news from Spain:

    from newspaper.google_news import GoogleNewsSource
    
    # Attempt to fetch Spanish news from Spain
    source = GoogleNewsSource(country="ES", period="3d", max_results=3)
    source.config.language = "es"  # Set Spanish language after initialization
    source.build(top_news=True)
    
    # Print article titles
    print(f"Found {len(source.articles)} articles:")
    for i, art in enumerate(source.articles, 1):
        print(f"{i}. {art.title}")
  3. Observe that articles are in English (US news) instead of Spanish:

    Found 3 articles:
    1. FBI arrests suspect in 2021 D.C. pipe bomb case, sources say - CBS News
    2. What to know about Adm. 'Mitch' Bradley, commander at the center of boat strike...
    3. Trump 'garbage' rhetoric about Somalis draws cheers from administration...
    
  4. For comparison, using gnews directly (without newspaper4k wrapper) works correctly:

    import gnews
    
    g = gnews.GNews(language='spanish', country='ES', max_results=3)
    articles = g.get_top_news()
    
    print(f"Found {len(articles)} articles:")
    for i, art in enumerate(articles, 1):
        print(f"{i}. {art['title']}")

    Output (correct Spanish articles):

    Found 3 articles:
    1. Ferraz se niega a llevar a Salazar a la Fiscalía como piden dirigentes del PSOE
    2. La empresa gestora del hospital de Torrejón despidió a cuatro directivos...
    3. Los sindicatos médicos convocan una huelga indefinida en enero contra el estatuto
    

Expected behavior

When setting source.config.language = "es" with country="ES", the source should fetch Spanish-language articles from Spanish news sources. The articles should be from Spanish news outlets (like Cadena SER, ABC, elDiario.es) and in the Spanish language.

Expected output example:

  • "Ferraz se niega a llevar a Salazar a la Fiscalía como piden dirigentes del PSOE"
  • "La empresa gestora del hospital de Torrejón despidió a cuatro directivos tras denunciar órdenes..."
  • "Los sindicatos médicos convocan una huelga indefinida en enero contra el estatuto"

Screenshots

Not applicable - this is a console/API behavior issue without visual components.

System information

  • OS: Linux (kernel 6.8.0-88-generic)
  • Python version: 3.12.3
  • Library version: newspaper4k 0.9.4.1
  • Dependency version: gnews 0.4.2
  • Installation method: pip install newspaper4k[gnews] in virtual environment

Additional context

Root Cause

The issue occurs in newspaper/google_news.py in the GoogleNewsSource.__init__() method (around line 95-104):

self.gnews = gnews.GNews(
    language=self.config.language,  # Reads config.language at initialization time
    country=self.country,
    period=self.period,
    start_date=self.start_date,
    end_date=self.end_date,
    max_results=self.max_results,
    exclude_websites=self.exclude_websites,
    proxy=proxy,
)

At initialization, self.config.language defaults to 'en'. The gnews.GNews object is created with this default value. When users subsequently set source.config.language = 'es', this only updates the configuration object but does not recreate or update the underlying gnews object, which continues to use language='en'.

Suggested Solutions

  1. Accept language parameter in __init__ (Recommended):

    def __init__(self, country=None, language='en', period=None, start_date=None, ...):
        # Pass language to parent Source class or set before creating gnews
  2. Make config.language setter update the gnews object:

    # In Configuration class
    @language.setter
    def language(self, value):
        self._language = value
        # Update gnews object if it exists
        if hasattr(self, '_source') and hasattr(self._source, 'gnews'):
            self._source.gnews.language = value
  3. Lazy initialization of gnews object:
    Create the gnews object in build() method instead of __init__(), so it picks up the current config values.

Current Workaround

For users who need this functionality now, the following workaround bypasses newspaper4k's validation:

from newspaper.google_news import GoogleNewsSource

source = GoogleNewsSource(country="ES", period="3d", max_results=10)
# Bypass newspaper4k by setting gnews language directly
source.gnews.language = 'spanish'  # Use gnews full language name
source.build(top_news=True)

Important notes about the workaround:

  • Use gnews full language names ('spanish', 'portuguese portugal', 'portuguese brasil') instead of 2-char ISO codes ('es', 'pt')
  • This directly modifies the gnews object, bypassing newspaper4k's validation
  • This is fragile and may break with future updates

Impact

This bug makes GoogleNewsSource completely unusable for fetching non-English news when following the documented pattern of setting config.language after initialization. It affects:

  • All non-English languages
  • Any application that needs to fetch international news
  • Multi-language news aggregation systems

The issue is particularly problematic because:

  1. The documented pattern (creating source, then setting config) doesn't work
  2. There's no clear way to pass language during initialization
  3. The failure is silent - wrong results instead of an error
  4. Users may not notice they're getting wrong-language content until they inspect the results

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions