Skip to content

Conversation

@Sarthacker
Copy link

@Sarthacker Sarthacker commented Nov 26, 2025

Closes Issue #499

PR Title

Upgraded Existing Web Scraper Using Custom Search Engine ID and Google API

Summary

This PR introduces a fully functional web scraping tool that extracts search results dynamically, logs all key actions, and performs basic data analysis on the gathered data.

Description

The changes are as follows:

  • Scope change: Old script scrapes using requests + BeautifulSoup. New script performs Google Custom Search queries and saves search results (Title/URL/Snippet), then produces a generated summary using Google Gemini (Generative AI).
  • The old script scrapes <h2 class="blog-title"> elements and prints and writes them to blog_titles.txt.
  • The new script:
    • Accepts a search query via command-line argument.
    • Queries Google Custom Search API to fetch search results (Title, URL, Snippet).
    • Summarizes snippets using Google Gemini (Generative AI).
    • Adds structured logging into data/logs/.
    • Saves result in a structured CSV form.

Screenshots

logs csv_file data_analysis
  • The user needs to write their query through the CLI as shown, the results are stored in a CSV file and a summary of those results is also stored as a text file.

Checks

in the repository

  • Made no changes that degrades the functioning of the repository
  • Gave each commit a better title (unlike updated README.md)

in the PR

  • Followed the format of the pull_request_template
  • Made the Pull Request in a small level (for the creator's wellfare)
  • Tested the changes you made

Thank You,
Sarthak

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant