This project performs Web Scrapping & Sentiment Analysis on verified Gartner reviews of popular BMC Software Products, using Python NLP Techniques and Data Visualization.
BMC Product Review Scrapping & Sentiment Analysis is an open source project designed for performing sentiment analysis on customer reviews of BMC Software products scraped from public platforms like Gartner. It leverages Natural Language Processing (NLP) techniques and visualization tools to extract actionable insights from product reviews.
This project is perfect for beginners and intermediate contributors who want hands-on experience with web scraping, NLP, data visualization, and open source collaboration.
It includes:
- Web scraping from Gartner Peer Insights
- Preprocessing text with NLP
- VADER-based sentiment scoring
- Charts, word clouds, and Excel exports
We scrape verified reviews from the following Gartner pages:
| Product Name | Review Page |
|---|---|
| 🧠 BMC Helix ITSM | Link |
| 📈 BMC Helix Operations Management | Link |
| ⚙️ TrueSight Server Automation | Link |
| 📊 Control-M | Link |
Your final analysis should look like this (in Excel or CSV):
| Product Name | Review Title | Overall Rating | Industry | Function | Date | Other Vendors | Country | Pros | Cons | Overall Comment | Sentiment |
|---|
Visuals like pie charts and word clouds should be stored in the outputs/ folder.
BMC-Product-Review-Scrapping-and-Sentiment-Analysis/
│
├── 📂 data/ # Sample scraped data files (Excel/CSV)
├── 📂 notebooks/ # Jupyter notebooks for quick experimentation
├── 📂 scripts/
│ ├── scraper.py # Scraper module
│ ├── nlp_preprocessing.py # Text cleaning + POS + lemmatization
│ ├── sentiment.py # VADER-based sentiment scoring
│ └── visualize.py # Wordclouds, pie charts, bar graphs
│
├── 📂 outputs/ # Saved images, processed files
│
├── requirements.txt # Install dependencies
├── README.md # Project overview
├── CONTRIBUTING.md # Contribution guidelines
├── LICENSE # Open-source license
└── .gitignore- Robust product review scraper for BMC products
- Clean text with:- Tokenization Lemmatization POS Tagging Stopword Removal
- Sentiment classification using VADER
- Generate sentiment reports and dashboards
- Modularized structure for easy expansion and contributions
- Export analysis to Excel and visual graphs
- Python 3.x
- Selenium / Playwright (for scraping)
- NLTK, VADER (for sentiment)
- Pandas, Matplotlib, WordCloud
- Excel output (xlsxwriter/openpyxl)
- Any
git clone https://github.com/Yash22222/BMC-Product-Review-Scrapping-and-Sentiment-Analysis.git
cd BMC-Product-Review-Scrapping-and-Sentiment-Analysis
pip install -r requirements.txt- Scrape reviews using the
scraper.pyscript. - Clean and preprocess with
nlp_preprocessing.py. - Analyze sentiment using
sentiment.py. - Visualize using
visualize.py.
We welcome contributions from GSSoC contributors and all open source enthusiasts!
-
Fork the repository
-
Clone your fork
git clone https://github.com/YOUR_USERNAME/BMC-Product-Review-Scrapping-and-Sentiment-Analysis.git
-
Commit your changes
git commit -m "✨ Added sentiment model for XYZ" -
Push to your fork
git push origin feature/your-feature-name
-
Open a Pull Request with a clear explanation.
| Type | Ideas |
|---|---|
| 🔄 Add new BMC products | Expand the scraper |
| 🎨 Streamlit UI | Upload reviews & analyze sentiment |
| 🧾 PDF/Excel report generator | Auto reports for each product |
| 🤖 Add BERT | Use HuggingFace transformer models |
| 🌐 Multi-language support | Translate & analyze non-English reviews |
| 🛠 Docker Support | Add Dockerfile for easy setup |
This project is licensed under the MIT License.
- Proudly open for contributions under GSSoC 2025