🎓 Sentiment & Theme Analysis of High- vs. Low-Performing Schools Using Online Reviews and Discussions
A working demonstration of this project can be found at TJAlytix.streamlit.app (tee-jah-lyticks).
A working demonstration of this project can be found at TJAlytix.streamlit.app (tee-jah-lyticks).
To get started with this project, please clone the repository and navigate to it:
> git clone https://github.com/junclemente/ads509-final_project.git
> cd ads509-final_project
This project uses a conda environment specified in a YAML file for reproducibility and consistent development. Ensure you have Anaconda or Miniconda installed.
Run the following:
conda env create -f environment/ads509-streamlit.yamlIf there are any updates to the environment, you can update the environment with the following:
conda env update -f environment/ads509-streamlit.yaml --pruneThe --prune option cleans the environment by removing packages that are
no longer required.
- main → stable, production-ready branch (protected).
- develop → active development branch where new features are merged.
- feature/* → short-lived branches for specific tasks.
- Create a feature branch from
develop. - Commit your changes with clear messages.
- Open a Pull Request into
develop. - Once reviewed, your changes will be merged into
develop. - At milestones,
developis merged intomain.
👉 See CONTRIBUTING.md for full guidelines.
- Exploratory Data Analysis
- Text Cleaning
- Topic Modeling
- Sentiment Analysis
- Python 3.11+
- Pandas
- NLTK
- Numpy
- Jupyter Notebook
- Matplotlib / Seaborn
- Scikit-learn
- Streamlit
- ChatGPT
- VSCode
This project looks at how people talk about schools in high-performing vs. low-performing districts. We're pulling reviews and discussions from Reddit to see both the overall sentiment (positive or negative) and the main themes that come up in these conversations.
Our goals are to:
- Run sentiment analysis to check if high-performing schools are talked about more positively compared to low-performing ones.
- Use topic modeling to pull out key themes in each group (like academics, safety, teacher quality, resources).
- Compare the sentiment and themes between high- and low-performing schools to get a clearer picture of how school quality is being perceived online.
The Reddit API was used to gather the text data sources for processing.
This project uses aggregated Reddit data obtained via the Reddit API.
- No raw Reddit posts or user comments are displayed, stored, or redistributed.
- Only aggregated outputs such as keywords, topics, and sentiment scores are shown.
- All analysis is for academic, non-commercial research purposes.
- Web Application: TJAlytics.streamlit.app
- Project Presentation: Canva.com (TBD)
- Project Repository: https://github.com/junclemente/ads509-final_project
Parts of this project were developed with help from ChatGPT (OpenAI):
- Debugging Python functions and pipeline logic
- Drafting/rewriting docstrings and short notebook summaries
- Creating small code snippets
All generated code and text were reviewed and edited by the authors.