Description of the files in this repository :
- The final dataset after cleaning. The original can be found on Kaggle at the link below
- The EDA ipynb file
- The clustering ipynb file
- The pbix file containing the dashboard
- A pdf file containing the presentation
As a duo for 48 hours, we have the task of analyzing and cleaning the dataset on OkCupid's user profile in 2012. The dataset can be found on Kaggle here: https://www.kaggle.com/datasets/andrewmvd/okcupid-profiles
Interactive dashboard for observing trends and recurring user profiles Profile category creation to identify the most popular profile types A presentation stand to showcase our work
- EDA with Python : observe dataset, clean and, above all, reorganize columns
- Identify the most relevant information and digitize it for clustering (create the required profile categories)
- Design dashboard in Power BI
- Organize a presentation and write an oral speech
- Better analyze the clusters created with k-means and see how we can improve them by using more relevant information, etc.
- Perform NLP with the columns containing the presentation texts or other texts that users enter on their profiles. With this, we could have better observed the trend of their searches or even the trend of their profile descriptions.
