This project uses Azure, Apache Spark, and Python to process and analyze Olympic data. Below are the main components and technologies used in the project:
- Azure: Microsoft's cloud platform used for data storage and processing.
- Apache Spark: A unified analytics engine for processing large volumes of data.
- Python: The programming language used to write data processing and analysis scripts.
The goal of this project is to process and analyze Olympic data to extract valuable information about athletes, their coaches, teams, and events. The data is stored in Azure and processed using Apache Spark to efficiently handle large volumes of data.
- CSVs/: Contains all the CSV files with Olympic data.
Athletes.csvCoaches.csvEntriesGender.csvMedals.csvTeams.csv
- tokyo_olympic.ipynb: The Jupyter notebook containing the data processing and analysis scripts.