This project fetches and integrates gas prices, consumer price index (CPI), and simulated sales data, validates the merged dataset, and sends an alert email if validation fails.
Ensure required Python libraries are installed:
pip install pandas python-dotenvCreate a .env file in your project directory with the following content:
FRED_API=your_fred_api_key
SENDER_EMAIL=[email protected]
RECEIVER_EMAIL=[email protected]
EMAIL_PASSWORD=your_email_passwordYou can run the script manually:
python main_commented.pyOr schedule it monthly using cron:
0 7 1 * * /usr/bin/python3 /path/to/main_commented.py >> /path/to/log.txt 2>&1-
Data Fetching:
- Gas price data and CPI are fetched from FRED using API keys.
- Simulated sales data is generated using an internal module.
-
Preprocessing:
- CPI and sales data are matched by month.
- Sales + CPI data are merged with weekly gas price data using ISO week numbers.
-
Validation:
- Ensures all required columns are present.
- Checks for nulls, duplicates, unexpected data types, and suspicious value spikes.
-
Notification:
- If validation fails, an email is sent with all error messages.
-
Output:
- Cleaned, validated dataset is saved as
data/final_data_1.csv.
- Cleaned, validated dataset is saved as
- Sales data is simulated and assumes a fixed schema.
- Gas and CPI data are available and up-to-date from the FRED API.
- The weekly merge via
week_numis sufficient despite potential year overlaps. - The threshold for suspicious spikes is ±50% per week.
- Emails are sent via Gmail's SMTP server (
smtp.gmail.com:587), and the app password method is used for secure access.
The final merged and validated dataset is saved at:
data/final_data_1.csv
For support or questions, feel free to raise an issue or email the maintainer.