Skip to content

FabricioPorcelli/Solar-Energy-Forecasting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Solar Power Generation Forecasting with Machine Learning

Overview

This project builds a production-style ML pipeline to forecast solar power generation (kW) using historical weather and temporal data. The goal is to support grid stability and renewable integration with accurate, explainable predictions.

Problem Statement

Accurate solar energy forecasting is essential for grid operators and energy planners to manage supply, demand, and storage. This project addresses the regression problem of predicting hourly solar power output based on past weather and time features.

Data

  • ~3000 hourly observations
  • Target: Power generated (kW), renamed internally as power_generated_kw
  • Features:
    • Temporal: Year, Month, Day, Day of Year, First Hour of Period
    • Solar: Distance to Solar Noon, Is Daylight
    • Weather: Temperature, Wind Speed, Wind Direction, Sky Cover, Humidity, Pressure, Visibility

Methodology

  • Data cleaning and preprocessing (src/preprocessing.py)
  • Feature engineering (cyclic encoding, solar geometry) (src/features.py)
  • TimeSeriesSplit cross-validation (no random split)
  • Model comparison: Baseline, Linear Regression, Ridge, ElasticNet, Random Forest, ExtraTrees, HistGradientBoostingRegressor, XGBoost, LightGBM, CatBoost
  • Main metric: MAE (Mean Absolute Error)
  • Final model: ExtraTreesRegressor (scikit-learn)
  • Baseline: Previous value (naive last-value)

Results

  • Best Model: ExtraTreesRegressor
  • MAE: ~1900 kW
  • Outperforms naive baseline and linear models

Deployment

  • Interactive Streamlit app for real-time prediction and visualization
  • Model and feature importances saved with joblib and CSV

Tech Stack

  • Python 3.11
  • pandas, numpy
  • scikit-learn
  • matplotlib, seaborn
  • streamlit
  • joblib

Project Structure

  • src/: Pipeline scripts (preprocessing, features, training, evaluation)
  • data/: Raw and processed data
  • models/: Trained models and feature importances
  • app/: Streamlit application

For details, see the code and documentation in each module.

About

This repository contains a production-oriented machine learning pipeline for forecasting solar power generation using historical weather and temporal data. The project includes data preprocessing, feature engineering, model training with time series validation, baseline comparisons, and a Streamlit web application for user-friendly predictions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors