A flexible and extensible PyTorch-based framework for multi-modal regression tasks. This project combines time-series and tabular data to predict a continuous target value, leveraging various deep learning architectures like LSTMs, 1D-CNNs, and Transformers for time-series feature extraction.
- Multi-Modal Architecture: Fuses features from both time-series and tabular data for more accurate predictions.
- Pluggable Encoders: Easily switch between different time-series encoders (
LSTM,1D-CNN,Transformer) via a configuration file. - Configuration-Driven: All experiment parameters, including model architecture and training settings, are managed through a single
config.yamlfile. - Automated Experiment Tracking: Automatically saves model checkpoints, training history, and the exact configuration for each run, ensuring reproducibility.
- Modular Codebase: A clean and organized structure that separates data processing, model definition, and training logic.
- End-to-End Scripts: Provides ready-to-use scripts for
train,test, andinference.
.
├── data/ # (Optional) Directory for your datasets
├── experiments/ # Directory to save all experiment results
├── src/
│ ├── config/
│ │ └── config.py # Configuration dataclasses
│ ├── data/
│ │ ├── dataset.py # PyTorch Dataset class
│ │ └── preprocessing.py# Data loading and preprocessing logic
│ ├── model/
│ │ ├── regression_dl.py# Multi-modal model definition
│ │ └── trainer.py # Training and evaluation logic
│ └── utils.py # Utility functions (e.g., loading experiments)
├── config.yaml # Main configuration file for experiments
├── inference.py # Script to run inference on a trained model
├── test.py # Script to evaluate a trained model
├── train.py # Main script to start model training
├── README.md # This file
└── requirements.txt # Project dependencies
-
Clone the repository:
git clone <your-repository-url> cd <repository-name>
-
Install the required dependencies:
pip install -r requirements.txt
- Place your dataset (in
.pklformat) into a directory (e.g.,data/). - Update the
path_datasetinconfig.yamlto point to your file. - Crucially, you must customize
src/data/preprocessing.pyto correctly load your DataFrame, perform any necessary cleaning or feature engineering, and separate the data into time-series features, tabular features, and the target variable.
Modify config.yaml to define your experiment. Key parameters include:
model.model_type: Choose betweenlstm,cnn, ortransformer.model.*: Adjust model-specific hyperparameters.train.num_epochs,train.batch_size,train.learning_rate.data.path_dataset: Path to your data file.
Run the training script from the root directory:
python train.pyThis will create a new directory in experiments/ (e.g., experiments/20250908_160152_lstm/) containing:
best_model.pth: The model checkpoint with the lowest validation loss.last_model.pth: The model checkpoint from the final epoch.training_history.json: A log of training and validation losses.config.yaml: A copy of the configuration used for this run.
To evaluate the best model on the held-out test set, run the test.py script with the path to your experiment directory:
python test.py experiments/<your_experiment_name>This will print the final Mean Squared Error (MSE) on the test data.
To make a prediction on a single (dummy) data point, use the inference.py script:
python inference.py experiments/<your_experiment_name>- Adding a New Model: To add a new time-series encoder, simply add the logic to
src/model/regression_dl.pyand update themodel_typeoptions inconfig.yaml. - Data Preprocessing: The core of adapting this framework to your own data is by implementing your custom logic in
src/data/preprocessing.py.
This project is licensed under the MIT License.