This repository contains a framework for training, evaluating, and fine-tuning various language models using Hydra for configuration management and Hugging Face's Transformers library for model handling. The framework supports a wide range of models, including GPT-NeoX, Mamba, and a bidirectional Mamba.
This project builds upon the following codebases:
This project also includes the following libraries as submodules:
- Mamba (Apache 2.0 License)
- Causal Conv1D (MIT License)
.
├── README.md
├── conf
│ ├── config.yaml # Main configuration file
│ ├── dataset/ # Dataset-specific configurations
│ ├── model/ # Model-specific configurations
│ └── run/ # Run-specific configurations
├── main.py # Entry point for running tasks (train, evaluate, preprocess)
├── models
│ ├── bidirectional_mamba.py # Model definition for Bidirectional Mamba
│ ├── libs/ # Supporting libraries
│ └── model_utils.py # Utilities for loading models and tokenizers
└── scripts
├── evaluate.py # Script for evaluating models
├── preprocess.py # Script for preprocessing datasets
├── preprocess_flipped.py # Script for preprocessing flipped datasets
├── preprocess_original.py # Script for preprocessing original datasets
└── train.py # Script for training models
Install required dependencies:
pip install transformers datasets accelerate python-dotenv wandb evaluate torchprofile gputil hydra-core --upgrade causal-conv1d mamba-ssm
pip3 install torch torchvision torchaudioAll configurations are managed using Hydra. The primary configuration file is conf/config.yaml, which controls the default and broad configs. You can override any configuration parameter from the command line.
The scripts in this project utilize Weights & Biases (Wandb) for experiment tracking and logging. The scripts expect a .env file to be present in the project root directory. This file should contain the following environment variables:
WANDB_USERNAME=<your_wandb_username>
WANDB_PROJECT=<your_wandb_project_name>
You can train models with various configurations directly from the command line.
python main.py model=gpt-neox run=basic-training dataset=squad task=train group="test" tags=[gpt-neox] run.batch_size=64The datasets used in this project were obtained using the datasets library from Hugging Face. Specifically, the SQuAD v2 dataset was used.
To preprocess the dataset for training, use the following command:
python main.py -m dataset=squad_v2 task=preprocessThis will apply the necessary preprocessing steps as defined in the scripts/preprocess.py script and the configuration files located in the conf/ directory.
To evaluate a trained model, use:
python main.py \
-m model=gpt-neox \
run=basic-evaluation \
dataset=squad_v2 \
task=evaluate \
group="evaluation" \
tags=[evaluate_best] \
run.eval_dataset=test \
model.checkpoint=./models/checkpoints/best_model_gpt-neox/- GPT-NeoX
- Mamba (multiple configurations)
- Bidirectional Mamba
- Pythia (multiple configurations)