Skip to content

aacarter1/mamba-vs-transformer-qa

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Question Answering with Mamba

Overview

This repository contains a framework for training, evaluating, and fine-tuning various language models using Hydra for configuration management and Hugging Face's Transformers library for model handling. The framework supports a wide range of models, including GPT-NeoX, Mamba, and a bidirectional Mamba.

Acknowledgments

This project builds upon the following codebases:

This project also includes the following libraries as submodules:

Project Structure

.
├── README.md
├── conf
│   ├── config.yaml          # Main configuration file
│   ├── dataset/             # Dataset-specific configurations
│   ├── model/               # Model-specific configurations
│   └── run/                 # Run-specific configurations
├── main.py                  # Entry point for running tasks (train, evaluate, preprocess)
├── models
│   ├── bidirectional_mamba.py  # Model definition for Bidirectional Mamba
│   ├── libs/                   # Supporting libraries
│   └── model_utils.py          # Utilities for loading models and tokenizers
└── scripts
    ├── evaluate.py            # Script for evaluating models
    ├── preprocess.py          # Script for preprocessing datasets
    ├── preprocess_flipped.py  # Script for preprocessing flipped datasets
    ├── preprocess_original.py # Script for preprocessing original datasets
    └── train.py               # Script for training models

Getting Started

Installation

Install required dependencies:

pip install transformers datasets accelerate python-dotenv wandb evaluate torchprofile gputil hydra-core --upgrade causal-conv1d mamba-ssm
pip3 install torch torchvision torchaudio

Configuration

All configurations are managed using Hydra. The primary configuration file is conf/config.yaml, which controls the default and broad configs. You can override any configuration parameter from the command line.

Wandb Integration

The scripts in this project utilize Weights & Biases (Wandb) for experiment tracking and logging. The scripts expect a .env file to be present in the project root directory. This file should contain the following environment variables:

WANDB_USERNAME=<your_wandb_username>
WANDB_PROJECT=<your_wandb_project_name>

Running the Project

Training a Model

You can train models with various configurations directly from the command line.

python main.py model=gpt-neox run=basic-training dataset=squad task=train group="test" tags=[gpt-neox] run.batch_size=64

Data Sources and Processing

The datasets used in this project were obtained using the datasets library from Hugging Face. Specifically, the SQuAD v2 dataset was used.

Data Processing

To preprocess the dataset for training, use the following command:

python main.py -m dataset=squad_v2 task=preprocess

This will apply the necessary preprocessing steps as defined in the scripts/preprocess.py script and the configuration files located in the conf/ directory.

Evaluating a Model

To evaluate a trained model, use:

python main.py \
    -m model=gpt-neox \
    run=basic-evaluation \
    dataset=squad_v2 \
    task=evaluate \
    group="evaluation" \
    tags=[evaluate_best] \
    run.eval_dataset=test \
    model.checkpoint=./models/checkpoints/best_model_gpt-neox/

Additional Features

Supported Models

  • GPT-NeoX
  • Mamba (multiple configurations)
  • Bidirectional Mamba
  • Pythia (multiple configurations)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 74.2%
  • Cuda 18.0%
  • C++ 7.0%
  • C 0.8%