Skip to content

Add a GPT-2 training exampleΒ #19

@bwdGitHub

Description

@bwdGitHub

We would like to use these issues to gauge user interest.

It is possible to use the GPT-2 implementation for further language model training. There is no example demonstrating this on the repo or otherwise.

To make this possible on a typical consumer GPU will likely require some technique to reduce the amount of GPU memory required to train. There are a number of options:

  1. Add support for a smaller GPT-2 model.
  2. Only train a subset of the GPT-2 parameters.
  3. Use gradient accumulation.
  4. Gradient checkpointing.
  5. Reduced precision gradients.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions