-
Notifications
You must be signed in to change notification settings - Fork 62
Open
Labels
enhancementNew feature or requestNew feature or request
Description
We would like to use these issues to gauge user interest.
It is possible to use the GPT-2 implementation for further language model training. There is no example demonstrating this on the repo or otherwise.
To make this possible on a typical consumer GPU will likely require some technique to reduce the amount of GPU memory required to train. There are a number of options:
- Add support for a smaller GPT-2 model.
- Only train a subset of the GPT-2 parameters.
- Use gradient accumulation.
- Gradient checkpointing.
- Reduced precision gradients.
Shipley1105
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request