-
Notifications
You must be signed in to change notification settings - Fork 900
Open
Description
What you would like to be added?
This issue tracks caching of tokenized datasets to accelerate data loading for fine-tuning use cases. By caching tokenized data, we can avoid re-tokenizing the dataset on every GPU node and significantly improve training speed, especially for hyperparameter optimization once support for a common initializer is available.
The goal is to offload the tokenization step to data-cache CPU nodes, freeing GPU nodes to focus exclusively on training.
Why is this needed?
Tensor caching will boost GPU utilization by enabling reuse of them across training nodes.
Love this feature?
Give it a π We prioritize the features with most π
Reactions are currently unavailable