Support for Tensor Caching in Kubeflow Data Cache

### What you would like to be added?

This issue tracks caching of tokenized datasets to accelerate data loading for fine-tuning use cases. By caching tokenized data, we can avoid re-tokenizing the dataset on every GPU node and significantly improve training speed, especially for hyperparameter optimization once support for a common initializer is available.

The goal is to offload the tokenization step to data-cache CPU nodes, freeing GPU nodes to focus exclusively on training.

### Why is this needed?

Tensor caching will boost GPU utilization by enabling reuse of them across training nodes.

### Love this feature?

Give it a 👍 We prioritize the features with most 👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Tensor Caching in Kubeflow Data Cache #3173

What you would like to be added?

Why is this needed?

Love this feature?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for Tensor Caching in Kubeflow Data Cache #3173

Description

What you would like to be added?

Why is this needed?

Love this feature?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions