You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# always_save_context=False, # Optional, defaults to False
93
93
# write_thread_count=1, # Optional, defaults to 1
94
94
# initial_write_buffer_size_bytes=DESIRED_NUM_BYTES, # Optional, defaults to 16 GB
95
+
# use_optimized_save=True, # Optional, defaults to True. Uses the optimized save method to reduce write time.
96
+
# use_cached_ckpt_structure=True, # Optional, defaults to False. Caches the checkpoint structure after identifying 2 consecutive save plan structures that are equal.
# use_cached_ckpt_structure=True, # Optional, defaults to False. Caches the checkpoint structure after identifying 2 consecutive save plan structures that are equal.
where `base_container` is the base path CheckpointContainerId used for all checkpoints for the current job, e.g. `"/tmp/mlf-checkpoints/job123"`.
173
176
@@ -229,7 +232,7 @@ Code: See the [`ml_flashpoint.adapter.pytorch`](https://github.com/google/ml-fla
229
232
To use directly with PyTorch DCP, use the provided `StorageWriter` and `StorageReader` implementations.
230
233
You can use whatever `Planner` implementations work for your use case, or resort to the defaults.
231
234
232
-
If your per-rank checkpoint data exceeds the default buffer size (16 GB as of this writing), you can increase it using the optional `initial_buffer_size_bytes` parameter.
235
+
If your per-rank checkpoint data exceeds the default buffer size (16 GB as of this writing), you can increase it using the optional `initial_buffer_size_bytes` parameter.
0 commit comments