[Auto-Paralllel] shard_dataloader fix #75629
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Category
Auto Parallel
PR Types
Others
Description
问题1:shard_dataloader对非tensor数据类型的兼容性支持
解决1:增强shard_dataloader的兼容性。当输入数据为非tensor类型时,保持原始数据格式不作转换,直接存入dist_batch_data用于模型训练。
问题2:shard_dataloader迭代器重置机制不完善,部分场景下重置失败,迭代器耗尽。
解决2:将dataloader的重置逻辑统一由__next__方法调用。通过在__iter__和__call__方法中将self.iter置为None,并在调用__next__时进行重置,从而避免迭代器耗尽问题。