-
Notifications
You must be signed in to change notification settings - Fork 60
[QEff. Finetune]: Added logger and its test cases. #630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| assert "Rank zero message" in caplog.text | ||
|
|
||
| @patch("QEfficient.finetune.experimental.core.logger.get_rank") | ||
| def test_log_rank_zero_not_zero(self, mock_get_rank, caplog): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a typo in the func name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update it to
test_log_rank_zero_negative_case
|
|
||
| @patch("QEfficient.finetune.experimental.core.logger.get_rank") | ||
| def test_log_rank_zero_not_zero(self, mock_get_rank, caplog): | ||
| """Test that non-rank zero messages are not logged""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here also, is it non-zero rank messages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the description.
|
Can we also add a small example script as part of documentation may be, which helps with usage examples for the logger. |
| return dist.is_available() and dist.is_initialized() | ||
|
|
||
|
|
||
| def get_rank() -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work fine in case of PP + DDP? Currently, we use os.getenv("LOCAL_RANK", 0) to retrieve the rank in QEff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When training on multiple clusters of machines, with each having multiple devices, dist.get_rank() --> gives os.environ["RANK"] which is a global value across nodes and devices.
For us that wont be a problem as long as we dont do multi machine training, because for single machine training LOCAL_RANK = RANK.
For sake of clarity, implemented get_local_rank() as well and we will be internally using the same wherever we are intending to refer local rank 0.
The change will be reflected in latest.
I will add some sample commented text in this PR which will give user hint on how to use the logger. Later on add the same in an extended manner to the documentation as well. |
Yes that would be helpful, thanks. |
…s utility code when dealing with distributed training. Signed-off-by: meetkuma <[email protected]>
Signed-off-by: meetkuma <[email protected]>
cdc22a8 to
0fb3f03
Compare
Signed-off-by: meetkuma <[email protected]>
Signed-off-by: meetkuma <[email protected]>
|
Discarding this PR in favor of #644 |
TODO: Enable test cases via jenkins infra.