[QEff. Finetune]: Added logger and its test cases. #630

quic-meetkuma · 2025-11-21T07:58:09Z

Added a logger which will log onto console and file. This code is similar to existing QEff. Finetuning logger code.
Also added dist_utils which serves as utility code when dealing with distributed training.
Added logger test cases for sanity checks.

TODO: Enable test cases via jenkins infra.

quic-akuruvil · 2025-11-23T09:49:01Z

QEfficient/finetune/experimental/tests/test_logger.py

+            assert "Rank zero message" in caplog.text
+
+    @patch("QEfficient.finetune.experimental.core.logger.get_rank")
+    def test_log_rank_zero_not_zero(self, mock_get_rank, caplog):


Is there a typo in the func name?

I will update it to
test_log_rank_zero_negative_case

quic-akuruvil · 2025-11-23T09:52:01Z

QEfficient/finetune/experimental/tests/test_logger.py

+
+    @patch("QEfficient.finetune.experimental.core.logger.get_rank")
+    def test_log_rank_zero_not_zero(self, mock_get_rank, caplog):
+        """Test that non-rank zero messages are not logged"""


Here also, is it non-zero rank messages?

I'll update the description.

quic-akuruvil · 2025-11-23T09:56:33Z

Can we also add a small example script as part of documentation may be, which helps with usage examples for the logger.

quic-swatia · 2025-11-24T05:42:55Z

QEfficient/finetune/experimental/core/utils/dist_utils.py

+    return dist.is_available() and dist.is_initialized()
+
+
+def get_rank() -> int:


Will this work fine in case of PP + DDP? Currently, we use os.getenv("LOCAL_RANK", 0) to retrieve the rank in QEff.

When training on multiple clusters of machines, with each having multiple devices, dist.get_rank() --> gives os.environ["RANK"] which is a global value across nodes and devices.
For us that wont be a problem as long as we dont do multi machine training, because for single machine training LOCAL_RANK = RANK.

For sake of clarity, implemented get_local_rank() as well and we will be internally using the same wherever we are intending to refer local rank 0.

The change will be reflected in latest.

quic-meetkuma · 2025-11-24T06:46:09Z

Can we also add a small example script as part of documentation may be, which helps with usage examples for the logger.

I will add some sample commented text in this PR which will give user hint on how to use the logger. Later on add the same in an extended manner to the documentation as well.

quic-akuruvil · 2025-11-24T07:36:07Z

Can we also add a small example script as part of documentation may be, which helps with usage examples for the logger.

I will add some sample commented text in this PR which will give user hint on how to use the logger. Later on add the same in an extended manner to the documentation as well.

Yes that would be helpful, thanks.

…s utility code when dealing with distributed training. Signed-off-by: meetkuma <[email protected]>

Signed-off-by: meetkuma <[email protected]>

quic-meetkuma · 2025-11-28T11:38:54Z

Discarding this PR in favor of #644

quic-meetkuma requested review from quic-akuruvil, quic-swatia, tchawada and vbaddi November 21, 2025 07:58

quic-akuruvil reviewed Nov 23, 2025

View reviewed changes

quic-rishinr added the fine-tuning label Nov 24, 2025

quic-swatia reviewed Nov 24, 2025

View reviewed changes

quic-meetkuma requested a review from ochougul November 24, 2025 06:43

quic-meetkuma added 2 commits November 27, 2025 15:19

Added logger and its test cases. Also added dist_utils which serves a…

1d5c858

…s utility code when dealing with distributed training. Signed-off-by: meetkuma <[email protected]>

Addressed comments and made some cleanup in tests.

0fb3f03

Signed-off-by: meetkuma <[email protected]>

quic-meetkuma force-pushed the hf_trainer_logger branch from cdc22a8 to 0fb3f03 Compare November 27, 2025 09:49

quic-meetkuma added 2 commits November 27, 2025 15:22

Fixed lint errors.

90366a8

Signed-off-by: meetkuma <[email protected]>

Fixed ruff errors.

251f89c

Signed-off-by: meetkuma <[email protected]>

quic-meetkuma closed this Nov 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QEff. Finetune]: Added logger and its test cases. #630

[QEff. Finetune]: Added logger and its test cases. #630

Uh oh!

quic-meetkuma commented Nov 21, 2025 •

edited

Loading

Uh oh!

quic-akuruvil Nov 23, 2025

Uh oh!

quic-meetkuma Nov 24, 2025

Uh oh!

quic-akuruvil Nov 23, 2025

Uh oh!

quic-meetkuma Nov 24, 2025

Uh oh!

quic-akuruvil commented Nov 23, 2025

Uh oh!

quic-swatia Nov 24, 2025

Uh oh!

quic-meetkuma Nov 27, 2025

Uh oh!

quic-meetkuma commented Nov 24, 2025

Uh oh!

quic-akuruvil commented Nov 24, 2025

Uh oh!

quic-meetkuma commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		return dist.is_available() and dist.is_initialized()


		def get_rank() -> int:

[QEff. Finetune]: Added logger and its test cases. #630

[QEff. Finetune]: Added logger and its test cases. #630

Uh oh!

Conversation

quic-meetkuma commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quic-akuruvil Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

quic-meetkuma Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

quic-akuruvil Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

quic-meetkuma Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

quic-akuruvil commented Nov 23, 2025

Uh oh!

quic-swatia Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

quic-meetkuma Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

quic-meetkuma commented Nov 24, 2025

Uh oh!

quic-akuruvil commented Nov 24, 2025

Uh oh!

quic-meetkuma commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

quic-meetkuma commented Nov 21, 2025 •

edited

Loading