Skip to content

Conversation

@Egor-Krivov
Copy link
Contributor

@Egor-Krivov Egor-Krivov commented Oct 24, 2025

Resolves #5292

There's also refactoring of Liger-Kernels installation to be consistent with sglang, vllm

PR depends on:

Before merging:

@Egor-Krivov
Copy link
Contributor Author

Egor-Krivov commented Oct 24, 2025

@Egor-Krivov
Copy link
Contributor Author

Since my patch got merged upstream now this PR only does refactoring
https://github.com/linkedin/Liger-Kernel/pull/917/files

echo "****** Running Liger-Kernel tests ******"
echo "************************************************"

run_pytest_command -vvv -n ${PYTEST_MAX_PROCESSES:-4} Liger-Kernel/test/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Liger hangs with -n 4 for some reason, but passes without parallelization after merging cache cleanup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I make tests sequential

Copy link
Contributor Author

@Egor-Krivov Egor-Krivov Oct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my local experiment that doesn't change the total runtime, maybe due to the GPU being the bottleneck. We do have 1 GPU for these tests

@Egor-Krivov
Copy link
Contributor Author

@Egor-Krivov
Copy link
Contributor Author

@whitneywhtsang
Copy link
Contributor

Since my patch got merged upstream now this PR only does refactoring https://github.com/linkedin/Liger-Kernel/pull/917/files

Should we update the PR title?

@Egor-Krivov
Copy link
Contributor Author

@Egor-Krivov
Copy link
Contributor Author

Egor-Krivov commented Oct 27, 2025

Since my patch got merged upstream now this PR only does refactoring https://github.com/linkedin/Liger-Kernel/pull/917/files

Should we update the PR title?

This PR still changes number of runners (-n 4 removal), which helps with hanging. Without it hanging can still happen.

@Egor-Krivov Egor-Krivov enabled auto-merge (squash) October 27, 2025 12:21
@Egor-Krivov Egor-Krivov merged commit 2fadfe7 into main Oct 27, 2025
23 of 31 checks passed
@Egor-Krivov Egor-Krivov deleted the egor/liger_fix branch October 27, 2025 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI][Liger-Kernels] Liger kernel tests take >12hrs in CI

2 participants