Fix noncontiguous input for rmsnorm by yangw1234 · Pull Request #117 · sgl-project/sgl-kernel-xpu

yangw1234 · 2026-03-07T01:05:08Z

The issue

Deepseek vl2 small and deepseek coder lite will produce garbage output.
The root cause is that the rmsnorm cannot support non-contiguous input.
This PR provides a workaround.

yangw1234 · 2026-03-07T01:06:18Z

@airMeng could you take a look?

airMeng · 2026-03-08T13:02:45Z

python/sgl_kernel/elementwise.py

    output: torch.Tensor
        Normalized tensor, shape (batch_size, hidden_size).
    """
+    input = input.contiguous()


how about adding a check whether the input is contiguous?

Isn't .contiguous() a no-op when the tensor is already contiguous?

yes, like

if not input.is_contiguous(): warnings.warn("Input of RMSNorm is not contiguous. Converting to contiguous tensor.") input = input.contiguous()

Won't this print too much log in the sglang server log?

airMeng · 2026-03-08T13:05:08Z

Generally I would suggest to refactor the kernel to support the non contiguous input rather than do reordering implicitly which might make performance analysis complex. If not possible, probably throw a warning then do the reorder

yangw1234 · 2026-03-08T16:05:46Z

Generally I would suggest to refactor the kernel to support the non contiguous input rather than do reordering implicitly which might make performance analysis complex. If not possible, probably throw a warning then do the reorder

Yes, I believe we will eventually support it in the kernel. But I don't think this is the bottleneck right now. I don't think adding an op will complicate the analysis, since a op is pretty obvious in the profiling trace. Throwing a warning is probably too noisy.

mingfeima · 2026-03-10T07:16:02Z

python/sgl_kernel/elementwise.py

    output: torch.Tensor
        Normalized tensor, shape (batch_size, hidden_size).
    """
+    input = input.contiguous()


we should not fix this issue by "converting input to contiguous" tensor.

this would result into a silent memcpy if non-contig input is met.

how to fix it properly?

fix it in C++ kernels

add non-contiguous support

i prefer to do contiguous checks in C++ kernels:

TORCH_CHECK(input.is_contiguous(), "xxx");

if we have a non contig input, just fix it. That would be better than a silent memcpy.

We can wait for the c++ kernel fix. SGLANGT-500 is tracking this. I'll leave this PR as is to make sure people know this problem.

mingfeima · 2026-03-10T07:18:02Z

avoid using input = input.contiguous() in both python and C++.

mingfeima · 2026-03-10T07:21:00Z

Generally I would suggest to refactor the kernel to support the non contiguous input rather than do reordering implicitly which might make performance analysis complex. If not possible, probably throw a warning then do the reorder

Yes, I believe we will eventually support it in the kernel. But I don't think this is the bottleneck right now. I don't think adding an op will complicate the analysis, since a op is pretty obvious in the profiling trace. Throwing a warning is probably too noisy.

不是这样的，防微杜渐，避免积重难返。
写模型的人可以这么做，但是做性能的人不能这么想。

yangw1234 · 2026-03-10T17:34:11Z

Generally I would suggest to refactor the kernel to support the non contiguous input rather than do reordering implicitly which might make performance analysis complex. If not possible, probably throw a warning then do the reorder

Yes, I believe we will eventually support it in the kernel. But I don't think this is the bottleneck right now. I don't think adding an op will complicate the analysis, since a op is pretty obvious in the profiling trace. Throwing a warning is probably too noisy.

不是这样的，防微杜渐，避免积重难返。写模型的人可以这么做，但是做性能的人不能这么想。

That's a very good point, @mingfeima, I totally agree with this from the performance engineering perspective. But from a user experience point view, model output garbage is a such a bad experience that we will lose the user/developer trust instantly if he/she met the problem, while a few percent performance drop doesn't seem matter that much. In addition, looking at the whole development process, this can not only unlock people working on model/feature/CI enabling but also enable the folks working on low level kernels/performance to prioritize the kernels which are truly the bottleneck. That's why I prefer moving fast and working around such things as soon as possible.

Let me know what's your thoughts @mingfeima and @airMeng

yangw1234 added 2 commits March 6, 2026 17:03

fix norm with noncontiguous input

ee899e8

remove comment out test

aa8ec9d

airMeng reviewed Mar 8, 2026

View reviewed changes

fix lint

702b184

airMeng requested a review from mingfeima March 9, 2026 01:01

mingfeima requested changes Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix noncontiguous input for rmsnorm#117

Fix noncontiguous input for rmsnorm#117
yangw1234 wants to merge 3 commits intosgl-project:mainfrom
yangw1234:fix_norn_noncontiguous

yangw1234 commented Mar 7, 2026 •

edited

Loading

Uh oh!

yangw1234 commented Mar 7, 2026

Uh oh!

airMeng Mar 8, 2026

Uh oh!

yangw1234 Mar 8, 2026

Uh oh!

airMeng Mar 9, 2026

Uh oh!

yangw1234 Mar 9, 2026

Uh oh!

airMeng commented Mar 8, 2026

Uh oh!

yangw1234 commented Mar 8, 2026

Uh oh!

mingfeima Mar 10, 2026

Uh oh!

yangw1234 Mar 10, 2026 •

edited

Loading

Uh oh!

mingfeima commented Mar 10, 2026

Uh oh!

mingfeima commented Mar 10, 2026

Uh oh!

yangw1234 commented Mar 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yangw1234 commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The issue

Uh oh!

yangw1234 commented Mar 7, 2026

Uh oh!

airMeng Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

yangw1234 Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

airMeng Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

yangw1234 Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

airMeng commented Mar 8, 2026

Uh oh!

yangw1234 commented Mar 8, 2026

Uh oh!

mingfeima Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

yangw1234 Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mingfeima commented Mar 10, 2026

Uh oh!

mingfeima commented Mar 10, 2026

Uh oh!

yangw1234 commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yangw1234 commented Mar 7, 2026 •

edited

Loading

yangw1234 Mar 10, 2026 •

edited

Loading

yangw1234 commented Mar 10, 2026 •

edited

Loading