Disable skynny gemms by default #568

k-artem · 2025-06-05T12:31:28Z

Enabling of this feature by default in Commit 188b7f9 is broken inference of models via vllm (refs SWDEV-531223), so due to currently support of this feature is limited we propose to disable it by default and enable back when support of Navi will be done.

Please direct your PRs to the upstream vllm (https://github.com/vllm-project/vllm.git)

Accepting PRs into the ROCm fork (https://github.com/ROCm/vllm) will require a clear previously communicated exception

Upstream merge 25 01 27

* updating code blocks * typo * updated manifest * Including feedback * whitespace * Deepseek instructions * hyperlink fix * hyperlink fix * updating what is new * cpx update * typo * whitespace * whitespace

* integrate new cpa kernel, update tests and benchmark * added comments to mfma4 kernel * further comments for mfma16 kernel * clang-format * Lint * add flag for logits rtz conversion and disable by default * lint * [Bugfix]: Fix paged attention unit tests of #372 (#389) * [Bugfix]: fix paged attention tests based on the updated kernels in `csrc/attention/paged_attention_v1.cu`,`csrc/attention/paged_attention_v2.cu` and `csrc/rocm/attention.cu`. * improve code documentation. * lint --------- Co-authored-by: vllmellm <[email protected]> --------- Co-authored-by: Gregory Shtrasberg <[email protected]> Co-authored-by: Gregory Shtrasberg <[email protected]> Co-authored-by: Joe Shajrawi <[email protected]> Co-authored-by: TJian <[email protected]> Co-authored-by: vllmellm <[email protected]>

…s padding (#394)

…1_31

… quant is not supported Signed-off-by: Hongxia Yang <[email protected]>

Signed-off-by: Hongxia Yang <[email protected]>

* Aiter section * Aiter section in docker * Enablement * Only exposing a single knob * More details on env defaults

…2_03

…merge_25_02_03

…pstream_merge_25_02_03

Correct initial values

Upstream merge 25 02 03

* Enabling P3L.py & P3L_mling.py tests to run with multiple batched queries. This alternation adds minimal measurement noise. The underlining testing material is the same, the resulting measurements are comparable to the old (BS=1) testing runs. Signed-off-by: Alexei V. Ivanov <[email protected]> * Making linters happy. Signed-off-by: Alexei V. Ivanov <[email protected]> * Changed the device specification for the 'forced_sample' tensor. The resulting implementation produces identical measurement, and, actually, became faster (3.21s/it vs 3.42s/it with previous commit). Signed-off-by: Alexei V. Ivanov <[email protected]> * Fixing reporting to reflect processed intervals. Signed-off-by: Alexei V. Ivanov <[email protected]> --------- Signed-off-by: Alexei V. Ivanov <[email protected]>

* fix quark fp8 loading * fix undefined variables --------- Co-authored-by: Bowen Bao <[email protected]>

…n ROCm (#406) Signed-off-by: Gregory Shtrasberg <[email protected]>

* Update README.md 20250205_aiter * whitespace * adding VLLM_USE_AITER=0 advice

* fix rocm get_device name use 'market_name' hard-code names for mi308 & mi300 * use gfx and num_CU for device name * using market_name * rename MI325_OAM to MI325X * rm (duplicate) MI300X_OAM * rename mi308

* Add tuned moe config for qwen1.5_moe_A2.7B * Add more sweep parameters on qwen2_moe * Add tp = 1,2,4,8 after applying PR12838 * Rename config name by deleting "_OAM" --------- Co-authored-by: Gregory Shtrasberg <[email protected]> Co-authored-by: Divakar Verma <[email protected]>

Signed-off-by: charlifu <[email protected]>

Signed-off-by: Gregory Shtrasberg <[email protected]>

* Added benchmark results and commit hash * Added release notes/changelog * Update README.md

Signed-off-by: charlifu <[email protected]>

…_05_29

…tream_merge_2025_05_29

…_05_29

Upstream merge 2025 06 02

Upstream merge 2025 06 03

Enabling of this feature by default in Commit 188b7f9 is broken inference of models via vllm (refs SWDEV-531223), so due to currently support of this feature is limited we propose to disable it by default and enable back when support of Navi will be done.

gshtras · 2025-06-05T14:40:08Z

Please note the check at https://github.com/ROCm/vllm/blob/main/vllm/model_executor/layers/utils.py#L75

github-actions · 2025-09-04T02:00:29Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

gshtras and others added 30 commits January 28, 2025 16:54

Merge remote-tracking branch 'origin/main' into upstream_merge_25_01_27

a892ecc

Direct call on ROCm

c8b8654

Merge pull request #391 from ROCm/upstream_merge_25_01_27

b2c3b22

Upstream merge 25 01 27

20250127 docs update (#392)

7a292f9

* updating code blocks * typo * updated manifest * Including feedback * whitespace * Deepseek instructions * hyperlink fix * hyperlink fix * updating what is new * cpx update * typo * whitespace * whitespace

Using a more precise profiling on ROCm to properly account for weight…

22141e7

…s padding (#394)

Update Dockerfile.rocm

6852819

Merge remote-tracking branch 'upstream/main' into upstream_merge_25_0…

339ba27

…1_31

Remove redundant code paths

d47b834

Fix MLA and logic for using triton scaled_mm on ROCm as blockwise FP8…

2fa8a9d

… quant is not supported Signed-off-by: Hongxia Yang <[email protected]>

use MLA on rocm

3523ce5

Signed-off-by: Hongxia Yang <[email protected]>

pre-commit format

3930fdd

Signed-off-by: Hongxia Yang <[email protected]>

Aiter readme (#400)

4e7e709

* Aiter section * Aiter section in docker * Enablement * Only exposing a single knob * More details on env defaults

Merge remote-tracking branch 'upstream/main' into upstream_merge_25_0…

14a02be

…2_03

Merge remote-tracking branch 'upstream/main' into upstream_merge_25_0…

0e24b85

…2_03

Merge remote-tracking branch 'hongxia/enable_deepseek' into upstream_…

92b42cd

…merge_25_02_03

fix None dict (#402)

fdb06c3

Merge branch 'main' into upstream_merge_25_02_03

b59d8c3

New linters

8dbc899

Merge branch 'upstream_merge_25_02_03' of github.com:ROCm/vllm into u…

76b8163

…pstream_merge_25_02_03

Custom params for mla attention backend

c887bc9

Correct initial values

Merge pull request #403 from ROCm/upstream_merge_25_02_03

b43c8d1

Upstream merge 25 02 03

Fix quark fp8 format loading. (#395)

ed3337d

* fix quark fp8 loading * fix undefined variables --------- Co-authored-by: Bowen Bao <[email protected]>

The code assumes WARP_SIZE to be equal to 32, which is not the case o…

f65ecc9

…n ROCm (#406) Signed-off-by: Gregory Shtrasberg <[email protected]>

Update README.md 20250205_aiter (#407)

13434bd

* Update README.md 20250205_aiter * whitespace * adding VLLM_USE_AITER=0 advice

fix rocm get_device name for moe configs (#359)

3f610f0

* fix rocm get_device name use 'market_name' hard-code names for mi308 & mi300 * use gfx and num_CU for device name * using market_name * rename MI325_OAM to MI325X * rm (duplicate) MI300X_OAM * rename mi308

Fixing the output formatting (#414)

29499bb

Merge remote-tracking branch 'upstream/main'

6a0deb7

charlifu and others added 15 commits May 28, 2025 21:35

cache get_lds_size()

630ed84

Signed-off-by: charlifu <[email protected]>

Removing RPD in favor of torch profiler for V1 (#558)

f4a992c

Signed-off-by: Gregory Shtrasberg <[email protected]>

Merge remote-tracking branch 'upstream/main'

bee14ca

Added benchmark results and commit hash (#556)

7bb0618

* Added benchmark results and commit hash * Added release notes/changelog * Update README.md

Merge branch 'main' into amd/gfx950_skinny_gemm

0286875

Signed-off-by: charlifu <[email protected]>

Merge remote-tracking branch 'upstream/main' into upstream_merge_2025…

7bf92f9

…_05_29

Merge leftover

421c498

Merge remote-tracking branch 'origin/amd/gfx950_skinny_gemm' into ups…

628db8d

…tream_merge_2025_05_29

Merge remote-tracking branch 'upstream/main' into upstream_merge_2025…

d92c04b

…_05_29

Remove redundant configs

9c22cdd

Merge branch 'main' into upstream_merge_2025_06_02

9d4c238

Merge pull request #565 from ROCm/upstream_merge_2025_06_02

3712649

Upstream merge 2025 06 02

Merge remote-tracking branch 'upstream/main'

ab92741

cleanup

aee731f

Merge pull request #566 from ROCm/upstream_merge_2025_06_03

8cde510

Upstream merge 2025 06 03

k-artem requested review from charlifu, mawong-amd, shajrawi, gshtras, maleksan85, sunway513 and hongxiayang as code owners June 5, 2025 12:31

github-actions bot added the stale label Sep 4, 2025

gshtras force-pushed the main branch 2 times, most recently from 1d2c43d to eb9d4de Compare September 9, 2025 16:43

github-actions bot added unstale and removed stale labels Sep 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Disable skynny gemms by default #568

Disable skynny gemms by default #568

Uh oh!

k-artem commented Jun 5, 2025 •

edited by github-actions bot

Loading

Uh oh!

gshtras commented Jun 5, 2025

Uh oh!

github-actions bot commented Sep 4, 2025

Uh oh!

Uh oh!

Disable skynny gemms by default #568

Are you sure you want to change the base?

Disable skynny gemms by default #568

Uh oh!

Conversation

k-artem commented Jun 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gshtras commented Jun 5, 2025

Uh oh!

github-actions bot commented Sep 4, 2025

Uh oh!

Uh oh!

k-artem commented Jun 5, 2025 •

edited by github-actions bot

Loading