Skip to content

Conversation

k-artem
Copy link

@k-artem k-artem commented Jun 5, 2025

Enabling of this feature by default in Commit 188b7f9 is broken inference of models via vllm (refs SWDEV-531223), so due to currently support of this feature is limited we propose to disable it by default and enable back when support of Navi will be done.

Please direct your PRs to the upstream vllm (https://github.com/vllm-project/vllm.git)

Accepting PRs into the ROCm fork (https://github.com/ROCm/vllm) will require a clear previously communicated exception

gshtras and others added 30 commits January 28, 2025 16:54
* updating code blocks

* typo

* updated manifest

* Including feedback

* whitespace

* Deepseek instructions

* hyperlink fix

* hyperlink fix

* updating what is new

* cpx update

* typo

* whitespace

* whitespace
* integrate new cpa kernel, update tests and benchmark

* added comments to mfma4 kernel

* further comments for mfma16 kernel

* clang-format

* Lint

* add flag for logits rtz conversion and disable by default

* lint

* [Bugfix]: Fix paged attention unit tests of #372 (#389)

* [Bugfix]: fix paged attention tests based on the updated kernels in `csrc/attention/paged_attention_v1.cu`,`csrc/attention/paged_attention_v2.cu` and  `csrc/rocm/attention.cu`.

* improve code documentation.

* lint

---------

Co-authored-by: vllmellm <[email protected]>

---------

Co-authored-by: Gregory Shtrasberg <[email protected]>
Co-authored-by: Gregory Shtrasberg <[email protected]>
Co-authored-by: Joe Shajrawi <[email protected]>
Co-authored-by: TJian <[email protected]>
Co-authored-by: vllmellm <[email protected]>
Signed-off-by: Hongxia Yang <[email protected]>
Signed-off-by: Hongxia Yang <[email protected]>
* Aiter section

* Aiter section in docker

* Enablement

* Only exposing a single knob

* More details on env defaults
* Enabling P3L.py & P3L_mling.py tests to run with multiple batched
queries.

This alternation adds minimal measurement noise.

The underlining testing material is the same, the resulting measurements
are comparable to the old (BS=1) testing runs.

Signed-off-by: Alexei V. Ivanov <[email protected]>

* Making linters happy.

Signed-off-by: Alexei V. Ivanov <[email protected]>

* Changed the device specification for the 'forced_sample' tensor.
The resulting implementation produces identical measurement, and,
actually, became faster (3.21s/it vs 3.42s/it with previous commit).

Signed-off-by: Alexei V. Ivanov <[email protected]>

* Fixing reporting to reflect processed intervals.

Signed-off-by: Alexei V. Ivanov <[email protected]>

---------

Signed-off-by: Alexei V. Ivanov <[email protected]>
* fix quark fp8 loading

* fix undefined variables

---------

Co-authored-by: Bowen Bao <[email protected]>
* Update README.md 20250205_aiter

* whitespace

* adding VLLM_USE_AITER=0 advice
* fix rocm get_device name

use 'market_name'
hard-code names for mi308 & mi300

* use gfx and num_CU for device name

* using market_name

* rename MI325_OAM to MI325X

* rm (duplicate) MI300X_OAM

* rename mi308
* Add tuned moe config for qwen1.5_moe_A2.7B

* Add more sweep parameters on qwen2_moe

* Add tp = 1,2,4,8 after applying PR12838

* Rename config name by deleting "_OAM"

---------

Co-authored-by: Gregory Shtrasberg <[email protected]>
Co-authored-by: Divakar Verma <[email protected]>
Enabling of this feature by default in Commit 188b7f9 is broken
inference of models via vllm (refs SWDEV-531223), so due to currently
support of this feature is limited we propose to disable it by default
and enable back when support of Navi will be done.
@gshtras
Copy link
Collaborator

gshtras commented Jun 5, 2025

Copy link

github-actions bot commented Sep 4, 2025

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions bot added the stale label Sep 4, 2025
@gshtras gshtras force-pushed the main branch 2 times, most recently from 1d2c43d to eb9d4de Compare September 9, 2025 16:43
@github-actions github-actions bot added unstale and removed stale labels Sep 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.