forked from NVIDIA/cutlass
-
Notifications
You must be signed in to change notification settings - Fork 51
Simplify Flash Attention Decode benchmarks generation #437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
muhammad-tanvir-1211
wants to merge
29
commits into
intel:main
Choose a base branch
from
muhammad-tanvir-1211:flash_decode_simplify_benchmarks
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
2a5d95c
Add more tests and benchmark configurations
muhammad-tanvir-1211 beabe4d
Merge branch 'sycl-develop' of https://github.com/codeplaysoftware/cu…
muhammad-tanvir-1211 55030f0
Merge branch 'sycl-develop' of https://github.com/codeplaysoftware/cu…
muhammad-tanvir-1211 df9a1e1
Merge branch 'sycl-develop' into flash_decode_separate_out_configs
muhammad-tanvir-1211 d30c6be
Fix license year
muhammad-tanvir-1211 338d7fe
Workaround to skip today's DPCPP nightly on CI (#425)
aacostadiaz a1811a4
Split example for prefill attention with cachedkv (#409)
aacostadiaz 649f904
Avoid failures if latest nightly DPCPP tag didn't provide binaries (…
carlewis 80e4b83
Add BF16BF16FP32 CUTE Example on BMG (#422)
leslie-fang-intel 580e8c8
Merge branch 'sycl-develop' of https://github.com/codeplaysoftware/cu…
muhammad-tanvir-1211 2ad93de
Merge branch 'sycl-develop' of https://github.com/codeplaysoftware/cu…
muhammad-tanvir-1211 ea4376f
Simplify test generation
muhammad-tanvir-1211 540084a
Merge branch 'sycl-develop' into flash_decode_separate_out_configs
muhammad-tanvir-1211 ae03894
Merge branch 'sycl-develop' of https://github.com/codeplaysoftware/cu…
muhammad-tanvir-1211 0975c01
Fix benchmark api
muhammad-tanvir-1211 188fdce
Merge branch 'flash_decode_separate_out_configs' of https://github.co…
muhammad-tanvir-1211 ff198f5
Fix benchmark names
muhammad-tanvir-1211 4d446bb
Change intel workflow
muhammad-tanvir-1211 30c3a79
Merge branch 'sycl-develop' of https://github.com/codeplaysoftware/cu…
muhammad-tanvir-1211 7f45907
Simplify benchmark generation
muhammad-tanvir-1211 90d7637
Merge branch 'sycl-develop' of https://github.com/codeplaysoftware/cu…
muhammad-tanvir-1211 4327493
Increase timeout
muhammad-tanvir-1211 a859948
Added check for head_size_vo
muhammad-tanvir-1211 a9173c0
Fix the CI
muhammad-tanvir-1211 e4f8462
Merge branch 'sycl-develop' of https://github.com/codeplaysoftware/cu…
muhammad-tanvir-1211 c64c66f
Merge branch 'sycl-develop' of https://github.com/codeplaysoftware/cu…
muhammad-tanvir-1211 8568330
Merge branch 'sycl-develop' of https://github.com/codeplaysoftware/cu…
muhammad-tanvir-1211 287b5af
Remove test changes, hardcode head_size_vo
muhammad-tanvir-1211 96ba5a7
Merge branch 'sycl-develop' into flash_decode_simplify_benchmarks
muhammad-tanvir-1211 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
110 changes: 55 additions & 55 deletions
110
benchmarks/device/bmg/input_files/input_sglang_flash_attention_decode_kvcache.in
Large diffs are not rendered by default.
Oops, something went wrong.
108 changes: 54 additions & 54 deletions
108
benchmarks/device/bmg/input_files/input_sglang_flash_attention_decode_nokvcache.in
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
69 changes: 0 additions & 69 deletions
69
benchmarks/flash_attention/flash_attention_decode/benchmarks_h128_1024_nonpaged.cpp
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUTLASS_SYCL_RUNNING_CI
doesn't seem to do anything as far as I can tell?