Skip to content

Conversation

kefeiyao
Copy link

@kefeiyao kefeiyao commented Oct 10, 2025

…t prefill

The NON_BLOCKING mode is functionally ready while the ASYNC mode has piece missing and is right now just for dev use

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results

Purpose

This commit is to significantly reduce the kv fetching overhead at prefill

Test Plan

Prefill-only test and E2E test

Test Result

Tested on G3D with bs=8 from 287 ms to
97 ms (NON_BLOCKING mode)
0 ms (ASYNC mode)
And according to prefill-only test with a G3D P/D setup (1P2D, 3.5K/1), the NON_BLOCKING mode increase the prefill throughput for around 8%

…t prefill

This significantly reduce the kv fetching overhead at prefill, with
bs=8 from 287 ms to
97 ms (NON_BLOCKING mode)
0 ms (ASYNC mode)
And according to prefill-only test with a G3D P/D setup (1P2D, 3.5K/1),
the NON_BLOCKING mode increase the prefill throughput for around 8%

The NON_BLOCKING mode is functionally ready while the ASYNC mode has piece
missing and is right now just for dev use
@kefeiyao
Copy link
Author

@czhu15 @xinyu-intel are you ok with this change? somehow I couldn't find other people in the reviewer list...

@kefeiyao kefeiyao merged commit f8f3b68 into deepseek_r1 Oct 15, 2025
1 check failed
@kefeiyao kefeiyao deleted the deepseek_r1_kf_ww39 branch October 15, 2025 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants