Skip to content

Conversation

underfituu
Copy link
Contributor

@underfituu underfituu commented Jul 29, 2025

What this PR does / why we need it?

This PR addresses a critical issue where Node D (Device) failures cause Node P (Processor) to hang due to inability to release KV cache.

Trigger Scenarios:

  1. Node D fails mid-inference (e.g., network disconnection)
  2. Node D rejects requests at a certain stage (e.g., via API server)
  3. Load-test script termination causes Node P or D to abort queued requests

Root Cause Analysis:

  1. Currently, Node D sends a "KV cache pull complete, release approved" message to Node P
  2. This message is transmitted via the worker connector. If PD connection breaks or requests are rejected upstream, Node D cannot send the message
  3. Node P will never release KV cache without receiving this message

Solution:
Following VLLM community's approach (NIXL connector timeout mechanism), we're implementing:

  • A timeout mechanism with comprehensive warnings
  • Updated README documentation
  • Reference: VLLM's optimization PR #20139

Note: The full disaster recovery solution is still in design. This PR will be merged into v091-dev branch simply but will evolve in main (PR #2174).

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: underfituu <[email protected]>
@jianzs
Copy link
Collaborator

jianzs commented Jul 29, 2025

When is this feature needed?

Signed-off-by: underfituu <[email protected]>
Signed-off-by: underfituu <[email protected]>
@underfituu underfituu closed this Aug 2, 2025
@underfituu underfituu reopened this Aug 4, 2025
Signed-off-by: underfituu <[email protected]>
Signed-off-by: underfituu <[email protected]>
@underfituu
Copy link
Contributor Author

When is this feature needed?

  • Thank you for your attention. This PR addresses a critical issue where Node D (Device) failures cause Node P (Processor) to hang due to inability to release KV cache.

Trigger Scenarios:

  1. Node D fails mid-inference (e.g., network disconnection)
  2. Node D rejects requests at a certain stage (e.g., via API server)
  3. Load-test script termination causes Node D to abort queued requests

Root Cause Analysis:

  1. Currently, Node D sends a "KV cache pull complete, release approved" message to Node P
  2. This message is transmitted via the worker connector. If PD connection breaks or requests are rejected upstream, Node D cannot send the message
  3. Node P will never release KV cache without receiving this message

Solution:
Following VLLM community's approach (NIXL connector timeout mechanism), we're implementing:

  • A timeout mechanism with comprehensive warnings
  • Updated README documentation
  • Reference: VLLM's optimization PR #20139

Note: The full disaster recovery solution is still in design. This PR will be merged into v091-dev branch simply but will evolve in main (PR #2174).

  • We sincerely welcome your valuable feedback on this approach.

@ganyi1996ppo ganyi1996ppo merged commit 2b97c69 into vllm-project:v0.9.1-dev Aug 5, 2025
17 checks passed
liyu119 added a commit to rjg-lyh/vllm-ascend that referenced this pull request Aug 11, 2025
…nto qwen30-dev

* 'qwen30-dev' of https://github.com/rjg-lyh/vllm-ascend:
  [V0.9.1] Replace FA ops with FA_V2 to optimize perf
  [0.9.1]remove chunked_prefill_for_mla (vllm-project#2177)
  move with_prefill allreduce from cpu to npu (vllm-project#2230)
  [v0.9.1] Add release note for v0.9.1rc2 (vllm-project#2233)
  [Docs] Sync main doc to v0.9.1-dev (vllm-project#2227)
  [0.9.1] Enable external distributed dp deployments in vllm ascend(0.9.1 only) (vllm-project#2109)
  [V0.9.1][BugFix] Fix the bug in decoraotor patch (vllm-project#2199)
  [v0.9.1][Bugfix][PD] Auto-clear producer KV cache if no pull notification (vllm-project#2085)
  [BUGFIX][0.9.1] FIX ring_mla input ‘query_lens’ to cpu (vllm-project#2170)
  [0.9.1][Prefill Perf] add D2H & initRoutingQuantV2 (vllm-project#2038)
  [bugfix] add with_prefill cpu allreduce to handle D-node recomputatio… (vllm-project#2129)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants