tests : add non-cont K,V FA tests #14756

ggerganov · 2025-07-18T10:39:48Z

With the introduction of a split KV cache, the K and V tensors passed to FA can now be non-contiguous. Add tests in test-backend-ops to cover this.

This issue was reported here: #14363 (comment)

It can be reproduced with this command with CUDA backend:

make -j && LLAMA_SET_ROWS=1 ./bin/llama-parallel -hf ggml-org/Qwen2.5-Coder-3B-Q8_0-GGUF -np 8 -ns 128 -s 1 -c 4096 -fa -ngl 99 --top-k 1 -ctk q8_0 -ctv q8_0

0.02.205.072 I common_init_from_params: setting dry_penalty_last_n to ctx_size = 4608
0.02.205.072 W common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
0.02.233.146 I No new questions so proceed with build-in defaults.
0.02.233.146 I 

0.02.240.868 I main: Simulating parallel requests from clients:
0.02.240.870 I main: n_parallel = 8, n_sequences = 128, cont_batching = 1, system tokens = 273
0.02.240.870 I 
0.02.240.870 I Processing requests ...

0.02.241.045 I main: clearing the KV cache
0.02.248.040 I Client   0, seq    0, junk =    0, prompt = 284, started decoding ...
0.02.254.999 I Client   1, seq    1, junk =    0, prompt = 284, started decoding ...
0.02.262.112 I Client   2, seq    2, junk =    0, prompt = 284, started decoding ...
0.02.269.266 I Client   3, seq    3, junk =    0, prompt = 290, started decoding ...
0.02.276.355 I Client   4, seq    4, junk =    0, prompt = 288, started decoding ...
0.02.283.337 I Client   5, seq    5, junk =    0, prompt = 285, started decoding ...
0.02.290.405 I Client   6, seq    6, junk =    0, prompt = 286, started decoding ...
0.02.297.367 I Client   7, seq    7, junk =    0, prompt = 284, started decoding ...
/home/ggerganov/development/github/llama.cpp/ggml/src/ggml-cuda/template-instances/../fattn-common.cuh:748: GGML_ASSERT(ggml_is_contiguously_allocated(K)) failed

cc @JohannesGaessler

ggml-ci

OrangeDoro

Hi! I'm a grad student working on a research project about using large language models to automate code review. Based on your commit a856a56 and the changes in tests/test-backend-ops.cpp, my tool generated this comment:

Error Handling: Implement error handling for tensor creation functions. Check if ggml_new_tensor_4d or ggml_view_4d returns a null pointer to prevent dereferencing null pointers later in the code.
Null Pointer Checks: The code does not check if the ctx pointer is null before passing it to functions like ggml_new_tensor_4d, ggml_view_4d, and ggml_permute. It is advisable to add a check for ctx at the beginning of the create_permuted function.
Input Validation: The function does not validate the dimensions passed to ggml_new_tensor_4d or ggml_view_4d. Consider adding checks to ensure that the dimensions are within acceptable bounds before proceeding with tensor creation.
Parameter Addition: Ensure that the calling code correctly specifies the is_view parameter for each tensor creation.
Tensor Creation Logic: Verify the doubling of ne_perm[1] when is_view is true to ensure it is the intended behavior. If this is conditional, it should be documented or validated.
Variable Initialization: Initialize the variable ggml_tensor * t; to nullptr to avoid potential undefined behavior.
Memory Management: The code creates a tensor t0 when is_view is true, but it does not appear to free t0 after it is used to create the view t. Ensure that there is a mechanism to free t0 when it is no longer needed.
Redundant Tensor Creation Logic: The logic for creating a tensor is duplicated in the if and else branches of the is_view check. This could be refactored to reduce redundancy.
Concurrency Issues: If this code is executed in a multi-threaded environment, ensure that the context ctx is thread-safe.
Test Coverage for create_permuted: Add unit tests for create_permuted with is_view set to true to verify that the tensor is created correctly as a view.

As part of my research, I'm trying to understand how useful these comments are in real-world development. If you have a moment, I'd be super grateful if you could quickly reply to these two yes/no questions:

Does this comment provide suggestions from a dimension you hadn’t considered?
Do you find this comment helpful?

Thanks a lot for your time and feedback! And sorry again if this message is a bother.

tests : add non-cont K,V FA tests

a856a56

ggml-ci

github-actions bot added the testing Everything test related label Jul 18, 2025

ggerganov mentioned this pull request Jul 18, 2025

llama : add high-throughput mode #14363

Merged

23 tasks

OrangeDoro reviewed Jul 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tests : add non-cont K,V FA tests #14756

tests : add non-cont K,V FA tests #14756

ggerganov commented Jul 18, 2025 •

edited

Loading

Uh oh!

OrangeDoro left a comment

Uh oh!

Uh oh!

tests : add non-cont K,V FA tests #14756

Are you sure you want to change the base?

tests : add non-cont K,V FA tests #14756

Conversation

ggerganov commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OrangeDoro left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ggerganov commented Jul 18, 2025 •

edited

Loading