Eval bug: LLAMA_SET_ROWS=1 gibberish output with Dual GPU offload

### Name and Version

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 1: Tesla P40, compute capability 6.1, VMM: no
version: 5944 (36c15324)
built with MSVC 19.44.35208.0 for x64

### Operating systems

Windows

### GGML backends

CUDA

### Hardware

Ryzen 5800x 
3090 + P40

### Models

magnum-v4-22b-Q6_K.gguf
TheSkullery_L3.3-Unnamed-Exp-70B-v0.8-IQ4_XS.gguf


### Problem description & steps to reproduce

Running with environment variable 'LLAMA_SET_ROWS=0' results in normal output. Setting to 1 results in gibberish

```

Helpful AI
21 July 2025 7:42 AM

How can I help?
12t
User
21 July 2025 7:43 AM

Can you recite for me the intro to sesame street
6.4s
85t
Helpful AI
21 July 2025 8:25 PM

Sunnynynynyy dayyy day,,
day!…
AItSunIt''''ssIts
unnme of
time for for to to
talk you play
play S S
S
ThisSThis is is is is the is a the way street best w song sest
way of way you a
to explore to be come in a
to play learn
play
```

However, if i restrict to just a single GPU (P40 or 3090), llama_set_rows=1 works with no issues. 

### First Bad Commit

Havent tested earlier versions. 

### Relevant log output

```shell
With LLAMA_SET_ROWS=1
Gibberish:
.\llama-server.exe -m D:\text-generation-webui\models\TheSkullery_L3.3-Unnamed-Exp-70B-v0.8-IQ4_XS.gguf -ngl 99 -ts 40/43 -fa -c 32768
.\llama-server.exe -m D:\text-generation-webui\models\magnum-v4-22b-Q6_K.gguf -ngl 99 -ts 40/43 -fa -c 32768

Sane (i cant fit 32k ctx on one GPU:
.\llama-server.exe -m D:\text-generation-webui\models\magnum-v4-22b-Q6_K.gguf -ngl 99 -fa -c 16368
.\llama-server.exe -m D:\text-generation-webui\models\magnum-v4-22b-Q6_K.gguf -ngl 53 -fa -c 32768 (CPU has 4 layers offloaded)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: LLAMA_SET_ROWS=1 gibberish output with Dual GPU offload #14795

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: LLAMA_SET_ROWS=1 gibberish output with Dual GPU offload #14795

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions