Skip to content

Conversation

compilade
Copy link
Collaborator

@compilade compilade commented Aug 5, 2025

This fixes a problem I've noticed when working on #15060 and running llama-imatrix with -ub 32768 -b 32768 to compute 64 chunks (of 512 tokens) at once with https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507. (I think any model with a vocab size of at least 131073 tokens should trigger this problem (since 2**32 == 32768 * 132072). That one has a vocab size of 151936)

At least two places can overflow on big batches, which are

ggml_backend_tensor_get_async(backend_res, t_logits, logits_out, 0, n_outputs*n_vocab*sizeof(float));

and

std::swap(logits[i0*n_vocab + k], logits[i1*n_vocab + k]);

This PR should fix that.

Before (notice the high perplexity of later chunks in the huge batch):

compute_imatrix: 1096.04 seconds per pass - ETA 36.53 minutes
[1]4.9921,[2]3.5230,[3]3.2906,[4]3.6947,[5]3.5449,[6]3.1976,[7]3.5389,[8]3.5263,[9]6.2084,[10]17.0548,[11]38.9870,[12]77.6513,[13]139.1047,[14]229.2792,[15]353.5501,[16]516.4480,[17]721.5067,[18]971.2237,[19]1267.1001,[20]1609.7314,[21]1998.9256,[22]2433.8312,[23]2913.0640,[24]3434.8252,[25]3997.0067,[26]4597.2832,[27]5233.1890,[28]5902.1827,[29]6601.6984,[30]7329.1878,[31]8082.1512,[32]8858.1627,[33]9654.8871,[34]10470.0929,[35]11301.6604,[36]12147.5862,[37]13005.9851,[38]13875.0902,[39]14753.2511,[40]15638.9308,[41]16530.7019,[42]17427.2419,[43]18327.3288,[44]19229.8356,[45]20133.7248,[46]21038.0440,[47]21941.9197,[48]22844.5530,[49]23745.2143,[50]24643.2388,[51]25538.0218,[52]26429.0147,[53]27315.7209,[54]28197.6915,[55]29074.5227,[56]29945.8515,[57]30811.3532,[58]31670.7381,[59]32523.7490,[60]33370.1584,[61]34209.7666,[62]35042.3990,[63]35867.9044,[64]36686.1527,

After (looks more normal):

compute_imatrix: 1000.44 seconds per pass - ETA 33.33 minutes
[1]4.9921,[2]3.5230,[3]3.2906,[4]3.6947,[5]3.5449,[6]3.1976,[7]3.5389,[8]3.5263,[9]3.9438,[10]3.8663,[11]3.8256,[12]4.2541,[13]4.8149,[14]5.0722,[15]5.5042,[16]5.8128,[17]6.0388,[18]6.4330,[19]6.2236,[20]6.3602,[21]6.3347,[22]6.3306,[23]6.2010,[24]6.4108,[25]6.6068,[26]6.5006,[27]6.5882,[28]6.6548,[29]6.8055,[30]6.7468,[31]6.5401,[32]6.2833,[33]6.1323,[34]6.0287,[35]5.9729,[36]5.9470,[37]5.9240,[38]5.9597,[39]5.9534,[40]6.1014,[41]6.1480,[42]6.3101,[43]6.4466,[44]6.6048,[45]6.7291,[46]6.8099,[47]6.7464,[48]6.8177,[49]6.8992,[50]6.9471,[51]6.8584,[52]6.9443,[53]7.0865,[54]7.1735,[55]7.2375,[56]7.3005,[57]7.3728,[58]7.4331,[59]7.4543,[60]7.4642,[61]7.4415,[62]7.4002,[63]7.4459,[64]7.5015,

Make sure to read the contributing guidelines before submitting a PR

@compilade compilade added the bugfix fixes an issue or bug label Aug 5, 2025
@CISC CISC merged commit ee3a9fc into master Aug 5, 2025
45 of 47 checks passed
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Aug 5, 2025
* context : fix overflow when re-ordering huge outputs

* context : fix logits size overflow for huge batches
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes an issue or bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants