feat: Batch records before writing to sinks by salvacorts · Pull Request #20821 · grafana/loki

salvacorts · 2026-02-16T09:16:27Z

What this PR does / why we need it:

We noticed that in many queries for which the physical planning phase was taking long, most of the time was spent on waiting to write to sinks.

Schedulers create one task per pointerscan. Before this PR, each task would write N times to sinks.

We are adding record_batch_size: to control how many responses are merged together into a single response before writing to a sink, thus reducing the number of round trips. Note that this also applies to other tasks beside the metastore scan ones.

github-actions · 2026-02-16T09:18:24Z

💻 Deploy preview available (feat: Batch records before writing to sinks):

https://deploy-preview-loki-20821-zb444pucvq-vp.a.run.app/docs/loki/latest/

rfratto · 2026-02-17T11:20:18Z

pkg/validation/limits.go

I think there's a fair bit of complexity being introduced here by batching by number of records. We already have batch_size as a configuration option in the engine, have you tried using that instead of adding a new limit?

I should have copied over my comment from the original PR: #20763 (comment)

by batching by number of records

We don't batch by the number of records/rows but, but by their size (Bytes).

My hesitation around using the already existing batch_size (row count) is that (e.g.) 6k rows of metric records will be fairly small, whereas 6k rows of log records can be a lot and cause OOMs.

In my opinion bytes size-based batching fits all use cases plus ideally the writing to sinks logic should be agnostic of what's actually written.

Happy to change my mind tho.

My hesitation around using the already existing batch_size (row count) is that (e.g.) 6k rows of metric records will be fairly small, whereas 6k rows of log records can be a lot and cause OOMs.

If 6k rows of log records can cause OOMs, our batch size shouldn't be 6k :) batch_size determines the largest record that a pipeline can produce, so it needs to be able to fit in memory. (Batching records by row count also helps performance as you'll have more contiguous regions of memory)

Based on that intent, do you think it makes sense to consistently use batch_size throughout?

our batch size shouldn't be 6k

Fair, my point is that 6k may be fine (and beneficial) for some record types and too much for other. So reducing it reduces the benefit in the smaller record types.

Still I'm all up for simplicity here, and this is something we can revisit in the future if we need to. I changed it so we use the already existing BatchSize.

spiridonov · 2026-02-17T14:19:53Z

pkg/engine/internal/worker/thread.go

A single arrowagg instance can be used in drainPipeline instead of batch []arrow.RecordBatch, I think. arroagg has a Reset method, so it can be reset after flushing.

Good call. Changed.

salvacorts added 14 commits February 12, 2026 15:47

fix: Add back taks reveive and send durations

121f995

Buffered merge and sharded scanset

cb3dd69

Batching on drain

cf9c0fc

Test for batching on drain

f5b7abb

docs and undo changes

b25af55

fix lint

d852fdd

fix release

fdfad3f

fix after rebase

7bcf0f4

Use arrowagg for batching

d00b05d

Fix test

c65a349

Fix applies only to local network

99ef17b

Remove all Retain/Release

c993bc1

Stats

7057a1b

lint

914a2b7

pull-request-size bot added the size/L label Feb 16, 2026

salvacorts force-pushed the salvacorts/buffered-sinks branch from 762962f to 95025fa Compare February 16, 2026 09:29

undo changes

b4e04fc

salvacorts force-pushed the salvacorts/buffered-sinks branch from 95025fa to 6bd1ddb Compare February 16, 2026 09:33

remove scans batching

3506e9a

salvacorts force-pushed the salvacorts/buffered-sinks branch from 6bd1ddb to 3506e9a Compare February 16, 2026 09:35

salvacorts mentioned this pull request Feb 16, 2026

feat: Multiple pointerscans per task and buffering on merge #20763

Closed

salvacorts marked this pull request as ready for review February 16, 2026 09:55

salvacorts requested a review from a team as a code owner February 16, 2026 09:55

rfratto reviewed Feb 17, 2026

View reviewed changes

spiridonov reviewed Feb 17, 2026

View reviewed changes

spiridonov mentioned this pull request Feb 17, 2026

chore: Buffered channel in stream source #20837

Open

6 tasks

salvacorts added 3 commits February 18, 2026 11:56

reuse instance of arrowagg

db369fe

replace byte-size based batching in favor of existing record count

874636e

Merge branch 'main' into salvacorts/buffered-sinks

d0aefb2

salvacorts added 3 commits February 18, 2026 12:21

Add back stats export after main merge

ee005c4

fix span

02e9cca

lint

dee3cf6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Batch records before writing to sinks#20821

feat: Batch records before writing to sinks#20821
salvacorts wants to merge 22 commits intomainfrom
salvacorts/buffered-sinks

salvacorts commented Feb 16, 2026

Uh oh!

github-actions bot commented Feb 16, 2026 •

edited

Loading

Uh oh!

rfratto Feb 17, 2026

Uh oh!

salvacorts Feb 17, 2026

Uh oh!

rfratto Feb 17, 2026

Uh oh!

salvacorts Feb 18, 2026

Uh oh!

spiridonov Feb 17, 2026

Uh oh!

salvacorts Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

salvacorts commented Feb 16, 2026

Uh oh!

github-actions bot commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rfratto Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

salvacorts Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

rfratto Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

salvacorts Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

spiridonov Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

salvacorts Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

github-actions bot commented Feb 16, 2026 •

edited

Loading