Skip to content

feat: Add chunking query support for datasource plugins#1449

Merged
dgiagio merged 1 commit intomainfrom
dgiagio/query-chunked-data
Jan 19, 2026
Merged

feat: Add chunking query support for datasource plugins#1449
dgiagio merged 1 commit intomainfrom
dgiagio/query-chunked-data

Conversation

@dgiagio
Copy link
Contributor

@dgiagio dgiagio commented Jan 5, 2026

This PR introduces support for Chunked Query Responses in the Grafana Plugin SDK. This architecture improvement allows datasource plugins to stream data back to Grafana in batches (chunks) rather than buffering the entire result set in memory before transmission. This implementation addresses critical Out-of-Memory (OOM) crashes and significantly reduces peak memory pressure when handling large datasets.

Technical Details

  • New Streaming gRPC Method: Adds QueryChunkedData operation to backend.proto to support server-side gRPC streaming. This change is fully backwards compatible with existing plugins.
  • QueryChunkedDataHandler Interface: Adds a new (optional) interface for plugins to implement chunked query handlers, distinct from the existing QueryDataHandler.
  • ChunkedDataWriter Interface: Provides a helper interface modeled after Go's http.ResponseWriter to simplify sending frames and rows sequentially.
  • Automatic Buffering: The writer automatically buffers rows (defaulting to 1000 rows) to ensure efficient network transmission.
  • Frame Splitting: To handle incomplete data frame transmissions across chunks, a "marker" data frame (an empty frame) is sent immediately after a complete frame is finished. This signals to the server that the specific frame transmission is complete.
  • Adoption: Existing plugins continue to work as-is. Adoption of the new streaming interface is opt-in for plugins requiring memory optimization.

Testing

Unit tests have been added to ensure data integrity and backward compatibility:

  • Parity Verification: Validated that chunked responses produced by the new QueryChunkedData implementation are identical in content and structure to existing QueryData responses.
  • Capability Detection & Fallback: Verified that the SDK correctly identifies plugins that do not implement the new QueryChunkedDataHandler interface, allowing the system to switch to the legacy QueryData when needed.

Performance

Benchmarks were conducted simulating a query returning ~96MB of data . The results demonstrate a drastic reduction in memory usage.

Resources

@grafana-plugins-platform-bot grafana-plugins-platform-bot bot moved this from 📬 Triage to 🔬 In review in Grafana Catalog Team Jan 5, 2026
@dgiagio dgiagio force-pushed the dgiagio/query-chunked-data branch from 9c7b4eb to d93b1a1 Compare January 5, 2026 18:28
@dgiagio dgiagio marked this pull request as ready for review January 5, 2026 18:35
@dgiagio dgiagio requested review from a team as code owners January 5, 2026 18:35
@wbrowne wbrowne self-requested a review January 6, 2026 09:09
Copy link
Contributor

@wbrowne wbrowne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job taking this on 👍

It's hard to strike the complexity balance with this type of feature as you want to give people control, but not too much so that they become overwhelmed with the API. The other things that come to mind relate to usability / high level API design, so I'll just leave them as food for thought for now:

  • It might be nice to have a fallback to automatically call .Close() on the writer in case a plugin dev forgets to do it
  • ForChunkedDataWriter
    • Since frames are processed one at a time (and refID is already attached to the frame struct), we could technically remove refID from the API and avoid the repetition across methods (replace WriteFrameRow with AppendRow for example). We do obviously lose more control this way however
    • Having the ability Flush() manually might be a nice addition eventually for those who want more control

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces Chunked Query Responses support for the Grafana Plugin SDK, enabling datasource plugins to stream data back to Grafana in batches rather than buffering entire result sets in memory. This addresses OOM crashes and reduces memory pressure when handling large datasets.

Key Changes:

  • Adds new gRPC streaming method QueryChunkedData with full backwards compatibility
  • Introduces QueryChunkedDataHandler interface and ChunkedDataWriter for plugin implementations
  • Implements automatic buffering with configurable chunk sizes (default 1000 rows)
  • Uses marker frames (empty frames) to signal frame transmission boundaries

Reviewed changes

Copilot reviewed 25 out of 26 changed files in this pull request and generated no comments.

Show a summary per file
File Description
proto/backend.proto Adds new gRPC streaming endpoint and related message types
genproto/pluginv2/*.go Generated protobuf code with updated versions and spelling fixes
backend/data_adapter.go Core chunking implementation with writer and state management
backend/data.go New interfaces for chunked data handling
experimental/datasourcetest/chunking_test.go Comprehensive tests verifying parity and fallback behavior
data/field.go New AppendAll utility for efficient field merging
backend/serve.go Integration of chunked handler into serve options
internal/testutil/freeport.go Refactored utility for test port allocation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dgiagio dgiagio force-pushed the dgiagio/query-chunked-data branch 3 times, most recently from 8a98104 to feaf0c5 Compare January 12, 2026 21:31
map<string,string> headers = 2;

// List of data queries
repeated DataQuery queries = 3;
Copy link
Member

@toddtreece toddtreece Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a benefit to sending multiple queries per request? i'm wondering if there be backpressure issues if multiple large responses are trying to write to the same stream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, summary is as follows:

@toddtreece raised concerns regarding potential bottlenecks if the receiver processes data slower than the source, especially when multiples responses are being sent. Specifically, could this lead to memory pressure or TCP timeouts during chunked reading from the source API?

We reviewed how gRPC stream backpressure handles flow control, which should mitigate these risks by throttling the sender. We've decided to proceed without changes for now but will monitor real-world performance to ensure it behaves as expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry just wanted to add to this: With multiplexing, if Query A's WriteFrame() blocks (buffer full), it blocks the entire stream preventing Query B/C/etc. from sending. Did you consider independent streams per query (tf isolating backpressure per query). I understand the trade-offs (API compatibility with QueryData, increased connections), but curious if this alternative was evaluated.

Copy link
Contributor Author

@dgiagio dgiagio Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't evaluated implementing independent streams "automatically" in this PR, but it's a path we can explore if real-world usage suggests a need for it.

In the meantime, the current design allows developers to achieve parallelism by calling QueryChunkedData multiple times with individual queries. Since each call triggers grpc.ClientConn.NewStream, they will receive responses over independent, parallel streams, each with their own backpressure control.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I belive some of the datasources (newrelic? datadog? honestly I don't remember... it has been a LONG time) allow references to the other query results -- allowing behavior that became generic with expressions.

So it seems best to keep multiple queries in the request for compatibility, and I agree the client could split this into multiple streams if that appears to make a real difference

return nil, err
}

f, err := data.UnmarshalArrowFrame(sr.Frame)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that the client gets access to the raw bytes!

This will let us support direct-to-browser passthough without decoding in the future

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgiagio is the plan to add a helper for demuxing the stream into existing query response types in the future? or is that up to each client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, it's up to each client. Once we have identified the best patterns to do so, I think we can move the code to the SDK so it can be reused.

This is because different clients can have different requirements - some will pass-thru the frames, others will have to buffer to run calculations, and probably a third way I haven't thought about.

@andresmgot
Copy link
Contributor

Good initiative! I'm okay merging this for testing, but before we do, we should mark it as experimental (not production-ready) so it's clear to Grafana and community developers. Documentation would also be needed but it can wait until it's ready for others to try. In particular, I think it's important that we add guidance on when to use this chunked approach versus the existing streaming one.

Copy link
Contributor

@ryantxu ryantxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- the breaking change failure seems like a false positive based on adding a function to a service.

However, the adapter allows the second function to be unimplemented so I do not think it is valid

@dgiagio dgiagio force-pushed the dgiagio/query-chunked-data branch from 2828130 to ba1455b Compare January 16, 2026 16:02
Copy link
Member

@toddtreece toddtreece left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good to me as well, but it should probably wait for @wbrowne's final approval before merging

Copy link
Contributor

@wbrowne wbrowne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @dgiagio!

@dgiagio dgiagio merged commit 3a04692 into main Jan 19, 2026
7 of 8 checks passed
@dgiagio dgiagio deleted the dgiagio/query-chunked-data branch January 19, 2026 17:22
@github-project-automation github-project-automation bot moved this from 🔬 In review to 🚀 Shipped in Grafana Catalog Team Jan 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

7 participants