feat: Add chunking query support for datasource plugins#1449
Conversation
9c7b4eb to
d93b1a1
Compare
wbrowne
left a comment
There was a problem hiding this comment.
Great job taking this on 👍
It's hard to strike the complexity balance with this type of feature as you want to give people control, but not too much so that they become overwhelmed with the API. The other things that come to mind relate to usability / high level API design, so I'll just leave them as food for thought for now:
- It might be nice to have a fallback to automatically call
.Close()on the writer in case a plugin dev forgets to do it - For
ChunkedDataWriter- Since frames are processed one at a time (and refID is already attached to the frame struct), we could technically remove refID from the API and avoid the repetition across methods (replace
WriteFrameRowwithAppendRowfor example). We do obviously lose more control this way however - Having the ability
Flush()manually might be a nice addition eventually for those who want more control
- Since frames are processed one at a time (and refID is already attached to the frame struct), we could technically remove refID from the API and avoid the repetition across methods (replace
There was a problem hiding this comment.
Pull request overview
This PR introduces Chunked Query Responses support for the Grafana Plugin SDK, enabling datasource plugins to stream data back to Grafana in batches rather than buffering entire result sets in memory. This addresses OOM crashes and reduces memory pressure when handling large datasets.
Key Changes:
- Adds new gRPC streaming method
QueryChunkedDatawith full backwards compatibility - Introduces
QueryChunkedDataHandlerinterface andChunkedDataWriterfor plugin implementations - Implements automatic buffering with configurable chunk sizes (default 1000 rows)
- Uses marker frames (empty frames) to signal frame transmission boundaries
Reviewed changes
Copilot reviewed 25 out of 26 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
proto/backend.proto |
Adds new gRPC streaming endpoint and related message types |
genproto/pluginv2/*.go |
Generated protobuf code with updated versions and spelling fixes |
backend/data_adapter.go |
Core chunking implementation with writer and state management |
backend/data.go |
New interfaces for chunked data handling |
experimental/datasourcetest/chunking_test.go |
Comprehensive tests verifying parity and fallback behavior |
data/field.go |
New AppendAll utility for efficient field merging |
backend/serve.go |
Integration of chunked handler into serve options |
internal/testutil/freeport.go |
Refactored utility for test port allocation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
8a98104 to
feaf0c5
Compare
feaf0c5 to
2828130
Compare
| map<string,string> headers = 2; | ||
|
|
||
| // List of data queries | ||
| repeated DataQuery queries = 3; |
There was a problem hiding this comment.
is there a benefit to sending multiple queries per request? i'm wondering if there be backpressure issues if multiple large responses are trying to write to the same stream
There was a problem hiding this comment.
Discussed offline, summary is as follows:
@toddtreece raised concerns regarding potential bottlenecks if the receiver processes data slower than the source, especially when multiples responses are being sent. Specifically, could this lead to memory pressure or TCP timeouts during chunked reading from the source API?
We reviewed how gRPC stream backpressure handles flow control, which should mitigate these risks by throttling the sender. We've decided to proceed without changes for now but will monitor real-world performance to ensure it behaves as expected.
There was a problem hiding this comment.
Sorry just wanted to add to this: With multiplexing, if Query A's WriteFrame() blocks (buffer full), it blocks the entire stream preventing Query B/C/etc. from sending. Did you consider independent streams per query (tf isolating backpressure per query). I understand the trade-offs (API compatibility with QueryData, increased connections), but curious if this alternative was evaluated.
There was a problem hiding this comment.
I haven't evaluated implementing independent streams "automatically" in this PR, but it's a path we can explore if real-world usage suggests a need for it.
In the meantime, the current design allows developers to achieve parallelism by calling QueryChunkedData multiple times with individual queries. Since each call triggers grpc.ClientConn.NewStream, they will receive responses over independent, parallel streams, each with their own backpressure control.
There was a problem hiding this comment.
I belive some of the datasources (newrelic? datadog? honestly I don't remember... it has been a LONG time) allow references to the other query results -- allowing behavior that became generic with expressions.
So it seems best to keep multiple queries in the request for compatibility, and I agree the client could split this into multiple streams if that appears to make a real difference
| return nil, err | ||
| } | ||
|
|
||
| f, err := data.UnmarshalArrowFrame(sr.Frame) |
There was a problem hiding this comment.
I like that the client gets access to the raw bytes!
This will let us support direct-to-browser passthough without decoding in the future
There was a problem hiding this comment.
@dgiagio is the plan to add a helper for demuxing the stream into existing query response types in the future? or is that up to each client?
There was a problem hiding this comment.
For now, it's up to each client. Once we have identified the best patterns to do so, I think we can move the code to the SDK so it can be reused.
This is because different clients can have different requirements - some will pass-thru the frames, others will have to buffer to run calculations, and probably a third way I haven't thought about.
|
Good initiative! I'm okay merging this for testing, but before we do, we should mark it as experimental (not production-ready) so it's clear to Grafana and community developers. Documentation would also be needed but it can wait until it's ready for others to try. In particular, I think it's important that we add guidance on when to use this chunked approach versus the existing streaming one. |
ryantxu
left a comment
There was a problem hiding this comment.
LGTM -- the breaking change failure seems like a false positive based on adding a function to a service.
However, the adapter allows the second function to be unimplemented so I do not think it is valid
2828130 to
ba1455b
Compare
toddtreece
left a comment
There was a problem hiding this comment.
this looks good to me as well, but it should probably wait for @wbrowne's final approval before merging
This PR introduces support for Chunked Query Responses in the Grafana Plugin SDK. This architecture improvement allows datasource plugins to stream data back to Grafana in batches (chunks) rather than buffering the entire result set in memory before transmission. This implementation addresses critical Out-of-Memory (OOM) crashes and significantly reduces peak memory pressure when handling large datasets.
Technical Details
QueryChunkedDataoperation tobackend.prototo support server-side gRPC streaming. This change is fully backwards compatible with existing plugins.QueryDataHandler.http.ResponseWriterto simplify sending frames and rows sequentially.Testing
Unit tests have been added to ensure data integrity and backward compatibility:
QueryChunkedDataimplementation are identical in content and structure to existingQueryDataresponses.QueryChunkedDataHandlerinterface, allowing the system to switch to the legacyQueryDatawhen needed.Performance
Benchmarks were conducted simulating a query returning ~96MB of data . The results demonstrate a drastic reduction in memory usage.
Resources