-
Notifications
You must be signed in to change notification settings - Fork 1.5k
plugin/decision: upload events as soon as buffer limit is reached #7516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
plugin/decision: upload events as soon as buffer limit is reached #7516
Conversation
✅ Deploy Preview for openpolicyagent ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
ec256d9 to
c0bfd4c
Compare
|
Keeping this in draft until #7521 is merged, some overlap so rather fix any merge conflicts before this is reviewed. |
|
This pull request has been automatically marked as stale because it has not had any activity in the last 30 days. |
813f807 to
8b2b5e7
Compare
8a818be to
05f9d29
Compare
Signed-off-by: Sebastian Spaink <[email protected]>
05f9d29 to
fbeb948
Compare
|
@johanfylling finally ready for review 🥳 |
Signed-off-by: Sebastian Spaink <[email protected]>
Signed-off-by: Sebastian Spaink <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
A couple of thoughts.
| if dropped > 0 { | ||
| b.incrMetric(logBufferEventDropCounterName) | ||
| b.incrMetric(logBufferSizeLimitExDropCounterName) | ||
| b.logger.Error("Dropped %v chunks from buffer. Reduce reporting interval or increase buffer size.", dropped) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're in immediate mode, and we still have a margin until the upload limit, I'd not expect us to be dropping events, but instead upload the current buffer and start a new buffer with the event that didn't fit. Is that happening, but just not reflected here?
|
|
||
| if full && *p.config.Reporting.Trigger == plugins.TriggerImmediate { | ||
| select { | ||
| case p.triggerUpload <- struct{}{}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have dropped events on the above push, e.g. from the front of the size buffer at this point?
|
|
||
| if len(receivedEvents) != 1 { | ||
| t.Fatalf("Expected %d events, got %d", 1, len(receivedEvents)) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also assert that the second event hasn't been dropped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, although at the moment there isn't a guarantee the event won't drop.
Like we discussed worth looking into a way to guarantee no events are dropped. Perhaps by flushing the buffer when full like you suggested. Will loop back to this 👍
|
|
||
| util.PushFIFO(b.buffer, event, b.metrics, logBufferEventDropCounterName) | ||
|
|
||
| return len(b.buffer) == cap(b.buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have dropped an event here when running in immediate mode?
Signed-off-by: Sebastian Spaink <[email protected]>
Why the changes in this PR are needed?
resolve: #7455
What are the changes in this PR?
This introduces a new trigger mode:
decision_logs.reporting.trigger=immediateThis trigger mode will upload events either if the buffer limit is reached or after the maximum delay. Both the size and event buffer type supports this trigger mode. For the event buffer that means if the buffered channel is full, for the size buffer that means if the chunk buffer is full.
Using this trigger mode disables configuring the
decision_logs.reporting.min_delay_secondsand only thedecision_logs.reporting.max_delay_secondscan be changed. If using the size buffer type it also requires setting thedecision_logs.reporting.buffer_size_limit_bytes, otherwise it defaults to unlimited and could cause confusion because then the uploads will only trigger ondecision_logs.reporting.max_delay_seconds.Example
Take for example this contrived scenario to illustrate the benefit of the immediate trigger.
Given the following configuration with a small size limit and large delay.
Perhaps this config works well for most of the time, but then during peak traffic created using the vegeta cli:
echo 'POST http://localhost:8181/v1/data/example/allow' | vegeta attack --duration=10s -rate=500 | tee results.bin | vegeta reportChecking the metrics you see there is a ridiculous number of recorded dropped metrics, only 10 events weren't dropped (vegeta is sending 500 requests per second for 10 seconds) and those 10 events were probably still in the buffer:
"counter_decision_logs_dropped_buffer_size_limit_exceeded": 4990Now updating the configuration to use the immediate trigger mode, keeping the same small buffer size and max delay:
Attacking it with the same vegeta configuration... There isn't a single dropped log! 🥳 Of course there are pros/cons but that could make a huge difference if logs are vital and the user has the resources to keep up with the uploads.
In this example the log eater service is a (simple go server to eat the logs )
Notes to assist PR review:
Updated the upload loop function to support immediate uploads. Also added a new loop function when setting
manualtrigger mode or ifserviceis not configured. This helps separate the logic making it easier to read and moves the decision making during reconfiguration only and not while the loop is running. So this is why theloopTypelogic is there.