plugin/decision: upload events as soon as buffer limit is reached #7516

sspaink · 2025-04-14T18:42:43Z

Why the changes in this PR are needed?

resolve: #7455

What are the changes in this PR?

This introduces a new trigger mode:

decision_logs.reporting.trigger=immediate

This trigger mode will upload events either if the buffer limit is reached or after the maximum delay. Both the size and event buffer type supports this trigger mode. For the event buffer that means if the buffered channel is full, for the size buffer that means if the chunk buffer is full.

Using this trigger mode disables configuring the decision_logs.reporting.min_delay_seconds and only the decision_logs.reporting.max_delay_seconds can be changed. If using the size buffer type it also requires setting the decision_logs.reporting.buffer_size_limit_bytes, otherwise it defaults to unlimited and could cause confusion because then the uploads will only trigger on decision_logs.reporting.max_delay_seconds.

Example

Take for example this contrived scenario to illustrate the benefit of the immediate trigger.

Given the following configuration with a small size limit and large delay.

decision_logs:
  service: logeater
  reporting:
    buffer_type: event
    trigger: periodic
    buffer_size_limit_events: 10
    min_delay_seconds: 10
    max_delay_seconds: 20

Perhaps this config works well for most of the time, but then during peak traffic created using the vegeta cli:

echo 'POST http://localhost:8181/v1/data/example/allow' | vegeta attack --duration=10s -rate=500 | tee results.bin | vegeta report

Checking the metrics you see there is a ridiculous number of recorded dropped metrics, only 10 events weren't dropped (vegeta is sending 500 requests per second for 10 seconds) and those 10 events were probably still in the buffer:
"counter_decision_logs_dropped_buffer_size_limit_exceeded": 4990

Now updating the configuration to use the immediate trigger mode, keeping the same small buffer size and max delay:

decision_logs:
  service: logeater
  reporting:
    buffer_type: event
    trigger: immediate
    buffer_size_limit_events: 10
    max_delay_seconds: 20

Attacking it with the same vegeta configuration... There isn't a single dropped log! 🥳 Of course there are pros/cons but that could make a huge difference if logs are vital and the user has the resources to keep up with the uploads.

In this example the log eater service is a (simple go server to eat the logs )

Notes to assist PR review:

Updated the upload loop function to support immediate uploads. Also added a new loop function when setting manual trigger mode or if service is not configured. This helps separate the logic making it easier to read and moves the decision making during reconfiguration only and not while the loop is running. So this is why the loopType logic is there.

netlify · 2025-04-14T18:44:26Z

✅ Deploy Preview for openpolicyagent ready!

Name	Link
🔨 Latest commit	`ea8b0af`
🔍 Latest deploy log	https://app.netlify.com/projects/openpolicyagent/deploys/68e9298a88cf090008b4cf06
😎 Deploy Preview	https://deploy-preview-7516--openpolicyagent.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

sspaink · 2025-05-23T18:40:58Z

Keeping this in draft until #7521 is merged, some overlap so rather fix any merge conflicts before this is reviewed.

stale · 2025-06-22T20:24:27Z

This pull request has been automatically marked as stale because it has not had any activity in the last 30 days.

Signed-off-by: Sebastian Spaink <[email protected]>

sspaink · 2025-09-19T02:31:53Z

@johanfylling finally ready for review 🥳

Signed-off-by: Sebastian Spaink <[email protected]>

johanfylling

Thanks!
A couple of thoughts.

v1/plugins/logs/README.md

v1/plugins/logs/sizeBuffer.go

johanfylling · 2025-10-02T12:38:13Z

v1/plugins/logs/sizeBuffer.go

 	if dropped > 0 {
 		b.incrMetric(logBufferEventDropCounterName)
 		b.incrMetric(logBufferSizeLimitExDropCounterName)
 		b.logger.Error("Dropped %v chunks from buffer. Reduce reporting interval or increase buffer size.", dropped)


If we're in immediate mode, and we still have a margin until the upload limit, I'd not expect us to be dropping events, but instead upload the current buffer and start a new buffer with the event that didn't fit. Is that happening, but just not reflected here?

v1/plugins/logs/plugin.go

johanfylling · 2025-10-02T13:10:10Z

v1/plugins/logs/plugin.go

+
+	if full && *p.config.Reporting.Trigger == plugins.TriggerImmediate {
+		select {
+		case p.triggerUpload <- struct{}{}:


Could we have dropped events on the above push, e.g. from the front of the size buffer at this point?

v1/plugins/logs/plugin.go

johanfylling · 2025-10-02T13:46:41Z

v1/plugins/logs/plugin_test.go

+
+			if len(receivedEvents) != 1 {
+				t.Fatalf("Expected %d events, got %d", 1, len(receivedEvents))
+			}


Should we also assert that the second event hasn't been dropped?

Good idea, although at the moment there isn't a guarantee the event won't drop.

Like we discussed worth looking into a way to guarantee no events are dropped. Perhaps by flushing the buffer when full like you suggested. Will loop back to this 👍

johanfylling · 2025-10-02T13:51:41Z

v1/plugins/logs/eventBuffer.go


 	util.PushFIFO(b.buffer, event, b.metrics, logBufferEventDropCounterName)
+
+	return len(b.buffer) == cap(b.buffer)


Could we have dropped an event here when running in immediate mode?

Signed-off-by: Sebastian Spaink <[email protected]>

sspaink force-pushed the decisionplugin_uploadtrigger branch 4 times, most recently from ec256d9 to c0bfd4c Compare April 21, 2025 15:46

sspaink marked this pull request as ready for review April 21, 2025 15:58

sspaink changed the title ~~feat: new "immediate" trigger mode for decision log plugin~~ plugin/decision: new "immediate" trigger mode Apr 30, 2025

sspaink added the monitoring Issues related to decision log and status plugins label Apr 30, 2025

sspaink marked this pull request as draft May 20, 2025 16:01

stale bot added the inactive label Jun 22, 2025

sspaink changed the title ~~plugin/decision: new "immediate" trigger mode~~ plugin/decision: upload events as soon as buffer limit is reached Sep 18, 2025

sspaink force-pushed the decisionplugin_uploadtrigger branch from 813f807 to 8b2b5e7 Compare September 18, 2025 23:36

stale bot removed the inactive label Sep 18, 2025

sspaink force-pushed the decisionplugin_uploadtrigger branch 3 times, most recently from 8a818be to 05f9d29 Compare September 19, 2025 01:05

upload buffer when limit is reached

fbeb948

Signed-off-by: Sebastian Spaink <[email protected]>

sspaink force-pushed the decisionplugin_uploadtrigger branch from 05f9d29 to fbeb948 Compare September 19, 2025 01:37

sspaink marked this pull request as ready for review September 19, 2025 02:30

sspaink requested a review from johanfylling September 19, 2025 02:31

sspaink and others added 2 commits September 18, 2025 22:04

fix config validation

7efe1f2

Signed-off-by: Sebastian Spaink <[email protected]>

Merge branch 'main' into decisionplugin_uploadtrigger

03d518a

Signed-off-by: Sebastian Spaink <[email protected]>

johanfylling reviewed Oct 2, 2025

View reviewed changes

allow min delay for immediate mode

ea8b0af

Signed-off-by: Sebastian Spaink <[email protected]>


		util.PushFIFO(b.buffer, event, b.metrics, logBufferEventDropCounterName)

		return len(b.buffer) == cap(b.buffer)

plugin/decision: upload events as soon as buffer limit is reached #7516

Are you sure you want to change the base?

plugin/decision: upload events as soon as buffer limit is reached #7516

Uh oh!

Conversation

sspaink commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why the changes in this PR are needed?

What are the changes in this PR?

Example

Notes to assist PR review:

Uh oh!

netlify bot commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for openpolicyagent ready!

Uh oh!

sspaink commented May 23, 2025

Uh oh!

stale bot commented Jun 22, 2025

Uh oh!

sspaink commented Sep 19, 2025

Uh oh!

johanfylling left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

johanfylling Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

johanfylling Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

johanfylling Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

sspaink Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

johanfylling Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sspaink commented Apr 14, 2025 •

edited

Loading

netlify bot commented Apr 14, 2025 •

edited

Loading