Skip to content

Conversation

@sspaink
Copy link
Member

@sspaink sspaink commented Apr 14, 2025

Why the changes in this PR are needed?

resolve: #7455

What are the changes in this PR?

This introduces a new trigger mode:

decision_logs.reporting.trigger=immediate

This trigger mode will upload events either if the buffer limit is reached or after the maximum delay. Both the size and event buffer type supports this trigger mode. For the event buffer that means if the buffered channel is full, for the size buffer that means if the chunk buffer is full.

Using this trigger mode disables configuring the decision_logs.reporting.min_delay_seconds and only the decision_logs.reporting.max_delay_seconds can be changed. If using the size buffer type it also requires setting the decision_logs.reporting.buffer_size_limit_bytes, otherwise it defaults to unlimited and could cause confusion because then the uploads will only trigger on decision_logs.reporting.max_delay_seconds.

Example

Take for example this contrived scenario to illustrate the benefit of the immediate trigger.

Given the following configuration with a small size limit and large delay.

decision_logs:
  service: logeater
  reporting:
    buffer_type: event
    trigger: periodic
    buffer_size_limit_events: 10
    min_delay_seconds: 10
    max_delay_seconds: 20

Perhaps this config works well for most of the time, but then during peak traffic created using the vegeta cli:

echo 'POST http://localhost:8181/v1/data/example/allow' | vegeta attack --duration=10s -rate=500 | tee results.bin | vegeta report

Checking the metrics you see there is a ridiculous number of recorded dropped metrics, only 10 events weren't dropped (vegeta is sending 500 requests per second for 10 seconds) and those 10 events were probably still in the buffer:
"counter_decision_logs_dropped_buffer_size_limit_exceeded": 4990

Now updating the configuration to use the immediate trigger mode, keeping the same small buffer size and max delay:

decision_logs:
  service: logeater
  reporting:
    buffer_type: event
    trigger: immediate
    buffer_size_limit_events: 10
    max_delay_seconds: 20

Attacking it with the same vegeta configuration... There isn't a single dropped log! 🥳 Of course there are pros/cons but that could make a huge difference if logs are vital and the user has the resources to keep up with the uploads.

In this example the log eater service is a (simple go server to eat the logs )

Notes to assist PR review:

Updated the upload loop function to support immediate uploads. Also added a new loop function when setting manual trigger mode or if service is not configured. This helps separate the logic making it easier to read and moves the decision making during reconfiguration only and not while the loop is running. So this is why the loopType logic is there.

@netlify
Copy link

netlify bot commented Apr 14, 2025

Deploy Preview for openpolicyagent ready!

Name Link
🔨 Latest commit ea8b0af
🔍 Latest deploy log https://app.netlify.com/projects/openpolicyagent/deploys/68e9298a88cf090008b4cf06
😎 Deploy Preview https://deploy-preview-7516--openpolicyagent.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@sspaink sspaink force-pushed the decisionplugin_uploadtrigger branch 4 times, most recently from ec256d9 to c0bfd4c Compare April 21, 2025 15:46
@sspaink sspaink marked this pull request as ready for review April 21, 2025 15:58
@sspaink sspaink changed the title feat: new "immediate" trigger mode for decision log plugin plugin/decision: new "immediate" trigger mode Apr 30, 2025
@sspaink sspaink added the monitoring Issues related to decision log and status plugins label Apr 30, 2025
@sspaink sspaink marked this pull request as draft May 20, 2025 16:01
@sspaink
Copy link
Member Author

sspaink commented May 23, 2025

Keeping this in draft until #7521 is merged, some overlap so rather fix any merge conflicts before this is reviewed.

@stale
Copy link

stale bot commented Jun 22, 2025

This pull request has been automatically marked as stale because it has not had any activity in the last 30 days.

@stale stale bot added the inactive label Jun 22, 2025
@sspaink sspaink changed the title plugin/decision: new "immediate" trigger mode plugin/decision: upload events as soon as buffer limit is reached Sep 18, 2025
@sspaink sspaink force-pushed the decisionplugin_uploadtrigger branch from 813f807 to 8b2b5e7 Compare September 18, 2025 23:36
@stale stale bot removed the inactive label Sep 18, 2025
@sspaink sspaink force-pushed the decisionplugin_uploadtrigger branch 3 times, most recently from 8a818be to 05f9d29 Compare September 19, 2025 01:05
Signed-off-by: Sebastian Spaink <[email protected]>
@sspaink sspaink force-pushed the decisionplugin_uploadtrigger branch from 05f9d29 to fbeb948 Compare September 19, 2025 01:37
@sspaink sspaink marked this pull request as ready for review September 19, 2025 02:30
@sspaink
Copy link
Member Author

sspaink commented Sep 19, 2025

@johanfylling finally ready for review 🥳

sspaink and others added 2 commits September 18, 2025 22:04
Copy link
Contributor

@johanfylling johanfylling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
A couple of thoughts.

if dropped > 0 {
b.incrMetric(logBufferEventDropCounterName)
b.incrMetric(logBufferSizeLimitExDropCounterName)
b.logger.Error("Dropped %v chunks from buffer. Reduce reporting interval or increase buffer size.", dropped)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're in immediate mode, and we still have a margin until the upload limit, I'd not expect us to be dropping events, but instead upload the current buffer and start a new buffer with the event that didn't fit. Is that happening, but just not reflected here?


if full && *p.config.Reporting.Trigger == plugins.TriggerImmediate {
select {
case p.triggerUpload <- struct{}{}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have dropped events on the above push, e.g. from the front of the size buffer at this point?


if len(receivedEvents) != 1 {
t.Fatalf("Expected %d events, got %d", 1, len(receivedEvents))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also assert that the second event hasn't been dropped?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, although at the moment there isn't a guarantee the event won't drop.

Like we discussed worth looking into a way to guarantee no events are dropped. Perhaps by flushing the buffer when full like you suggested. Will loop back to this 👍


util.PushFIFO(b.buffer, event, b.metrics, logBufferEventDropCounterName)

return len(b.buffer) == cap(b.buffer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have dropped an event here when running in immediate mode?

Signed-off-by: Sebastian Spaink <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

monitoring Issues related to decision log and status plugins

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Decision log plugin: trigger upload as soon as the buffer limit is reached

2 participants